Wednesday, July 29, 2015

close wait Issue

Close Wait problem


CLOSE_WAITS occur when the connection is open at one end and closed at one end.

CLOSE_WAITS occur due to a lot of reasons and the actual reason has to be figured out.

Login to the Webserver that shows the CLOSE_WAIT problem and run the below 

netstat -an | egrep 'CLOSE_WAIT' | awk '{print $5}'| sort | uniq -c | sort -nr

or place it as a shell script in /usr/local/adm/bin/count_closewait and run count_closewait going forth

//content of count_closewait.sh script is stated below
**begin
clear
echo "Here is the current Snapshot of CLOSE_WAIT ... "
echo
echo "Count Applicaiton (IP+Port) "
echo "===== ===================== "
netstat -an | egrep 'CLOSE_WAIT' | awk '{print $5}'| sort | uniq -c | sort -nr
echo
**end

below is a snapshot of the output 

Count Applicaiton (IP+Port)
===== =====================
  120 <<ipaddress1>>.9081
  110 <<ipaddress1>>.9087
   90 <<ipaddress1>>.9086
   85 <<ipaddress1>>.9088
   83 <<ipaddress1>>.9083
   88 <<ipaddress2>>.9082
   73 <<ipaddress2>>.9087
   71 <<ipaddress2>>.9084
   60 <<ipaddress2>>.9083

From the above output we infer the count of closewaits to a particular Server on a particular port. This displays the IP+Port with the maximum closewaits, at top.

  1. Out of all the outgoing connections from that server, check the IP and the corresponding port no. with the most number of CLOSE_WAIT’s associated with it.
  1. After identifying the ip and port no, do an nslookup to find the name of that server.
    nslookup <ipaddress>

Now login to the appropriate server as found from the above step.

  1. Now find the process id of the process running on that port number of the server on which the CLOSE_WAIT was detected. This can be done by using the lsof command
  lsof  | grep  <port no>

The corresponding Java/JVM instance will be process ID will be listed 

java    38863002  was  .....                0t0    TCP *:9083 (LISTEN)
ps -ef |grep <<pid>> in the above case 

ps -ef |grep 38863002 and the JVM instance will be found.

NOTE : Alternately you can also check the webserver plugin-cfg.xml for the port number above which will give you the application server instance being referred by it.

Now check the systemout logs of that JVM for the reason of CLOSE_WAIT.

  1. Some times the CLOSE_WAIT’s may be a result of network connectivity issues. Contact the Networks team to resolve the issue.
  1. If there is no problem in network connectivity, check the applications that are running on that JVM. This can be found by using the info_app command
      info_app | egrep ‘<JVM>|Server’

  1. If any application has a problem, then the CLOSE_WAIT’s may be due to the non responsiveness of that application. Contact the application support team for that and meanwhile perform the recycle of the web server to prevent any other applications having problems.
          If all the applications are running fine, then go ahead and recycle the Web Server.

      Notes

§  At any point of time, if you feel that the number of CLOSE_WAITS has gone too high, you’d better recycle the Web server. That should clear the issue
§  Usually, after performing the recycle of the web server on one side, the number of CLOSE_WAIT sockets on the other side also increases due to the increased load on the other server. Hence it is advised to recycle the other web server after the first one.


No comments:

Post a Comment