tomcat 遠端監控

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port={port for access}
-Dcom.sun.management.jmxremote.rmi.port={port for access}
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname={optional, bind IP}

init.d 加完以上JVM參數就可以用visualvm遠端監控了

Tomcat出現大量CLOSE_WAIT

最近遇到tomcat出現大量CLOSE_WAIT的問題, 始終找不出原因
以下是TCP連線的步驟

CLIENT                                 SERVER

1.    ESTABLISHED                    ESTABLISHED
2.    (Close)
FIN-WAIT-1  –> <FIN,ACK>  –> CLOSE-WAIT
3.    FIN-WAIT-2  <– <ACK>      <– CLOSE-WAIT
4.                                   (Close)
TIME-WAIT   <– <FIN,ACK>  <– LAST-ACK
5.    TIME-WAIT   –> <ACK>      –> CLOSED
(2 MSL)

Server端出現CLOSE_WAIT表示Server"被動地"收到關閉連線通知,Server確實關閉socket連線後會發LAST_ACK回去給Client端
通常CLOSE_WAIT出現的時間很短暫,會出現大量CLOSE_WAIT有以下幾種可能:
1. CPU資源吃緊,該請求正排隊關閉連線:此為正常現象,通常一段時間後就會消失,建議換好一點的硬體設備
2. 後端code有無窮迴圈,Server無法回應該請求:基本上這種情況可以排除,通常這類有明顯bug的code很快就會被發現,除非是天兵工程師。可用jstack或visualvm等工具查看
3. 後端code的multi-thread發生deadlock,Server無法回應該請求:檢查程式有synchronized的地方,可用jstack或visualvm等工具查看
4. 後端code有直接對Remote端發TCP packet的地方沒正確關閉連線:當Server的後端code用client類別連到Remote端,連線逾時遭到Remote端斷線,此時被動關閉連線時,code若沒處理好關閉,就會造成本地端CLOSE_WAIT
5. 某一版Tomcat或JVM的bug造成無法關閉socket

查socket語法:ss -tulpn (一些fd資訊要用sudo才會出現)
[Description] :
ss : It is a command representing utility used to investigate sockets
-t : It is an additional parameter for the ‘ss’ command used to add filter for the output for displaying TCP sockets.
-u : It is an additional parameter for the ‘ss’ command used to add filter for the output for displaying UDP sockets.
-l : It is an additional parameter for the ‘ss’ command used to add filter for the output for displaying only listening sockets.
-p : It is an additional parameter for the ‘ss’ command used to add filter for the output for displaying process associated with the sockets displayed.
-n : It is an additional parameter for the ‘ss’ command used to add filter for the output in a numeric format.

(拿到pid後)
查看socket local port:
root@hostname: /proc/7112/fd# ls -al | grep socket
查看socket local port的另一個方法:
root@hostname:/proc/7112/fd# lsof -i -a -p 7112
參考文章:http://www.dark-hamster.com/operating-system/linux/ubuntu/show-list-of-listening-services-in-linux-using-ss/

—-
問題發生原因:
http://jschu.blog.51cto.com/5594807/1732414

http://m.myexception.cn/open-source/921974.html

http://serverfault.com/questions/160558/how-to-not-get-so-many-apache-close-wait-connections

http://ahuaxuan.iteye.com/blog/657511

暴力刪除法:
https://github.com/rghose/kill-close-wait-connections

https://www.experts-exchange.com/questions/20568402/How-to-clear-CLOSE-WAIT-state-of-a-TCP-connection.html

http://www.shellhacks.com/en/HowTo-Kill-TCP-Connections-in-CLOSEWAIT-State