During some monitoring work I niticed taht top shows the process 12013 using 800% CPU (We have a 12 CPU machine) and had to find out what’s happening !!
I used top to check the process ID and then the threads taking the CPU
ps -ef |grep 12013 weblogic 12013 11836 23 Apr03 ? 03:38:31 /app/weblogic/Java/jdk/bin/java -server -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dweblogic.Name=msD2-02 -Djava.security.policy=/app/weblogic/Middleware/wlserver/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Dweblogic.system.BootIdentityFile=/app/weblogic/domains/myDomain/servers/msD2-02/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.nmservice.RotationEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=false -Dweblogic.ReverseDNSAllowed=false -Xms2048m -Xmx2048m -XX:MaxMetaspaceSize=512m -Dcom.sun.xml.ws.api.streaming.XMLStreamReaderFactory.woodstox=true -Dcom.sun.xml.ws.api.streaming.XMLStreamWriterFactory.woodstox=true -Djava.io.tmpdir=/app/weblogic/tmp/myDomain/msD2-02 -Ddomain.home=/app/weblogic/domains/myDomain -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.protocolVersion=TLS1 -Dweblogic.security.disableNullCipher=true -Djava.security.egd=file:///dev/./urandom -Dweblogic.security.allowCryptoJDefaultJCEVerification=true -Dweblogic.nodemanager.ServiceEnabled=true -Djava.endorsed.dirs=/app/weblogic/Java/jdk1.8.0_45/jre/lib/endorsed:/app/weblogic/Middleware/wlserver/../oracle_common/modules/endorsed -da -Dwls.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.home=/app/weblogic/Middleware/wlserver/server -Dweblogic.management.server=https://10.10.1.1:7001 -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true weblogic.Server top -H -b -p 12013 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12624 weblogic 20 0 10.5g 3.6g 30m S 83.1 11.5 18053:38 java 2540 weblogic 20 0 10.5g 3.6g 30m S 79.4 11.5 3:11.62 java 2591 weblogic 20 0 10.5g 3.6g 30m S 79.4 11.5 3:08.02 java 12436 weblogic 20 0 10.5g 3.6g 30m S 9.2 11.5 3:14.59 java
Now that I have the Unix threads PID, I will get the thread dump of the WebLogic Server process and try to figure out the culprits. To generate the thread dump I used the jstack java tool.
jstack -l 12013 > $HOME/jstack_100_CPU.txt
The Thread dump provides the threads with the ID in Hexadecimal, thus those provided from the top needs to be converted:
[myDomain]$ printf '%x\n' 12624 3150 [myDomain]$ printf '%x\n' 2540 9ec [myDomain]$ printf '%x\n' 2591 a1f [myDomain]$
"LDAPConnThread-1172 ldaps://ldapserver.dbi-lab.com:3636" #167119 daemon prio=5 os_prio=0 tid=0x00007f5444087800 nid=0xa1f runnable [0x00007f5409e88000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked (a sun.nio.ch.Util$2) - locked (a java.util.Collections$UnmodifiableSet) - locked (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729) at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462) at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424) at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked (a java.io.BufferedInputStream) at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source) at netscape.ldap.LDAPConnThread.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) "[STUCK] ExecuteThread: '302' for queue: 'weblogic.kernel.Default (self-tuning)'" #326 daemon prio=1 os_prio=0 tid=0x00007f5560376000 nid=0x3150 runnable [0x00007f543636e000] java.lang.Thread.State: RUNNABLE at java.io.InputStream.skip(InputStream.java:224) at weblogic.utils.http.HttpChunkInputStream.skip(HttpChunkInputStream.java:287) at weblogic.utils.http.HttpChunkInputStream.skipAllChunk(HttpChunkInputStream.java:497) at weblogic.servlet.internal.ServletInputStreamImpl.ensureChunkedConsumed(ServletInputStreamImpl.java:51) at weblogic.servlet.internal.ServletRequestImpl.skipUnreadBody(ServletRequestImpl.java:217) at weblogic.servlet.internal.ServletRequestImpl.reset(ServletRequestImpl.java:169) at weblogic.servlet.internal.HttpConnectionHandler.prepareRequestForReuse(HttpConnectionHandler.java:258) at weblogic.servlet.internal.HttpConnectionHandler.requeue(HttpConnectionHandler.java:665) at weblogic.servlet.internal.VirtualConnection.requeue(VirtualConnection.java:332) at weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1657) at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1582) at weblogic.servlet.provider.ContainerSupportProviderImpl$WlsRequestExecutor.run(ContainerSupportProviderImpl.java:255) at weblogic.work.ExecuteThread.execute(ExecuteThread.java:311) at weblogic.work.ExecuteThread.run(ExecuteThread.java:263) "LDAPConnThread-1171 ldaps://ldapserver.dbi-lab.com:3636"" #167118 daemon prio=5 os_prio=0 tid=0x00007f5544048000 nid=0x9ec runnable [0x00007f540a28c000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.interrupt(Native Method) at sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317) at sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:193) - locked (a java.lang.Object) at java.nio.channels.spi.AbstractSelector$1.interrupt(AbstractSelector.java:213) at java.nio.channels.spi.AbstractSelector.begin(AbstractSelector.java:219) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:78) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked (a sun.nio.ch.Util$2) - locked (a java.util.Collections$UnmodifiableSet) - locked (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at weblogic.socket.NIOSocketMuxer$NIOInputStream.readInternal(NIOSocketMuxer.java:802) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:746) at weblogic.socket.NIOSocketMuxer$NIOInputStream.read(NIOSocketMuxer.java:729) at weblogic.socket.JSSEFilterImpl.readFromNetwork(JSSEFilterImpl.java:462) at weblogic.socket.JSSEFilterImpl.read(JSSEFilterImpl.java:424) at weblogic.socket.JSSESocket$JSSEInputStream.read(JSSESocket.java:98) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read(BufferedInputStream.java:265) - locked (a java.io.BufferedInputStream) at netscape.ldap.ber.stream.BERElement.getElement(Unknown Source) at netscape.ldap.LDAPConnThread.run(Unknown Source) at java.lang.Thread.run(Thread.java:745)
This looks to be an issue with the LDAP server or the network.
The WebLogic java threads loops on the LDAP connection to get data. This loop should fail in a timeout depending on the Authentication provider LDAP connection timeout settings. But it looks there is an issue with the JDK that reinitiates the connection in the sun.nio.ch.EPollArrayWrapper class.
Once the LDAP connection was back the WebLogic Domain could be restarted properly and the service was back.
A possible workaround could be to not use the weblogic.socket.NIOSocketMuxer but the weblogic.socket.PosixSocketMuxer. This workaround is described in Oracle Suport note:2128032.1
Or a better solution is to upgrade to the latest WebLogic Software version and apply the latest PSU where this issue is fixed.
Cet article WebLogic Server process takes 100% CPU est apparu en premier sur Blog dbi services.