Re: What values are OK for IPC Subsystem Queuing?

jj_geiger · ‎12-16-2016

We have a tricky performance issue and I am curious as to how IPC Subsystem Queuing may come into play here (if at all). We are running HP-UX 11i V3 on a BL870c i2 blade which is one node of a five-node SG cluster. This node talks to an Oracle RAC Cluster.

What we are seeing is that a Cobol job that make many (as in millions) of calls to the Oracle RDBMS is spending most of it's time blocked on SOCKT. The Oracle folks assure that the database is performing fine. Ditto for the Storage and Networking folks. On our end, system resources are barely being hit. The network interface in play here is averaging about 0.8% utilization.

One thing we have discovered is that the IPC Subsystem queue -- while rising to as high as ~300 during the day (when users are on the system) falls down to about 50 - 70 sustained throughout the evening hours which is whgen our batch job is showing poor performance. The problem is this: I have no idea if those values (either during the day and/or at night) are OK or not. The University of Google has not been of much help.

Also, I am curious if there could possibly be a link between the IPC Susbsyem Queuing and the poor performing process spending so much time blocked on SOCKT.

Finally, if there is by some chance a link, how might one reduce the IPC Subsystem Queuing? Again, the UoG has not been my friend here.

Thanks!

Bill Hassell · ‎12-16-2016

A socket conection is just a simple handshake between processes. In a simple model, a request goes out through the socket connection and then the process waits (or is blocked) for a response. you'll have to identify the processes that appear to be in a blocked state and see what they are waiting for. You'll need to add instrumentation (logs, stats) to the supplier of the responses to see why they are taking so long to reply.

Bill Hassell, sysadmin

jj_geiger · ‎12-27-2016

Bill -thanks for the info. SInce I have posted this, I have been reassurred by multiple resources that any sustained queuing on the network interface above zero is not good. We are seeing sustained queuing *MUCH* higher than that (up to 50--600 during the day and avaergaing about 50-70 during evening batch processing). The other four nodes in this cluster (all BL870c I2's as opposed to this node, an rx2800) have network queuing peaks of < 3 and averages of zero. So, some5thing is amiss with this rx2800 -- which was ignited (via DRD rehosting) from one of those blades. We have another rx2800 in the cluster which is not even hosting a single package at this point which is showing high network queuing.

So, I am thinking that this has to have something to do with the way we built these systems. Problem is, I have come up empty looking for causes despite looking and digging for two weeks now. All nodes (including the problem one) show network utilization averaging less than 2%. All kernel parameters and ndd settings (tcp) are identical. This stands to reason based on the way this node was built.

Still, something that eludes me thus far, is causing network queuing that is orders of magnitude higher than the other nodes in the cluster. Any ideas of where to look would be greatly appreciated.

The BL870c i2 blades (which are fine) all use NICs exposed by FLEX-10 VC modules and use the iexgbe driver. Conversely, the rx2800 node is using 10Gb ports from AT118A and AT094A cards. On all nodes, we have APA configured for the public interfaces.

TIA, jjg

Bill Hassell · ‎12-27-2016

You wrote: ...which was ignited (via DRD rehosting) from one of those blades.

The rx2800 doesn't have a lot in common with a blade, expecially at the hardware level. Cloning using Ignite is tricky when there is a hardware mismatch. It may be worth the effort to do a cold install from the DVD so that the resultant kernel and driver setup matches the rx2800. Also, APA does require fairly recent patches. Are your patches up to date within the last 2 years? The patch sewt for the rx2800 won't be the same as the blades, especially with networking.

And just to rule out any networking setup issues, run lanadmin -g # where # is your lan ID (0=lan0, 901=lan901, etc). There are two parts to the listing. The second should have all zeros.

Then transfer a 1-2 GB file using scp. Start with a reference transfer between two blades that perform well. Then run the transfer between the rx2800 and a blade. The second half of the listing are error counts. If they are not zeros, use lanadmin -c # (#=lan number) to clear the stats and then run some production work (where the problem shows up) and run the lan counts again.

Bill Hassell, sysadmin

jj_geiger · ‎12-28-2016

Bill - thanks once again for the feedback. I ran the tests you suggested and I found no errors at all reported. The wall times for the operations tested were all similar as well. That said, here is a screen shot of the lanadmin command from the node at issue immediately after the scp from it (to another node in the cluster) completed):

[root@stlam54p]:/tmp # lanadmin -g 48

LAN INTERFACE STATUS DISPLAY
Wed, Dec 28,2016 12:04:17

PPA Number                      = 48
Description                     = lan48 HP 10GBase-SFP Release IOCXGBE_B.11.31.1503
Type (value)                    = ethernet-csmacd(6)
MTU Size                        = 9000
Speed                           = 10000000000
Station Address                 = 0x40a8f0b31fdc
Administration Status (value)   = up(1)
Operation Status (value)        = up(1)
Last Change                     = 7844
Inbound Octets                  = 3606040782
Inbound Unicast Packets         = 952779846
Inbound Non-Unicast Packets     = 93931
Inbound Discards                = 1
Inbound Errors                  = 0
Inbound Unknown Protocols       = 149
Outbound Octets                 = 3058005149
Outbound Unicast Packets        = 1733380368
Outbound Non-Unicast Packets    = 16050
Outbound Discards               = 0
Outbound Errors                 = 0
Outbound Queue Length           = 510
Specific                        = 655367

Ethernet-like Statistics Group

Index                           = 2
Alignment Errors                = 0
FCS Errors                      = 0
Single Collision Frames         = 0
Multiple Collision Frames       = 0
Deferred Transmissions          = 0
Late Collisions                 = 0
Excessive Collisions            = 0
Internal MAC Transmit Errors    = 0
Carrier Sense Errors            = 0
Frames Too Long                 = 0
Internal MAC Receive Errors     = 0

Note that high value for network queuing -- this (seemingly) super-high value here seems to be the norm for this interface while this same metric on all lan900 APA NICs from any of the other cluster nodes (all blades) is effectively zero. To be honest, I cannot even state with any degree of certainty if this is manifesting itself a s aperformance issue -- I don't think it is. But, the value seems so far from normal that I am compelled to follow up / investigate.

I am aware of the issues using Ignite (and I assume that a DRD Rehosting operation is, for all intents and purposes, using the inards of Ignite to do its work) to migrate from a BL870c i2 to an rx2800. This node, FWIW, has been in use for well over a year now - again, it was only lately that this ussue was brought to light as part of an investigation into a larger, over-arching performance issue with the cluster as a whole. I do have a case open with HPE support to ensure I have all proper iocxgbe/networking configuration in place.

Bill Hassell · ‎12-28-2016

The lanadmin stats look perfect. So you might want to eliminate a couple complex factors: The APA subsystem and possible router issues. As you know, the gateway needs special handling for the duplicate IP addresses on multiple NICs. It may be config's OK but may be contributing to the high queue depth due to a firmware bug or strange gbic behavior. If this is also on a unique Vlan, there may be some contribution there.

I would try a single NIC with no features as a start. If the queue becomes normal or drops significantly, the issue may be in the network appliances.

Another approach would be to run Wireshark and take a look at the same traffic on the blades and on the rx2800's. Of particular interfest will be the elapsed time from open to close for similar IPC handshakes.

Bill Hassell, sysadmin

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: What values are OK for IPC Subsystem Queuing?

What values are OK for IPC Subsystem Queuing?