Operating System - Linux
1827847 Members
2108 Online
109969 Solutions
New Discussion

Outbound Queue Length=2, Who?

 
SOLVED
Go to solution
ronlevyca
Frequent Advisor

Outbound Queue Length=2, Who?


We are having a problem in which a rp7420 running 11.23 with a 2-port gigabit aggregation is the main outbound network link. This machine is the Oracle RAC database server, and at the moment it continually reports a network bottleneck using glanceplus. Using lanadmin -g mibstats 900, the main indication of trouble is the Outbound Queue Length=2... where the Dev and the QA box which are of course under less load but have similar quantity of packet output shows an Outbound Queue Length=0.

The web guys in charge of the applications boxes who hit this database server complain that there's some delay in getting their responses back. What I want to know is - is there any way of determining WHICH machines are blocking up my output? Whose connections go through fast and who slow? Is there any way of connecting this with, say, my netstat -a output and seeing what kind of traffic goes to each machine or connection so we can troubleshoot this further?
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: Outbound Queue Length=2, Who?

Shalom,

Run this set on the rp7420 systems and other HP-UX based systems.

http://www.hpux.ws/?p=6

IT will let you know if there really is a performance issue.

You might be able to resolve network issues with ndd utility or /etc/rc.config.d/nddconf settings.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
ronlevyca
Frequent Advisor

Re: Outbound Queue Length=2, Who?

So now that I have the data, what should I be doing with it? Uploading all the various files?
rick jones
Honored Contributor

Re: Outbound Queue Length=2, Who?

The driver transmit queue (the outbound queue) is an equal opportunity delay. It means that the quanity of traffic being sent out the NIC is consistently close to the limits of the NIC, which means that it will be affecting any destination reached via that NIC.

Now, if you want to find-out which are the most common destinations for traffic through that NIC, you need to take a packet trace with something like tcpdump. If possible, you might want to do that with a third system connected to the switch, where the switch has had a "monitor" port configured to get traffic on the port(s) to your server mirrored. That way you have less of an effect on the server itself.
there is no rest for the wicked yet the virtuous have no pillows
ronlevyca
Frequent Advisor

Re: Outbound Queue Length=2, Who?

We have Openview running, and watching the router that these machines connect into. My network admin tells me that of the two ports of the aggregation link, I should not be close to the maximum. he says that the average throughput per 1 gb port is 12 Mbps, with a momentary spike on one system of 450 Mbps incoming, and another on the other system of 350 Mbps incoming. But normally the rate is low.

But still I get continuous network bottleneck warning in Glance and that continuous outbound queue length.

rick jones
Honored Contributor

Re: Outbound Queue Length=2, Who?

12 Mbps doesn't sound like all that much. Still, all the world is not necessarily bits per second. It is also packets per second. So, you might ask the network admin about the packets per second.

Also, it is entirely possible that the outbound queue length stat for an aggregate is "off" - ie buggy. Some ideas on how to check:

*) if the machine can be idle, really idle - does glance or lanadmin -g mibstats still show an outbound queue of 2?

*) if the aggregate can be split and the traffic sent over just a plain old NIC, does glance/lanadmin still show an outbound queue of 2?
there is no rest for the wicked yet the virtuous have no pillows
ronlevyca
Frequent Advisor

Re: Outbound Queue Length=2, Who?

You're right that it doesn't seem like a lot. The spikes are only occasional, but the output queue length and the alarm in Glance are continual.

This database server is the backend for a website that is continually crawled by zillions of web robots, so it never goes truly quiet. Night-time usage is about 70% of peak usage.

One possible problem is that it has 5 front-end applications machines, and before yesterday they were all 100mb machines. We have upgraded 3 of them to gigabit ethernet, but it did not seem to change matters.

As it is a production box, we can't really un-aggregate it for easy testing.
rick jones
Honored Contributor
Solution

Re: Outbound Queue Length=2, Who?

The queue being described by the Outbound Queue Length metric is a queue between the system and the NIC. Modulo the enabling of pause frames between the NIC and the switch and backpressure through switches, *remote* systems being 100 Mbit, 1000 Mbit or even 10 Mbit is a don't care as far as this queue is concerned - it is there simply to "buffer" between the host and the NIC. The host puts packets onto the queue, the NIC drains them as fast as it can put them onto the cable.

If it is _really_ always two, it means the "NIC" more or less completely saturated in its outbound traffic.

Glance uses packet per second rates, as well as the average queue depth to guesstimate when a NIC is saturated.
there is no rest for the wicked yet the virtuous have no pillows