Operating System - HP-UX
1826086 Members
4638 Online
109690 Solutions
New Discussion

Interface throughput - am I hitting the limit?

 
SOLVED
Go to solution
Richard I Curtis
Frequent Advisor

Interface throughput - am I hitting the limit?

Hi,
I have a couple of boxes connected to 100Mbit switch ports via A9784-60002 adapters. We will be moving to gigabit switch ports at some point, but at present, the ports are at 100Mb so the hp-ux boxes (11iv1) are hardcoded at 100/full.

We are getting some errors in our apps which transfer files between these two hosts (no routing - both into the same switch), but the error messages are not very useful ("Unable to get ACK of handshake message).

I have used the attached script (sourced from these forums in the past) to measure throughput, and notice some numbers which I cannot explain.

Over a 24 hour period, the peak inbound rate was 23421kb/sec, and the peak outbound was 15431kB/sec.

On a 100Mb interface, how could I be pulling in 23Mb/sec?
Is the logic in the attached script correct, or am I really hitting the limits of the interface and that is what is likely causing the application errors?
Are there any other counters (netstat -n tcp?) that I should be looking at?

Thanks in advance for any suggestions

Richard
13 REPLIES 13
BUPA IS
Respected Contributor

Re: Interface throughput - am I hitting the limit?

Hello,
Due to the slight delay in getting the numbers off the card with lanadmin -g mibstats your loop could take more than one second to execute especialy if the card is busy.
However to see if the interface its self has any errors please post the output of
lanadmin -g mibstats_ext n
where n is the interfaces' ppa number
Mike
Help is out there always!!!!!
Richard I Curtis
Frequent Advisor

Re: Interface throughput - am I hitting the limit?

Hi Mike,
Please find the output below..


# lanadmin -g mibstats_ext 6

LAN INTERFACE EXTENDED MIB STATISTICS DISPLAY
Thu, Aug 27,2009 15:35:50

Interface Name = lan6
PPA Number = 6
Description = lan6 HP PCI-X 1000Base-T Release B.11.11.25
Interface Type(value) = ethernet-csmacd(6)
MTU Size = 1500
Speed = 100 Mbits/Sec
Station Address = 0x0014c29c263c
Administration Status = up
Operation Status = up
Last Change = Thu Jan 15 13:19:55 2009
Inbound Octets = 8290712634686
Inbound Unicast Packets = 16353160263
Inbound Multicast Packets = 0
Inbound Broadcast Packets = 5977676
Inbound Discards = 0
Inbound Errors = 0
Inbound Unknown Protocols = 233
Outbound Octets = 12992072699912
Outbound Unicast Packets = 16278796164
Outbound Multicast Packets = 0
Outbound Broadcast Packets = 95888
Outbound Discards = 0
Outbound Errors = 0
Counter Discontinuity Time = Thu Jan 15 13:19:08 2009
Physical Promiscuous Mode = false
Physical Connector Present = true
Interface Alias =
Link Up/Down Trap Enable = enabled

Ethernet Specific Extended Statistics Display

Index = 7
Alignment Errors = 0
FCS Errors = 0
Internal MAC Transmit Errors = 0
Frame Too Long Errors = 0
Internal MAC Receive Errors = 0
Symbol Errors = 0
Single Collision Frames = 0
Multiple Collision Frames = 0
SQE Test Errors = 0
Deferred Transmissions = 0
Late Collisions = 0
Excessive Collisions = 0
Carrier Sense Errors = 0
Control Field Errors = 0
Multicasts Accepted = 0
Duplex Status = fullDuplex
Rate Control Ability = false
Rate Control Status = rateControlOff
Collision Count = 0
Collision Frequency = 0
#

Bill Hassell
Honored Contributor

Re: Interface throughput - am I hitting the limit?

Make sure you clear the lan card stats on a regular basis. The numbers will quickly overflow the simple 32 bit integer arithmetic in the sample script. And the numbers will silently be wrong as number exceed 2 billion. Change the script to use bc for all your arithmetic.

You can expect (with no collisions) that a 100 Mbit link can transfer about 80% of the wire speed. 100 Mbit = 80 Mbit throughput or 8 MByte/s. You can verify this with ftp. Any use of NFS or scp/rcp, etc will be significantly slower.


Bill Hassell, sysadmin
Hein van den Heuvel
Honored Contributor
Solution

Re: Interface throughput - am I hitting the limit?

Richard,

Thanks for the refreshingly complete problem statement.

Bill is of course correct.
You are running into the shell's 32 bit limitation.
All you have to do to see the problem is to manually execute a one-line get x or y and echo $x or $y. You'll see a 10 digitis or smaller number and often negative.
For example for the example values:
# let y=$(echo "8290712634686")
# echo $y
1425753406
# let y2=$(echo "12992072699912")
# echo $y2
-203370488
# let y2=$(echo "12992072699922")
# echo $y2
-203370478

So you could do bc, but why not let AWK do more of the work. You woke it up anyway!

If you want to stay close to the original scheme, then I would suggest you try something along the line of the following example (working but simplyfied):

---------- test --------------
let PPA=1
let in=0
let ou=0
while true
do
sleep 1
lanadmin -g mibstats $PPA | awk -v o_in=$in -v o_ou=$ou -f speed.awk | read in ou r_in r_ou
echo "Lan $PPA: ${r_in} Kb/s inbound, ${r_ou} Kb/s outbound"
done

----------- speed.awk ----------
/^Inbound Octets/ {n_in = $4}
/^Outbound Octets/ {n_ou = $4
r_in = (n_in - o_in) / 1024
r_ou = (n_ou - o_ou) / 1024
print n_in, n_ou, int(r_in), int( r_ou)
}
-----------------------------------

So what this does it to have the shell remember the old values, and display the rates.
awk uses those old values (explicitly passed along) in the math, and provides the rates, and the new old values in the output.

No double grep
No bad math
Somewhat tell-tale variable names.
- Why try to remember that y is new and x = old and y is in and y2 is out?
Call a spade a spade.
Here (albeit criptically):
n_ for new, o_ for old, r_ for rate.
_in for in, _ou for out. Duh!

This still leave the second not being precise.

Personally I would put the whole lot in awk or perl, and have it do the math, looping, waiting and pretty-printing. But that's me, trying to avoid learning a platform dependent shell.

hth,
Hein.
BUPA IS
Respected Contributor

Re: Interface throughput - am I hitting the limit?

Richard ,

Refining the script, I suppose, you could report the stats to see if you had any errors you didn't notice before clearing the registers

lanadmin -g mibstats_ext ppa
lanadmin -c ppa


I would then suggest that you only have a look every 10 seconds or even once a minute this will reduce effect of the delay in the lookup (remembering to divide by 1024*secs)
and then report the stats to check for errors again at the end.

Back to the original problem:
Check that the other server's network card is error free.
One other thing which caught us out on one occasion.
Check the switch is not overloaded, in our case a backup was running between another pair of servers on the same switch leading to the switch delaying and occasionaly dropping packets including ACKs for all ports on that switch segment which was saturated.
Once you get the switch upgraded to gigabit don't forget make both ends of each link auto negotiate.

I hope this is of some help

Mike
Help is out there always!!!!!
Hein van den Heuvel
Honored Contributor

Re: Interface throughput - am I hitting the limit?

As Mike indicates, 10 seconds will reduce any sampling time errors significantly.

Also, with 10 seconds sample it will become more reasonable to pre-divide the number awk found into the 32 bit range, as there is 10x less loss of precision.

Instead of :

sleep 1
grep Outbound|awk '{print $4}')
:
let t2=$t2/1024

Use:
sleep 10
... grep Outbound|awk '{print $4/(1024*128)}')
:
let t2=$t2*8/10

fwiw,
Hein.

Richard I Curtis
Frequent Advisor

Re: Interface throughput - am I hitting the limit?

Thanks for all the suggestions guys. I got caught up in other stuff yesterday but will try some of the suggestions on Tuesday and report back.

Points to follow..
Richard I Curtis
Frequent Advisor

Re: Interface throughput - am I hitting the limit?

Hein,
I am a little confused by the suggested changes you described.

After changing to use a sleep 10, why do we divide the number of octets by (1024*128)?

Correct me if I am wrong, but octets = bytes, so if we just did a divide by 1024, we would end up with kilobytes, but I am lost as to why we have *128 in there?

It's probably me being stupid but I just can't see it :)

Regards

Richard
Hein van den Heuvel
Honored Contributor

Re: Interface throughput - am I hitting the limit?

The 32 - 64 bit number game is tricky.
You need to divide by some, but not too much because you'll loose too much precision when you later take a difference between relatively large numbers.

Specifically, pre-divding by 1024 is not good enough.
So then it is tempting to just pre-divide by 1024*1024 and work in megabytes. But because the shell works with whole numbers, that's too coarse, even for a 10 second range (imho).

So I suggest to divide by something less than 1MB, but more than 1KB, and took 1024*8.
The turn it back into KB after the subtraction by multiplying by 8, and correct for the 10 seconds dividing by 10.
That's the: let t2=$t2*8/10

So the whole complexity is there to gete enough precision, yet never exceed 32 bits.

Now admittedly I was thinking INT math, but awk will do fractions, so Ishould have written: int( $4/(1024*128) )

and... I did not try... just 'thinking fingers' and hoping for the best. I may well have don it wrong.

Makes a little more sense?
Hein.

Richard I Curtis
Frequent Advisor

Re: Interface throughput - am I hitting the limit?

Hein,
It makes sense to me, but only if we were doing int( $4/(1024*128) ) and then again at the end doing let t2=$t2*128/10, but your example has us doing $4/(1024*128) but at the end, we do let t2=$t2*8/10

Was the $4/(1024*128) a typo, or am I still missing the point?

Thanks for your help and time on this one!
Hein van den Heuvel
Honored Contributor

Re: Interface throughput - am I hitting the limit?

Looks like I have the details wrong.
The 8 was supposed to go with the 128 to make an other 1024... but then it needs to be a division, not a multiplication. Just fill in some example numbers from your systems, verify that number are in reasonable ranges: more than a few hundred to get precision, less then a billion to stay clear from 32 bit overflows.
Good luck!
Hein

rick jones
Honored Contributor

Re: Interface throughput - am I hitting the limit?

I never have been fond of hardcoding duplex settings, even at 100Mbit. I've not had probalems with autoneg and HP-UX system NICs and HP ProCurve switches... perhaps my autoneg life has been charmed. Anyway...

That there are zeros for outbound discards suggests to me that even if you are close to saturation, you have not exceeded it for any length of time greater than it would take to fill the driver's transmit queue. I thought there was an Outbound Queue Length in the mibstats - didn't notice it in your output. If one is willing to abuse Little's Law:

AvgQueueLen = Utilization/(1-Utilization)

then averaging the queue length over time will give some idea of the outbound utilization of the NIC.
there is no rest for the wicked yet the virtuous have no pillows
Richard I Curtis
Frequent Advisor

Re: Interface throughput - am I hitting the limit?

Yesterday we were able to restart the app which is showing these errors and since then, we have had no re-occurances of the error message so I am now thinking it was nothing to do with throughput.

Thanks to all of the feedback here I have now amended the script and am getting consistent numbers back.

Rick,
The lack of outbound queue length in the stats I posted seems to be because I did the mibstats_ext - if I do the same but with mibstats, the outbound queue is shown - sometimes it is none zero but never more than 2

Again thanks for everyones help!