Operating System - HP-UX
1847253 Members
4413 Online
110263 Solutions
New Discussion

Network Bottleneck probability= 100%

 
uadm26
Super Advisor

Network Bottleneck probability= 100%

Hi, guys
OS: Hp-ux 11.23
Seems that I have network problems in my ia64 hp superdome server. The OVO reports:

"Network Bottleneck probability= 100.00%"

And running "dmesg" I have this:
LLT INFO V-14-1-10205 link 1 (lan4) node 2 in trouble
LLT INFO V-14-1-10205 link 2 (lan5) node 2 in trouble
LLT INFO V-14-1-10024 link 2 (lan5) node 2 active
LLT INFO V-14-1-10024 link 1 (lan4) node 2 active

What can I do make shore if that’s a minor problem or a big problem?

Thanks for all…
JT
9 REPLIES 9
Ivan Ferreira
Honored Contributor

Re: Network Bottleneck probability= 100%

Check network statistics with netstat and lanadmin. Use glance to monitor the network performance. Use tcpdump to diagnose the problem. Check your network switch port configuration and logs.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Peter Godron
Honored Contributor

Re: Network Bottleneck probability= 100%

JT,
check your alarmdef files for the condition under which you get the alarm:
/var/opt/perf/alarmdef
or
/opt/perf/newconfig/alarmdef

That will give you a better idea if the problem is serious.

Are you doing anything unusual over the network?

Coolmar
Esteemed Contributor

Re: Network Bottleneck probability= 100%

Please see the following link, hope they help you out:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1038934

uadm26
Super Advisor

Re: Network Bottleneck probability= 100%

Hi guys,

I forgot to tell you that is a SVG Cluster:
netstat report:
netstat -a | grep TIME_WAIT| wc -l
136
All time_wait's are like this:
tcp 0 0 localhost.hacl-dlm localhost.62877 TIME_WAIT
tcp 0 0 localhost.hacl-dlm localhost.62878 TIME_WAIT
tcp 0 0 localhost.hacl-dlm localhost.62879 TIME_WAIT

nestat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
lan4:1 1500 10.3.6.0 10.3.6.156 302694 0 117488 0 0
lan2 1500 172.31.0.0 172.31.0.1 107411856 0 53255049 0 0
lan0 1500 192.168.0.0 192.168.0.3 5940475 0 3042668 0 0
lo0 4136 127.0.0.0 127.0.0.1 113019655 0 113019666 0 0
lan5* 1500 none none 2188 0 2124 0 17
lan4 1500 10.3.6.0 10.3.6.131 1097922594 0 546448458 0 0
Chan 007
Honored Contributor

Re: Network Bottleneck probability= 100%

Check running
lanadmin -x for all cards that are connected.
to get the speed at what it is running at now. So your are not happy then set ot correct one by lanadmin -X 100fd

Please ensure that you have GSP/MP access to your server before you do any such change and do that after office hours..

Also check you

netstat |egrep -i "Wait|est"

Check your syslog and dmesg for any error.

I assume your network may be running in 1/2 duplex or your auto neg is off. Check them too.

What is your LAN speed of (10/100/1000)?
uadm26
Super Advisor

Re: Network Bottleneck probability= 100%

Hi,

Its normal has so many entries like that?
tcp 0 0 localhost.hacl-dlm localhost.54904 TIME_WAIT

# netstat |egrep -i "Wait"| wc -l
143

And it's increasing.
uadm26
Super Advisor

Re: Network Bottleneck probability= 100%

Hi, Chan
The LAN speed itâ s forced to 100FULL. The Network devices are 1Gigabit but the switch only accepts a speed 100Mbit/s. In Half duplex Iâ ve some problems. But thereâ s so many others servers HP connected on the same switch but only that 2 clusters nodes reports network errors.
Looking for answers...;)
Chan 007
Honored Contributor

Re: Network Bottleneck probability= 100%

JT,

Your LLT error from dmesg says that you have a cluster heartbeat problem, one way is to
Change the default value of timeout which is 15000 ms (gabconfig -l), but this is not the correct way to tackle this problem.

Better check your LAN setup. Try to make a seperate lan for your Clustered network and avoid unwanted trafic. Have your heartbeat and LAN seperately from LAN that does produce more network traffic.

Also are you sure that your other systems doesn't have the network problem. If so your setup may be incorrect.

Did you apply all the latest patches OS and veritas patches to it.

Chan
rick jones
Honored Contributor

Re: Network Bottleneck probability= 100%

"Never" disable autoneg, especially with Gigabit cards. Only if you know you have a fubar switch that you _cannot_ replace with a good one should you resort to the kludge that is hardcoding duplex settings. See the attachment about duplex and autoneg.

TIME_WAIT has _nothing_ to do with network bottlenecks. 100 some-odd TIME_WAITS is nothing to worry about.

The lanadmin stats suggestion is the one to go with. You want to see if the outbound queue depth remains non-zero for a non-trivial length of time. You also want to make sure that the link isn't dropping any frames (packets).

Also, check the CPU utilization of the CPUs taking interrupts from the NICs. Since this is in the HP-UX heirarchy, you can see the assignemtns with the intctl command and then look at the per-CPU stats in glance - second page.

Also, go through the knowledge base to see if it talks about making sure that TOPS (Thread Optimized Packet Scheduling) remains enabled. That can help with interrupt CPU saturation.
there is no rest for the wicked yet the virtuous have no pillows