Operating System - HP-UX
1833016 Members
2728 Online
110048 Solutions
New Discussion

Server access fails because of NIC Bandwidth?

 
cam9269
Regular Advisor

Server access fails because of NIC Bandwidth?

Hello everyone,

I'm in a bit of a mess with networking things. It seems that one of my customer's machine becomes inaccessible after running for several days. This is a backup server connected to prod/apps servers via NFS. The NIC is a 10/100Mbps device. The only thing I could think of is that it's getting congested throughout the duration of its use, that's when packets start to drop and connections start to loose.

I can also see several "NFS Write Failed: RPC timed out"

From the configuration, I can see that they have somewhat configuredt Auto-Port Agg, using lan7 and lan8 - But LAN8 is not present! So these messages are seen in the nettl logs:

**********************Gigabit Ethernet LAN/9000 Networking******************@#%
Timestamp : Fri Jul 24 EST 2009 15:01:23.747475
Process ID : [ICS] Subsystem : GELAN
User ID ( UID ) : -1 Log Class : ERROR
Device ID : 8 Path ID : 0
Connection ID : 0 Log Instance : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<2009> 1000Base-SX in path 1/8/0/0
detected a link down event from the device.

********************************UNKNOWN SUBSYSTEM***************************@#%
Timestamp : Fri Jul 24 EST 2009 15:01:23.752026
Process ID : [ICS] Subsystem : 188
User ID ( UID ) : -1 Log Class : DISASTER
Device ID : 900 Path ID : 0
Connection ID : 0 Log Instance : 0
Location : 00123
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I don't know if these errors are connected to what's happening though.

Need help in resolving this as the backups are very much affected and it's a very risky situation on their part

TIA!
13 REPLIES 13
Mel Burslan
Honored Contributor

Re: Server access fails because of NIC Bandwidth?

Since I am not an expert on nettl traces, my words are a bit on the common sense side.

You said lan7 and lan8 are forming an aggregate interface but lan8 is not present. Is that right ? If so, is lan8 down for a short time, such as it broke down and a replacement is expedited any day now, or is it more like "hell with lan8, lan7 will be sufficient" situation ? If it looks more like the latter, it is best to dissolve the aggregate in my opinion.

Also, a backup server with 10/100 NIC is not a very efficient way of running things. I am not sure what you mean by "backup server" but if this poor machine is mounting volumes via NFS and what not, to back them up to its local drives, congestion will be a big issue. Looks like the IGELAN driver is loaded, so I am under the impression that, this machine have some gigabit interfaces. Why are not being used ? Also, could you provide an output from:

ioscan -fnC lan

________________________________
UNIX because I majored in cryptology...
Steven E. Protter
Exalted Contributor

Re: Server access fails because of NIC Bandwidth?

Shalom,

The natural response from me in this case would be to use a faster LAN interface on the backup server or install a faster interface.

If that is not possible, designate a different server to be the backup.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

Thanks for the response guys, much appreciated. I agree that we should be using a faster LAN card for this backup server. The only thing I need to provide is data that would support that - how can I show that the current LAN card is not up to the requirements of the activity going thru the NIC? How and what kind of network data do I need to collect to verify the findings?

Regards,
Chris
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

As a followup on my reply, am I correct in assuming that an overloaded NIC can cause server access to fail?
Bill Hassell
Honored Contributor

Re: Server access fails because of NIC Bandwidth?

Well, based on the information you provided, the GELAN NIC card is 1 Gigabit card, not a 10/100 card. Since the APA link is apparently broken, I would figure out what is wrong (or if it ever worked). Link down means that the LAN cable has been disconnected. Congestion is not a reason for the errors. You can run as fast as the network will allow.

Since this appears to be a Gigabit card, I would see what the card is actually using, especially the duplex setting. Use lanadmin and display the current duplex setting:

lanadmin -x 7

If it is half duplex, that is your problem. The link (NIC plus switch) has been clobbered and the link has likely defaulted to 100 Mbit, half duplex, which means your link speed is horrible and full of collisions. Collisions are impossible at full duplex. Use the attached script to show the settings in an easy to read format.


Bill Hassell, sysadmin
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

Thanks Bill,

Excellent script! I will execue this on the server to see the configuration.

As an additional infomation, I just found out that the nfs server has been setup with an APA config using GELAN NICs for NFS purposes. Now, how do I verify that my NFS client is also using its own GELAN NICs to connect to the GELAN NICs which the NFS server has been configured with?


Regards
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

I think here's the proper picture for this environment

LAN0 (10/100MBps) is the access to the server
LAN900 (Lan7/8 - 1GBps) is the access connecting to the NFS servers this backup machine needs to backup

I would say this setup is good and should not cause any problems accessing the machine even if backups are going on, since it has been isolated into another NIC. What may have caused the inaccessibility to LAN0 then?

This is turning out to be messy, huh?

Regards,
Chris
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

I just got this routing table from the machine. Is this a correct setup at all?

Routing tables
Destination           Gateway            Flags   Refs Interface  Pmtu
127.0.0.1             127.0.0.1          UH        0  lo0        4136
134.144.188.38        134.144.188.38     UH        0  lan2       4136
134.144.141.8         134.144.141.8      UH        0  lan0       4136
134.144.188.166       134.144.188.166    UH        0  lan900     4136
134.144.141.20        134.144.188.165    UGH       0  lan900     1500
134.144.141.17        134.144.188.162    UGH       0  lan900     1500
134.144.141.16        134.144.188.161    UGH       0  lan900     1500
134.144.141.19        134.144.188.164    UGH       0  lan900     1500
134.144.141.18        134.144.188.163    UGH       0  lan900     1500
134.144.141.12        134.144.188.34     UGH       0  lan2       4352
134.144.141.11        134.144.188.33     UGH       0  lan2       4352
134.144.188.32        134.144.188.38     U         2  lan2       4352
134.144.188.160       134.144.188.166    U         2  lan900     1500
134.144.128.0         134.144.141.8      U         2  lan0       1500
127.0.0.0             127.0.0.1          U         0  lo0        4136
default               134.144.136.50     UG        0  lan0       1500
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

sorry for that last entry, here's a cleaner view:

Routing tables
Destination Gateway Flags Refs Interface Pmtu
127.0.0.1 127.0.0.1 UH 0 lo0 4136
134.144.188.38 134.144.188.38 UH 0 lan2 4136
134.144.141.8 134.144.141.8 UH 0 lan0 4136
134.144.188.166 134.144.188.166 UH 0 lan900 4136
134.144.141.20 134.144.188.165 UGH 0 lan900 1500
134.144.141.17 134.144.188.162 UGH 0 lan900 1500
134.144.141.16 134.144.188.161 UGH 0 lan900 1500
134.144.141.19 134.144.188.164 UGH 0 lan900 1500
134.144.141.18 134.144.188.163 UGH 0 lan900 1500
134.144.141.12 134.144.188.34 UGH 0 lan2 4352
134.144.141.11 134.144.188.33 UGH 0 lan2 4352
134.144.188.32 134.144.188.38 U 2 lan2 4352
134.144.188.160 134.144.188.166 U 2 lan900 1500
134.144.128.0 134.144.141.8 U 2 lan0 1500
127.0.0.0 127.0.0.1 U 0 lo0 4136
default 134.144.136.50 UG 0 lan0 1500
Bill Hassell
Honored Contributor

Re: Server access fails because of NIC Bandwidth?

It's not clear what is failing at this point. You mentioned lan0 as well as lan7/8. I would first verify what networks are working -- use traceroute to the target machines. Then check the output of my script and make sure there are *NO* HD entries. What do you see in syslog.log as far as networking errors? The RPC timeout may indicate a broken network connection.

At this point, I would perform basic network verifications (ping, traceoute, telnet/ssh) for all the LAN connections and not try to use complicated applications such as backups until basic communication is fixed. If lan8 is truly down (run lanadmin and look at each lan instance), that should be fixed as it will probably uncover other network problems.


Bill Hassell, sysadmin
cam9269
Regular Advisor

Re: Server access fails because of NIC Bandwidth?

Hi Bill,

Here's the output of the script you sent me:

LAN-ID Network IP-address Speed DX AUTO MTU I/O path
====== =============== =============== ===== == ==== ==== ===============
lan0 134.144.128.0 134.144.141.8 100 FD no 1500 0/0/0/0
lan2 134.144.188.32 134.144.188.38 100 -- -- 4352 1/12/0/0
lan900 134.144.188.160 134.144.188.166 1000 -- -- 1500 --

======================================================================

We had the following findings one time that this issue came up:

1. ping 134.144.188.35
PING 134.144.188.35: 64 byte packets
64 bytes from 134.144.188.35: icmp_seq=0. time=1. ms
64 bytes from 134.144.188.35: icmp_seq=1. time=0. ms

2. ping 134.144.141.14
PING 134.144.141.14: 64 byte packets

3. ping 134.144.141.15
PING 134.144.141.15: 64 byte packets

from the routing table, #1 above is the network for lan900, #2 and #3 is the network for lan0

From here, it seemed lan0 failed, but there wasn't any HW errors logged when we checked the logs.

Is this any help?
BUPA IS
Respected Contributor

Re: Server access fails because of NIC Bandwidth?

Hello
As Bill suggests you need to find out if the network cards are behaving correctly.
Please post the output of lanscan and then for each real ppa please post the output of
lanadmin -g mibstats_ext ppan (ppan is the ppan number e.g. 0 or 1 or 7 )
this will list the card settings and the counters .
regards

Mike .
Help is out there always!!!!!
Andrew Rutter
Honored Contributor

Re: Server access fails because of NIC Bandwidth?

hi,

well it seems to me you are abit confused too?

your successful ping is actually on the lan2 network.

You need to check the config files and the physical connections and start with.

Andy