Operating System - Linux
1822582 Members
3504 Online
109642 Solutions
New Discussion юеВ

Server Loses Network for no apparent reason?

 
SOLVED
Go to solution
Robert Walker_8
Valued Contributor

Server Loses Network for no apparent reason?

Hi,

Since a few weeks ago our RHEL4 U4, Running 2.6.9-42.0.2 and now -42.0.3 has on a saturday just stopped responding to the network. A service network restart resolves the problem, however disk space is fine, link on interface is fine and switch it is plugged into is fine. ifconfig shows no problems although about 81 million packets have gone through the interface?

We arent doing anything unusal on saturdays as during the week. we have an NFS file transfer going from 6am to 10am and although this time the network dies in the middle the last time it died 1am no where near it.

Kernel and Messages log file show no reaction to network loss, only upon restart of network service do we see eth0 up/down messages etc as expected.

Anyone know of any ideas?

Robert.
18 REPLIES 18
Steven E. Protter
Exalted Contributor

Re: Server Loses Network for no apparent reason?

Shalom,

I remember a story about the cleaning staff disconnecting a plug to run the vaccums. This may be happening on some network hardware.

/var/log/messages

Check the cron logs for something unusual.

Check the logs on the network hardware.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: Server Loses Network for no apparent reason?

Shalom,

Some systems have built in bios, some require a boot from cd, but hardware diagnostics should be run to check the hardware of the system itself.

meant this for the first post.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday SEP,

Wish it were that simple for cleaning staff, but alas no. The servers are racked up and there is no cleaners on Saturday mornings.

The system behaves itself during the weekdays when load is higher, its really only a Mon-Fri 9-5pm server.

There is nothing, all quiet in the logs as if someone has done an ifdown eth0 (or unplugged the network cable on the switch)on the device however ifconfig shows network adapter up, same with ethtool mii-tool etc.

Robert.
Ivan Krastev
Honored Contributor

Re: Server Loses Network for no apparent reason?

What is your network card ? I have simillar problems with Intel NIC's - overflow occured on heavy trafic.Check for latest drivers also.

regards,
ivan
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Ivan,

The Proliant DL380 G4s we have use "Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet" adapters.

We are also using standard Redhat software (i.e. not the drivers supplied by HP).

Tracking Kernels to see if there are issues:

27/09 We installed Update 4 giving 2.6.9-42
10/10 We upgraded to 2.6.9-42.02
25/10 We had a crash/hang no details unforunately
04/11 - Network Hang server otherwise ok
08/11 - Upgrade to 2.6.9-42.0.3
11/11 - Network hang like 4/11 otherwise ok

Maybe something wrong since 2.6.9-42?

Robert.
Alpha977
Valued Contributor

Re: Server Loses Network for no apparent reason?

i think is most probable that isn't a server problem, but a switch problem.

Is it connected to a Cisco Catalyst switch?

And if is true, do you have a switch simulator or similar installed on the server? Check into log if is s CRC error.


A Cisco switch use this protection, if you connect a switch to a port, it shut down the port immediatly.
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday,


The server is plugged into a Cisco, more than likely a Catalyst. However the server has behaving it self up until kernel 2.6.9-42.0.2

We dont have any switch simulators or any other network devices connectted to that port, switches/hubs etc.

Robert.
Ragu_3
Trusted Contributor

Re: Server Loses Network for no apparent reason?

Check the driver module that your Broadcom NIC is using. Are both the bcm and tg3 drivers getting loaded? There maybe old cruft of modutils loading the "bcm" module, now this is taken care of via the "module-init-tools" package for the 2.6.x kernels. Do an rmmod on the bcm module and load the tg3 alone.
Debian GNU/Linux for the Enterprise! Ask HP ...
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday,

I just did an lsmod with the following results:

Module Size Used by
nfsd 214529 17
exportfs 10177 1 nfsd
lockd 65769 2 nfsd
nfs_acl 7745 1 nfsd
i2c_dev 14529 0
i2c_core 26049 1 i2c_dev
sunrpc 144037 12 nfsd,lockd,nfs_acl
ipt_LOG 10177 1
ipt_state 5953 24
ip_conntrack 46085 1 ipt_state
iptable_filter 6977 1
ip_tables 22721 3 ipt_LOG,ipt_state,iptable_filter
dm_mirror 31901 0
dm_mod 60741 1 dm_mirror
button 10705 0
battery 12997 0
ac 8901 0
uhci_hcd 32857 0
ehci_hcd 32325 0
hw_random 9685 0
tg3 101061 0
floppy 58193 0
ext3 118857 8
jbd 59609 1 ext3
cciss 63913 12
sd_mod 20545 0
scsi_mod 117709 2 cciss,sd_mod

There is no indication of any broadcom device driver loaded.

Robert.
Al Licause
Trusted Contributor

Re: Server Loses Network for no apparent reason?

tg3 is broadcoms driver. It will eventually replace the bcm5700 driver.

It appears that you are using the tg3 driver.

Might want to confirm that with the contents of /etc/modprobe.conf and/or ethtool.
L_Dieter
Occasional Advisor

Re: Server Loses Network for no apparent reason?

Hi Robert,


Maybe a rather stupid question: checked on duplicate IPs? The only time I had this kind of problems, it was due to duplicate IPs.

Best regards,
Dieter
Alexander Samad
Frequent Advisor

Re: Server Loses Network for no apparent reason?

Are you using PSP and thus the HP driver for the nic. If so did you upgrade the driver when yuo upgraded the kernel ?

When you loose connectivity what does a tcpdump on the interface produce, do you see any traffic at all, what does ethtool give you link state etc.
Stuart Browne
Honored Contributor

Re: Server Loses Network for no apparent reason?

I don't suppose you've tried the simple "replace the cable" bit?

Cables die *shrug*.
One long-haired git at your service...
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday,

A couple of answers to your questions. We have just moved its ip address and the problem still persists. We have upgraded it from 100Mbs to a bonded pair of GB NICs (admittedly still using the same DL380 NICs provided on the mother board) with no result - ie the server didnt failover to its backup NIC - the whole network stack froze effectively.

We have run a TCPDUMP as the call was logged with Redhat and they too asked whether tcpdump showed anything. The only thing is the following:

10:16:22.275318 arp who-has 192.168.10.30 tell myserver.example.com
10:16:25.275570 arp who-has 192.168.10.31 tell myserver.example.com
10:16:26.276324 arp who-has 192.168.10.31 tell myserver.example.com
10:16:27.276075 arp who-has 192.168.10.31 tell myserver.example.com
10:16:30.276330 arp who-has 192.168.10.30 tell myserver.example.com
10:16:31.276085 arp who-has 192.168.10.30 tell myserver.example.com

10.31/10.30 are our DNS/Wins windows servers which are also defined in /etc/resolv.conf

Again a service network restart resolves the problem or an ifup/ifdown on the interface.

Robert.
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday,

Problem still occuring, has happened again today. We downgraded the kernel from 2.6.9-42.0.3 to 2.6.9-34.0.2 as this is the time frame when it started - always the case two things done and then a problem appears.

The problem seems to be mostly around the NFS file transfer time about 30 minutes to 1 hour into the transfer. As mentioned tcpdump just shows no network activity, as if the network were switched off! Teaming/bonding dont seem to help as the interface appears to be up but not communicating.

Any ideas?

Robert.
Andrew Gilbrt
New Member

Re: Server Loses Network for no apparent reason?

Robert,

We are in the throes of a very similar situation. We have a number of servers it is occurring on, some it is not. We have tried a variety of fixes, including bonding/unbonding, drivers, kernel revs. Will try to get a more detailed post to you.

Andrew Gilbrt
New Member
Solution

Re: Server Loses Network for no apparent reason?

Robert,

Have you seen this thread?

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=898761

Looks quite promising
Robert Walker_8
Valued Contributor

Re: Server Loses Network for no apparent reason?

Gday,

Going to close this thread as it appears we have a similar problem posted here:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=898761

Thanks for all who have contributed - I will look at updating my drivers (something which has been festering in my mind for a while - however thought it to be a Redhat issue). Well see how that goes.

Messy XMAS & New Year to you all!

Robert.