Operating System - Linux
1825662 Members
3434 Online
109686 Solutions
New Discussion

Re: Network connections hang on DL380

 
Brian Shepard
Advisor

Network connections hang on DL380

Environment:
------------
DL380G3
NC7781 Gigabit Adapter
RedHat Enterprise Linux AS 2.1 (2.4.9-3.40smp)
bcm5700-2.2.30-1.src.rpm NIC driver

Problem description:
--------------------
Every 15 minutes, all network connections hang for 30 seconds, then resume. Some network connections drop. A Linux telnet client connection will hang for 30 seconds, then resume. A windows telnet client connection will drop.

I have 20 DL380s all configured identically.
All 20 DL380s have been up and running fine with NO problems for well over a year.
Several months ago, the problem started occuring on two of my DL380s and lasted for about two weeks, then the problem mysteriously disappeared. Soon after that, the problem started occuring on two other DL380s (not the same servers) and lasted for about two weeks, then the problem mysteriously disappeared.
Soon after that, it started happening on two other DL380s (not the same servers) and lasted for about two weeks, then the problem mysteriuosly disappeared. Just last week it started happening on yet two other DL380s (not the same servers), then the problem mysteriously disappeared yesterday.

I am fairly certain the switches the servers are connected to were dropping the server's MAC address. Further more, the problem first surfaced right after we replaced all of our switches. We replaced all of our HP 4000 & HP 5308 switches with Nortel Policy Switch 2000s.

This problem has never surfaced on any of my RS/6000s, Sun 450s, or DL380s running Windows.
It only occured on some of my DL380s running RedHat Linux AS 2.1.

I witnessed the switches dropping the DL380s MAC address & this problem started right after we replaced our switches. Given these fact, one would assume the problem is with the Nortel switches.

My question is why is it only happening on my DL380s running RedHat Linux AS 2.1 and not any of the other servers????

Anyone ever witness a similar proplem?

-Brian
4 REPLIES 4
Steven E. Protter
Exalted Contributor

Re: Network connections hang on DL380

Yes. It happened when a workstation was brought up on the same IP address as my server. This is usually the problem.

Identify the offending workstation and decide who gets the ip address.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jeroen Peereboom
Honored Contributor

Re: Network connections hang on DL380

Brian,

I never experienced this problem, but some thoughts may help.

* 'Every 15 minutes' may indicate a timer interval of 900 seconds. What expires / starts every 15 minutes?

* Do you have network bonding on the RHAS servers?

* Dropping a MAC address may be caused by duplicate IP addresses? If so, how do the RHAS servers get their IP address? Any chance these addresses are within a DHCP range?

* Did the network admins correctly configure the new switches' advanced features?

HtH

JP
Brian Shepard
Advisor

Re: Network connections hang on DL380

Thanks for the replies. I'm meeting with the NetAdmins to see about the switches advanced features. As far as duplicate IP addresses, wouldn't I see a message in the server's logs
that some server/workstation has the same IP address? Thanks again, you guys are great! This forum rocks!

-Brian
Mark Travis
Frequent Advisor

Re: Network connections hang on DL380

Since you pretty much can replicate the issue (just wait about 15 minutes), you might as well sniff the traffic on both ends during that time. tcpdump on Linux does this.

If you see packets not arriving from one system to the other then that very strongly implicates the switch or whatever's in between.

If your network people are anything like every other network person on the face of the planet then they will demand hard evidence before they even consider the possibility that their equipment is malfunctioning (or that they've misconfigured it). Sniffing packets at both ends should provide that evidence.