HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

c7000 Active Onboard Administrator loses network connection

 
Chris House
Frequent Advisor

c7000 Active Onboard Administrator loses network connection

I've talked to HP support about this but thought I would mention it here too.

 

I have one c7000 G2 enclosure with dual "BladeSystem c7000 DDR2 Onboard Administrator with KVM" modules (part number 456204-B21). I also have a dozen c7000 enclosures (pre-G2) with dual "BladeSystem c7000 Onboard Administrator" modules (part number 412142-B21). All enclosures running OA firmware 3.31.

 

Every morning, the active OA in the c7000 G2 enclosure (with the DDR2/KVM OA modules) is not pingable. But the OA is still functional - I can log in through enclosure link from another enclosure and view the Active OA's system log which shows:

 

Aug 18 19:19:57  Kernel: Network packet flooding detected. Disabling network interface for 2 seconds
Aug 18 19:20:05  Kernel: Network packet flooding detected. Disabling network interface for 2 seconds
Aug 18 19:20:11  Kernel: Network packet flooding detected. Disabling network interface for 4 seconds
Aug 18 19:20:17  Kernel: Network packet flooding detected. Disabling network interface for 8 seconds
Aug 18 19:20:27  Kernel: Network packet flooding detected. Disabling network interface for 16 seconds
Aug 18 19:20:55  OA: Network link to gateway is down
Aug 18 19:22:19  OA: Network link to interconnect 3 is down
Aug 18 19:22:31  OA: Network link to interconnect 4 is down
Aug 18 19:27:11  NTP: Failed to update time/date using NTP

 

Apparently the active OA feels there is too much network traffic so it shuts down its NIC for a few seconds. However, when it tries to re-enable the NIC, it cannot establish a link back to the switch. If, through enclosure link, I change the NIC options for the Active OA from Auto Negotiate to Full Duplex, or vice-versa, then the link will come online and the OA becomes pingable:

 

Aug 19 07:33:00  OA: Administrator logged into the Onboard Administrator from Enclosure Link
Aug 19 07:46:18  OA: Network Interface link set to Auto negotiation by user Administrator.
Aug 19 07:46:23  Kernel: Network link is up at 100Mbps - Half Duplex
Aug 19 07:46:24  Kernel: Network link is up at 100Mbps - Half Duplex
Aug 19 07:46:24  NTP: Successfully updated time/date using NTP
Aug 19 07:46:25  OA: Network link to interconnect 3 is up
Aug 19 07:46:26  OA: Network link to interconnect 4 is up
Aug 19 07:47:10  OA: Network link to gateway is up

 

I have seen the customer advisory recommending leaving the OA NICs at Auto Negotiate, and I have done this for all enclosures, but it makes no difference for this one with these OA modules.

 

Meanwhile, the standby OA remains online/reachable, in standby mode:

 

Aug 19 07:20:05  OA: LDAP SIM\, HP logged into the Onboard Administrator from 10.10.105.165
Aug 19 07:20:06  OA: LDAP SIM\, HP logged out of the Onboard Administrator
Aug 19 07:35:12  OA: LDAP SIM\, HP logged into the Onboard Administrator from 10.10.105.165
Aug 19 07:35:12  OA: LDAP SIM\, HP logged out of the Onboard Administrator

Aug 19 07:46:12  OA: Network Interface link set to Auto negotiation by user Administrator.
Aug 19 07:46:17  Kernel: Network link is up at 100Mbps - Half Duplex
Aug 19 07:46:19  Kernel: Network link is up at 100Mbps - Half Duplex

 

This issue occurs regardless of which OA is active (occurs in OA Bay 1 and OA Bay 2). I got a replacement OA from HP and the issue remained, regardless of which original OA was standby, or even if an original OA was active and the replacement was in standby.

 

I worked with our network admin to set the switch ports that the OAs are connected to at 100/Full. Even when I match those settings on the OAs, the active one still goes down every night. There are not many errors on the switch ports, and they are not the kind of errors typical of bad cabling or patch panel port.

 

HP recommends having the OA NICs on a seperate VLAN to reduce broadcast traffic, but I don't have a seperate VLAN to use so these NICs are on the same VLAN as my production network.

 

If I replace the DDR2/KVM OA modules with older ones from other c7000 enclosures ("BladeSystem c7000 Onboard Administrator" modules - part number 412142-B21, same firmware 3.31), then the problem goes away. It can be narrowed down to only occuring with these newer style OA modules. I also tried swapping the OA sleeve from another enclosure, but the problem remained. None of the other enclosures report this flooding or go offline, but they are all using the earlier generation of OA module.

 

I wish the network packet flooding protection feature could be disabled, or it could offer more information about the flooding (source, type, etc).

 

This issue has been ongoing for about a year over numerous firmware versions.

 

Anybody seen this or have any suggestions?

2 REPLIES

Re: c7000 Active Onboard Administrator loses network connection

We had a similar issue with our OAs quite a while back.  Ours was occuring once a week, and conincided with our network security scans.  Apparently, some traffic being generated by the tool used by our security team was causing issues.  I wonder if you're seeing a similar issue, especially if it occurs nightly.  It sounds like some scheduled job on your network running that's causing a problem...

 

Anyways, our security scans were causing us lots of headaches, as we were also experiencing a problem that affected our ILO2s and caused them to stop responding.  At least HP fixed that issue with firwmare v2.05:

 

iLO 2 v2.05 replaces v2.01 to address issues with iLO stops responding on different scenarios.

  • Fixed an issue where iLO 2 could stop responding when running Nessus Scanner, FoundStone or similar port scan tool in the iLO2 network.
  • Fixed an issue where iLO 2 could stop responding after receiving an Ethernet packet with protocol type 0x8874.

 

Our only way to fix it was for our security team to stop scanning our management network, as we have a dedicated VLAN for all ILOs and OAs.  Our problems went away after that.  Since you share a network with your production systems, you may be able to get your security guys to exclude the addresses of your OAs and see if the problem goes away?

 

Hope you figure it out.  Keep us posted, and good luck!

Chris House
Frequent Advisor

Re: c7000 Active Onboard Administrator loses network connection

A few days after swapping the sleeve, the problem has gone away. It's been about two weeks since it went down. I doubt the sleeve had any impact and it was just a conicendence. I'm sure the problem will come back but I guess it's just the OA being stupid about the traffic that it sees on the port.