BladeSystem - General
1748123 Members
3306 Online
108758 Solutions
New Discussion

Re: Problems with OA 4.11?

 
Barmaley
Occasional Advisor

Re: Problems with OA 4.11?

I have lost iLO access (server unresponsive) to almost all servers in 20 x c7000 enclosures.

 

Different servers (BL460c/BL490c/BL2x220c from G1 to G7).

iLO 2.23

OA 4.01/4.11/4.20

 

It's happened after unsucceful network firmware upgrade (with SPP 2014.02) on some BL460c G1 servers.

NIC configuration data lost (including MAC addresses, etc), NIC adapter is disabled.

 

This is known bug of SPP 2014.02.

http://h30499.www3.hp.com/t5/ProLiant-Servers-ML-DL-SL/HP-Proliant-DL380-G5-NIC-s-not-found-after-firmware-update/td-p/6256615

 

Looks like, corrupted NIC's have send some incorrect packets or MAC addresses conflict, which almost destroys all management network!!!

 

I can recover servers only by issues "reset server" command on OA modules. But i can't hard restart for hundreds of servers!

 

Servers which was not restarted by "reset server" still "Critical error"/"Unknwon" state.

 

Barmaley
Occasional Advisor

Re: Problems with OA 4.11?

Fantastic!

All iLO 2 interfaces in our management vlan is stuck!

Even non blades servers!

 

# hponcfg
HP Lights-Out Online Configuration utility
Version 4.3.0 Date 12/10/2013 (c) Hewlett-Packard Company, 2014
ERROR: Error communicating with ILO
ERROR: Unable to communicate with the Management Processor.

 

I can't restart iLO 2 without completely remove power from the servers!

 

p.s. iLO 3 was not affected.

scharchouf
Trusted Contributor

Re: Problems with OA 4.11?

Hi All

 

OA v4.11 and v4.20 contain an OpenSSL version that has the vulnerability for  Heartbleed

 

iLOs are NOT vulnerable as they don't use SSL/TLS libraries that contain the TLS heartbeat extension BUT, we are receiving reports that the script that test for the HeartBleed bug is causing iLO2 to stop responding and the blades have to be refused to recover iLO2 functionality.  

 

bsodenkamp
Occasional Visitor

Re: Problems with OA 4.11?

I was being flooded with lots and lots of these messages until I turned SIM off.  Then they all stopped.  Working with HP Support now to be able to use SIM again.

 

Ben

Psychonaut
Respected Contributor

Re: Problems with OA 4.11?

I upgraded the two chassis that I was seeing the alerts from to firmware 4.21 and it looks like that finally stopped the "Blade, "xxxx", has changed from Failed to OK." flapping.

vinothbala
Senior Member

management processor on blade 6 appears unresponsive

management processor on blade 6 appears unresponsive Hi ,

OA Syslog showing the below,

Jun 12 04:14:48 OA: Management Processor on Blade 6 appears unresponsive.

Jun 12 04:14:58 OA: Management Process on Blade 6 appears responsive again.

My OA firmware version is 4.21

Please advise and what is the root cause??

ElkabelNew
Occasional Advisor

Re: management processor on blade 6 appears unresponsive

Hi guys,

I have got the same problem from 2 - 3 days with my Onbord Administrator (OA).

It's happened twice morning, when I come to work in our server room is too noisily. When I login in OA I seeing my blade servers with Critical errors : Management Processor :  Error - lost communication with ILO , and their fans spins on 98%.

The reboot it is not decision, but when I pull out OA module from the chassis and back it again, everything is OK

That from below is OA System Logs from when I have problem with my OA:

Aug 17 04:07:54 OA: Management Processor on Blade 1 appears unresponsive.
Aug 17 04:08:04 OA: Management Processor on Blade 2 appears unresponsive.
Aug 17 04:08:14 OA: Management Processor on Blade 3 appears unresponsive.
Aug 17 07:38:28 OA: Authentication failure for user admin from 192.168.11.203, requesting web service
Aug 17 07:38:46 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 17 08:29:45 OA: Blade removed from bay 3
Aug 17 08:29:45 OA: Blade inserted in bay 3
Aug 17 08:29:45 OA: Blade in bay #3 status changed to OK
Aug 17 08:29:45 OA: Management Processor on Blade 3 appears unresponsive.
Aug 17 08:31:45 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 08:33:45 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 08:44:19 OA: Blade removed from bay 3
Aug 17 08:44:19 OA: Blade inserted in bay 3
Aug 17 08:44:19 OA: Blade in bay #3 status changed to OK
Aug 17 08:46:19 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 08:47:37 OA: Management Processor on Blade 3 appears unresponsive.
Aug 17 08:48:19 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 09:31:39 OA: Blade removed from bay 3
Aug 17 09:31:40 OA: Blade inserted in bay 4
Aug 17 09:31:40 OA: Blade in bay #4 status changed to OK
Aug 17 09:33:40 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 09:35:04 OA: Management Processor on Blade 4 appears unresponsive.
Aug 17 09:35:40 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 09:44:11 OA: Blade removed from bay 4
Aug 17 09:44:11 OA: Blade inserted in bay 4
Aug 17 09:44:11 OA: Blade in bay #4 status changed to OK
Aug 17 09:46:11 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 09:47:35 OA: Management Processor on Blade 4 appears unresponsive.
Aug 17 09:48:11 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 10:37:36 OA: PowerDelay server settings have been changed.
Aug 17 10:39:57 OA: Blade removed from bay 4
Aug 17 10:39:57 OA: Blade inserted in bay 4
Aug 17 10:39:57 OA: Blade in bay #4 status changed to OK
Aug 17 10:41:58 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 10:43:21 OA: Management Processor on Blade 4 appears unresponsive.
Aug 17 10:43:58 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 10:44:54 OA: PowerDelay server settings have been changed.
Aug 17 10:57:36 OA: Onboard Administrator is rebooting
Aug 17 10:58:16 Kernel: Network link is up at 100Mbps - Full Duplex
Aug 17 10:58:17 OA: Time zone changed to GMT+2
Aug 17 10:58:19 OA: LCD Status is: OK.
Aug 17 10:58:21 Enclosure-Link: Service started
Aug 17 10:58:22 OA: Onboard Administrator booted successfully
Aug 17 10:58:31 Enclosure-Link: Initial topology scan completed successfully
Aug 17 10:59:48 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 17 11:00:32 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 11:01:24 OA: Management Processor on Blade 1 appears unresponsive.
Aug 17 11:01:30 OA: Management Processor on Blade 2 appears unresponsive.
Aug 17 11:01:42 OA: Management Processor on Blade 4 appears unresponsive.
Aug 17 11:02:56 OA: Blade removed from bay 4
Aug 17 11:03:13 OA: Blade inserted in bay 3
Aug 17 11:03:13 OA: Blade in bay #3 status changed to OK
Aug 17 11:04:56 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 11:06:31 OA: Management Processor on Blade 3 appears unresponsive.
Aug 17 11:06:56 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 11:12:51 Kernel: Network link is up at 100Mbps - Full Duplex
Aug 17 11:12:53 OA: Time zone changed to GMT+2
Aug 17 11:12:54 OA: Server Power Reduction Mode - Enabled
Aug 17 11:12:55 OA: LCD Status is: OK.
Aug 17 11:12:56 Enclosure-Link: Service started
Aug 17 11:12:58 OA: Onboard Administrator booted successfully
Aug 17 11:13:06 Enclosure-Link: Could not acquire bottom enclosure's UUID. Cannot set RUID.
Aug 17 11:13:06 Enclosure-Link: Initial topology scan completed successfully
Aug 17 11:13:08 OA: Blade 3 is reporting nominal health status.
Aug 17 11:13:08 OA: Blade in bay #3 status changed to OK
Aug 17 11:13:13 Enclosure-Link: RUID recovered: 09CZC9027XJ3
Aug 17 11:13:28 OA: Blade 1 is reporting nominal health status.
Aug 17 11:13:28 OA: Blade in bay #1 status changed to OK
Aug 17 11:13:32 OA: Blade 2 is reporting nominal health status.
Aug 17 11:13:32 OA: Blade in bay #2 status changed to OK
Aug 17 11:13:36 OA: Server Power Reduction - Deactivated
Aug 17 11:13:36 OA: Server Power Reduction Mode - Disabled
Aug 17 11:13:40 OA: Server blade in bay 3 has been powered on
Aug 17 11:13:40 OA: Blade 3 is properly cooled.
Aug 17 11:14:09 OA: Blade in bay #3 status changed to OK
Aug 17 11:14:25 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 17 11:14:29 OA: Blade in bay #1 status changed to OK
Aug 17 11:14:33 OA: Blade in bay #2 status changed to OK
Aug 17 11:15:08 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 11:15:08 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 17 11:17:08 Alertmail: Failed to send AlertMail to kiro@elkabel.bg

Aug 18 15:35:48 Kernel: Network packet flooding detected.
Aug 18 15:35:50 Kernel: Network packet flooding detected.
Aug 18 15:35:55 Kernel: Network packet flooding detected.
Aug 18 15:36:02 Kernel: Network packet flooding detected.
Aug 18 20:28:34 OA: Management Processor on Blade 1 appears unresponsive.
Aug 18 20:28:44 OA: Management Processor on Blade 2 appears unresponsive.
Aug 18 20:28:54 OA: Management Processor on Blade 3 appears unresponsive.
Aug 19 07:43:16 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 19 07:51:20 OA: Onboard Administrator is rebooting
Aug 19 07:52:00 Kernel: Network link is up at 100Mbps - Full Duplex
Aug 19 07:52:00 OA: Time zone changed to GMT+2
Aug 19 07:52:01 OA: LCD Status is: OK.
Aug 19 07:52:03 Enclosure-Link: Service started
Aug 19 07:52:04 OA: Onboard Administrator booted successfully
Aug 19 07:52:13 Enclosure-Link: Initial topology scan completed successfully
Aug 19 07:53:13 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 19 07:54:14 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 19 07:55:07 OA: Management Processor on Blade 1 appears unresponsive.
Aug 19 07:55:13 OA: Management Processor on Blade 2 appears unresponsive.
Aug 19 07:55:19 OA: Management Processor on Blade 3 appears unresponsive.
Aug 19 08:15:31 Kernel: Network link is up at 100Mbps - Full Duplex
Aug 19 08:15:32 OA: Time zone changed to GMT+2
Aug 19 08:15:34 OA: Server Power Reduction Mode - Enabled
Aug 19 08:15:36 OA: LCD Status is: OK.
Aug 19 08:15:38 Enclosure-Link: Service started
Aug 19 08:15:41 OA: Onboard Administrator booted successfully
Aug 19 08:15:47 Enclosure-Link: Initial topology scan completed successfully
Aug 19 08:15:47 OA: admin logged into the Onboard Administrator from 192.168.11.203
Aug 19 08:16:10 OA: Blade 1 is reporting nominal health status.
Aug 19 08:16:10 OA: Blade in bay #1 status changed to OK
Aug 19 08:16:15 OA: Blade 2 is reporting nominal health status.
Aug 19 08:16:15 OA: Blade in bay #2 status changed to OK
Aug 19 08:16:16 OA: Server Power Reduction - Deactivated
Aug 19 08:16:16 OA: Server Power Reduction Mode - Disabled
Aug 19 08:16:20 OA: Blade 3 is reporting nominal health status.
Aug 19 08:16:20 OA: Blade in bay #3 status changed to OK
Aug 19 08:17:10 OA: Blade in bay #1 status changed to OK
Aug 19 08:17:15 OA: Blade in bay #2 status changed to OK
Aug 19 08:17:20 OA: Blade in bay #3 status changed to OK

Aug 19 08:17:48 Alertmail: Failed to send AlertMail to kiro@elkabel.bg
Aug 19 08:18:08 Alertmail: Failed to send AlertMail to kiro@elkabel.bg

If someone has experience with similar errors please let me share what have to done.

Thank you in advance