ILO 1.92 update - hosed 3 systems


I recently updated about 45 of our servers that had ILO version 1.91 on them. All but 3 of them succeeded without a problem, however 3 of the systems had serious issues:

- 1 DL360 G4 no longer boots at video or anything. ILO is unresponsive (doesn't ping).

- 1 DL360 G4 boots okay, but ILO was in the flash recovery mode. Attempts to recover it using FTP failed no matter what version I tried (I tried 1.92, 1.92, 1.89 and 1.88). I eventually discovered that using the ROMPAQ flash for version 1.88 will get ILO working, but trying to put on any newer version (1.89, 1.91 which it had before, or 1.92) will fail and put it back in flash recovery.

- 1 DL380 G4 has the same symptoms as the first DL360 G4... no video at all, no response from the ILO IP address.

On the totally dead systems, I've tried resetting the system to defaults, "blindly" trying to boot from a ROMPAQ disk (it never appears to try reading from the diskette at all, so I really don't know how far it's getting during the POST).

As you can imagine, this was very disturbing. I really wanted to get 1.92 on all the systems so that using "tab" in a remote console would work with the latest Java versions, but I never would have guessed a simple update would hose 3 out of 45 systems...

Anyone else have similar problems? The systems are old enough that they're no longer under warranty, and I've shuffled some machines around from our spare inventory, but these are still decent servers and I'd like to find some way to recover the 2 dead ones. The 3rd one, I'm happy to just leave ILO 1.88 on there and call it good, because at least the thing boots up and I have an older version of ILO for remote management.

Additional info... not sure what the ILO status LEDs actually mean, but on both of the totally dead servers, the ILO status LED is showing lights 1, 6, and 7 lit up. Same on both. I haven't checked a working system to see what the status LEDs look like when everything is okay...

That's the status when the server is powered up. If power is applied but the server is off, all the lights flash in different groups except LED 2 is always off. It looks like 1, 3, 4, 6, 8 flash, and then 5, 7 flash in the alternate time.

The ILO manual doesn't say anything about those particular patterns in it's troubleshooting section unfortunately.

Oops...correction. On the flashing patterns, I was reading the LED backwards (swapped the MSB and LSB).

It's actually alternating flashing LEDs 1,3,5,6,8 and 2,4. LED 7 is the one that stays off.

If I convert that to decimal, it's alternating 181 and 10. (B5 and 0A in hex). Again, the ILO manual says nothing about that pattern. It's not the hex 99 pattern it mentions (10011001) followed by something else.

I got the MSB/LSB correct in the part about the 3 LEDs that remain lit once the server is powered on... 1, 6, 7, just steady, no flashing.
for the one that works with old versions there is a newer version intact 1.92

just no video for the dead ones? rest is fine fans spinning and so on... if so

did you tried clearing NVRAM?
remove power cable for 1 minute?

Yeah, 1.92 is the version I was updating when it killed those 2 systems (and "froze up" ILO on the 3rd). Funny, because all the other systems (about 40+) took the 1.92 update just fine, no problems.

On the dead systems, everything seems to work properly like they should during a normal power-on sequence, it just never posts. The fans initially come on at high speed, then slow down once the temperature checks out okay. I can see/hear the internal hard drives spinning up so the array controller seems to be doing it's job.

I just don't get any video at all, and it's not like the video output died, because I can leave the system going for a long time but it never boots the OS or anything. I guess it's possible that it boots up and is sitting at some prompt because it detected an error, however, I did have them set to "delay" if there was an error during bootup. And I've hit F1 a few times just in case it was at a "hit f1 to continue" prompt somewhere.

The systems have had power totally removed for over 24 hours as I moved them out of our datacenter and into my lab, so that should have been enough time for ILO to drop any residual power.

I did set the dip switches to clear the NVRAM, (1, 5 and 6 on, then power-up). I hear the loud/long beeps so the setup info should be cleared out okay.

Yeah, this is just a real mystery. Both systems had been operating fine for years and years.

What happened was that I used CPQLOCFG.EXE with an XML script to update all my ILO management processors in one batch. The machines that failed, it reported a successful flash, but on the 2 totally dead machines, System Management Homepage and Insight Manager showed that ILO was missing/reporting an error. So I rebooted the systems with a power-cycle so ILO could reset (10 seconds of no-power). Unfotunately after that, those 2 servers wouldn't boot up at all. And as mentioned, the 3rd server's ILO was stuck in flash recovery. I tried to reflash using the FTP method and it took the ILO image file, said it was okay, flashed all 31 blocks okay without reporting any errors, but then it just reset back into flash recovery each time.

The ONLY way I could get that 3rd ILO to work at all was to set dipswitch 1, boot up using the version 1.88 ROMPAQ and reflash with that. I now have a bunch of floppies on my desk since I created ROMPAQ disks for 1.92, 1.91, 1.89 and 1.88. Oddly, even though it had 1.91 previously, only the 1.88 ROMPAQ did anything. And if I now try to upgrade it back to 1.89, 1.91 or 1.92, it goes into that flash recovery mode. So strange.

Oh, and both the DL360 G4 and DL380 G4 that are dead had the most recent system BIOS...2007.07.16 and 2007.07.19 respectively. I have exactly (purchased at the same time, same hardware, etc) identical systems that took the 1.92 update just fine.

I guess if no solution can be found, I'll have to use these servers for spare parts which is a shame... dual 3.6GHz processors and all that. :(

HP .. stop making virus instead of firmware!


See this:

