ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

iLO 2 Access all of a sudden not working on multiple servers

 
Robert W. Eastman Jr._2
Frequent Advisor

iLO 2 Access all of a sudden not working on multiple servers

We are starting to experience and issue where iLO2 will all of a sudden stop functioning. Sometimes you can ping the iLO address but most of the time you cannot. This is happening on DL380 G5 and ML350 G5's. Firmware is at least 1.60 since they are almost all new servers as of the quarter end of 2008. The only thing that has really changed is the we updated HPSIM to 5.3. I can't see how communication from HPSIM to the Servers would stop the iLO from functioning. It appears that a restart of the server will start the iLO to start communicating once again.

Is anyone else seeing this?
(NTFS) No Time For Stupidity
25 REPLIES
Bijl
Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Hi Robert,
QQ, what OS are you using?
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

This is happening on Windows 2003 SP2 and also ESX 3.5 Update 1. I had another one last night stop functioning. Of course since the only thing that has really changed is HPSIM version I am beginning to wonder if this isn't causing it.
(NTFS) No Time For Stupidity
Cookie_2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Hi Rob,

I'm not sure if HP SIM is causing this. But since its an issue with iLo2, i would suggest upgrading the firmware to 1.70

Regards,
Cookie
Sometimes a Loser Wins!!
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

I don't think HP SIM is causing this either, but just a coincidence that I updated HPSIM and a week later it started happening.

I have a total of about 30 servers now all 1.60 version of the iLO2 firmware that are just disappearing. The "ONLY" way to get them back to be able to even update them appears to be to have the server powered down and remove the plug from them so the iLO actually gets powered off. Then we are able to access the iLO. Unfortunately we are in the midst of our busy season (Tax) and we don't have competent people in our remote offices that can do what we need them to do. The biggest issue is that most of the servers affected are running ESX 3.5 so when we do actually have an issue with the server we have no way of powering it off without the iLO to get it back up and running.

I guess my only alternative is going to be a waiting game and wait until we have to reboot each server and walk the users in the remote offices of unplugging each power plug so the iLO no longer receives power.

HP is still indicating that they have no know issues with 1.60, but hell I can show them 30 :)
.
(NTFS) No Time For Stupidity
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

I'm also getting this exact same problem on 4 of our 5 Windows 2003 SP2 file servers. Interestingly, all 4 are DL360 G5's. The only one with a working iLO is a DL360 G3. Firmware is v1.61 and they've been working fine since they were installed at the end of last year. They've all stopped working in the last few weeks (earliest was on 4th Feb) and we've done nothing to the servers that might cause this.
The HP System Management page is showing the Interface Status as 'Not Responding' and the iLO port won't ping.
I've tried to update the firmware to 1.70, but it fails saying 'Unable to communicate with the management processor'.

There are also eventlog entries saying:
Remote Insight Agent: The Remote Insight Board/Integrated Lights-Out has detected a controller interface error.
[SNMP TRAP: 9006 in CPQSM2.MIB]

Does anyone know if there is a hard iLO2 reset we can try. I'm obviously trying to find a solution that doesn't involve taking the files servers down.
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

The only way we have found so far to get the iLO to start responding again in order to update it to 1.70 was to physically power off the server and have the plugs removed. This way the iLO no longer gets power and will reset. Simply shutting down the server does not work because power is still flowing to the iLO.

Hopefully your servers are local and you can do this. Unfortunately the majority of ours are remote offices with no one in the office that is technical.
(NTFS) No Time For Stupidity
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Jason, are u using HPSIM 5.3
(NTFS) No Time For Stupidity
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Hi Rob,

Thanks for the reply. Yes, we're running HPSIM 5.3, which was upgraded from 5.2 on 3rd Feb. The first of our iLO2 problems started on the 4th Feb, but cleared 5 minutes later. It then went down again on 9th and has stayed down since.

I'm reluctant to point the finger at SIM (although it does seem like quite a coincidence), but that's based on nothing but gut instinct as I honestly don't know enough about how it works yet.

Thankfully, all my servers are fairly local, so I can come in and power them down out of hours. Just out of interest, what happens after you upgrade to firmware 1.7? Does iLO2 miraculously start working again and, if so, for how long?

I'm going to try and do some more investigation this week, so I'll post any results.
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Just as a test I update one server locally to version 1.70 and another server I just updated the support pack to PSP 8.15. I had to have the server that I update to PSP 8.15 power completed removed from the server in order to get the iLO back to a responsive state. Once the server had the power removed I was once again able to access the iLO on this server. After about 3 days of running the server with the just PSP 8.15 update the iLO became unresponsive again while the 1.70 firmware version is still up and functioning. It appears that 1.60 is the issue here.
(NTFS) No Time For Stupidity
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

I have also been working with the Western Michigan Technical lead for HPSIM here in Michigan, and he does not believe that HPSIM to be the issue, but he also has seen no customer alerts from HP indicating that customers should move away from 1.60 version of the firmware. But any of the iLO's that are at 1.70, 1.50, or 1.61 do not have this issue, only 1.60 version.

Maybe HPSIM is just reporting better than the previous version? Not sure here.
(NTFS) No Time For Stupidity
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Your testing does seem to point to firmware issues. Sounds like I'll have to bite the bullet and arrange some downtime for my file servers.

This could take a while, but I'll try updating the firmware on one, leave it for a while and see if iLO stays functional.

I'm running v1.61 firmware, so I'm surprised you've only seen issues with your v1.60 servers. Maybe there is problems with the latest SIM version and the older iLO firmware on G5 servers, but you'd have thought more people would have noticed.

Thanks for your help. I'll post back when I've done some testing.
Graeme Bray
Regular Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

We had this issue today where it was at 1.60, it would not load properly. I updated to 1.70 and the iLo worked just fine.

That's my suggestion.
Banksy
Occasional Visitor

Re: iLO 2 Access all of a sudden not working on multiple servers

We upgraded to Sim 5.3 and the ILO 2 on DL380 G5 and BL460c servers started to fail. ILOs that have failed are on different firmwares including 1.60, 1.60 and 1.70. Assume the following from SIM 5.3 readme points to the culprit
In HP SIM 5.3, the new WS-Management protocol is used to identify and
access ProLiant iLO 2 management processors that have firmware revision
1.30 or higher. WBEM credentials are used with WS-Management as well as
with WBEM, so to properly identify an iLO with WS-Management you must
first add either a default WBEM credential or a system-specific WBEM
credential with the WS-Management user name and password. WS-Management is
accessed through port 443 by default, but other ports can be configured by
editing the wsmanportlist.xml file in the config/identification directory.

have to assume the problem is with the WS-Management protocol
Will be reinstalling SIM 5.2
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Thanks for pointing me in the direction of the HP SIM 5.3 ReadMe Banksy, should have really had a look at that myself ;-)

One of our file servers failed to come back up after a patch reboot at 3am this morning, so we took the opportunity to pull the power out before bringing it back up again.

The iLO port worked fine after this reset and I was able to update the firmware to v1.70 successfully. I also saw this in the ReadMe (p50):

"WBEM Indications
For ProLiant iLO2's to be properly identified with WS-MAN functionality, the iLO2 credentials must be the first credentials specified in either the discovery credential list or the global credential list when discovery is run. Otherwise, system credentials can be set directly for the system after it is discovered."

Having read this, we've removed all the entries in the Global Credentials page and re-entered them, with the iLO username and password at the top of the list.

I'll monitor this server for the rest of the week and if iLO stays up, I'll update the other 3 servers at the same time.

Thanks for everyone's help and I'll post back with any further info...
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Not sure how this has something to do with the servers that I have checked that are having the issue. The servers are set as SNMP and not WEBM but I don't know alot about SIM 5.3. Unless HPSIM is using WEBM as a default first and then falls back to SNMP.
(NTFS) No Time For Stupidity
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

I'm not sure whether this has anything to do with the problems either; my knowledge of HP SIM is fairly limited as well. Having checked the Global Protocol Settings WBEM, WS-MAN and SNMP are all enabled in my config (the default I think). I take this to mean that WBEM will be used at some point to get information from systems (along with the other protocols); so re-ordering the username and password list can't hurt. It's certainly something I'm happy to try alongside the firmware update before re-installing HP SIM 5.2...
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

For information, our file server iLO has been stable all week, so we're going to do the same for the other 3 when we can arrange downtime. Not sure if it was a combination of the firmware and HP SIM password re-ordering or just the firmware, but it seems to have worked for now...
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Well I see that HP has finally come out with a customer advisory on this issue:


They have also come out with an update to HPSIM 5.3

http://h18013.www1.hp.com/products/servers/management/hpsim/dl_windows.html#hotfix
(NTFS) No Time For Stupidity
MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Thanks for that Robert. It's funny, I've had a call outstanding with HP about this since the start and heard nothing. I'll install this hotfix on Monday.

Can you post a link to the advisory please. Thanks.
Robert W. Eastman Jr._2
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

MacSWW
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Thanks for that Rob.

For info - I've now managed to update the firmware on my fileservers to v1.70, installed the hotfix on the HP SIM server and all the iLO ports are up and stable. The first one's been up for over a week with no issues. Hopefully, they'll stay that way...

Thanks for all the help...
engineering and project
Occasional Contributor

Re: iLO 2 Access all of a sudden not working on multiple servers

hi we have strated to get this issue this week

but we are running SIM 5.3 with the hotfix

plus one of the servers is running firmware 1.75

we also see on some red hat servers that whe the ilo fails it kills the servers, only removing the power wil bring them back, but the ILO remains failed
ThomasVu
Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Same here. A customer of ours had this issue, and we massively deployed firmware 1.77 and the hotfix on HP SIM. And now they say the issue is back...

Anyone else having issues with this?
Jerame
Frequent Advisor

Re: iLO 2 Access all of a sudden not working on multiple servers

Hi,
We are expiriencing this as well.

About a month back, I upgraded SIM to 5.3 and applied the hotfix immediately.

Within the following week, I upgraded the firmware on all my iLo/iLo2's , and also used the scripting tools to change some network settings, as well as change the administrator password.

Since then, I have had iLo's dropping like flies. Sometimes they will reply to pings, but fail to connect via http (or telnet for my Netware servers).

Once they are in that state, or where they are not replying to pings, the only thing I found to resolve the issue is to power down the server, and unplug power for 30 seconds.

After replugging in the servers, and powering on, the iLo's have been fine. I haven't had any repeats of the same iLo.

I have the problems on servers running Win2k, Win2k3 and Netware 6.5

It would be nice if HP Engineering would step up and look into this issue!