Server Management - Remote Server Management
1752790 Members
5908 Online
108789 Solutions
New Discussion юеВ

Re: Multiple Ilo errors weekly

 
Greg Bates
Advisor

Multiple Ilo errors weekly

We have been seeing an anomaly in our environment over the past month where most of our ilo2 devices report an interface error to the server, then to SIM (Integrated LightsOut Interface Error) every week. It has been happening consistently every 7.5 days. Our environment is large and I don't know everything that might be polling hardware. I know that the timing does not line up with any polling that our SIM servers do. There are no errors on the ilo devices and none that I can find in the OA's. The servers have been configured and running for about 6 months, but this issue just started at the beginning of this month.

I know the default answer is to upgrade the firmware, but that is just not possible for an environment like ours without proper testing, which is in progress. Are there any known issues with this firmware that might produce this type of error?

Our environment in a nutshell:
Windows servers
BL460C g6
PSP 8.30
ilo2 fw = 1.80 & 1.81
SIM = 6.0
OA FW = 2.60 & 3.10
5 REPLIES 5
Oscar A. Perez
Honored Contributor

Re: Multiple Ilo errors weekly

Do they all stop respoding at about the same time? Can you ping iLO2?

Are you running a port scan tool like Nessus Scanner against the iLO2s?

In the past month, have you added any new devices to the network iLO2 is on?



__________________________________________________
If you feel this was helpful please click the KUDOS! thumb below!
Greg Bates
Advisor

Re: Multiple Ilo errors weekly

I do not believe that they stop responding, at least to pings.
The errors happen in bunches. I would get alerts of about 10 or 15 ilo errors at once. Then an hour later another 10 or 15. It seems random at times. The bunches are not related from what I can tell. The IP addresses are not similar and they are not all in one enclosure.

I don't know if we have port scans running. That is one thing I hope to discover and prove through my troubleshooting. Our environment is extrememly dynamic and sometimes devices get added without thought that they might affect other items. Thanks.
Greg Bates
Advisor

Re: Multiple Ilo errors weekly

I have still had no luck figuring this out and it has been causing big issues with our server stability. Some servers that reboot after an ilo error have failed to reboot properly forcing a server blade re-seat. I have moved forward with upgrading the ilo firmware, but haven't gone far enough to notice any results.

For those issues where a Nessus type port scanner were causing an issue, did the ilo log anything in the logs? I am looking for that one spot in any of my HP logs that might point to an IP address or something. The only thing in my ilo logs is HP SIM accessing it.

Any other Port scanner types known for hurting ilos. Any specific protocols that these devices hit that is the reason that ilo's fail? I need something here because I am striking out in my own troubleshooting.

Thanks.
Oscar A. Perez
Honored Contributor

Re: Multiple Ilo errors weekly

Greg,

iLO2 will not log activity from Nessus scanner.

There are two bugs we are fixing in iLO2 in the next release:

One is related to the handling of the TCP backlog queue. Basically, if you have a tool that opens and closes TCP sessions faster than what the iLO2 application layer can handle, iLO2 could mishandle the removal of these entries from the backlog queue, potentially causing a hang down the road. Nessus Scanner and other similar port scanner tools do open and close TCP sessions very fast. This is why I was asking you if you have something like that running in your network.

Another bug we are fixing is related to Ethernet packets with protocol type=0x8874. These packets could cause iLO2 to stop responding. We are not sure the source of these packets but so far, they seem to come from some EMC CLARiiON storage devices.

If you have a HP support case number, please post it here.




__________________________________________________
If you feel this was helpful please click the KUDOS! thumb below!
Greg Bates
Advisor

Re: Multiple Ilo errors weekly

My case went as expected, upgrade all firmware and PSP versions.

HP Case # 4621297057

I'm suspecting System Insight Manager as the culprit. I have multiple SIM servers running in our environment. 3 data centers, 3 SIM's. 1 Failover SIM that monitors all 3 data centers. I just noticed that a lot of traffic was heading in that direction prior to the ilo errors. A very quick check made me notice that the Insight Power manager was set up without access right to the ilos. Could SIM really cause these errors on ILO2?

SIM version is 6.0.

I noticed that all ILO's (not 2) were not affected and older versions of ILO2, pre 1.7x were very uncommon.