HPE OneView
1752623 Members
4241 Online
108788 Solutions
New Discussion

Re: Random ILO errors

 
Kerry Quillen
Regular Advisor

Random ILO errors

Wondering if anyone else is experience random ILO related errors in Oneview.  I'm currently running Oneview 3.10 (upgraded from 3.0 yesterday).  My environment consists of G7-Gen9 blades and Gen8 & Gen9 DL380's - around 135 total.  ILO's are at 2.50 or 2.54 firmware.  Every day of so I'll see one or more servers showing status of critical with error such as these:

"Unable to register this HPE OneView instance with iLO: The iLO initialization was unable to complete"

"Remote Insight/ Integrated Lights-Out self test error 8192"

"Unable to read and save firmware installation status information from the server hardware"

There is no common denominator - site, enclosure, model, etc.  I can usually clear these errors but it requires a server reboot.  I may have to unassign and reassign profiles, format NAND for 8192 errors, or delete server and re-add to Oneview..  It varies.  Yesterday I had 3 servers in error prior to upgrading to 3.10.  After the upgrade I had 7.  I'm spending hours each week clearing these up.

 

17 REPLIES 17
ChrisLynch
HPE Pro

Re: Random ILO errors

Your iLO errors are likely due to issues with iLO firmware.  Please review this Customer Advisory.


I am an HPE employee

Accept or Kudo

Kerry Quillen
Regular Advisor

Re: Random ILO errors

I'm way too familiar with that advisory having performed its suggested actions numerous times.  Does the NAND flash have a high failure rate?  Do they randomly just fail?  Just trying to understand why a managed server goes from good to bad after a reboot and why rebooting the Oneview appliance causes multiple server failures.  Way too much time spent clearing up these errors.

Michel-CO
Occasional Advisor

Re: Random ILO errors

Hey,

we have the same problem at many Servers. We can find the Problem in the ILO under "Diagnostics" and then show the Embedded Flash/SD-CARD an error.
This problem was also reported to HP, but without a solution.

We have now found a workaround to fix the problems:

1: Update ILO Firmware to 2.54
2: Server Shutdown and complete power off
3: Disconnect powercable

 

Maybe this is also your solution.

 

Thanks

Kerry Quillen
Regular Advisor

Re: Random ILO errors

I'm seeing the same Embedded Flash/SD-CARD errors.  Sometimes your steps work for me and other times I have to power down server, unassign profile, format the nand, reinstall IP, reassign profile and reboot.  Strange that I have not had one single ILO issue until I start managing servers with Oneview.  I currently have 3 Gen8 blades that I can't even assign profiles to due to the self test error 8192.  Servers that have been in production for years w/ no issues.  According to the advisories, after performing all of the necessary steps, I need a new motherboard.  No longer have maintenance on them so guess they are junk now.  These servers were runnng ilo fw 2.55 and I had updated IP to the latest version prior to adding to Oneview.

Kerry Quillen
Regular Advisor

Re: Random ILO errors

latest update - I now have 9 servers, all Gen8 BL460c's and DL380's, that after walking through the steps in the advisory, apparently need new motherboards.  Some of them have been managed by Oneview for over a year.  No problems and then one day out of the blue, ILO errors.  The servers are still functional.  I just don't like all of that red and yellow in my dashboard.  Sorry for the rant but frustration level is peaking.

MarioE
Valued Contributor

Re: Random ILO errors

Hello
I have the same problems. The problems started when I wanted to update all iLO of the Proliant Server Gen8 and Gen9 (iLO 4) to version 2.54 according to the security bulletin (50 Gen8 and 58 Gen9 servers)
-> https://support.hpe.com/hpsc/doc/public/display?docId=hpesbhf03769en_us

1st problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) Upgrading to iLO 4 Firmware Version 2.20 From an Earlier Version Will Reset Some Settings to Default
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04760191&hprpt_id=HPGL_ALERTS_1990019&jumpid=em_alerts_us-us_Nov17_xbu_all_all_1328324_1990019_ServersEnterpriseSoftware_critical__&DIMID=EMID_7E899A18D3FF6077DDA51F95973BFB03/
Advisory: (Revision) Integrated Lights-Out 4 (iLO 4) - HPE ProLiant Gen8-Series Server iLO 4 Settings May Reset to Factory Defaults During an iLO 4 Reset
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00026427en_us&hprpt_id=HPGL_ALERTS_1982275&jumpid=em_alerts_us-us_Sep17_xbu_all_all_1249691_1982275_ServersEnterpriseSoftwareStorage_critical__/
out of 50 servers (only Gen8 Servers), 20 were affected

2nd problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) - Upgrading to iLO 4 Firmware Version 2.50 through 2.55 May Cause the iLO "Date/Time" to Become "Not Set"
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00027374en_us&hprpt_id=HPGL_ALERTS_1983811&jumpid=em_alerts_us-us_Sep17_xbu_all_all_1272053_1983811_ServersStorageEnterpriseSoftwareEnterpriseSolutions_recommended__/
out of 110 servers, 4 were affected

3rd problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) - HPE Active Health System (AHS) Logs and HPE OneView Profiles May Be Unavailable Causing iLO Self-Test Error 8192, Embedded Media Manager and Other Errors
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097
out of 110 servers, 10 were affected (2 motherboard were replaced)
These errors occur again and again.

I think it has nothing to do with the HPE OneView. All of these problems started with me when I started upgrading the iLO to version 2.54 / 2.55.
I have invested about 120 hours for the upgrades of the approximately 110 iLOs (with all the problems).

MeFromil
Frequent Advisor

Re: Random ILO errors

hi 

me to ... the way for solve it 

1. go to  ILO WEB ADMIN  to  ILO Activeity  and clear the log

2. flash  the NAND by login to the oa 

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097 

3. go back to ILO WEB ADMIN   and make sure the the format run Ok if the NAND didnt finish success , try once again

if fail again , you need to replace the board ..

4. run NEW hpIp on the server ,  you have diffrent version of HPIP for each Gen of server.

5. update the Spp / Ilo 

6. Refresh the server hardware and confirom any old active alerts 

7. if the oneview show the server red .. try to reset the server bay Or re-apply server profile 

 

 

 

 

 

 

Kerry Quillen
Regular Advisor

Re: Random ILO errors

 MeFromil - how do you flash NAND from within the OA?  I can't access the link you provided.

James Bull
Senior Member

Re: Random ILO errors

Log in with admin credentials to the iLO via CLI

Use the following XML, replace <ENTER BAY TO FLASH> with the number of the server bay you wish to flash.  This also takes ranges, multple bays, and ALL arguements.  The LOGIN USER_LOGIN credentials are placeholder tokens, if you're logged in as full admin whatever values you use will be ignored.

hponcfg <ENTER BAY TO FLASH> << end_marker
<RIBCL VERSION="2.0">
<LOGIN USER_LOGIN="adminname" PASSWORD="password">
<RIB_INFO MODE="write">
<FORCE_FORMAT VALUE="all" />
</RIB_INFO>
</LOGIN>
</RIBCL>
end_marker

After flashing the iLO NAND you'll need to boot into Intelligent Provisioning Recovery media appropriate for your generation of server blade to reimage that software component.  AHS will also be wiped, but was already probably corrupted anyway.