cancel
Showing results for 
Search instead for 
Did you mean: 

Random ILO errors

Kerry Quillen
Frequent Advisor

Random ILO errors

Wondering if anyone else is experience random ILO related errors in Oneview.  I'm currently running Oneview 3.10 (upgraded from 3.0 yesterday).  My environment consists of G7-Gen9 blades and Gen8 & Gen9 DL380's - around 135 total.  ILO's are at 2.50 or 2.54 firmware.  Every day of so I'll see one or more servers showing status of critical with error such as these:

"Unable to register this HPE OneView instance with iLO: The iLO initialization was unable to complete"

"Remote Insight/ Integrated Lights-Out self test error 8192"

"Unable to read and save firmware installation status information from the server hardware"

There is no common denominator - site, enclosure, model, etc.  I can usually clear these errors but it requires a server reboot.  I may have to unassign and reassign profiles, format NAND for 8192 errors, or delete server and re-add to Oneview..  It varies.  Yesterday I had 3 servers in error prior to upgrading to 3.10.  After the upgrade I had 7.  I'm spending hours each week clearing these up.

 

14 REPLIES
ChrisLynchHPE
Neighborhood Moderator

Re: Random ILO errors

Your iLO errors are likely due to issues with iLO firmware.  Please review this Customer Advisory.

Kerry Quillen
Frequent Advisor

Re: Random ILO errors

I'm way too familiar with that advisory having performed its suggested actions numerous times.  Does the NAND flash have a high failure rate?  Do they randomly just fail?  Just trying to understand why a managed server goes from good to bad after a reboot and why rebooting the Oneview appliance causes multiple server failures.  Way too much time spent clearing up these errors.

Michel-CO
Occasional Advisor

Re: Random ILO errors

Hey,

we have the same problem at many Servers. We can find the Problem in the ILO under "Diagnostics" and then show the Embedded Flash/SD-CARD an error.
This problem was also reported to HP, but without a solution.

We have now found a workaround to fix the problems:

1: Update ILO Firmware to 2.54
2: Server Shutdown and complete power off
3: Disconnect powercable

 

Maybe this is also your solution.

 

Thanks

Kerry Quillen
Frequent Advisor

Re: Random ILO errors

I'm seeing the same Embedded Flash/SD-CARD errors.  Sometimes your steps work for me and other times I have to power down server, unassign profile, format the nand, reinstall IP, reassign profile and reboot.  Strange that I have not had one single ILO issue until I start managing servers with Oneview.  I currently have 3 Gen8 blades that I can't even assign profiles to due to the self test error 8192.  Servers that have been in production for years w/ no issues.  According to the advisories, after performing all of the necessary steps, I need a new motherboard.  No longer have maintenance on them so guess they are junk now.  These servers were runnng ilo fw 2.55 and I had updated IP to the latest version prior to adding to Oneview.

Kerry Quillen
Frequent Advisor

Re: Random ILO errors

latest update - I now have 9 servers, all Gen8 BL460c's and DL380's, that after walking through the steps in the advisory, apparently need new motherboards.  Some of them have been managed by Oneview for over a year.  No problems and then one day out of the blue, ILO errors.  The servers are still functional.  I just don't like all of that red and yellow in my dashboard.  Sorry for the rant but frustration level is peaking.

MarioE
Advisor

Re: Random ILO errors

Hello
I have the same problems. The problems started when I wanted to update all iLO of the Proliant Server Gen8 and Gen9 (iLO 4) to version 2.54 according to the security bulletin (50 Gen8 and 58 Gen9 servers)
-> https://support.hpe.com/hpsc/doc/public/display?docId=hpesbhf03769en_us

1st problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) Upgrading to iLO 4 Firmware Version 2.20 From an Earlier Version Will Reset Some Settings to Default
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04760191&hprpt_id=HPGL_ALERTS_1990019&jumpid=em_alerts_us-us_Nov17_xbu_all_all_1328324_1990019_ServersEnterpriseSoftware_critical__&DIMID=EMID_7E899A18D3FF6077DDA51F95973BFB03/
Advisory: (Revision) Integrated Lights-Out 4 (iLO 4) - HPE ProLiant Gen8-Series Server iLO 4 Settings May Reset to Factory Defaults During an iLO 4 Reset
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00026427en_us&hprpt_id=HPGL_ALERTS_1982275&jumpid=em_alerts_us-us_Sep17_xbu_all_all_1249691_1982275_ServersEnterpriseSoftwareStorage_critical__/
out of 50 servers (only Gen8 Servers), 20 were affected

2nd problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) - Upgrading to iLO 4 Firmware Version 2.50 through 2.55 May Cause the iLO "Date/Time" to Become "Not Set"
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00027374en_us&hprpt_id=HPGL_ALERTS_1983811&jumpid=em_alerts_us-us_Sep17_xbu_all_all_1272053_1983811_ServersStorageEnterpriseSoftwareEnterpriseSolutions_recommended__/
out of 110 servers, 4 were affected

3rd problem:
Advisory: (Revision) HPE Integrated Lights-Out 4 (iLO 4) - HPE Active Health System (AHS) Logs and HPE OneView Profiles May Be Unavailable Causing iLO Self-Test Error 8192, Embedded Media Manager and Other Errors
-> https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097
out of 110 servers, 10 were affected (2 motherboard were replaced)
These errors occur again and again.

I think it has nothing to do with the HPE OneView. All of these problems started with me when I started upgrading the iLO to version 2.54 / 2.55.
I have invested about 120 hours for the upgrades of the approximately 110 iLOs (with all the problems).

MeFromil
Advisor

Re: Random ILO errors

hi 

me to ... the way for solve it 

1. go to  ILO WEB ADMIN  to  ILO Activeity  and clear the log

2. flash  the NAND by login to the oa 

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c04996097 

3. go back to ILO WEB ADMIN   and make sure the the format run Ok if the NAND didnt finish success , try once again

if fail again , you need to replace the board ..

4. run NEW hpIp on the server ,  you have diffrent version of HPIP for each Gen of server.

5. update the Spp / Ilo 

6. Refresh the server hardware and confirom any old active alerts 

7. if the oneview show the server red .. try to reset the server bay Or re-apply server profile 

 

 

 

 

 

 

Kerry Quillen
Frequent Advisor

Re: Random ILO errors

 MeFromil - how do you flash NAND from within the OA?  I can't access the link you provided.

James Bull
Established Member

Re: Random ILO errors

Log in with admin credentials to the iLO via CLI

Use the following XML, replace <ENTER BAY TO FLASH> with the number of the server bay you wish to flash.  This also takes ranges, multple bays, and ALL arguements.  The LOGIN USER_LOGIN credentials are placeholder tokens, if you're logged in as full admin whatever values you use will be ignored.

hponcfg <ENTER BAY TO FLASH> << end_marker
<RIBCL VERSION="2.0">
<LOGIN USER_LOGIN="adminname" PASSWORD="password">
<RIB_INFO MODE="write">
<FORCE_FORMAT VALUE="all" />
</RIB_INFO>
</LOGIN>
</RIBCL>
end_marker

After flashing the iLO NAND you'll need to boot into Intelligent Provisioning Recovery media appropriate for your generation of server blade to reimage that software component.  AHS will also be wiped, but was already probably corrupted anyway.

Generious
Occasional Advisor

Re: Random ILO errors

What I normally use same as James Bull
The password and login fields can be whatever.

==============================

Connect to the OA modules using Putty
Paste in the following – but replace X with the target bay!

hponcfg X << <*>

<RIBCL VERSION="2.0">
<LOGIN USER_LOGIN="Administrator" PASSWORD="">
<RIB_INFO MODE="write">
<FORCE_FORMAT VALUE="all" />
</RIB_INFO>
</LOGIN>
</RIBCL>

<*>

==============================

 

If successful the RIBCL output will show status 0x0000

<RESPONSE
STATUS="0x0000"
MESSAGE='Forcing a format of the partition after the iLO reset.'
/>
</RIBCL>
<?xml version="1.0"?>
<RIBCL VERSION="2.23">
<RESPONSE
STATUS="0x0000"
MESSAGE='No error'
/>
</RIBCL>

Kerry Quillen
Frequent Advisor

Re: Random ILO errors

I've ran the force_format command numerous times.  Sometimes it fixes the issue, sometimes not.  However I execute it from my desktop via this command line:

C:\HP_Lights-Out_Configuration_Utility\hpqlocfg -s ILO_IP -l c:\temp\hpqcfg.log -f Force_Format.xml -v -t user=user1,password=password1.

 

It always runs to completeing with no errors but I can tell by the run time whether it worked or not.  If it has not fixed the issue it pauses just before finishing.  

pjo65
Occasional Visitor

Re: Random ILO errors

Chris!

You are wrong! Having the same issues with latest ilo4 f.w.
My problem is that i'm unable to correctly update one blade without this.
However accessing the ilo from oneview is not a problem.

I don't want to make the same mistake as i have heard of = mount firmware iso directly to the blade and perform the upgrade..

It feels like the only thing to do is to open a ticket @SUPPORT

ChrisLynchHPE
Neighborhood Moderator

Re: Random ILO errors

@pjo65, please open a support case for your issue.

Re: Random ILO errors

The solution is to replace the entire system board of the blade server as the iLO NAND chip is embedded within itself. 

Once the NAND chip is replaced then update the iLO firmware to version 2.60. 

This advisory explains why the iLO NAND chip fails from time to time. 

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00049583en_us&hprpt_id=HPGL_ALERTS_2014320&jumpid=em_alerts_us-us_Jun18_xbu_all_all_1581050_2014320_ServersEnterpriseSoftwareStorageEnterpriseSolutionsSynergy_critical__&DIMID=EMID_DCB5AF4BC692315B230F3E88B829FACF/