HPE ProLiant Servers (ML,DL,SL)
1855866 Members
11141 Online
104107 Solutions
New Discussion

Moonshot Proliant m750 Failed DIMM Detection Method

 
dnewkirk
Occasional Contributor

Moonshot Proliant m750 Failed DIMM Detection Method

Hello,

I have a Moonshot Proliant m750 blade and I am looking for methods to detect DIMM failures for currently installed DIMMs. Preferrably using ILO5 accessed via CLI.

I've found a method to check for DIMM failures using redfish to access the IML. The issue with this method is that it captures a failure at a point in time, that DIMM may have been replaced and is no longer installed. I need a method that will show the status of currently installed failed DIMMs.

 

Chassis Firmware 2.2.30

iLO Firmware Version 2.48 Aug 02 2021

System ROM H09 v1.60 (07/24/2023)

 

bash-5.2$ curl -ks "https://IP/redfish/v1/Systems/1/LogServices/IML/Entries" -k -u Credentials | jq -r '.Members[]|select(.Message | contains("DIMM"))'
{
"@odata.context": "/redfish/v1/$metadata#LogEntry.LogEntry",
"@odata.id": "/redfish/v1/Systems/1/LogServices/IML/Entries/164",
"@odata.type": "#LogEntry.v1_1_0.LogEntry",
"Id": "164",
"Created": "2024-08-22T09:44:41Z",
"EntryType": "Oem",
"Message": "DIMM Failure - Uncorrectable Memory Error (Processor 1, DIMM 2)",

 

Please provide suggestions! Thanks.

7 REPLIES 7
Suman_1978
HPE Pro

Re: Moonshot Proliant m750 Failed DIMM Detection Method

Hi,

Have you checked this?  GET_EMBEDDED_HEALTH

Thank You!
I work with HPE but opinions expressed here are mine.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
MV3
HPE Pro

Re: Moonshot Proliant m750 Failed DIMM Detection Method

Hello,

 

Please check the link below and see if it helps. Refer page 68.

https://www.hpe.com/psnow/doc/a00018323en_us

 

Cheers...



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
support_s
System Recommended

Query: Moonshot Proliant m750 Failed DIMM Detection Method

Hello,

 

Let us know if you were able to resolve the issue.

 

If you have no further query, and you are satisfied with the answer then kindly mark the topic as Solved so that it is helpful for all community members.

 

Please click on "Thumbs Up/Kudo" icon to give a "Kudo".

 

Thank you for being a HPE valuable community member.


Accept or Kudo

dnewkirk
Occasional Contributor

Re: Query: Moonshot Proliant m750 Failed DIMM Detection Method

Thank you for the suggestions. I will work with my team to test these methods and update if they are working as needed.

DanRobinson
HPE Pro

Re: Query: Moonshot Proliant m750 Failed DIMM Detection Method

FYI, for Redfish queries I would highly suggest you also sign up for the HPE Developer Slack community.
Some of the best HPE Redfish gurus hang out in there and try to help answer questions.



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
dnewkirk
Occasional Contributor

Re: Query: Moonshot Proliant m750 Failed DIMM Detection Method

It seems like the only way to detect (un)correctable ECC errors is through the IML. My conclusion is that the best way to test DIMMs would be to clear the IML log, run a memory stress test, and then check the IML for any new ECC errors. 

Am I correct or is the another method that doesn't involve clearing the IML?

DanRobinson
HPE Pro

Re: Query: Moonshot Proliant m750 Failed DIMM Detection Method

I pinged some internal folks and they said IML is the place to go.
You can check it via GUI or API call
If the Uncorrectable ECC entry in the IML has not been "Cleared", then you will also show the overall Health Status in iLO reporting degraded for the Hardware subsystem.

 

Hope that helps



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo