1834869 Members
2437 Online
110070 Solutions
New Discussion

diag messages

 
SOLVED
Go to solution
Michael Dalanek
Advisor

diag messages

Im getting diagnostic emails from my L class with the following errors;

Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single bit error has been detected and logged.

What do I do to fix them ?
8 REPLIES 8
Stefan Farrelly
Honored Contributor
Solution

Re: diag messages


In my experience these memory errors are almost certainly caused by faulty firmware on the HP server, not actual memory problems. I remember time after time replacing memory when we get errors like this only to find the memory we removed works fine when either reseated or installed in a different server. Whenever we upgraded the firmware the memory errors went away. So, for your L-class load the latest firmware first - its now in a simple patch, PHSS_21696 is the latest I believe, firmware version 40.26. Once the server reboots after installing it youre memory errors should go away.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Patrick Wessel
Honored Contributor

Re: diag messages

Nothing!
One single bit error is nothing to worry. But keep an eye on it. If you receive more of them you better contact you hardware support.
There is no good troubleshooting with bad data
Patrick Wessel
Honored Contributor

Re: diag messages

Stefan,
I can agree with you. Of cause is a bad connection a possible reason for a single bit error. Or some diagnostics problems too. But, a single bit error can be located in either main memory or CPU cache.
Updating the firmware can't be wrong, 40.26 does a lot of good things. But it's not the all-round recipe for SBE.
But the way you react on SBE should depend on the frequency they appear
There is no good troubleshooting with bad data
Michael Dalanek
Advisor

Re: diag messages


I used to get 1 email a day but am now up to about 3 emails a day with this error. I want to be proactive now and do something about it before it crashes and I get it in the neck.
Patrick Wessel
Honored Contributor

Re: diag messages

Michael,
An L-Class server should have at least warranty, right? Contact your HP Hardware support. They should take care of the problem
There is no good troubleshooting with bad data
Stefan Farrelly
Honored Contributor

Re: diag messages


If youre running a mission-critical server setup then even 1 SBE it cause for concern and I would definitely be proactive and do something about it. Ive seen 1 SBE in several weeks and then the server crashes with an HPMC! The vast majority of people do not keep the firmware up-to-date on their HP servers. This is the first proactive thing to do/check if you get an SBE. If they continue after the firmware upgrade then the next proactive thing to do is reseat the memory. If they still continue (very unlikely in my opinion) then swap the memory to/from another server. If you have a large number of servers all the same then if theyre up-to-date with firmware and the environment is good then you should not be getting even a single SBE. I was on a large project for HP all over Europe all running identical K-class and we often used to get SBE's but once I trawled all over Europe upgrading their firmware we never got any SBE's anymore.

Im from Palmerston North, New Zealand, but somehow ended up in London...
Michael Dalanek
Advisor

Re: diag messages


Im going to try the firmware upgrade, ive downloaded the patch. Will let you know if that clears it up.
Rick Garland
Honored Contributor

Re: diag messages

If you are getting frquest e-mail notifications about the single-bit errors, call for HW support.
One message every so often is not much to worry about except you need to keep an eye on. If you start getting the messages on a more frequect basis, you have a memory module or memory carrier going out. If you have DIAGS loaded and can access the stm (cstm or xstm or mstm) you can see the specific area causing the problem.