HPE 9000 and HPE e3000 Servers

single bit error

 
SOLVED
Go to solution
Javier Ortiz Guajardo
Frequent Advisor

single bit error

i receive this message
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.

Is this a HW error on dimm?

Please send me information about it?

Thanks.
The obstacles are those things that the people see when they left to see their goals.
9 REPLIES 9
Ken Hubnik_2
Honored Contributor

Re: single bit error

Yes but if it does not happen consistently than you have nothing to worry about. It is only informational.
Michael Duthie
Trusted Contributor
Solution

Re: single bit error

If it only happens once don't worry about it. If it starts to repeat, log a hardware call.
Although the single bit errors are being corrected, it may be advisable to
evaluate whether the system should be rebooted. Rebooting the system will
allow the memory page to be deallocated so it is no longer referenced. If
an excessive rate of single bit errors occur, an event with higher severity
will be generated.

James R. Ferguson
Acclaimed Contributor

Re: single bit error

Hi:

You can track memory errors this way:

# echo "selclass qualifier memory;info;wait;infolog" | cstm > /tmp/meminfo

Regards!

...JRF...
S.K. Chan
Honored Contributor

Re: single bit error

The STM or EMS report should show you how many single bit error you got. Regardless of how many there are I would still suggest that you look into it. You can call HP, give them the details of your server configuration (eg: model, memory config and firmware)and the error message. They should be able to tell you if it's a cause for concern or not. Sometimes it could be just you need to upgrade your firmware, if that's the case then there is no worry. Keep a close watch I've seen a single bit error that keeps increasing and I usually would not wait till it gets worse (it may).
T. M. Louah
Esteemed Contributor

Re: single bit error

Moreover, usually EMS will tell details of the Event, check /var/opt/resmon/log/event.log
It should tell you about the treshold. How many of these events are received & thier frequency is important.
How's the system memory modules layout, it is very important to have RAM fill in 1st slots 1st (also Memory like even numbers), you can chek RAM layout with:
# cstm --> map --> sel dev memory# --> info --> infolog --> look at Memory layout 0a/0b 1a/1b so forth..
For example, you don't want to see 0a/0b with 256MB & 1a/1b with 512MB this config is out-of-specs.

Cheers,
PAP! (a.k.a. Pliz Assign Points)



Little learning is dangerous!
monasingh_1
Trusted Contributor

Re: single bit error

Next time when you boot , there is a table PDT, check its size. It is in service sub menu. There is alimit of 50 , if you are approching that, then you must change the memory. Becasue if the errors have filled it or about to fill up that will mean at least the error has come 50 times...

If this error has just come once then I will not worry much.

hope this helps..
Chris Vail
Honored Contributor

Re: single bit error

On our production servers, we use only HP RAM. In our test and development servers, we have a mixture of HP and DataRam RAM. In the last year, we've had some trouble with the DataRam product, but never with the HP. Invariably, this trouble has shown up in the form of Single Bit Errors, such as what you describe.
The Processor Dependant Code on your machine may differ from our L-Class machines. But this PDC has a utility to examine the RAM layout. This will say which slots has HP Ram, and which is from another manufacturer. From your error message, you should be able to locate which slot has the DIMM with the bad memory, and this PDC utility will tell you who made it. If its HP Ram, and the system is in warranty and/or under service contract, HP will take care of it.
DataRam was very good about honoring their lifetime warranty. They need the stock keeping code off of the DIMM itself, but once they have it, they will cross-ship you a new DIMM overnight. You have to return the bad DIMM to them within 10 days or so, or they'll send you a bill. But their service is just great. You just need to tell them that the DIMM was de-allocated due to excessive SBE's.
Anyway--let us know how it turns out for you.


Chris
Philip Ladouceur
New Member

Re: single bit error

Does a reboot clear off single bit errors? Is the type of reboot important such as, a soft reset, whereby you do a shutdown -r to the server? Or would a poweroff reboot work? Such as a shutdown -h and then a poweroff/poweron of the server?
Charlie_17
Advisor

Re: single bit error

"Does a reboot clear off single bit errors? "

Single bit errors are not cleared from the PDT (Page Deallocation Table) by a reboot.

When a single bit error occurs, the corresponding memory page is entered into the PDT and deallocated from the system. Sometimes this deallocation is not possible - for example in case the kernel has a lock on this page that cannot be removed - but a reboot can resolve this.

Pages entered in the PDT will no longer be used - this is OK and no problem (because the pages are very small) - it is the sense of the PDT!
Only in case you have lots of pages of one DIMM pair (e.g. 0a/b) in the PDT you should consider to replace this DIMM pair (or single DIMM, depending on the type of server you have).
The HP technician then clears the PDT. All defect pages then of course will be re-entered by the system due to new errors, but the new DIMMS will be fully used until new errors occur.

I hope this answered your question. Or where else did you want to clear the single bit errors?

Cheers!
Time flies like an arrow -- fruit flies like a banana