HPE 9000 and HPE e3000 Servers
1748256 Members
3941 Online
108760 Solutions
New Discussion юеВ

Re: Single Bit Error

 
Kavita Poonia
Regular Advisor

Single Bit Error

Hi Folks,

We have got one EMSE Notification for one of our HP-UX server stating Single Bit Error. But I am unable to understand what this error is all about and how to troubleshoot with it and fix it. Please see the full details given below:

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Fri Apr 4 01:53:39 2008

ebzb2bs3 sent Event Monitor notification information:

/system/events/memory/8 is >= 3.
Its current value is MAJORWARNING(3).



Event data from monitor:

Event Time..........: Fri Apr 4 01:53:39 2008
Severity............: MAJORWARNING
Monitor.............: dm_memory
Event #.............: 4300
System..............: ebzb2bs3.eadv.na.jnj.com

Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.


Description of Error:

The memory component:

Cab/Cell or Node: 0
MC/EXT: N/A
DIMM: 1b
Serial Number: N/A
Part Number: N/A

is experiencing correctable single bit errors (SBE) on a single
component.

Probable Cause / Recommended Action:

Although the single bit errors are being corrected, it may be advisable to
monitor the situation. If an excessive rate of single bit errors occur, an
event with higher severity will be generated.

Additional Event Data:
System IP Address...: 10.15.20.15
Event Id............: 0x47f5c26300000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 70
Received within...: 7 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp4440
EMS Version.....................: A.04.20
STM Version.....................: A.57.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4300

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path....: 8
Tag 2...................: 20


>---------- End Event Monitoring Service Event Notification ----------<

Please help me out with this as soon as possible. Thanks a lot !
3 REPLIES 3
Mridul Shrivastava
Honored Contributor

Re: Single Bit Error

You have posted it in the wrong place that's the reason haven't got any answers.

Whats the current status, Are you still getting these events. Use xstm,cstm or mstm to isolate which DIMM is having the issue.

There is possibility that you are receiving all of the EMS event for a single page so have a look at the no. of pages entered in the PDT and no of errors a single page have.

If there are multiple pages causing multiple errors at different pages then that memory module has to be replaced, otherwise a reboot will resolve the issue.


Time has a wonderful way of weeding out the trivial
Torsten.
Acclaimed Contributor

Re: Single Bit Error

As said, reboot the server to allow exclusive access to that memory area and watch out for further messages. Come back if you receive some.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Prashanth.D.S
Honored Contributor

Re: Single Bit Error

Hi Kavita,

Single Bit Errors are correctable errors on a Dimm on your server, currently dimm at slot 1B has experienced a SBE. These are correctable and hence can be ignored and can be cleared with a reboot.

You need to be worried if these errors are very frequent on a single memory dimm at different addresses. Keep a close watch on the PDT (Page Deallocation Table) via cstm, if it cross beyond 80% then contact your HP support team to have the dimm replaced.

Run the below mentioned command and check the output file for SBE alerts..

# echo "map selall info;wait infolog" | cstm > /tmp/cstm.txt
#echo "gop cstmpager cat;scl type mem;info;wait;il"|cstm > /tmp/mem.out
#echo "gop cstmpager cat;ru l\nvd\n"|cstm >> /tmp/mem.out

If possible run these commands and attach the output file, i would take a look at it.

Best Regards,
Prashanth