- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Problem with the RAM
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-06-2006 06:21 PM
тАО06-06-2006 06:21 PM
I have a server L1000 HP-UX 11.11 and i faced really big problems on the past. I end up to the conclusion that there was a memory problem and on the beginning of May i changed them. The problem now are less but I got this log output . I think that there is memory i should change on slot 2a/b. Could you please give me any more suggestions pls... i am giving you below some output of the log file.
Thank you.
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 08:25:13 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is MAJORWARNING(3).
Event data from monitor:
Event Time..........: Tue May 30 08:25:13 2006
Severity............: MAJORWARNING
Monitor.............: dm_memory
Event #.............: 4000
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2b
Serial Number: N/A
Part Number: N/A
is experiencing correctable single bit errors (SBE) on a single
component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it may be advisable to
monitor the situation. If an excessive rate of single bit errors occur, an
event with higher severity will be generated.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447bd73900000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 20
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4000
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Physical Device Path....: 8
Tag 2...................: 20
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 09:22:05 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is SERIOUS(4).
Event data from monitor:
Event Time..........: Tue May 30 09:22:05 2006
Severity............: SERIOUS
Monitor.............: dm_memory
Event #.............: 4100
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2b
Serial Number: N/A
Part Number: N/A
is experiencing a high rate of correctable single bit errors on a
single component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it is advisable to
closely monitor the situation. If an excessive rate of single bit errors
occur, an event with higher severity will be generated.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447be48d00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 50
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4100
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 09:39:07 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is MAJORWARNING(3).
Event data from monitor:
Event Time..........: Tue May 30 09:39:07 2006
Severity............: MAJORWARNING
Monitor.............: dm_memory
Event #.............: 4000
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2a/b
Serial Number: N/A
Part Number: N/A
is experiencing correctable single bit errors (SBE) on a single
component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it may be advisable to
monitor the situation. If an excessive rate of single bit errors occur, an
event with higher severity will be generated.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447be88b00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 20
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4000
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 09:46:07 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is MAJORWARNING(3).
Event data from monitor:
Event Time..........: Tue May 30 09:46:07 2006
Severity............: MAJORWARNING
Monitor.............: dm_memory
Event #.............: 4300
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2b
Serial Number: N/A
Part Number: N/A
is experiencing correctable single bit errors (SBE) on a single
component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it may be advisable to
monitor the situation. If an excessive rate of single bit errors occur, an
event with higher severity will be generated.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447bea2f00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 70
Received within...: 7 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4300
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 10:23:10 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is SERIOUS(4).
Event data from monitor:
Event Time..........: Tue May 30 10:23:10 2006
Severity............: SERIOUS
Monitor.............: dm_memory
Event #.............: 4400
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2b
Serial Number: N/A
Part Number: N/A
is experiencing a high rate of correctable single bit errors on a
single component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it is advisable to
closely monitor the situation. If an excessive rate of single bit errors
occur, an event with higher severity will be generated.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447bf2de00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 100
Received within...: 7 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4400
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue May 30 10:54:13 2006
sathes4 sent Event Monitor notification information:
/system/events/memory/8 is >= 1.
Its current value is CRITICAL(5).
Event data from monitor:
Event Time..........: Tue May 30 10:54:13 2006
Severity............: CRITICAL
Monitor.............: dm_memory
Event #.............: 4200
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cab/Cell or Node: 0
MC/EXT: 0
DIMM: 2b
Serial Number: N/A
Part Number: N/A
is experiencing an excessive rate of single bit errors on a single
component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it is strongly advisable
to closely monitor the situation. This condition indicates a potential
problem. Contact your HP support representative to check the memory boards.
Additional Event Data:
System IP Address...: 172.30.104.27
Event Id............: 0x447bfa2500000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 120
Received within...: 1 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L1000-36
EMS Version.....................: A.04.00
STM Version.....................: A.45.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4200
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-06-2006 07:02 PM
тАО06-06-2006 07:02 PM
Re: Problem with the RAM
I would suggest you to replace the DIMMS 2a/2b immediately.
Check PDT entries as well.
-Amit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-06-2006 07:34 PM
тАО06-06-2006 07:34 PM
Re: Problem with the RAM
All single bit errors dont necessarily mean a faulty dimm.. so it is best to collect the cstm info output and post it for us to see..
# echo "map selall info; wait infolog" | cstm > /filename
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-07-2006 09:32 PM
тАО06-07-2006 09:32 PM
SolutionYes, those EMS events are telling you that you have a real problem with those DIMMs.
Albert is right that a small number of SBEs is acceptable, which is exactly why the OnlineDiags are present; they monitor the hardware and send appropriate EMS events when certain thresholds are reached. In this case, the 4200 event was generated when over 120 SBEs on the DIMM were recorded in 24 hours.
The documentation page shows the thresholds for the different events: http://docs.hp.com/en/diag/ems/dm_memory.htm
I would also recommend that you upgrade to a current, supported, version of the OnlineDiags. You have the A.45.00 version, which was the June 2004 release. See http://www.docs.hp.com/en/diag/stm/stm_upd.htm#table to see the supported versions.
The link to download the latest OnlineDiags is:
http://www.software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE
You can also go to http://www.software.hp.com and then type "B6191AAE" in the search box.
http://docs.hp.com/en/diag/stm/stm_ptch.htm shows the latest patches. A.49 (HWE0509) is the latest version of OnlineDiags for 11.11, and the latest patch for that (to be applied after you upgrade to A.49) is PHSS_34288.
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2006 01:49 AM
тАО06-08-2006 01:49 AM
Re: Problem with the RAM
Run the following an paste the output:
cstm
sel pa 8
info
wai
il
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2006 01:56 AM
тАО06-08-2006 01:56 AM
Re: Problem with the RAM
did a
MAP
then sel dev # (for corresponding memory)
cstm>map
esuunix1
Dev Last Last Op
Num Path Product Active Tool Status
=== ==================== ========================= =========== =============
1 system system () Information Successful
2 0 Bus Adapter (803)
3 0/0 PCI Bus Adapter (782)
4 0/0/0/0 Core PCI 100BT Interface
5 0/0/1/0 PCI SCSI Interface (10000 Information Successful
6 0/0/1/0.0.0 SCSI Tape (HPC1537A)
7 0/0/1/0.2.0 SCSI Disk (HPDVD-ROM)
8 0/0/2/0 PCI SCSI Interface (10000
9 0/0/2/0.6.0 SCSI Disk (SEAGATEST39204
10 0/0/2/1 PCI SCSI Interface (10000
11 0/0/2/1.6.0 SCSI Disk (SEAGATEST39204
12 0/0/4/0 RS-232 Interface (103c104
13 0/0/5/0 RS-232 Interface (103c104
14 0/1 PCI Bus Adapter (782)
15 0/2 PCI Bus Adapter (782)
16 0/4 PCI Bus Adapter (782)
17 0/4/0/0 PCI Terminal Multiplexor
18 0/5 PCI Bus Adapter (782)
19 0/5/0/0 PCI SCSI Interface (10000
20 0/5/0/0.3.0 SCSI Tape (QUANTUMDLT8000 Information Successful
21 0/8 PCI Bus Adapter (782)
22 0/8/0/0 PCI Gigabit Ethernet Link
23 0/10 PCI Bus Adapter (782)
24 0/12 PCI Bus Adapter (782)
25 0/12/0/0 PCI 100 BaseT LAN Interfa
26 1 Bus Adapter (803)
27 1/0 PCI Bus Adapter (782)
28 1/2 PCI Bus Adapter (782)
29 1/4 PCI Bus Adapter (782)
30 1/4/0/0 PCI Bus Adapter (80860964
31 1/4/0/1 I2O Interface Adapter (RA
32 1/4/0/1.0.0.0 SCSI Disk (I2ORAID1)
33 1/4/0/1.0.0.1 SCSI Disk (I2ORAID1)
34 1/4/0/1.0.0.2 SCSI Disk (I2ORAID1)
35 1/4/0/1.0.0.3 SCSI Disk (I2ORAID1)
36 1/8 PCI Bus Adapter (782)
37 1/10 PCI Bus Adapter (782)
38 1/10/0/0 PCI 100 BaseT LAN Interfa
39 1/12 PCI Bus Adapter (782)
40 1/12/0/0 PCI SCSI Interface (10000
41 37 CPU (5d3) Information Killed
42 45 CPU (5d3)
43 101 CPU (5d3)
44 109 CPU (5d3)
45 192 MEMORY (90) Information Successful
then
sel dev # (memory entry)
info
then il
This will show you when set is bad or in error.
We had to replace a couple of DIMMS.. Single bit error mean very little.. CSTM will should if one of the slots is Dead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-08-2006 01:57 AM
тАО06-08-2006 01:57 AM
Re: Problem with the RAM
do a MAP before INFO