- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: EMS Event Notification from /system/events/mem...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2002 09:07 AM
03-06-2002 09:07 AM
On a HP9000/813 with HP-UX 10.20, we've got an error message from EMS on the syslog.log file.
After executing the command "/opt/resmon/bin/resdata -R 50200578 -r /system/events/memory/63 -n 50200733 -a" (find in the syslog.log file), we got this
>>>>> BEGINNING
CURRENT MONITOR DATA:
Event Time..........: Wed Mar 6 10:06:50 2002
Severity............: SERIOUS
Monitor.............: dm_memory
Event #.............: 4400
System..............: nodename
Summary:
Memory Event Type : Single bit error (SBE) event. A correctable single
bit error has been detected and logged.
Description of Error:
The memory component:
Cell/Node: 0
MC/EXT: 0
DIMM: 0A
is experiencing a high rate of correctable single bit errors on a
single component.
Probable Cause / Recommended Action:
Although the single bit errors are being corrected, it is advisable to
evaluate whether any memory replacement is warranted at this time. If an
excessive rate of single bit errors occur, an event with higher severity
will be generated.
Additional Event Data:
System IP Address...:
Event Id............: 0x3c85dc2a00000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 100
Received within...: 7 day(s)
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/813
EMS Version.....................: A.03.20
STM Version.....................: A.24.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#4400
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Message in ll_msg (set: 50 msg: 10) did not exist in catalog.
Catalog type is MONITOR_INFO
Catalog version is A.01.00
Module name is dm_memory
Message set number is 50
Message number is 10
Message size is 72
Message parameter 1:
Message parameter size = 12
Message parameter is a literal
Literal text is 63
Message in ll_msg (set: 50 msg: 11) did not exist in catalog.
Catalog type is MONITOR_INFO
Catalog version is A.01.00
Module name is dm_memory
Message set number is 50
Message number is 11
Message size is 72
Message parameter 1:
Message parameter size = 12
Message parameter is a literal
Literal text is 20
<<<<< END
Can you help us : what does this mean? how can we resolve this problem?... We are thinking about a hardware problem.
Thanks in advance.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2002 09:10 AM
03-06-2002 09:10 AM
Solutionthis could mean problem with one of the memory modules on your system. Do this,
echo 'selclass qualifier memory;info;wait;infolog' |cstm >/tmp/meminfo.txt
more /tmp/meminfo.txt
You can see the details on any errors and the module that is giving errors. If it is regular feature, it would be good to get the module(s) replaced.
Hope this helps.
Regds
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2002 09:29 AM
03-06-2002 09:29 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
you can send this output to HP for their views.
Sandip
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2002 09:41 AM
03-06-2002 09:41 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
The event 4400 for memory indicates that 120 occurrences of single bit errors (SBE) in 7 days. As stated it is recommended that you monitor the situation closely. These error messages are sometimes accompanied by event 4200, 4300 or 4500. Each one has a different threshold for number of errors over a given time. If this is a production system or high availability required system you probably need to go ahead and open a hardware service call.
As stated in the thread above you can use the cstm, xstm or mstm interfaces to gain further insite on what's going on. If you need help with the diagnostic tools go to http://docs.hp.com/hpux/diag/index.html
You can accomplish logging a hardware call on ITRC via:
Maintenace & Support -> Collaborate-> hardware calls
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2002 02:53 AM
03-07-2002 02:53 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
From your error log the error is not always on the same memory address.
Log a hardware call with a view to getting this memory swapped out.
HTH
Paula
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2002 08:03 AM
03-11-2002 08:03 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
According to your report output:
1. All of the memory errors are occurring on the Phys Bank 0 (check the address range is 0x00....while the Phys Bank 1 has an address range of 0x10...). As has been stated before you probably need to have HW call opened and the memory replaced.
2. As far as trying to determine if this is causing your application to fail, you can use the ipcs command to determine what shared memory addresses are being used by the application, however these addresses may move around depending on what memory is available at the time the application is started and when it gets a slice of memory to use. Therefore you would need to know the memory being used at the time of failure to make any kind of judgement.
3. In the future I would highly recommend that you remove any node specific information (IP addresses, hostname etc.) that might be a security risk to your systems before posting information on public web sites. This information is normally not needed by people trying to help you resolve the issue and while I would hope that this information could not be used destructively...better to be safe than sorry.
4. To that end I am removing your attachment and reposting it minus the node information.
5. Please assign points to the hard working forum members that have given time and effort to provide suggestions to help you resolve your issue.
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2002 08:08 AM
03-11-2002 08:08 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
Hi,
First of all, thanks a lot ...
The result of the "echo 'selclass ...; info; wait; infolog' | cstm" is in the attached file.
- Can you help to understand this log? I think that there is hard problem on a memory bank, but which one ? Is that right ?
- On this server, we have an application which often crashed. Is it possible to link these crashes with our memory problem and how ?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2002 08:30 AM
03-11-2002 08:30 AM
Re: EMS Event Notification from /system/events/memory/63 ... CRITICAL
Looks to me as if the problem is with the 2nd module,
8 1 0A/0B 256 enabled 0x10000000 - 1fffffff N/A
Error address :0x00000000142667d8
Hope this helps.
Regds