- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Weird EMS output
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 04:03 AM
07-27-2012 04:03 AM
Weird EMS output
Good day Gurus,
following an accidental power outage imhaving the below EMS outputs on rp5470 servers, any suggestion is much appreciated:
on server1:
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Wed Jul 26 12:23:56 2012
server1 sent Event Monitor notification information:
/storage/events/disks/default/0_0_1_1.0.0 is >= 3.
Its current value is SERIOUS(4).
Event data from monitor:
Event Time..........: Wed Jul 26 12:23:56 2012
Severity............: SERIOUS
Monitor.............: disk_em
Event #.............: 100038
System..............: server1.org.com
Summary:
Disk at hardware path 0/0/1/1.0.0 : Media failure
Description of Error:
The format of the medium in the device is corrupt. The medium is unusable.
Probable Cause / Recommended Action:
A format operation in progress on the device may have been interrupted.
Restart the formatting process.
Alternatively, the medium in the device is flawed. If the medium is
removable, replace the medium.
Alternatively, if the medium is not removable, the device has experienced
a hardware failure. Contact your HP support representative to have the
device checked.
Additional Event Data:
System IP Address...: 192.168.50.1
Event Id............: 0x500fd74c00000000
Monitor Version.....: B.01.01
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L3000-5x
OS Version......................: B.11.11
STM Version.....................: A.49.00
EMS Version.....................: A.04.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100038
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Physical Device Path...: 0/0/1/1.0.0
Device Class...........: Disk
Inquiry Vendor ID......: HP 146 G
Inquiry Product ID.....: ST3146807LC
Firmware Version.......: HPC3
Serial Number..........: 3HY0NQ0P00007337LAV5
Product/Device Identification Information:
Logger ID.........: disc30; sdisk
Product Identifier: Disk
Product Qualifier.: HP 146 GST3146807LC
SCSI Target ID....: 0x00
SCSI LUN..........: 0x00
SCSI Command Data Block:
Command Data Block Contents:
0x0000: 25 00 00 00 00 00 00 00 00 00
Command Data Block Fields (10-byte fmt):
Command Operation Code...(0x25)..: READ CAPACITY
Logical Unit Number..............: 0
Relative Address Bit.............: 0
Partial Medium Indicator Bit.....: 0
Logical Block Address............: 0 (0x00000000)
Hardware Status: (not present in log record).
SCSI Sense Data:
Undecoded Sense Data:
0x0000: 70 00 03 00 00 00 00 0A 00 00 00 00 31 00 05 00
0x0010: 00 00
SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x03
Information Field Valid : FALSE
Information Field : 0x00000000
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x31
Additional Sense Qualifier : 0x00
Field Replaceable Unit : 0x05
Sense Key Specific Data Valid : FALSE
Sense Key Specific Data : 0x00 0x00 0x00
Sense Key 0x03, MEDIUM ERROR, indicates that the command terminated
with a nonrecovered error condition that was probably caused by a
flaw in the medium or an error in the recorded data. This sense key
may also be returned if the device is unable to distinguish between a
flaw in the medium and a specific hardware failure (sense key 0x04).
The combination of Additional Sense Code and Sense Qualifier (0x3100)
indicates: Medium format corrupted.
>---------- End Event Monitoring Service Event Notification ----------<
however when i issue ioscan, everything seems to be normal:
root> ioscan -fnC disk
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 2 0/0/1/1.0.0 sdisk CLAIMED DEVICE HP 146 GST3146807LC
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
disk 1 0/0/1/1.2.0 sdisk CLAIMED DEVICE COMPAQ BD1468A4B5
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 3 0/0/2/0.0.0 sdisk CLAIMED DEVICE HP 146 GST3146807LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
disk 4 0/0/2/0.2.0 sdisk CLAIMED DEVICE COMPAQ BD14685A26
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
on server 2:
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Tue Jul 26 19:10:40 2012
server2 sent Event Monitor notification information:
/system/events/memory/192 is >= 3.
Its current value is CRITICAL(5).
Event data from monitor:
Event Time..........: Tue Jul 26 19:10:39 2012
Severity............: CRITICAL
Monitor.............: dm_memory
Event #.............: 1400
System..............: server2.org.com
Summary:
Memory Event Type : A memory page has been deallocated and entered into
the Page Deallocation Table (PDT).
Description of Error:
The Page Deallocation Table (PDT) is 100% full.
PDT Entries Used: 50
PDT Entries Free: 0
PDT Total Size: 50
A large number of memory pages have been deallocated due to excessive
correctable single bit errors being detected. Since the PDT is 100%
full, no more entries can be added to it.
Probable Cause / Recommended Action:
The Page Deallocation Table (PDT) is full, it is strongly advisable to monitor
the situation. Although the errors are being corrected,
this condition indicates a potential problem.
Contact your HP support representative to check the memory boards.
Additional Event Data:
System IP Address...: 192.168.50.2
Event Id............: 0x500ee52000000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_memory.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Value received met: value(100) = 100
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/L3000-5x
EMS Version.....................: A.04.20
STM Version.....................: A.49.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#1400
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
Component Data:
Physical Device Path....: 192
Tag 2...................: 20
Thanks in advance
- Tags:
- PDT
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 08:53 AM
07-27-2012 08:53 AM
Re: Weird EMS output
Well! Looks like you lost a disk on system 1 and a memory DIMM on the other system. I would guess the disk is one of your mirrored OS disks. Give us a vgdisplay -v so we can see where it's at and if the LV's are still synced. These are older systems, and once in a great while, I have seen a disk recover by re-seating it. But bring the system down and power off before trying that.
The memory DIMM is going to require a call to support to get a replacement. The disk too, if re-seating it doesn't bring it back.
- Tags:
- vgdisplay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 09:27 AM
07-27-2012 09:27 AM
Re: Weird EMS output
Hello and thanks foryour reply, however, is it a DIMM issue or as mentioned by the output message a memory board? in addition and as it relates to the HDD at hardware path 0/0/1/1.0.0 : Media failure (referring to EMS message) how can it be that this specific disk is not showing any inconsistencies while issuing ioscan as it is showing a CLAIMED status?
ioscan -fnC disk
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 2 0/0/1/1.0.0 sdisk CLAIMED DEVICE HP 146 GST3146807LC
/dev/dsk/c1t0d0 /dev/rdsk/c1t0d0
Could it be that this power outage affect EMS so that it is showing inconsistent error messages?
How should i proceed?
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 09:41 AM
07-27-2012 09:41 AM
Re: EMS output (disk and PDT errors)
>is it a DIMM issue or as mentioned by the output message a memory board?
It's most likely a DIMM issue since there are over 50 of them over the years.
>Could it be that this power outage affect EMS so that it is showing inconsistent error messages?
>How should I proceed?
I wouldn't think so.
You could use dd(1) on the raw disk to read every block.
You could also unmount the filesystem and do a fsck on it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 09:48 AM - edited 07-27-2012 09:49 AM
07-27-2012 09:48 AM - edited 07-27-2012 09:49 AM
Re: EMS output (disk and PDT errors)
Thanks for your reply.
Is there a way to check which DIMM is causing this failure?
which command can i use to reproduce the same output of EMS on the screen?
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2012 10:13 AM
07-27-2012 10:13 AM
Re: EMS output (disk and PDT errors)
Yup! Like Dennis said. EMS is seeing a bad disk, ioscan is seeing that the disk is connected to the BUS. Being CLAIMED does not mean it works, just that it's controller is talking to the bus. Read the EMS errors closer, they are saying a media error, not a missing disk device.
Regarding the DIMM: Again, the EMS message is "excessive correctable single bit errors". That is a DIMM related error. They refer you to a single hardware memory slot. They want you to call a service tech. I would follow their advise. My past experience is that you need to replace a DIMM.
The dd test can test you something about if the disk can be read, and is not bad approach. But I would always start with a "vgdisplay -v" so I can see where the disk is being used, and if I have lost volume groups or mirrors.
Do the vgdisplay and lets see what the impact is.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-03-2012 08:05 AM
08-03-2012 08:05 AM
Re: EMS output (disk and PDT errors)
Many Thanks for your precious help