cancel
Showing results for 
Search instead for 
Did you mean: 

Weird EMS output

meekrob
Super Advisor

Weird EMS output

Good day Gurus,

 

following an accidental power outage imhaving the below EMS outputs on rp5470 servers, any suggestion is much appreciated:

 

on server1:

 

>------------ Event Monitoring Service Event Notification ------------<
 
Notification Time: Wed Jul 26 12:23:56 2012
 
server1 sent Event Monitor notification information:
 
/storage/events/disks/default/0_0_1_1.0.0 is >= 3.
Its current value is SERIOUS(4).
 
Event data from monitor:
Event Time..........: Wed Jul 26 12:23:56 2012
Severity............: SERIOUS
Monitor.............: disk_em
Event #.............: 100038              
System..............: server1.org.com
 
Summary:
     Disk at hardware path 0/0/1/1.0.0 : Media failure
Description of Error:
     The format of the medium in the device is corrupt. The medium is unusable.
Probable Cause / Recommended Action:
 
     A format operation in progress on the device may have been interrupted.
     Restart the formatting process.
 
     Alternatively, the medium in the device is flawed. If the medium is
     removable, replace the medium.
 
     Alternatively, if the medium is not removable, the device has experienced
     a hardware failure. Contact your HP support representative to have the
     device checked.
 
Additional Event Data:
     System IP Address...: 192.168.50.1
     Event Id............: 0x500fd74c00000000
     Monitor Version.....: B.01.01
     Event Class.........: I/O
     Client Configuration File...........:
     /var/stm/config/tools/monitor/default_disk_em.clcfg
     Client Configuration File Version...: A.01.00
          Qualification criteria met.
               Number of events..: 1
     Associated OS error log entry id(s):
          None
     Additional System Data:
          System Model Number.............: 9000/800/L3000-5x
          OS Version......................: B.11.11
          STM Version.....................: A.49.00
          EMS Version.....................: A.04.20
     Latest information on this event:
          http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100038
 
v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v
 
Component Data:
     Physical Device Path...: 0/0/1/1.0.0
     Device Class...........: Disk
     Inquiry Vendor ID......: HP 146 G
     Inquiry Product ID.....: ST3146807LC     
     Firmware Version.......: HPC3
     Serial Number..........: 3HY0NQ0P00007337LAV5
 
Product/Device Identification Information:
 
     Logger ID.........: disc30; sdisk
     Product Identifier: Disk
     Product Qualifier.: HP 146 GST3146807LC     
     SCSI Target ID....: 0x00
     SCSI LUN..........: 0x00
 
SCSI Command Data Block:
     Command Data Block Contents:
          0x0000: 25 00 00 00   00 00 00 00   00 00

     Command Data Block Fields (10-byte fmt):
          Command Operation Code...(0x25)..: READ CAPACITY
          Logical Unit Number..............: 0
          Relative Address Bit.............: 0
          Partial Medium Indicator Bit.....: 0
          Logical Block Address............: 0 (0x00000000)
 
Hardware Status:  (not present in log record).
    
SCSI Sense Data:
     Undecoded Sense Data:
          0x0000: 70 00 03 00   00 00 00 0A   00 00 00 00   31 00 05 00
          0x0010: 00 00
     
     SCSI Sense Data Fields:
          Error Code                      : 0x70
          Segment Number                  : 0x00
          Bit Fields:      
               Filemark                   : 0
               End-of-Medium              : 0
               Incorrect Length Indicator : 0
          Sense Key                       : 0x03
          Information Field Valid         : FALSE               
          Information Field               : 0x00000000
          Additional Sense Length         : 10
          Command Specific                : 0x00000000
          Additional Sense Code           : 0x31
          Additional Sense Qualifier      : 0x00
          Field Replaceable Unit          : 0x05
          Sense Key Specific Data Valid   : FALSE               
          Sense Key Specific Data         : 0x00 0x00 0x00
                       
          Sense Key 0x03, MEDIUM ERROR, indicates that the command terminated
          with a nonrecovered error condition that was probably caused by a
          flaw in the medium or an error in the recorded data.  This sense key
          may also be returned if the device is unable to distinguish between a
          flaw in the medium and a specific hardware failure (sense key 0x04).
                       
          The combination of Additional Sense Code and Sense Qualifier (0x3100)
          indicates: Medium format corrupted.

>---------- End Event Monitoring Service Event Notification ----------<

however when i issue ioscan, everything seems to be normal:

 
root> ioscan   -fnC  disk
Class     I  H/W Path     Driver S/W State   H/W Type     Description
=====================================================================
disk      2  0/0/1/1.0.0  sdisk CLAIMED     DEVICE       HP 146 GST3146807LC
                         /dev/dsk/c1t0d0   /dev/rdsk/c1t0d0
disk      1  0/0/1/1.2.0  sdisk CLAIMED     DEVICE       COMPAQ  BD1468A4B5
                         /dev/dsk/c1t2d0   /dev/rdsk/c1t2d0
disk      3  0/0/2/0.0.0  sdisk CLAIMED     DEVICE       HP 146 GST3146807LC
                         /dev/dsk/c2t0d0   /dev/rdsk/c2t0d0
disk      4  0/0/2/0.2.0  sdisk CLAIMED     DEVICE       COMPAQ  BD14685A26
                         /dev/dsk/c2t2d0   /dev/rdsk/c2t2d0

 

 

on server 2:

 

>------------ Event Monitoring Service Event Notification ------------<

Notification Time: Tue Jul 26 19:10:40 2012

server2 sent Event Monitor notification information:

/system/events/memory/192 is >= 3.

Its current value is CRITICAL(5).

 

Event data from monitor:

Event Time..........: Tue Jul 26 19:10:39 2012

Severity............: CRITICAL

Monitor.............: dm_memory

Event #.............: 1400               

System..............: server2.org.com

Summary:

     Memory Event Type : A memory page has been deallocated and entered into

     the Page Deallocation Table (PDT).

 

Description of Error:

     The Page Deallocation Table (PDT) is 100% full.

          PDT Entries Used: 50

          PDT Entries Free: 0

          PDT Total Size: 50

       A large number of memory pages have been deallocated due to excessive

       correctable single bit errors being detected. Since the PDT is 100%

       full, no more entries can be added to it.

Probable Cause / Recommended Action:

The Page Deallocation Table (PDT) is full, it is strongly advisable to monitor

the situation. Although the errors are being corrected,

this condition indicates a potential problem.

Contact your HP support representative to check the memory boards.

 

Additional Event Data:

     System IP Address...: 192.168.50.2

     Event Id............: 0x500ee52000000000

     Monitor Version.....: B.01.00

     Event Class.........: I/O

     Client Configuration File...........:

     /var/stm/config/tools/monitor/default_dm_memory.clcfg

     Client Configuration File Version...: A.01.00

          Qualification criteria met.

               Value received met: value(100) = 100

     Associated OS error log entry id(s):

          None

     Additional System Data:

          System Model Number.............: 9000/800/L3000-5x

          EMS Version.....................: A.04.20

          STM Version.....................: A.49.00

     Latest information on this event:

          http://docs.hp.com/hpux/content/hardware/ems/dm_memory.htm#1400

v-v-v-v-v-v-v-v-v-v-v-v-v    D  E  T  A  I  L  S    v-v-v-v-v-v-v-v-v-v-v-v-v

 

Component Data:

     Physical Device Path....: 192

     Tag 2...................: 20

 

Thanks in advance

6 REPLIES
Ken Grabowski
Respected Contributor

Re: Weird EMS output

Well!  Looks like you lost a disk on system 1 and a memory DIMM on the other system.  I would guess the disk is one of your mirrored OS disks.  Give us a vgdisplay -v so we can see where it's at and if the LV's are still synced.  These are older systems, and once in a great while, I have seen a disk recover by re-seating it. But bring the system down and power off before trying that.

 

The memory DIMM is going to require a call to support to get a replacement. The disk too, if re-seating it doesn't bring it back.

meekrob
Super Advisor

Re: Weird EMS output

Hello and thanks foryour reply, however, is it a DIMM issue or as mentioned by the output message a memory board? in addition and as it relates to the HDD at hardware path 0/0/1/1.0.0 : Media failure (referring to EMS message) how can it be that this specific disk is not showing any inconsistencies while issuing ioscan as it is showing a CLAIMED status?

 

ioscan   -fnC  disk
Class     I  H/W Path     Driver S/W State   H/W Type     Description
=====================================================================
disk      2  0/0/1/1.0.0  sdisk CLAIMED     DEVICE       HP 146 GST3146807LC
                         /dev/dsk/c1t0d0   /dev/rdsk/c1t0d0

 

Could it be that this power outage affect EMS so that it is showing inconsistent error messages?

How should i proceed?

 

Thanks in advance

Dennis Handly
Acclaimed Contributor

Re: EMS output (disk and PDT errors)

>is it a DIMM issue or as mentioned by the output message a memory board?

 

It's most likely a DIMM issue since there are over 50 of them over the years.

>how can it be that this specific disk is not showing any inconsistencies while issuing ioscan as it is showing a CLAIMED status?
Left hand right hand?  EMS doesn't talk to ioscan.  ;-)
ioscan can connect to the disk, it doesn't check that it can do I/O to every block.

 

>Could it be that this power outage affect EMS so that it is showing inconsistent error messages?

>How should I proceed?

 

I wouldn't think so.

You could use dd(1) on the raw disk to read every block.

You could also unmount the filesystem and do a fsck on it.

meekrob
Super Advisor

Re: EMS output (disk and PDT errors)

Thanks for your reply.

Is there a way to check which DIMM is causing this failure?

which command can i use to reproduce the same output of EMS on the screen?

 

Thanks in advance

Ken Grabowski
Respected Contributor

Re: EMS output (disk and PDT errors)

Yup! Like Dennis said. EMS is seeing a bad disk, ioscan is seeing that the disk is connected to the BUS.  Being CLAIMED does not mean it works, just that it's controller is talking to the bus. Read the EMS errors closer, they are saying a media error, not a missing disk device.

 

Regarding the DIMM: Again, the EMS message is "excessive correctable single bit errors". That is a DIMM related error. They refer you to a single hardware memory slot.  They want you to call a service tech. I would follow their advise. My past experience is that you need to replace a DIMM.  

 

The dd test can test you something about if the disk can be read, and is not bad approach. But I would always start with a "vgdisplay -v" so I can see where the disk is being used, and if I have lost volume groups or mirrors. 

 

Do the vgdisplay and lets see what the impact is.

meekrob
Super Advisor

Re: EMS output (disk and PDT errors)

Many Thanks for your precious help