1752678 Members
5334 Online
108789 Solutions
New Discussion юеВ

Re: bad internal drive??

 
Jenni Wolgast
Regular Advisor

bad internal drive??

I have a rp7420 running 11v1 that has had really slow response to OS commands the last few days... Glance shows disk util at 100% most of the time... Oracle and the web app running on the server seem to be running normally most of the time, a couple times they have seemed to slow down for a few minutes but they recover just fine. Connecting to the server the login prompt takes longer than usual to appear and commands like bdf are very noticeably slower. I have another HP-UX rp7400 on the same network without issues and no other network issues so I've ruled that out. Oracle and the web app mostly run on storage on an EVA4400 so I think that is why that performance has not been affected so much but they do have log and other misc files in var and opt which might be why they have transient issues sometimes?

I've been watching syslog for any clues and finally had the info below pop up. Could this mean a struggling internal disk? I have 2 internal disks, are they probably mirrored or probably not mirrored? If one of the drives does need to be replaced, what impact would that have on the system? I do have an active HP support agreement but I'm trying to get an idea what I should be prepared for when I call. I already made an ignite tape, what else should I do before I call?

Here is the entry from syslog and the results of the command it lists:

Jan 28 13:08:09 PRODUX EMS [3949]: ------ EMS Event Notification ------ Value:
"CRITICAL (5)" for Resource: "/storage/events/disks/default/1_0_0_3_0.6.0"
(Threshold: >= " 3") Execute the following command to obtain event details:
/opt/resmon/bin/resdata -R 258801666 -r /storage/events/disks/default/1_0_0_3_
0.6.0 -n 258801665 -a


CURRENT MONITOR DATA:

Event Time..........: Fri Jan 28 13:08:09 2011
Severity............: CRITICAL
Monitor.............: disk_em
Event #.............: 3
System..............: PRODUX.HPM.local

Summary:
Disk at hardware path 1/0/0/3/0.6.0 : Drive is not responding.


Description of Error:

As part of the polling functionality, the monitor periodically requests
data from the device. The monitor's request of Test Unit Ready command
failed.

Probable Cause / Recommended Action:

The I/O request that the monitor made to this device failed because the
device timed-out. Check cables, power supply, ensure the drive is powered
ON, and if needed contact your HP support representative.

Additional Event Data:
System IP Address...: deleted Event Id............: 0x4d43060900000000
Monitor Version.....: B.01.01
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp7420
OS Version......................: B.11.11
STM Version.....................: A.47.00
EMS Version.....................: A.04.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/disk_em.htm#3

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 1/0/0/3/0.6.0
Device Class...........: Disk
Inquiry Vendor ID......: HP 73.4G
Inquiry Product ID.....: ST373453LC
Firmware Version.......: HPC5
Serial Number..........: 3HW2WH210000753292A7

Product/Device Identification Information:

Logger ID.........: disc30; sdisk
Product Identifier: Disk
Product Qualifier.: HP 73.4GST373453LC
SCSI Target ID....: 0x06
SCSI LUN..........: 0x00

SCSI Command Data Block: (not present in log record)

Hardware Status: (not present in log record).

SCSI Sense Data: (not present in log record)

27 REPLIES 27
Manix
Honored Contributor

Re: bad internal drive??

This is a HW error /alert & it`s a internal disk check the IOs with dd command.

run dd if=/dev/rdsk/cxtxdx of=/dev/null bs=1024
count=100

try increasing the count to a higher value if commands succeeds , other wise it fails with error.

Paste the dd output.

Thanks
Manix
HP-UX been always lovable - Mani Kalra
Bill Hassell
Honored Contributor

Re: bad internal drive??

You have a disk that is about ready to completely fail. This requires immediate attention since the system is retrying a lot but eventually it will fail. If the vg00 disks are not mirrored, start your Ignite backup and get immediate service scheduled.

It seems that you have EMS running but you haven't been getting the error messages by email. Check root's email -- it is probably huge with all the failure messages. Make sure all your systems have root's email aliased to your sysadmin email address so everyone will see the problems sooner.


Bill Hassell, sysadmin
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

I tried the dd command and it seemed to work just fine, it was quick to come back with the smaller counts and only took a few seconds at 10000, is there anything else I can do test it?


[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=100
100+0 records in
100+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=500
500+0 records in
500+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=1000
1000+0 records in
1000+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=10000
10000+0 records in
10000+0 records out
[PRODUX]:/home/root ->
Hakki Aydin Ucar
Honored Contributor

Re: bad internal drive??

you probably have problematic disc,
HP recommends; Check cables, power supply, ensure the drive is powered ON, and if needed replace the drive.

you can try this also:
# echo 2400?20X | adb /dev/dsk/cxtydz

to see if there is nonzero counts , except first two counts.
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Bill, how can I tell if the disks are mirrored or not?
Manix
Honored Contributor

Re: bad internal drive??

do vgdisplay -v vgname ( which has this disks )
and check no of disks used by lvols over there.

Then do lvdisplay -v lvolname | more to see if they are mapped over two disks.
HP-UX been always lovable - Mani Kalra
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

All cables are fine, no one had been near this server before the problems started. I walked around it to check for non-green lights etc and did not see anything out of the ordinary... Here is the output from the adb command

[PRODUX]:/home/root ->echo 2400?20X | adb /dev/dsk/c1t6d0
2400: 44454645 43543031 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Bijeesh
Respected Contributor

Re: bad internal drive??

Hi,
If this is your boot disk,you can check if it is mirrorred using.
#lvlnboot -v

Rgds
Bijeesh
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Can you tell if I am mirrored?

[PRODUX]:/home/root ->vgdisplay -v /dev/vg00
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 7
Open LV 7
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 4384
VGDA 4
PE Size (Mbytes) 16
Total PE 8748
Alloc PE 7528
Free PE 1220
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 304
Current LE 19
Allocated PE 38
Used PV 2

LV Name /dev/vg00/lvol2
LV Status available/syncd
LV Size (Mbytes) 4096
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol3
LV Status available/syncd
LV Size (Mbytes) 512
Current LE 32
Allocated PE 64
Used PV 2

LV Name /dev/vg00/lvol4
LV Status available/syncd
LV Size (Mbytes) 30000
Current LE 1875
Allocated PE 3750
Used PV 2

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 12000
Current LE 750
Allocated PE 1500
Used PV 2

LV Name /dev/vg00/lvol7
LV Status available/syncd
LV Size (Mbytes) 4400
Current LE 275
Allocated PE 550
Used PV 2

LV Name /dev/vg00/lvol8
LV Status available/syncd
LV Size (Mbytes) 8912
Current LE 557
Allocated PE 1114
Used PV 2


--- Physical volumes ---
PV Name /dev/dsk/c1t6d0
PV Status available
Total PE 4374
Free PE 610
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c4t6d0
PV Status available
Total PE 4374
Free PE 610
Autoswitch On
Proactive Polling On

[PRODUX]:/home/root ->lvdisplay -v /dev/vg00/lvol1 | more
--- Logical volumes ---
LV Name /dev/vg00/lvol1
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 304
Current LE 19
Allocated PE 38
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c1t6d0 19 19
/dev/dsk/c4t6d0 19 19

--- Logical extents ---
LE PV1 PE1 Status 1 PV2 PE2 Status 2
00000 /dev/dsk/c1t6d0 00000 current /dev/dsk/c4t6d0 00000 current
00001 /dev/dsk/c1t6d0 00001 current /dev/dsk/c4t6d0 00001 current
00002 /dev/dsk/c1t6d0 00002 current /dev/dsk/c4t6d0 00002 current
00003 /dev/dsk/c1t6d0 00003 current /dev/dsk/c4t6d0 00003 current
00004 /dev/dsk/c1t6d0 00004 current /dev/dsk/c4t6d0 00004 current
00005 /dev/dsk/c1t6d0 00005 current /dev/dsk/c4t6d0 00005 current
00006 /dev/dsk/c1t6d0 00006 current /dev/dsk/c4t6d0 00006 current
00007 /dev/dsk/c1t6d0 00007 current /dev/dsk/c4t6d0 00007 current
00008 /dev/dsk/c1t6d0 00008 current /dev/dsk/c4t6d0 00008 current
00009 /dev/dsk/c1t6d0 00009 current /dev/dsk/c4t6d0 00009 current
00010 /dev/dsk/c1t6d0 00010 current /dev/dsk/c4t6d0 00010 current
00011 /dev/dsk/c1t6d0 00011 current /dev/dsk/c4t6d0 00011 current
00012 /dev/dsk/c1t6d0 00012 current /dev/dsk/c4t6d0 00012 current
00013 /dev/dsk/c1t6d0 00013 current /dev/dsk/c4t6d0 00013 current
00014 /dev/dsk/c1t6d0 00014 current /dev/dsk/c4t6d0 00014 current
00015 /dev/dsk/c1t6d0 00015 current /dev/dsk/c4t6d0 00015 current
00016 /dev/dsk/c1t6d0 00016 current /dev/dsk/c4t6d0 00016 current
00017 /dev/dsk/c1t6d0 00017 current /dev/dsk/c4t6d0 00017 current
00018 /dev/dsk/c1t6d0 00018 current /dev/dsk/c4t6d0 00018 current

[PRODUX]:/home/root ->