HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
cancel
Showing results for 
Search instead for 
Did you mean: 

bad internal drive??

 
Jenni Wolgast
Regular Advisor

bad internal drive??

I have a rp7420 running 11v1 that has had really slow response to OS commands the last few days... Glance shows disk util at 100% most of the time... Oracle and the web app running on the server seem to be running normally most of the time, a couple times they have seemed to slow down for a few minutes but they recover just fine. Connecting to the server the login prompt takes longer than usual to appear and commands like bdf are very noticeably slower. I have another HP-UX rp7400 on the same network without issues and no other network issues so I've ruled that out. Oracle and the web app mostly run on storage on an EVA4400 so I think that is why that performance has not been affected so much but they do have log and other misc files in var and opt which might be why they have transient issues sometimes?

I've been watching syslog for any clues and finally had the info below pop up. Could this mean a struggling internal disk? I have 2 internal disks, are they probably mirrored or probably not mirrored? If one of the drives does need to be replaced, what impact would that have on the system? I do have an active HP support agreement but I'm trying to get an idea what I should be prepared for when I call. I already made an ignite tape, what else should I do before I call?

Here is the entry from syslog and the results of the command it lists:

Jan 28 13:08:09 PRODUX EMS [3949]: ------ EMS Event Notification ------ Value:
"CRITICAL (5)" for Resource: "/storage/events/disks/default/1_0_0_3_0.6.0"
(Threshold: >= " 3") Execute the following command to obtain event details:
/opt/resmon/bin/resdata -R 258801666 -r /storage/events/disks/default/1_0_0_3_
0.6.0 -n 258801665 -a


CURRENT MONITOR DATA:

Event Time..........: Fri Jan 28 13:08:09 2011
Severity............: CRITICAL
Monitor.............: disk_em
Event #.............: 3
System..............: PRODUX.HPM.local

Summary:
Disk at hardware path 1/0/0/3/0.6.0 : Drive is not responding.


Description of Error:

As part of the polling functionality, the monitor periodically requests
data from the device. The monitor's request of Test Unit Ready command
failed.

Probable Cause / Recommended Action:

The I/O request that the monitor made to this device failed because the
device timed-out. Check cables, power supply, ensure the drive is powered
ON, and if needed contact your HP support representative.

Additional Event Data:
System IP Address...: deleted Event Id............: 0x4d43060900000000
Monitor Version.....: B.01.01
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/rp7420
OS Version......................: B.11.11
STM Version.....................: A.47.00
EMS Version.....................: A.04.00
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/disk_em.htm#3

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 1/0/0/3/0.6.0
Device Class...........: Disk
Inquiry Vendor ID......: HP 73.4G
Inquiry Product ID.....: ST373453LC
Firmware Version.......: HPC5
Serial Number..........: 3HW2WH210000753292A7

Product/Device Identification Information:

Logger ID.........: disc30; sdisk
Product Identifier: Disk
Product Qualifier.: HP 73.4GST373453LC
SCSI Target ID....: 0x06
SCSI LUN..........: 0x00

SCSI Command Data Block: (not present in log record)

Hardware Status: (not present in log record).

SCSI Sense Data: (not present in log record)

27 REPLIES
Manix
Honored Contributor

Re: bad internal drive??

This is a HW error /alert & it`s a internal disk check the IOs with dd command.

run dd if=/dev/rdsk/cxtxdx of=/dev/null bs=1024
count=100

try increasing the count to a higher value if commands succeeds , other wise it fails with error.

Paste the dd output.

Thanks
Manix
HP-UX been always lovable - Mani Kalra
Bill Hassell
Honored Contributor

Re: bad internal drive??

You have a disk that is about ready to completely fail. This requires immediate attention since the system is retrying a lot but eventually it will fail. If the vg00 disks are not mirrored, start your Ignite backup and get immediate service scheduled.

It seems that you have EMS running but you haven't been getting the error messages by email. Check root's email -- it is probably huge with all the failure messages. Make sure all your systems have root's email aliased to your sysadmin email address so everyone will see the problems sooner.


Bill Hassell, sysadmin
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

I tried the dd command and it seemed to work just fine, it was quick to come back with the smaller counts and only took a few seconds at 10000, is there anything else I can do test it?


[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=100
100+0 records in
100+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=500
500+0 records in
500+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=1000
1000+0 records in
1000+0 records out
[PRODUX]:/home/root ->dd if=/dev/rdsk/c1t6d0 of=/dev/null bs=1024 count=10000
10000+0 records in
10000+0 records out
[PRODUX]:/home/root ->
Hakki Aydin Ucar
Honored Contributor

Re: bad internal drive??

you probably have problematic disc,
HP recommends; Check cables, power supply, ensure the drive is powered ON, and if needed replace the drive.

you can try this also:
# echo 2400?20X | adb /dev/dsk/cxtydz

to see if there is nonzero counts , except first two counts.
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Bill, how can I tell if the disks are mirrored or not?
Manix
Honored Contributor

Re: bad internal drive??

do vgdisplay -v vgname ( which has this disks )
and check no of disks used by lvols over there.

Then do lvdisplay -v lvolname | more to see if they are mapped over two disks.
HP-UX been always lovable - Mani Kalra
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

All cables are fine, no one had been near this server before the problems started. I walked around it to check for non-green lights etc and did not see anything out of the ordinary... Here is the output from the adb command

[PRODUX]:/home/root ->echo 2400?20X | adb /dev/dsk/c1t6d0
2400: 44454645 43543031 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Bijeesh
Respected Contributor

Re: bad internal drive??

Hi,
If this is your boot disk,you can check if it is mirrorred using.
#lvlnboot -v

Rgds
Bijeesh
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Can you tell if I am mirrored?

[PRODUX]:/home/root ->vgdisplay -v /dev/vg00
--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 7
Open LV 7
Max PV 16
Cur PV 2
Act PV 2
Max PE per PV 4384
VGDA 4
PE Size (Mbytes) 16
Total PE 8748
Alloc PE 7528
Free PE 1220
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 304
Current LE 19
Allocated PE 38
Used PV 2

LV Name /dev/vg00/lvol2
LV Status available/syncd
LV Size (Mbytes) 4096
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol3
LV Status available/syncd
LV Size (Mbytes) 512
Current LE 32
Allocated PE 64
Used PV 2

LV Name /dev/vg00/lvol4
LV Status available/syncd
LV Size (Mbytes) 30000
Current LE 1875
Allocated PE 3750
Used PV 2

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 12000
Current LE 750
Allocated PE 1500
Used PV 2

LV Name /dev/vg00/lvol7
LV Status available/syncd
LV Size (Mbytes) 4400
Current LE 275
Allocated PE 550
Used PV 2

LV Name /dev/vg00/lvol8
LV Status available/syncd
LV Size (Mbytes) 8912
Current LE 557
Allocated PE 1114
Used PV 2


--- Physical volumes ---
PV Name /dev/dsk/c1t6d0
PV Status available
Total PE 4374
Free PE 610
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c4t6d0
PV Status available
Total PE 4374
Free PE 610
Autoswitch On
Proactive Polling On

[PRODUX]:/home/root ->lvdisplay -v /dev/vg00/lvol1 | more
--- Logical volumes ---
LV Name /dev/vg00/lvol1
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 304
Current LE 19
Allocated PE 38
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c1t6d0 19 19
/dev/dsk/c4t6d0 19 19

--- Logical extents ---
LE PV1 PE1 Status 1 PV2 PE2 Status 2
00000 /dev/dsk/c1t6d0 00000 current /dev/dsk/c4t6d0 00000 current
00001 /dev/dsk/c1t6d0 00001 current /dev/dsk/c4t6d0 00001 current
00002 /dev/dsk/c1t6d0 00002 current /dev/dsk/c4t6d0 00002 current
00003 /dev/dsk/c1t6d0 00003 current /dev/dsk/c4t6d0 00003 current
00004 /dev/dsk/c1t6d0 00004 current /dev/dsk/c4t6d0 00004 current
00005 /dev/dsk/c1t6d0 00005 current /dev/dsk/c4t6d0 00005 current
00006 /dev/dsk/c1t6d0 00006 current /dev/dsk/c4t6d0 00006 current
00007 /dev/dsk/c1t6d0 00007 current /dev/dsk/c4t6d0 00007 current
00008 /dev/dsk/c1t6d0 00008 current /dev/dsk/c4t6d0 00008 current
00009 /dev/dsk/c1t6d0 00009 current /dev/dsk/c4t6d0 00009 current
00010 /dev/dsk/c1t6d0 00010 current /dev/dsk/c4t6d0 00010 current
00011 /dev/dsk/c1t6d0 00011 current /dev/dsk/c4t6d0 00011 current
00012 /dev/dsk/c1t6d0 00012 current /dev/dsk/c4t6d0 00012 current
00013 /dev/dsk/c1t6d0 00013 current /dev/dsk/c4t6d0 00013 current
00014 /dev/dsk/c1t6d0 00014 current /dev/dsk/c4t6d0 00014 current
00015 /dev/dsk/c1t6d0 00015 current /dev/dsk/c4t6d0 00015 current
00016 /dev/dsk/c1t6d0 00016 current /dev/dsk/c4t6d0 00016 current
00017 /dev/dsk/c1t6d0 00017 current /dev/dsk/c4t6d0 00017 current
00018 /dev/dsk/c1t6d0 00018 current /dev/dsk/c4t6d0 00018 current

[PRODUX]:/home/root ->
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

[PRODUX]:/home/root ->lvlnboot -v
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg03".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg04".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg05".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg07".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg01".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg02".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg06".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg08".
lvlnboot: Volume group not activated.
Cannot display volume group "/dev/vg09".
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c1t6d0 (1/0/0/3/0.6.0) -- Boot Disk
/dev/dsk/c4t6d0 (1/0/1/1/0/1/1.6.0) -- Boot Disk
Boot: lvol1 on: /dev/dsk/c1t6d0
/dev/dsk/c4t6d0
Root: lvol3 on: /dev/dsk/c1t6d0
/dev/dsk/c4t6d0
Swap: lvol2 on: /dev/dsk/c1t6d0
/dev/dsk/c4t6d0
Dump: lvol2 on: /dev/dsk/c1t6d0, 0

Current path "/dev/dsk/c17t0d1" is an alternate link, skip.
Current path "/dev/dsk/c13t0d1" is an alternate link, skip.
Current path "/dev/dsk/c11t0d1" is an alternate link, skip.
Current path "/dev/dsk/c15t0d2" is an alternate link, skip.
Current path "/dev/dsk/c15t0d3" is an alternate link, skip.
Current path "/dev/dsk/c13t0d2" is an alternate link, skip.
Current path "/dev/dsk/c13t0d3" is an alternate link, skip.
Current path "/dev/dsk/c11t0d2" is an alternate link, skip.
Current path "/dev/dsk/c11t0d3" is an alternate link, skip.
Current path "/dev/dsk/c11t0d4" is an alternate link, skip.
Current path "/dev/dsk/c11t0d5" is an alternate link, skip.
Current path "/dev/dsk/c11t1d0" is an alternate link, skip.
Current path "/dev/dsk/c17t0d4" is an alternate link, skip.
Current path "/dev/dsk/c17t0d5" is an alternate link, skip.
Current path "/dev/dsk/c17t1d0" is an alternate link, skip.
Current path "/dev/dsk/c15t1d0" is an alternate link, skip.
Current path "/dev/dsk/c15t0d4" is an alternate link, skip.
Current path "/dev/dsk/c15t0d5" is an alternate link, skip.
Current path "/dev/dsk/c13t0d6" is an alternate link, skip.
Current path "/dev/dsk/c13t0d7" is an alternate link, skip.
Current path "/dev/dsk/c15t0d6" is an alternate link, skip.
Current path "/dev/dsk/c15t0d7" is an alternate link, skip.
Current path "/dev/dsk/c17t0d6" is an alternate link, skip.
Current path "/dev/dsk/c17t0d7" is an alternate link, skip.
Current path "/dev/dsk/c11t1d1" is an alternate link, skip.
Current path "/dev/dsk/c17t1d1" is an alternate link, skip.
Current path "/dev/dsk/c15t1d1" is an alternate link, skip.
[PRODUX]:/home/root ->
Manix
Honored Contributor

Re: bad internal drive??

so these two are mirrors ..

/dev/dsk/c1t6d0 & /dev/dsk/c4t6d0

check with ioscan -fnH 1/0/0/3/0.6.0 ,that is shows the same disk.
HP-UX been always lovable - Mani Kalra
Manix
Honored Contributor

Re: bad internal drive??

adb looks good so as dd..& the "power lights" too ..i suspect this could be IO timeout.

Is the server still running slow as you posted
earlier ??

The I/O request that the monitor made to this device failed because the device timed-out<<

The default i/o timeout value is 30 seconds.
we may tune it up to a higher limit.

pvchange -t 90/180 /dev/dsk/cXtXdX
HP-UX been always lovable - Mani Kalra
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Ok, so if the syslog message has not appeared again and all the tests I've done against this disk have seemed to come back clean, is it really the problem? I've tried all the same commands against the second internal disk and it has passed everything as well...
Manix
Honored Contributor

Re: bad internal drive??

if the machine is still slow ,sar -d 5 5 for this disk is high , as you said that glance shows high disk utilization or there are more of such errors ,it`s good to take Bill`s advice & contact HP .Let them diagnose .
HP-UX been always lovable - Mani Kalra
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

I am still seeing 100% disk util in Glance and I am still having to wait quite a bit longer than usual for commands like bdf. Even running a man on adb was really slow. (Don't be offended, I like to check to see what I'm running before trying it on my production server ;) )
Manix
Honored Contributor

Re: bad internal drive??

sure !! take your time ..we are just to help --))
HP-UX been always lovable - Mani Kalra
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

Is there any way I can edit my original post? Didn't catch that it had my sys IP address in there until just now...

Re: bad internal drive??

>Is there any way I can edit my original post?

No, you can only make requests to the moderators to do that. Just reply in the following thread with your request and the URL of the thread to edit:
http://h30499.www3.hp.com/t5/Your-Questions-Regarding-ITRC/January-February-March-2011-Issues-Requiring-Moderator/m-p/5268800#M5401

 

>had my sys IP address in there until just now.

I can't ping it, so I'm not sure if you should worry about that?

Torsten.
Acclaimed Contributor

Re: bad internal drive??

Don't worry, the 10.0.0... Ip addresses are within a private network, always everyone is using the same and nobody can connect to you from outside your network ...

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

I know the address isn't externally available but I'm just paranoid ;) Thanks for the info!
Torsten.
Acclaimed Contributor

Re: bad internal drive??

IMHO the address is not a problem here.


I'm using 192.168.1.13 - you cannot reach me ... same for your address :-)


However, only the mods can edit the posts, ask them here

http://h30499.www3.hp.com/t5/Community-Feedback-Suggestions/bd-p/community-feedback-suggestions


then select the current thread like this one:

January/February/March 2011 Issues Requiring Moderator Intervention

http://h30499.www3.hp.com/t5/Your-Questions-Regarding-ITRC/January-February-March-2011-Issues-Requiring-Moderator/m-p/5268800#M5401


Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Jenni Wolgast
Regular Advisor

Re: bad internal drive??

I'm not so much worried that ITRC users can see that info but all the posts here pop up on Google too... Just don't like the idea of publically announcing the host name, IP address, model, OS, etc info of my production server... I don't really think it would cause a problem but I would just feel better if I made some attempt to remove it :)
Torsten.
Acclaimed Contributor

Re: bad internal drive??

Well, post to the thread I told you.

The server model and OS is always an important information related to a question.



BTW, did you run the suggested command?

# /opt/resmon/bin/resdata -R 258801666 -r /storage/events/disks/default/1_0_0_3_
0.6.0 -n 258801665 -a






And by the way, your diags software is from 2004!


Update!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
melvyn burnard
Honored Contributor

Re: bad internal drive??

IP address deleted
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!