1831600 Members
2571 Online
110027 Solutions
New Discussion

Re: disk issue

 
SOLVED
Go to solution

disk issue

I dont have much experience with HP UX.
I know I have a bad disk in the disk array. I keep getting the following message in the syslog.log file. From the SAM display i also see the following message:


***********************************************

SAM:
1/12/0/0.8.0 1 Unused --- 0 SEAGATE.

SYSLOG:

Jul 18 13:58:35 ux01pwow EMS [3045]: ------ EMS Event Notification ------ Value: "SERIOUS (4)" for Resource: "/storage/event
s/disks/default/1_12_0_0.8.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmo
n/bin/resdata -R 181731421 -r /storage/events/disks/default/1_12_0_0.8.0 -n 199557146 -a
Jul 19 14:00:36 ux01pwow EMS [3045]: ------ EMS Event Notification ------ Value: "SERIOUS (4)" for Resource: "/storage/event
s/disks/default/1_12_0_0.8.0" (Threshold: >= " 3") Execute the following command to obtain event details: /opt/resmo
n/bin/resdata -R 181731421 -r /storage/events/disks/default/1_12_0_0.8.0 -n 199557147 -a
************************************************

How do I physically identify the bad disk? Tehre are no lights on the storage array (SC10 Sure Store E with LVD 18.2 disks)



The vgdisplay on sybdg volume gives the following output.

not correspond to physical volume attached to
this volume group
vgdisplay: Warning: couldn't query all of the physical volumes.
vgdisplay: Warning: couldn't query physical volume "/dev/dsk/c10t8d0":
The specified path does not correspond to physical volume attached to
this volume group
vgdisplay: Warning: couldn't query all of the physical volumes.
--- Volume groups ---
VG Name /dev/sybdg
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 43
Open LV 43
Max PV 16
Cur PV 10
Act PV 9
Max PE per PV 4342
VGDA 18
PE Size (Mbytes) 4
Total PE 39060
Alloc PE 36825
Free PE 2235
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/sybdg/rlog01
LV Status available/stale
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 1

LV Name /dev/sybdg/rlog02
LV Status available/stale
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 1

LV Name /dev/sybdg/rdata01
LV Status available/stale
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 1

LV Name /dev/sybdg/rdata02
LV Status available/stale
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 1

..........................................
,...................................
info deleted...
......................
LV Name /dev/sybdg/rlog08
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2

LV Name /dev/sybdg/rdata19
LV Status vgdisplay: Warning: couldn't query physical volume "/dev/dsk/c10t8d0":
The specified path does not correspond to physical volume attached to
this volume group
vgdisplay: Warning: couldn't query all of the physical volumes.
vgdisplay: Warning: couldn't query physical volume "/dev/dsk/c10t8d0":
The specified path does not correspond to physical volume attached to
this volume group
.
available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 3

LV Name /dev/sybdg/rdata20
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2

LV Name /dev/sybdg/rdata21
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2

LV Name /dev/sybdg/rdata22
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2

LV Name /dev/sybdg/rdata23
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2

LV Name /dev/sybdg/rdata24
LV Status available/syncd
LV Size (Mbytes) 2000
Current LE 500
Allocated PE 1000
Used PV 2



Thanks for all your help.

--irfan
15 REPLIES 15
Michael Tully
Honored Contributor
Solution

Re: disk issue

Well we can safely say that the LUN affected is "/dev/dsk/c10t8d0"

What is the output is you run
/opt/resmon/bin/resdata -R 181731421 -r /storage/events/disks/default/1_12_0_0.8.0 -n 199557147 -a
?

I don't know much about SC10, but is it set up as RAID5 ?

Looks as though you have a spare LUN as well.
Anyone for a Mutiny ?

Re: disk issue

Event Time..........: Mon Jun 28 13:18:00 2004
Severity............: SERIOUS
Monitor.............: disk_em
Event #.............: 100472
System..............: ux01pwow.newcorp.com

Summary:
Disk at hardware path 1/12/0/0.8.0 :


Description of Error:

Message in ll_msg (set: 15 msg: 0) did not exist in catalog.
Catalog type is MONITOR_INFO
Catalog version is A.01.00
Module name is disk_em
Message set number is 15
Message number is 0
Message size is 84
Message parameter 1:
Message parameter size = 24
Message parameter is a literal
Literal text is Test Unit Ready

Probable Cause / Recommended Action:

Message in ll_msg (set: 16 msg: 0) did not exist in catalog.
Catalog type is MONITOR_INFO
Catalog version is A.01.00
Module name is disk_em
Message set number is 16
Message number is 0
Message size is 60

Additional Event Data:
System IP Address...: 10.200.200.44
Event Id............: 0x40e052c800000000
Monitor Version.....: B.01.00
Event Class.........: I/O
Client Configuration File...........:
/var/stm/config/tools/monitor/default_disk_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800
OS Version......................: B.11.00
STM Version.....................: A.24.00
EMS Version.....................: A.03.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/scsi.htm#100472

v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v



Component Data:
Physical Device Path...: 1/12/0/0.8.0
Device Class...........: Disk
Inquiry Vendor ID......: SEAGATE
Inquiry Product ID.....:
Firmware Version.......: HP02
Serial Number..........:

Product/Device Identification Information:

Logger ID.........: disc30; sdisk
Product Identifier: Disk
Product Qualifier.: SEAGATE
SCSI Target ID....: 0x08
SCSI LUN..........: 0x00

Hardware Status: (not present in log record).

SCSI Sense Data:

Undecoded Sense Data:
0x0000: 70 00 02 00 00 00 00 0A 00 00 00 00 04 03 FF 00
0x0010: 00 00

SCSI Sense Data Fields:
Error Code : 0x70
Segment Number : 0x00
Bit Fields:
Filemark : 0
End-of-Medium : 0
Incorrect Length Indicator : 0
Sense Key : 0x02
Information Field Valid : FALSE
Information Field : 0x00000000
Additional Sense Length : 10
Command Specific : 0x00000000
Additional Sense Code : 0x04
Additional Sense Qualifier : 0x03
Field Replaceable Unit : 0xFF
Sense Key Specific Data Valid : FALSE
Sense Key Specific Data : 0x00 0x00 0x00

Sense Key 0x02, NOT READY, indicates that the logical unit addressed
cannot be accessed. Operator intervention may be required to correct
this condition.

The combination of Additional Sense Code and Sense Qualifier (0x0403)
indicates: Logical unit not ready, manual intervention required.





KapilRaj
Honored Contributor

Re: disk issue

i don't think SC10 can be configured as RAID-5 w/o a controller. As per my memory it is just like a HASS box where u can install disks and use them. May be u can identify them by doing a dd or so ...

for disk in c1t8d0 c1t9d0 ........
do
echo "Look at the array now light on a disk ?"
dd if = /dev/$disk of=/dev/null bs=
echo Did u see that keep a sticker as $disk
echo Press a key ;read
done

May be if tyhe disk is totally dead , for one disks you would find no lights .. so thats the one !

Regds,

Done
Nothing is impossible

Re: disk issue

sc10 is just an arry of JBOD's.

LVM is used on top to do RAID. I dont think there is an RAID5 though.

how did u make out that there was a spare?
thanks
Michael Tully
Honored Contributor

Re: disk issue

From the top of your first message:

SAM:
1/12/0/0.8.0 1 Unused --- 0 SEAGATE

You can check this by using the 'pvdisplay command' most likely on /dev/dsk/c10t8d1

# pvdisplay -v /dev/dsk/c10t8d1

There are various other ways to check, as well by using the output of 'ioscan -fnkC disk' and comparing to what is used in the list of LUN's in LVM:

# ioscan -fnkC disk
and checking them off against
# vgdisplay -v

This will tell what is available.

Also the additional message from running the command I gave you in my reply indicates, that the DISK is indeed dead.
Anyone for a Mutiny ?

Re: disk issue

thanks guys for your help.

by running dd i have been able to find out the bad disk on the array.

the question now is how to replace the disk and how to include it in lvm? Is this disk hot swappable? How can I check if the old disk was part of any mirroring?

From the output i send you (vgdisplay), it looks like it was part of mirroring.

thanks again

Michael Tully
Honored Contributor

Re: disk issue

You don't need to do anything when you replace the disk of a mirrored pair. LVM will resync it for you. I am not familiar with this model, so I am assuming it can be replaced as a hot-swap. The message being given, is that there are stale extents and the difference in the Cur PV=10 and Act PV=9 indicating that indeed this is a mirrored volume.
Anyone for a Mutiny ?
KapilRaj
Honored Contributor

Re: disk issue

You may replace the disk (hot-pluggable i suppose) and vgcfgrestore to include it in the LVM.

If it was part of mirroing a vgsync would start the mirroring.

You can find the lvols by lvdisplay -v and look for stale physical extents !

Regds,

Kaps
Nothing is impossible

Re: disk issue

thanks again for your input guys.

Isnt there a command which will show which disk is mirrored to which disk?

Can you guys clarify if i have to run the vgcfgrestore command or will the resyncing happen automatically? this is a production system and i dont have the luxary of trying it somewhere else.

sure enough lvdisplay shows a bunch of stale status.

thanks
KapilRaj
Honored Contributor

Re: disk issue

My responses :-

Isnt there a command which will show which disk is mirrored to which disk?

kaps:> lvdisplay -v is the command it would show on the top on which disk is used for the PE copy for every LE (Logical extent). Mirroring is not done "disk === > disk" but "LV == > LV".

Can you guys clarify if i have to run the vgcfgrestore command or will the resyncing happen automatically? this is a production system and i dont have the luxary of trying it somewhere else.

kaps:> Insert the disk in the slot and then run vgcfgrestore on to that. vgcfgrestore will kickin the vgsync command automatically.

Regds,

Kaps
Nothing is impossible

Re: disk issue

thanks guys..

i will try it and let you know how it goes.

thanks once again.

--irfan

Re: disk issue

the disks in sc10 are hot swappable...
right?
Ted Buis
Honored Contributor

Re: disk issue

The SC10 was also used as disk enclosure used with the FC60, so make sure that there isn't one of them, which is a RAID controller pair.
Mom 6

Re: disk issue

the new hard disk is in and can be seen by ioscan.

vgcfgrestore -n /dev/sybdg -l
Volume Group Configuration information in "/etc/lvmconf/sybdg.conf"
VG Name /dev/sybdg
---- Physical volumes : 10 ----
/dev/rdsk/c9t8d0 (Non-bootable)
/dev/rdsk/c9t9d0 (Non-bootable)
/dev/rdsk/c9t10d0 (Non-bootable)
/dev/rdsk/c9t11d0 (Non-bootable)
/dev/rdsk/c9t12d0 (Non-bootable)
/dev/rdsk/c10t8d0 (Non-bootable)
/dev/rdsk/c10t9d0 (Non-bootable)
/dev/rdsk/c10t10d0 (Non-bootable)
/dev/rdsk/c10t11d0 (Non-bootable)
/dev/rdsk/c10t12d0 (Non-bootable)

c10t8d0 was the bad disk.

Just confirming:
the command that i run now is

vgcfgrestore -n /dev/sybdg /dev/rdsk/c10t8d0

and the thats ti.

right

thanks
--irfan

Re: disk issue

looks like system is working fine after running vgcfgrestore, vxchange and vgsync.

thanks for all your help