1833019 Members
2176 Online
110049 Solutions
New Discussion

I/O error for lvol

 
Atul Goel
Frequent Advisor

I/O error for lvol

We have a applicaton deployed on Rx2660 itanium server with HPUX 11.23. This application use service guard for cluster mangment and also has oracle RAC
We have a lvol mounted on external storage(MSA30) and is mirrored on another disk.
We took out one of the disk for this lvol and then breaked the mirroring sing "lvreduce -m 0 -k /dev/vg01/ key". Everything was fine till the system was rebooted. After system reboot lvdisplay showed mirror copies one and the for the current physical extent ??? and for the disk partition available stale state was shown
FSCK checl for lvol failes with "vxfs fsck: read of super-block on failed: I/O error" following error.
I have mirrored copy of this as it is. Is there any way to recover the current disk without using mirrored disk preferably
7 REPLIES 7
Atul Goel
Frequent Advisor

Re: I/O error for lvol

Some of the errors in syslogs are as follow
(I don't know whether they are for the volume gropu which i reffreed in my post or not. But following volume group related errors are there in syslog)
###########################################
Dec 16 20:20:28 lcsnew10 vmunix: LVM: VG 64 0x020000: Lost quorum.
Dec 16 20:20:28 lcsnew10 vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become availa
ble:
Dec 16 20:20:28 lcsnew10 vmunix: <31 0x013002>
Dec 16 20:20:28 lcsnew10 vmunix: LVM: VG 64 0x020000: PVLink 31 0x013002 Failed! The PV is not accessible.
Dec 16 20:20:28 lcsnew10 EMS [5524]: ------ EMS Event Notification ------ Value: "CRITICAL (5)" for Resource: "/storage/events/disks/default/0_4_1_0.3.0" (Threshold: >= " 3") Execut
e the following command to obtain event details: /opt/resmon/bin/resdata -R 362020876 -r /storage/events/disks/default/0_4_1_0.3.0 -n 362020865 -a
Dec 16 20:24:47 lcsnew10 vmunix: LVM: WARNING: VG 64 0x020000: LV 4: Some I/O requests to this LV are waiting
Dec 16 20:24:47 lcsnew10 vmunix: indefinitely for an unavailable PV. These requests will be queued until
Dec 16 20:24:47 lcsnew10 vmunix: the PV becomes available (or a timeout is specified for the LV).
#####################################

Atul Goel
Frequent Advisor

Re: I/O error for lvol

Following is the output of problematic lvol.
The state of /dev/dsk/c1t3d0s2 is shown stale while it should be current
####################################################
LV Name /dev/vg01/softwarelvol
VG Name /dev/vg01
LV Permission read/write
LV Status available/stale
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 19456
Current LE 4864
Allocated PE 9728
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c1t3d0s2 4864 4864

--- Logical extents ---
LE PV1 PE1 Status 1 PV2 PE2 Status 2
00000 ??? 10240 current /dev/dsk/c1t3d0s2 10240 stale
00001 ??? 10241 current /dev/dsk/c1t3d0s2 10241 stale
00002 ??? 10242 current /dev/dsk/c1t3d0s2 10242 stale
00003 ??? 10243 current /dev/dsk/c1t3d0s2 10243 stale
00004 ??? 10244 current /dev/dsk/c1t3d0s2 10244 stale
00005 ??? 10245 current /dev/dsk/c1t3d0s2 10245 stale
00006 ??? 10246 current /dev/dsk/c1t3d0s2 10246 stale
00007 ??? 10247 current /dev/dsk/c1t3d0s2 10247 stale
00008 ??? 10248 current /dev/dsk/c1t3d0s2 10248 stale
00009 ??? 10249 current /dev/dsk/c1t3d0s2 10249 stale
00010 ??? 10250 current /dev/dsk/c1t3d0s2 10250 stale
00011 ??? 10251 current /dev/dsk/c1t3d0s2 10251 stale
00012 ??? 10252 current /dev/dsk/c1t3d0s2 10252 stale
00013 ??? 10253 current /dev/dsk/c1t3d0s2 10253 stale
00014 ??? 10254 current /dev/dsk/c1t3d0s2 10254 stale
00015 ??? 10255 current /dev/dsk/c1t3d0s2 102
####################################################

Re: I/O error for lvol

1) I'm not sure using disk slices on non-root disks is actually supported...

2) I can't see how your lvreduce can have worked... what "key" did you use for the lvreduce? Had you already checked that against lvdisplay -v â k /dev/vg01/softwarevol ?


HTH

Duncan

I am an HPE Employee
Accept or Kudo
Atul Goel
Frequent Advisor

Re: I/O error for lvol

We uses the command lvdisplay -v -k to find out the key
And usd that number to split the mirroring. We also checked after aplitting the mirroring that split is successful. But aftre reboot this kind of status was shown by lvdisply

Re: I/O error for lvol

I note this is part of a Serviceguard/RAC cluster - I'm presuming that this LV *isn't* one of the disks used for RAC itself (i.e. a SLVM shared volume), and is just used on one node at a time for some sort of filesystem yes?

Does the volume failover to another node in the cluster? Did you actually vgexport/vgimport on the other nodes in the cluster after getting rid of the disk? Is this activation on the *same* node as the lvreduce was run on, or another node?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
Atul Goel
Frequent Advisor

Re: I/O error for lvol

Yes you are right this is not the disk which is used by RAC itself.
But, Part of this disk has some oracle raw partitions

yes volume failover is there. But i am not confirmed. What configurations should i check to confirm it?

No i didnot do vgexport/import after getting rid of the disk


I guess i ran lvreduce on one node only
Atul Goel
Frequent Advisor

Re: I/O error for lvol

following are the steps for our application upgrade


Stop application
stoP Oracle RaC
cmhaltcl â v â f (halt the cluster)
vgchange â c n â S n â q n /dev/vg_rac (make the RAC volume group non cluster aware)
vgchange â c n â S n â q n /dev/vg01 (make the VG01 volume group non cluster aware)
take out the external disk only
vgchange â a y â q n /dev/vg_rac(Activate the RAC volum e group )
vgchange â a y â q n /dev/vg01(Activate the VG01 volum e group )
Use â lvdisplay -v -k /dev//â command on all logical volume to find the key number of the stale partition
Use â lvreduce -m 0 â k dev// â to split the mirroring
After step 10 is completed for all vg01 and vg_rac partitions use â vgreduce â f â to modify the volume group
copy the /etc/lvmtab file to some other place
perform vgscan â v
deactivate the vg01 and vgrac usin vgchange â a n command
perform step 4,5,7,8,11,12,13, 14 on secondary db machine
cmruncl â v
vgchange -c y -S y /dev/vg_rac
gchange -a s /dev/vg_rac
vgchange -c y -S y /dev/vg01
vgchange -a e /dev/vg01
cmmodpkg -e vg_activate_pkg
cmmodpkg -e nfsPkg
cmmodpkg -e vg_activate_pkg_remote
start Oracle rac)
take out primary vg00 disk from both the server
perform upgrade

can any body tell me what is wrong in this setps or they need any update or not