Operating System - HP-UX
1836645 Members
1584 Online
110102 Solutions
New Discussion

system hang when more than half vg00's disk are offline ?

 
SOLVED
Go to solution
vaughan_1
Advisor

system hang when more than half vg00's disk are offline ?

My vg00 consists of 5 pvs, of which two are 35G local disk, the other three are disks from SAN. Mirror copy of logical volumes on the two local disks resides on the three SAN disks. Quorum of vg00 has been set off through 'vgchange -a y -q n /dev/vg00' and mirror status of all volums are syncd. But when I pull the fibre out, 3 SAN disks offline, system hang(for example, 'ls' command will not exit.)

I found these comments in HP-UX system administrator's guide: logical volume management:
For the volume group to remain fully operational, at least half the disks must remain present and available.

My questions are:
1. Is system hang normal when more than half disks are absent ? I thought it can choose a moderate way to warn me quorum missing like sending a mail to root ...
2. Can this strategy be turn off ??
22 REPLIES 22
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

Check if really all LVOLs are mirrored. Sometimes people think an unmirrored swap is enough ... what is wrong.

# lvdisplay -v /dev/vg00/lvol...

and take a look at mirror copies count.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

all lvols of vg00 have been mirrored including lvol2(swap).
each lvol of vg00 has one mirror copy on san disks.
There are 2 local disk and 3 san disk.
I have pull out both two local disk(major copy of lvols), system change to use mirror copy on san disk and response normally within <60s.
But in contrast, when I pull out the fibre(all 3 san disk offline), system will hang for ever ...unless I plug the fibre again.
VK2COT
Honored Contributor
Solution

Re: system hang when more than half vg00's disk are offline ?

Hello,

There are two types of quorum:

Activation quorum
Running quorum

a) Activation quorum applies when the VG is
activated, and requires at least 50% of the
disks that were in the VG at the end of its
last activation are present.

The important part is that it is based on
how many PVs were left in the VG at the end
of the activation. If you have 5 PVs in the
VG, and one of them fails, then the
requirement to reactivate is the three
remaining PVs.

Activation quorum can be over-ridden from the command line ("-q n" flag for
vgchange(1M)).

b) Running quorum defines what happens
when a PV fails in the activated VG,
and requires that 50% or more of the PVs
in the VG remain available at any time.

You can not override running quorum.

The "-q n" option to vgchange(1M) only
applies to activation quorum.

You can not drop to less than 50% of the active PVs in a VG in one step - that is a running quorum failure and can not be over-ridden!

Make sure the VG never loses more than
50% of its PVs in a single failure.

Cheers,

VK2COT
VK2COT - Dusan Baljevic
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

The system should continue to run.


I assume you have multiple pathes to the SAN disks. Are all configured? If there are multiple pathes, did you pull all cables? Are there other VGs on SAN only? Maybe they hang the system.

Consider to post some more configuration details. (ioscan -fn, strings /etc/lvmtab, vgdisplay -v)

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

I understand. I should try my best to avoid such terrible single point failure then...
Thank you guyes, especially for the awesome explanation about quorum from VK2COT.
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

I have never heard about this "running quorum". In a typical configuration if you have 2 mirrored disks and loose one of them, you have exactly 50%, but not *more* then 50%, hence a quorum problem! Same as the 2 disks vs. 3 disks situation.

Will a system hang if you loose 1 out of 2 disks? No, it should continue to run!

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

to Torsten,
Yes! The system is running but abnormally. The scene is that any command like 'ls' does not exit and login from ssh hangs after I input password. Though it can return to normal after the fibre is plugged in again, that abnormal state of system means totally system unavailable for me.
My HBA is a6795a, single port FC card with only those 3 disks in SAN. No multipath software installed.
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

to Torsten,
I lost all the 3 san disk while the local 2 disk are still connected on system. Is it the same as what you said 2 disks vs 3 disks situation?
But I still feel puzzled as you for that unbelievable state system got into, which is more than a warning to the administrator.
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

Let's take a closer look, please post (with cables connected):

# lvlnboot -v

# vgdisplay -v

# strings /etc/lvmtab

# ioscan -fn

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

The 3 san disks environment is not in my company now. Here I create an simple one with one local disk and 2 san disks. This one act the same when the cables is pulled out(I did it right now.)

bash-3.2# ls
.ICEauthority 2 dev inq.hpux64 stand
.TTauthority SD_CDROM etc lib t.sh
.Xauthority a feedback.tar loop.pl tmp
.bash_history banner getinfo lost+found tmp_mnt
.dt bin getinfo.txt mapfile usr
.dtprofile cma_dump.log head mnt var
.gpmhp-hpa500 collect home net vaughan.bak
.profile collect.tar infile opt vgwrite1.map
.sh_history core info1.txt rhead vgwrite1.out
.sw cp.txt inq.hpux1100 sbin
bash-3.2# ls
asdfadsf
sdfadsfdsaf

( system hung after I pulled the cable out)

bash-3.2# asdfadsf
bash: asdfadsf: command not found
bash-3.2# sdfadsfdsaf
bash: sdfadsfdsaf: command not found
bash-3.2# sdf

( system return to noraml after I plug in the cable)


bash-3.2# lvlnboot -v
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/dsk/c3t15d0 (0/0/2/1.15.0) -- Boot Disk
/dev/dsk/c40t0d0 (0/2/0/0.1.12.255.0.0.0) -- Boot Disk
/dev/dsk/c40t0d1 (0/2/0/0.1.12.255.0.0.1)
Boot: lvol1 on: /dev/dsk/c3t15d0
/dev/dsk/c40t0d0
Root: lvol3 on: /dev/dsk/c3t15d0
/dev/dsk/c40t0d0
Swap: lvol2 on: /dev/dsk/c3t15d0
/dev/dsk/c40t0d0
Dump: lvol2 on: /dev/dsk/c3t15d0, 0

--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 8
Open LV 10
Max PV 16
Cur PV 3
Act PV 3
Max PE per PV 4384
VGDA 6
PE Size (Mbytes) 8
Total PE 9876
Alloc PE 4164
Free PE 5712
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 112
Current LE 14
Allocated PE 28
Used PV 2

LV Name /dev/vg00/lvol2
LV Status available/syncd
LV Size (Mbytes) 256
Current LE 32
Allocated PE 64
Used PV 2

LV Name /dev/vg00/lvol3
LV Status available/syncd
LV Size (Mbytes) 144
Current LE 18
Allocated PE 36

LV Name /dev/vg00/lvol4
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol5
LV Status available/syncd
LV Size (Mbytes) 24
Current LE 3
Allocated PE 6
Used PV 2

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 11000
Current LE 1375
Allocated PE 2750
Used PV 2

LV Name /dev/vg00/lvol9dup
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol8
LV Status available/syncd
LV Size (Mbytes) 1024
Current LE 128
Allocated PE 256
Used PV 2

LV Name /dev/vg00/lvol7new
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol9
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Used PV 2


--- Physical volumes ---
PV Name /dev/dsk/c3t15d0
PV Status available
Total PE 4374
Free PE 2292
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c40t0d0
PV Status available
Total PE 3583
Free PE 2876
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c40t0d1
PV Status available
Total PE 1919
Free PE 544
Autoswitch On
Proactive Polling On

bash-3.2# strings /etc/lvmtab
/dev/vg00
/dev/dsk/c3t15d0
/dev/dsk/c40t0d0
/dev/dsk/c40t0d1

bash-3.2# ioscan -fn
class I H/W Path Driver S/W State H/W Type Description
============================================================================
root 0 root CLAIMED BUS_NEXUS
ioa 0 0 sba CLAIMED BUS_NEXUS System Bus Adapter (582)
ba 0 0/0 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
lan 0 0/0/0/0 btlan CLAIMED INTERFACE HP PCI 10/100Base-TX Core
/dev/diag/lan0 /dev/ether0 /dev/lan0
ext_bus 0 0/0/1/0 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide LVD
target 0 0/0/1/0.7 tgt CLAIMED DEVICE
ctl 0 0/0/1/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c0t7d0
ext_bus 1 0/0/1/1 c720 CLAIMED INTERFACE SCSI C896 Ultra Wide Single-Ended
target 1 0/0/1/1.7 tgt CLAIMED DEVICE
ctl 1 0/0/1/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c1t7d0
ext_bus 2 0/0/2/0 c720 CLAIMED INTERFACE SCSI C87x Fast Wide Single-Ended
target 2 0/0/2/0.7 tgt CLAIMED DEVICE
ctl 2 0/0/2/0.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c2t7d0
ext_bus 3 0/0/2/1 c720 CLAIMED INTERFACE SCSI C87x Ultra Wide Single-Ended
target 3 0/0/2/1.7 tgt CLAIMED DEVICE
ctl 3 0/0/2/1.7.0 sctl CLAIMED DEVICE Initiator
/dev/rscsi/c3t7d0
target 4 0/0/2/1.15 tgt CLAIMED DEVICE
disk 1 0/0/2/1.15.0 sdisk CLAIMED DEVICE SEAGATE ST336706LC
/dev/dsk/c3t15d0 /dev/rdsk/c3t15d0
tty 0 0/0/4/0 asio0 CLAIMED INTERFACE PCI Serial (103c1048)
/dev/GSPdiag1 /dev/mux0 /dev/tty0p1
/dev/diag/mux0 /dev/tty0p0 /dev/tty0p2
tty 1 0/0/5/0 asio0 CLAIMED INTERFACE PCI Serial (103c1048)
/dev/GSPdiag2 /dev/mux1
/dev/diag/mux1 /dev/tty1p1
ba 1 0/2 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
fc 0 0/2/0/0 td CLAIMED INTERFACE HP Tachyon XL2 Fibre Channel Mass Storage Adapter
/dev/td0
fcp 0 0/2/0/0.1 fcp CLAIMED INTERFACE FCP Domain
ext_bus 40 0/2/0/0.1.12.255.0 fcpdev CLAIMED INTERFACE FCP Device Interface
target 5 0/2/0/0.1.12.255.0.0 tgt CLAIMED DEVICE
disk 59 0/2/0/0.1.12.255.0.0.0 sdisk CLAIMED DEVICE ODYSYS UWS_DISK
/dev/dsk/c40t0d0 /dev/rdsk/c40t0d0
disk 60 0/2/0/0.1.12.255.0.0.1 sdisk CLAIMED DEVICE ODYSYS UWS_DISK
/dev/dsk/c40t0d1 /dev/rdsk/c40t0d1
ba 2 0/4 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
ba 3 0/6 lba CLAIMED BUS_NEXUS Local PCI Bus Adapter (782)
memory 0 8 memory CLAIMED MEMORY Memory
processor 0 160 processor CLAIMED PROCESSOR Processor
iscsi 0 255/0 iscsi CLAIMED VIRTBUS iSCSI Virtual Node

bash-3.2# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 147456 78729 64464 55% /
/dev/vg00/lvol1 111637 34689 65784 35% /stand
/dev/vg00/lvol8 1048576 920831 120115 88% /var
/dev/vg00/lvol7new 2097152 1632643 435537 79% /usr
/dev/vg00/lvol4 2097152 205733 1773802 10% /tmp
/dev/vg00/lvol6 11264000 8861454 2333328 79% /opt
/dev/vg00/lvol5 24576 9110 14561 38% /home

Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

Same number of used PEs on local disks and on SAN.

BTW, what kind of array is it?


To check if all mirrors are really on both, local and remote, consider to get a

# lvdisplay -v /dev/vg00/lvol1|head -n 20

from all lvols.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

yes, the same number of used pes while major data copy resides in /dev/dsk/c3t15d0(local) and one mirror copy in /dev/dsk/c40t0d0(SAN disk1) and /dev/dsk/c40t0d1(SAN disk2)(not striped).
SAN disks are provided by FC target server developed by my company.

--- Logical volumes ---
LV Name /dev/vg00/lvol1
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 112
Current LE 14
Allocated PE 28
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 14 14
/dev/dsk/c40t0d0 14 14

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol2
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery NONE
Schedule parallel
LV Size (Mbytes) 256
Current LE 32
Allocated PE 64
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 32 32
/dev/dsk/c40t0d0 32 32

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol3
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 144
Current LE 18
Allocated PE 36
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 18 18
/dev/dsk/c40t0d0 18 18

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol4
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 256 256
/dev/dsk/c40t0d0 256 256

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol5
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 24
Current LE 3
Allocated PE 6
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 3 3
/dev/dsk/c40t0d0 3 3

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol6
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 11000
Current LE 1375
Allocated PE 2750
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 1375 1375
/dev/dsk/c40t0d1 1375 1375

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol7new
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 256 256
/dev/dsk/c40t0d0 256 256

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol8
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 1024
Current LE 128
Allocated PE 256
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 128 128
/dev/dsk/c40t0d0 128 128

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol9
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 256 256
/dev/dsk/c40t0d0 256 256

--- Logical extents ---
--- Logical volumes ---
LV Name /dev/vg00/lvol9dup
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Stripes 0
Stripe Size (Kbytes) 0
Bad block on
Allocation strict/contiguous
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 256 256
/dev/dsk/c40t0d0 256 256

--- Logical extents ---

bash-3.2# ls -l /dev/vg00/* |grep '^b'
brw-r----- 1 root root 64 0x000001 Sep 10 21:48 /dev/vg00/lvol1
brw-rw-rw- 1 root root 64 0x000002 Jun 11 19:45 /dev/vg00/lvol2
brw-r----- 1 root root 64 0x000003 Jun 11 15:49 /dev/vg00/lvol3
brw-r----- 1 root root 64 0x000004 Jun 11 15:49 /dev/vg00/lvol4
brw-r----- 1 root root 64 0x000005 Jun 11 15:49 /dev/vg00/lvol5
brw-r----- 1 root root 64 0x000006 Jun 11 15:49 /dev/vg00/lvol6
brw-r----- 1 root sys 64 0x000009 Aug 21 17:23 /dev/vg00/lvol7new
brw-r----- 1 root root 64 0x000008 Jun 11 15:49 /dev/vg00/lvol8
brw-r----- 1 root root 64 0x000009 Aug 5 18:40 /dev/vg00/lvol9
brw-rw-rw- 1 root sys 64 0x000009 Sep 1 14:58 /dev/vg00/lvol9dup
VK2COT
Honored Contributor

Re: system hang when more than half vg00's disk are offline ?

Hello,

May I offer some clarification for
LVM running quorum as it seems to puzzle some.

VG and LVs should still be available
and responding even when running quorum is
below 50% of the disks in the VG.

The issue is that if less than 50% of the disks
in the volume group respond to the quorum check, any LVM configuration change cannot
proceed!

So, your ls(1M) command should not be hung.
You have some other issue and we need
to verify what it is.

Cheers,

VK2COT
VK2COT - Dusan Baljevic
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

I'm with you VK2COT.

The configuration looks good to me so far.

Some minor changes I would do: set all Consistency Recovery to MWC and disable always the bad block relocation. But this is IMHO not releated to the problem. Next I would check for LVM patches. Can you post a

# swlist

BTW, did you ever try to boot from this SAN device (without the local disks)?

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

BTW, did you ever try to boot from this SAN device (without the local disks)?
Yes! many times, all ok.

Now I change my san disks into those from mylex FC array and is mirroring. I will check if system hang will happen this time.

Just now I added the 2 san disk into vg00, without mirror on them, and then pull the cable out. The system act normally with only a warning that indicates quorum lost. So I suppose it is something related to mirror, not running quorum, that cause the system hang.

bash-3.2# swlist
# Initializing...
# Contacting target "hpa500"...
#
# Target: hpa500:/
#

#
# Bundle(s):
#

100BaseT-01 B.11.11.01 HP-PB 100BaseT;Supptd HW=A3495A;SW=J2759BA
ATM-00 K.11.11 PCI ATM;Supptd HW=A5483A/A5513A/A5515A/J3557A;SW=J3572AA/J3572BA
ATM-01 K.11.11 HSC ATM;Supptd HW=J2468A/J2469A/J2499A/J3420B/J3573A;SW=J2806CA
B5725AA B.3.0.502 HP-UX Installation Utilities (Ignite-UX)
B9788AA 1.3.1.01.00release Java 2 SDK 1.3 for HP-UX (700/800), PA1.1 + PA2.0 Add On
BUNDLE B.2009.08.31 Patch Bundle
CDE-English B.11.11 English CDE Environment
FDDI-00 B.11.11.01 PCI FDDI;Supptd HW=A3739A/A3739B;SW=J3626AA
FibrChanl-00 B.11.11.17 PCI/HSC FibreChannel;Supptd HW=A6684A,A6685A,A5158A,A6795A
GOLDAPPS11i B.11.11.0406.5 Gold Applications Patches for HP-UX 11i v1, June 2004
GOLDBASE11i B.11.11.0406.5 Gold Base Patches for HP-UX 11i v1, June 2004
GigEther-00 B.11.11.14 PCI/HSC GigEther;Supptd HW=A4926A/A4929A/A4924A/A4925A;SW=J1642AA
HPUX11i-OE-MC B.11.11 HP-UX Mission Critical Operating Environment Component
HPUXBase64 B.11.11 HP-UX 64-bit Base OS
HPUXBaseAux B.11.11 HP-UX Base OS Auxiliary
HyprFabrc-00 B.11.11.00 PCI/HSC HyperFabric; Supptd HW=A6092A/A4921A/A4920A/A4919A;SW=B6257AA
Ignite-UX-10-20 B.3.0.502 HP-UX Installation Utilities for Installing 10.20 Systems
Ignite-UX-11-00 B.3.0.502 HP-UX Installation Utilities for Installing 11.00 Systems
Ignite-UX-11-11 B.3.0.502 HP-UX Installation Utilities for Installing 11.11 Systems
J4258BA B.04.11 Netscape Directory Server v4 for HP-UX
J4274AA B.01.02.06 HP WebQoS Peak Packaged Edition
OnlineDiag B.11.11.00.04 HPUX 11.11 Support Tools Bundle
RAID-00 B.11.11.00 PCI RAID; Supptd HW=A5856A
SLP B.11.11.1.0.1.2 Service Location Protocol components
TermIO-00 B.11.11.01 PCI MUX; Supptd HW=J3592A/J35923A; SW=J3596A
iSCSI-00 B.11.11.03e HP-UX iSCSI Software Initiator
perl D.5.8.8.D Perl Programming Language
#
# Product(s) not contained in a Bundle:
#

LPFC B.04.21.05 Light Pulse Adapter Driver
PHCO_31314 1.0 cumulative SAM patch
PHCO_32181 1.0 ugm cumulative patch
PHCO_33215 1.0 libpam_unix cumulative patch
PHCO_33288 1.0 Device IDs, mount(1M) cumulative patch
PHCO_33533 1.0 libc cumulative patch
PHKL_24554 1.0 vPar enablement patch
PHKL_30398 1.0 KI FSS ID and KI_rfscall
PHKL_32002 1.0 physio thread performance degradation
PHKL_32005 1.0 thread suspend, DaS, panic, physio
PHKL_32668 1.0 vPar enablement,DLKM load panic
PHKL_33258 1.0 VxFS cumulative patch ;ml_flag race
PHKL_33270 1.0 Cumulative VM patch
PHKL_33363 1.0 vPars panic;Syscall cumulative;FSS;msem_lock
PHNE_32477 1.0 ONC/NFS General Release/Performance Patch
PHSS_30726 1.0 rp24xx 43.50 PDC Firmware Patch
PHSS_30966 1.0 ld(1) and linker tools cumulative patch
bash 3.2 bash
gettext 0.17 gettext
libiconv 1.12 libiconv
make 3.81 make
openssl 0.9.8k openssl
termcap 1.3.1 termcap
wget 1.11.4 wget
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

to Torsten,
To set Consistency Recovery of lovl2 to NONE which used as major swap is recommedated by ã HP-UX System Administrator's Guide: Logical Volume Managementã , and this setting only affects system booting process. So I don't think it matters.
Could you tell me how to disable the bad block relocating please? I forgot it...
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

Test done. The same system hang happened.
One local disk and two SAN disks from FC array through FC switch. All lvols in vg00 have been mirrored, major copy on local and mirror copy on san disks. All Consistency Recovery are MWC except lvol2 NONE. All bad block relocate turned off.
System hung as usual after the cable of array on FC switch pulled out, and responsed normally after cable plugged in.

some part of dmesg below:
Sep 18 17:03:41 hpa500 vmunix: DIAGNOSTIC SYSTEM WARNING:
Sep 18 17:03:41 hpa500 vmunix: The diagnostic logging facility is no longer receiving excessive
Sep 18 17:03:41 hpa500 vmunix: errors from the I/O subsystem. 82 I/O error entries were lost.
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: Lost quorum.
Sep 18 17:08:48 hpa500 vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:
Sep 18 17:08:48 hpa500 vmunix: <31 0x280100> <31 0x280200>
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280100 Failed! The PV is not accessible.
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280200 Failed! The PV is not accessible.
Sep 18 17:08:48 hpa500 vmunix:
Sep 18 17:08:48 hpa500 vmunix: SCSI: Read error -- dev: b 31 0x280100, errno: 126, resid: 1024,
Sep 18 17:08:48 hpa500 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 1024.
Sep 18 17:08:48 hpa500 vmunix: DIAGNOSTIC SYSTEM WARNING:
Sep 18 17:08:48 hpa500 vmunix: The diagnostic logging facility has started receiving excessive
Sep 18 17:08:48 hpa500 vmunix: errors from the I/O subsystem. I/O error entries will be lost
Sep 18 17:08:48 hpa500 vmunix: until the cause of the excessive I/O logging is corrected.
Sep 18 17:08:48 hpa500 vmunix: If the diaglogd daemon is not active, use the Daemon Startup command
Sep 18 17:08:48 hpa500 vmunix: in stm to start it.
Sep 18 17:08:48 hpa500 vmunix: If the diaglogd daemon is active, use the logtool utility in stm
Sep 18 17:08:48 hpa500 vmunix: to determine which I/O subsystem is logging excessive errors.
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: Reestablished quorum.
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280100 Recovered.
Sep 18 17:08:48 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280200 Recovered.

-----------
Is there any special patch on my system ?
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

Is it the logging facility that causes hang ?
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

The last quality patch bundle is from 2004, but there is an individual bundle (BUNDLE B.2009.08.31 Patch Bundle) - nit sure what it is.

I would install the latest patch bundles and online diags now.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
Torsten.
Acclaimed Contributor

Re: system hang when more than half vg00's disk are offline ?

Patch bundles:

http://www13.itrc.hp.com/service/patch/releasePage.do?BC=main|releaseIndexPage|&releaseId=0906-11.11

Diags:

https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=B6191AAE

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

thank you for your links ! I will continue this thread next week. Have a good week end !
vaughan_1
Advisor

Re: system hang when more than half vg00's disk are offline ?

I haven't installed those patches yet. But there is a strange phenomenon to let you know. It is when vg lost its (running) quorum(more than half pvs lost), write to lvol will hang. I am quite sure it is the cause of system hang for log such as "pv is not accessible" should be written to /var lvol. That hang causes other hangs and finally causes the system unresponsible.

I post my test below,

--- Volume groups ---
VG Name /dev/vg00
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 9
Open LV 11
Max PV 16
Cur PV 5
Act PV 5
Max PE per PV 4384
VGDA 10
PE Size (Mbytes) 8
Total PE 15828
Alloc PE 2594
Free PE 13234
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg00/lvol1
LV Status available/syncd
LV Size (Mbytes) 112
Current LE 14
Allocated PE 14
Used PV 1

LV Name /dev/vg00/lvol2
LV Status available/syncd
LV Size (Mbytes) 256
Current LE 32
Allocated PE 32
Used PV 1

LV Name /dev/vg00/lvol3
LV Status available/syncd
LV Size (Mbytes) 144
Current LE 18
Allocated PE 18
Used PV 1

LV Name /dev/vg00/lvol4
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 256
Used PV 1

LV Name /dev/vg00/lvol5
LV Status available/syncd
LV Size (Mbytes) 24
Current LE 3
Allocated PE 3
Used PV 1

LV Name /dev/vg00/lvol6
LV Status available/syncd
LV Size (Mbytes) 11000
Current LE 1375
Allocated PE 1375
Used PV 1

LV Name /dev/vg00/lvol9dup
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 256
Used PV 1

LV Name /dev/vg00/lvol8
LV Status available/syncd
LV Size (Mbytes) 1024
Current LE 128
Allocated PE 128
Used PV 1

LV Name /dev/vg00/lvol7new
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 256
Used PV 1

LV Name /dev/vg00/lvol7
LV Status available/stale
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Used PV 2

LV Name /dev/vg00/lvol9
LV Status available/syncd
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 256
Used PV 1


--- Physical volumes ---
PV Name /dev/dsk/c3t15d0
PV Status available
Total PE 4374
Free PE 2036
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c1t15d0
PV Status available
Total PE 4384
Free PE 4384
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c40t0d1
PV Status unavailable
Total PE 4384
Free PE 4128
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c40t0d0
PV Status unavailable
Total PE 127
Free PE 127
Autoswitch On
Proactive Polling On

PV Name /dev/dsk/c40t0d2
PV Status unavailable
Total PE 2559
Free PE 2559
Autoswitch On
Proactive Polling On

----------------------------
ash-3.2# bdf
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 147456 79008 64202 55% /
/dev/vg00/lvol1 111637 34689 65784 35% /stand
/dev/vg00/lvol8 1048576 924558 116612 89% /var
/dev/vg00/lvol7new 2097152 1632648 435532 79% /usr
/dev/vg00/lvol4 2097152 205735 1773800 10% /tmp
/dev/vg00/lvol6 11264000 9616120 1602240 86% /opt
/dev/vg00/lvol5 24576 9110 14561 38% /home
/dev/vg00/lvol7 2097152 26190 1941534 1% /mnt/1

-------------------------------------
bash-3.2# lvdisplay -v /dev/vg00/lvol7 |more
--- Logical volumes ---
LV Name /dev/vg00/lvol7
VG Name /dev/vg00
LV Permission read/write
LV Status available/syncd
Mirror copies 1
Consistency Recovery MWC
Schedule parallel
LV Size (Mbytes) 2048
Current LE 256
Allocated PE 512
Stripes 0
Stripe Size (Kbytes) 0
Bad block off
Allocation strict
IO Timeout (Seconds) default

--- Distribution of logical volume ---
PV Name LE on PV PE on PV
/dev/dsk/c3t15d0 256 256
/dev/dsk/c40t0d1 256 256

--- Logical extents ---

/dev/dsk/c40t0dN are san disk.
Only /dev/vg00/lvol7 is mirrored.
I pulled the cable out. syslog.log shows below,

Sep 22 17:55:02 hpa500 vmunix: LVM: VG 64 0x000000: Lost quorum.
Sep 22 17:55:02 hpa500 vmunix: This may block configuration changes and I/Os. In order to reestablish quorum at least 1 of the following PVs (represented by current link) must become available:
Sep 22 17:55:02 hpa500 vmunix: <31 0x280100> <31 0x280200> <31 0x280000>
Sep 22 17:55:02 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280100 Failed! The PV is not accessible.
Sep 22 17:55:02 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280200 Failed! The PV is not accessible.
Sep 22 17:55:02 hpa500 vmunix: LVM: VG 64 0x000000: PVLink 31 0x280000 Failed! The PV is not accessible.

And everything is fine.
Then I execute below,
bash-3.2# dd if=/dev/zero of=/mnt/1/3.8M_5pv bs=1024k count=8

this dd hang. 'bdf' hang. 'tail /var/adm/syslog/syslog.log' is fine.
After I plug in the cable , everything returns normal.