LVM and VxVM

Filesystem hang after removing disk

 
meno
Advisor

Filesystem hang after removing disk

Hello,

I have HP-UX/IA64 11.23 box. There is SCSI adapter A7143A and disks in MSA30. I tried to simulate disk failure. I have Oracle database and on one disk redologs and on second disk mirror redologs. When I remove second disk from MSA30, filesystem (with mirror redologs)hangs. Oracle database hangs too. It is not desirable because a have mirrored redo logs but database hangs until ANY (original or new) disk is inserted. If real disk failure occur then it will be the same scenario? Does filesystem hangs too?
I haven't had disk failure yet and I don't know if database will hang?
Thanks.

Marian
7 REPLIES 7

Re: Filesystem hang after removing disk

Marian,

What do you mean by mirror?

Do you mean you are using disk mirroring like Mirrordisk/UX?

Or do you mean you're creating multiple log group members for Oracle redo logs?

HTH

Duncan

I am an HPE Employee
Accept or Kudo
F Verschuren
Esteemed Contributor

Re: Filesystem hang after removing disk

it is all verry depending:
a msa30 more tetails: http://h18004.www1.hp.com/products/quickspecs/11738_na/11738_na.html
if you use a disk controller or RAID software (e.g. HP StorageWorks) for the msa30
if you pleased the msa30 directly to you system and atake out one disk the mirror wil make sure you can keep up working, after the secend disk it is verry depening on whitch disk...
if you have a Raid control it and you pull out the secend disk during the reconfiguration I do not know what will happen but I bed a msa storage box will not like this,

please post how you use the msa 30 and how you have configered it an whitch disks are pull out than a good advice can be given
Steven E. Protter
Exalted Contributor

Re: Filesystem hang after removing disk

Shalom Marian,

Clearly what you thought was your mirror layout was not what it was in fact.

lvdisplay -v on the logical volumes involved should have been consulted before the test.

To provide you a good answer, it would be necessary to know what disks are in what logical volumes and what the layout is.

If you have five disks in a raid 5 layout you should be able to survie the loss of a single disk.

However if you only have the minium of 3 disks in a raid 5 layout (we have a MSA-30 here) then you may not survie the loss of a single disk.

Post complete lvdisplay -v layouts on all involved disks and someone here may be able to predict the performance in a disk loss scenario. I suspect right now your raid layout provides no protection at all.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
skt_skt
Honored Contributor

Re: Filesystem hang after removing disk

have a look at this too . FYI

Abstract

The results when performing an immediate path switch using 'pvchange -s' while autoswitchback is enabled, can seem
unpredictable. When manually switching paths, autoswitchback should be disabled.

Background

LVM provides multipathing support for multiported devices. LVM's multipath support uses only one active path where the other path(s) are on standby. When the current path becomes unavailable, LVM will switch to the first available standby path. The paths are selected in the order they were configured.
The pvchange command provides several options for managing LVM behavior with paths:
"The -s (lower case s) option (immediate switch) requests that the specified path be used for all future I/O.
"The -S (upper case s) option (autoswitchback) when set (default value), tells LVM to automatically switch to the
best available path. When autoswitchback is disabled, LVM will still switch paths when the current path becomes unavailable, but it will not switch back to a higher priority
path that returns. The pvdisplay command will show the current autoswitchback setting.

Make note that the best available path is determined based on the order the paths were configured into the volume group (using vgcreate, vgextend, vgimport or vgscan). The order can be determined by using vgdisplay or pvdisplay and changed using vgreduce and vgextend.

When performing an immediate link switch (using pvchange -s), LVM's behavior will vary depending upon the autoswitchback (-S)
setting.
"When autoswitchback (-S) is disabled, LVM will switch paths as directed by pvchange -s (immediate switch) and continue to use this path until it becomes unavailable or another manual switch is performed.
"When autoswitchback (-S) is enabled, LVM will switch to the selected path as directed by pvchange -s. However to comply with the autoswitchback setting, LVM will switch back to the best available path.

Recent Improvements

With patch PHKL_34518(11.11)/ PHKL_34094(11.23), LVM improved its monitoring of path health. LVM now proactively monitors the health of all configured paths. If autoswitchback is enabled, LVM is far more responsive in returning to a better path when it becomes available.

With patches PHCO_35955 and PHKL_35970(11.11) / PHCO_35524 and PHKL_35965(11.23), proactive path monitoring can be configured
with pvchange -p. LVM's proactive polling is on by default and can be displayed using pvdisplay.

Conclusion

If you want to manually select a path using pvchange -s, autoswitchback should be disabled first.
meno
Advisor

Re: Filesystem hang after removing disk

I'll try to clarify my configuration:
- rx2600 box, A7173A SCSI controller, MSA30DB, two 73GB disks in MSA30
- no Mirrordisk/UX, no RAID controller
- one physical disk - volume group vg03, LV redoA, FS /oracle/redoA
- second physical disk - volume group vg04, LV redoB, FS /oracle/redoB
- oracle - 2 redo log groups, each group has two members, together 4 redologs
- when I pull out second disk, oracle hangs, command ls /oracle/redoB/* hangs too
- system, probable, is waiting for disk (how long will be wait - is there any timeout)
- but when I inserted new disk (instead of that one which I pulled out) to MSA, HP-UX system return I/O error and oracle go on (some ora- errors are in alert.log - this is expected)
- my question is: does system hang when real failure occur? Or system return I/O error and oracle can run on?

Marian
skt_skt
Honored Contributor

Re: Filesystem hang after removing disk

- one physical disk - volume group vg03, LV redoA, FS /oracle/redoA
- second physical disk - volume group vg04, LV redoB, FS /oracle/redoB


as per the above infomration(one disk each on each LV) there is no mirroring or reduntancy for the LV.So it is going to hang.
meno
Advisor

Re: Filesystem hang after removing disk

I resolved my problem.

I set up IO_timeout parameter for LV to 30 seconds.
After I pulled out disk from MSA30 after cca 1-2 min. filesystem was disabled and oracle runnig - end user see nothing about disk problem - (one redolog in each redo log group is active and second member is invalid).
This behaviour is that what I wanted.

Marian