Re: catch22; trapped by failed disk

Ralph Grothe · ‎02-26-2002

Hello,

a disk seems to have a defect.
The usual test read of some blocks from the raw device, say
dd if=/dev/rdsk/c2t9d0 of=/dev/null bs=1024k count=10
hangs so much that it even ignores any SIGKILLs.
Additionally, the kernel scsi driver spills syslog with messages of this kind:

Feb 26 12:04:49 ganymed vmunix: SCSI: Unexpected Disconnect -- lbolt: 102770286,
dev: bc029000, io_id: 201dab0

Unfortunately, the affected disk is the first mirror disk of the VG that acts as cluster lock disk of a two node SG cluster.

This results in the nice feature that every attempt to release the disk from the current LVM configuration (e.g. lvreduce with and without the -k option, vgremove etc.) hangs in the same way as the dd.
Of course the same is true for attempts to reconfigure the cluster's binary, as every cmapplyconf hangs too.

Even when I force the scsci bus to reset by pulling (and replugging) the hot swap disk each system command which communicates with the device hangs as well.

Looks to me like the hen and egg paradoxon.

Has anyone the break spell from this infinite loop?

Regards
Ralph

Madness, thy name is system administration

harry d brown jr · ‎02-26-2002

Ralph,

Have you tried replacing the bad disk with a good one?

live free or die
harry

Live Free or Die

A. Clay Stephenson · ‎02-26-2002

Hi Ralph:

I would pull the bad disk; replace it with a good one; and start the normal procedure.
1) vgcfgrestore 2) vgchange -a y 3) vgsync

This really should be no different from a completely failed disk. I never bother with the lvreduce/vgreduce operation.

If it ain't broke, I can fix that.

Sanjay_6 · ‎02-26-2002

hi Ralph,

Reboot the box and then replace the disk, restore the vg info to the disk.

Try this link,

http://docs.hp.com/cgi-bin/fsearch/framedisplay?top=/hpux/onlinedocs/B3936-90053/B3936-90053_top.html&con=/hpux/onlinedocs/B3936-90053/00/00/53-con.html&toc=/hpux/onlinedocs/B3936-90053/00/00/53-toc.html&searchterms=troubleshooting%20lock%20disk&queryid=20020226-110527

Look for "Replacing Disks" --> "Replacing a Lock Disk"

Hope this helps.

Regds

Krishna Prasad · ‎02-26-2002

I am not sure if the fact that it is a locked disk will make you reboot. However, I do know that under normal circumstances you do not need to reboot.

replace bad disk
vgcfgrestore /dev/vg00
vgsync

Positive Results requires Positive Thinking

Krishna Prasad · ‎02-26-2002

add in the vgchange -a y /dev/vg00 in the middle of my post

Positive Results requires Positive Thinking

Ralph Grothe · ‎02-27-2002

Thanks to everyone for their suggestions.

Unfortunately, replacing the disk with a new one isn't really an option in this case.

The two nodes consist of old D-class boxes which were considered to be scrapped down.

Because our planners and decision makers decided to introduce Tivoli so that they needed a test platform.

That was how I could "save" the two boxes, as we had a reasonable occupation for them.

But of course, I wouldn't get new hardware such as hard disk replacements.

Despite, I yesterday somehow managed to get the defect disk from the LVM and cluster configuration by drawing and replugging the hot swapable disk, thus initiating a scsi bus reset.
I tried this a couple of times, whereafter I always issued a diskinfo command.
When the disk finally reported its characteristics I was able to issue the lvreduce command which was quitted by a success message :-) and after that a system panic :-(
However, when the machine was up again I could confirm that the lvreduce must have been successful since in the lvdisplay command the defect disk didn't appear anymore.
After that the rest was easy, and my cluster runs now with a different lock disk.

Madness, thy name is system administration

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: catch22; trapped by failed disk

catch22; trapped by failed disk