1833788 Members
2626 Online
110063 Solutions
New Discussion

Mirroring explanation

 
Chris Fadrowski
Super Advisor

Mirroring explanation

I currently have two drives mirrored (boot drives -vg00), I had a disk failure on the primary boot drive. when this occured, i could not get to the box via telnet, or console. I did have processes running such as backups etc... and then the box came back on line after about 6 hours or so apparently when some of these processes finished. However, i am concerned why the server was not responding with the primary being down but mirrored. I have lost boot drives before and the server has always remained up. Can anyone explain possible causes for this type of behavior?
5 REPLIES 5
BFA6
Respected Contributor

Re: Mirroring explanation

Hi Chris,

It's possible that the failed disk was causing the bus to reset repeatedly. If the backups were accessing data on the same bus, they too would seem to hang.

We had a similar problem recently where the boot disk failed & the box became unusable. No-one could log in, anyone logged in was thrown out. We eventually rebooted the box, it took ages to reboot, and kept hanging when trying to access anything. Once the failed drive was pulled out, there was no problem.

Hope this helps.

Regards,

Hilary
A. Clay Stephenson
Acclaimed Contributor

Re: Mirroring explanation

I've never had the problem you describe but I always mirror boot drives on separate SCSI buses. It might be possible to have a drive failure just bad enough to hang and cause bus resets very often so that things appear to stop. I'll bet that if you pulled the bad drive the problems would disappear. I do remember one thread in which a user experienced a similar problem and it turned out that both boot drives had failed. The first one failed and the second one failed soon thereafter. A low (but not zero) probability scenario.

Finally, are you absolutely certain that all LVOL's were mirrored including swap?
If it ain't broke, I can fix that.
Chris Fadrowski
Super Advisor

Re: Mirroring explanation

absolutely certain they are mirrored. The bus analogy makes sense, the two drives are however c2t6d0 and c1t6d0. it appears they where already on dif controllers.

here was lvlnboot before failure

/dev/dsk/c1t6d0 (0/0/2/0.6.0) -- Boot Disk
/dev/dsk/c2t6d0 (0/0/2/1.6.0) -- Boot Disk
Boot: lvol1 on: /dev/dsk/c1t6d0
/dev/dsk/c2t6d0
Root: lvol3 on: /dev/dsk/c1t6d0
/dev/dsk/c2t6d0
Swap: lvol2 on: /dev/dsk/c1t6d0
/dev/dsk/c2t6d0
Dump: lvol2 on: /dev/dsk/c1t6d0, 0
S.K. Chan
Honored Contributor

Re: Mirroring explanation

That is odd. Typically when a primary drive failed you will get a "momentarily" hang (matter of a few minutes) and that's it, the system will continue to run. LVM will show stale extents on the LVs sitting in that drive. Some of my mirrored drive is located on the same bus but I've not not seen this happening to me. If it were me, I'll double check/do a few things.
- Check if the mirrored disk has any IO error at all (STM diagnostics).
- Patches (make sure you're up to date on LVM patches).
- Re-mirror my drives and simulate "primary" disk failure just to make sure it's running okay before releasing it to the rest of the users.
Sajid_1
Honored Contributor

Re: Mirroring explanation

hi,

I would check the patch level of the system. Install the latest patch bundles which contains a lot of cumulative patches for LVM, SCSI bus, disk etc.

hth
learn unix ..