Operating System - HP-UX
1837896 Members
3367 Online
110122 Solutions
New Discussion

filesystem corrupted after disk pull out

 
Tapas Jha
Valued Contributor

filesystem corrupted after disk pull out

Hi,

Yesterday we faced a peculiar problem in HP-UX 11.00 on L2000.
All our system are SC10 storage and LVM raid and mirrored .
One hard disk of SC10 crashed on(ServerA).The HP engineer came and replace it on another server(ServerB) but the command he ran(vgcfgrestore -n vg01 /dev/rdsk/c5t8d0) on the affected server(Server A).

Now the second server(Server B) was throwing below message and we logged the call since this is netscape mail server and it's critical and no service was working.

"Aug 31 17:48:31 serverb vmunix: vxfs: mesg 003: vx_mapbad -
/dev/vg01/lvol1 file system free extent bitmap in au 15 marked bad
Aug 31 17:49:22 serverb vmunix: vxfs: mesg 008: vx_direrr -
/it file system inode 11511 block 98859 error 6
Aug 31 17:49:35 serverb vmunix: vxfs: mesg 008: vx_direrr -
/usrmlbox file system inode 53381 block 4297572 error 6
Aug 31 17:50:59 serverb vmunix: vxfs: mesg 017: vx_dirlook -
/usrmlbox file system inode 197266 marked bad
Aug 31 17:51:00 serverb vmunix: vxfs: mesg 017: vx_dirlook -
/usrmlbox file system inode 204349 marked bad "

The solution centre engineer was working on serverb using remote modem and given dd command on c5t8d0 disk(serverB) and at the same time server rebooted. We don't know what command did he typed.
From shutdown log i found that the server rebooted giving "Reboot after panic: vn_rele".

After serverb come to online we found that /usrmlbox, /it has so
many files in lost+found and users lost so many mails.

After that they did the following things:
1) unmount all file system in vg01
2) vgchange -a n vg01 ( it has given error)
3) Put the original disk back on c5t8d0(pulled out new one)
4) ioscan -fnC disk
5)diskinfo /dev/rdsk/c5t8d0
6)vgchange -a y vg01
7)vgdisplay -v vg01|more
8)vgsync vg01

Then we did restore the previous day's backup.

One issue is still there. Still users are getting "receiving 1 of 18"
when get msg from client netscape. We are trying to solve this issue
permanantly for all but in vain so far.


Now my question is since it is in mirror it should not throw
error message when hard disk in offline and put another new one.
LVM should take care of it.
It should not goes to panic and reboot. Filesystem should not goes bad.

What is wrong and why all these occurs and are all these activity are
correct after Local HP Eng.'s wrong disk replacement.


Rgds
Tapas
Tapas Jha
1 REPLY 1
A. Clay Stephenson
Acclaimed Contributor

Re: filesystem corrupted after disk pull out

The system will run just fine with a failed mirror -- I have not shutdown in years to replace a disk BUT vgcfgrestore is supposed to be run on an disk that is not activated as part of a volume group. Running the command on a disk that is already activated can cause chaos. It also is a good idea to be running the command on the system that is intended.

While it is easy to blame the HP Mr. Goodwrench for running the command on the wrong system, he is probably not familiar with your systems but you are. You should have been standing at his side while any commands are run --- or better still run them yourself. I NEVER let the HP guys run anything because if something goes wrong, regardless of why, it's my responsibility.
If it ain't broke, I can fix that.