Operating System - HP-UX
1824941 Members
3900 Online
109678 Solutions
New Discussion юеВ

Recovery of corrupted filesystem

 
A Pedersen
Occasional Advisor

Recovery of corrupted filesystem

I am in deep disappointment and I therefore need someone to help me out 'a bit'.

I have been in the process of moving from one EVA SAN system to another. The process I went through was (in basics):
1) attach one of the HBA's to the new EVA SAN
2) Present the new LUNS
3) vgextend the VG with the new disks/LUNs
4) lvextend (mirror) to the new disks/LUNs
5) lvreduce the disks from the old SAN
6) vgreduce the disks from the old SAN
7) Unpresent the LUNs from the old SAN
8) done...

..everything seemed to be a success. The filesystem is mounted, I can 'du' the filesystem. But, when the applications (housed on that filesystem) is getting launched (was shutdown during this process, just in case) it doesn't work. It actually seems like the filesystem has become corrupt in some way. Not that I know of how, but the files are 'accessible' but the binaries seems to be corrupt.

Now I'm desperate for any kind of solution where I can attach the old LUNs and recover the filesystem on those. There's 8 LUNs for that file system, and I have presented them to the system again...but, I can't use the vgimport with the old map file, as the VGID are gone on the old disks (guess from the vgreduce).

Anyone have any ideas? - restore from backup system is not great (talking about ~1TB data) and would take very long time.

Please please please help me...
Regards
Allan P

15 REPLIES 15
Dennis Handly
Acclaimed Contributor

Re: Recovery of corrupted filesystem

>but the binaries seems to be corrupt.

How and in what way?
Can you compare the cksum with one your backup?
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

I have a testing environment same as this productive one. I have copied a few of the binaries from the testing environment to the productive one, and they seem to work... but, my problem is - I don't know what has been corrupted and what has not.

I also tried to do a cmp between 2 binaries (1 from backup and 1 from system) - they differed...

\Allan
Dennis Handly
Acclaimed Contributor

Re: Recovery of corrupted filesystem

>I also tried to do a cmp between 2 binaries - they differed.

You could use "cmp -l" on the two files and see what the first 50 diffs are. Any modification date changes?

Also, if your binaries are corrupted, there isn't much you can do about it, except to try to see how many are corrupted.

And either restore from the backup or get your original LUNs back.
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

There seems to be a lot of things that differ on the binary I tested. The date and size are the same...

I'm not able to test out all my binaries, or for that matter, all the data. Because, if some of the data/binaries are corrupt, I can't be sure that the rest are not.

So, my hope for this was more to see if there by any chance could be a posibility to get back to the old LUNs and thereby get hold of the data as they looked before the mirror procedure...

\Allan
Kapil Jha
Honored Contributor

Re: Recovery of corrupted filesystem

there should not be any problem if you have mirrored the LV correctly any issue you faced while mirroring.

I would suggest to re-mirror the disk and then mount it on other mount point with lvsplit and compare if you really unsure about the mirroring.

BR,
Kapil+
I am in this small bowl, I wane see the real world......

Re: Recovery of corrupted filesystem

attach the disks to the system ,do a vgscan on the disks , the os will read the vgra on the disks and prompt that the disk belongs to a volume group , then create a temp vg. with the disks and then you can mount the lv's and use the data to restore the corrupted binaries .
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

Kapil: As I see it, there are no posibility to re-mirror. If the files are corrupt in any matter, the mirroring would just mirror the corrupted files to the new mirror.

Amit: I tried to attach the disks to the system...but, the vgscan does not work, as it seems that the VG information on the disks has been removed while doing the procedure desribed above (e.g. the vgreuce).

Re: Recovery of corrupted filesystem

vgreduce does not remove the data from the vgra data from the disks , if u vgvremove a disk u can still access it using vgextend and adding it to the same pv.

However vgscan does not show anything it means that the vgra has been deleted perhaps by a pvcreate command.

Did u try vgextending the disks to the same vg ??
chris huys_4
Honored Contributor

Re: Recovery of corrupted filesystem

Hi Allen,

If there would be filesystem problems, there would be filesystem errors logged in syslog.log, is this the case ?

If there would be filesystem problems, backup what you have now, unmount the filesystem, remount the filesystem. If the filesystem can remount without asking for a fsck, it certainly isnt a "usual" filesystem corruption.

So to check for "unusual" filesystem corruptions, unmount the filesystem again and do a full fsck, if that still doesnt find any problems, there are no filesystem problems, but its probably a problem with the application binaries itself, someone patched the application binaries, but forgot to restart the binaries afterwards to see if the patching was successfully or something (was the application restarted, before beginning with the "migrate" operation to see if the application was ok)..

On vxvm, you would be able to get back to the lost data, even after a 'lvreduce', on lvm, its restore time..

Greetz,
Chris
PS. Oh yes and logging a call with hp support, would also be a good idea.
PPS. restoring 1 TB nowadays is nothing. ;)
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

Amit: No, the data is not lost, I know that. But, the VGID information on the disks are removed when doing the vgreduce command. If it was only 1 LUN, I would just be able to create a new VG with that disk and reuse the FS... but, the file system is spread out on 8 LUNS, which means that I cannot be sure in which order they are there, and it won't recognize it when I attach the LUNS again.

Chris:
After some time, I have also recalled that there's not much to do than just restore (already initiated)...but the problem is also, that there's a time gap (12 hours) between the last backup and the actual migration. Yes, I know that I should have backed up beforehand, but I only got a small service window to do this, and I also was 100% sure that this would be a walk in the park as we do mirroring, expansions and other vg/lv related tasks several times a month online.
The weird thing about the binaries are, that they are the same size, have the same date and all the information on them are the same...but, the application fails when starting up.
The FS does not come with any errors on mount.
I did place a call at HP support, and they where not able to help, as they also said that the VGID information was lost. They gave me an unofficial/own risk/never heard heard that from HP thing to try - didn't work either :/ (the try was to do an vgimport -v -m mapfile vg -f infile ).


So, I guess the conclusion must be, that I can't get back to the old LUNs and a restore must be done.
Also, it should not be possible to have this kind of error during a mirror operation, which must mean that the error must have occured elsewhere - maybe an error occured when shutting down the application (inside a MC/S package)...

Thanks all for your time/answers. I'll leave this open a few days from now, just to see if someone else would be able to suggest something good, and will close it afterwards.
\Allan
Elmar P. Kolkman
Honored Contributor

Re: Recovery of corrupted filesystem

Only thing I can add, is that we did this a lot. And it worked, as long as the vgreduce was succesfull (sometimes the lvreduce didn't take out the right disks of the mirror, so the vgreduce failed and then the administrator still removed the disks) but this resulted in very visible errors...

Are you sure you're new EVA is working correctly? Because what you describe is very strange. Stangest is that the inode information seems correct, while the in-file-data itself is 'corrupted'... If you copied using tar, that could be explainable, but a LVM mirror doesn't know anything about Filesystems...

Just my 2 cents.
Every problem has at least one solution. Only some solutions are harder to find.
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

Elmar:
Thanks for your '2 cents' :)
The lvreduce was successful, and therefore also the vgreduce.
I would beleive that the new EVA is working correctly. Regarding the migration we moved 10 systems from the old SAN to the NEW including 3 cluster sets with several filesystems....and, so far, 'only' one filesystem has given problems (the one mentioned all the time).
And yes, is was a standard LVM mirror that was done (lvextend -m 1 /dev/vg09/lvol1 /dev/dsk/; and afterwards; lvreduce -m 0 /dev/vg09/lvol1 /dev/dsk/)
Viktor Balogh
Honored Contributor

Re: Recovery of corrupted filesystem

Hi,

>"...the VGID information on the disks are removed when doing the vgreduce command."

You could add the LVM header back to your disk with vgcfgrestore. For this, I would first deactivate the VG in the new SAN. After that, I would do a vgchgid to the disks in the old SAN, and try to import the old LUNs as a completely new VG, independent from the VG in the new SAN. Use a map file from the new VG, with the device files feeded into vgimport.

Let's see if it works, I never did things like this, but in theory it should work... ;)

Regards,
Viktor
****
Unix operates with beer.
Viktor Balogh
Honored Contributor

Re: Recovery of corrupted filesystem


> "It actually seems like the filesystem has become corrupt in some way. Not that I know of how, but the files are 'accessible' but the binaries seems to be corrupt."

> "I have copied a few of the binaries from the testing environment to the productive one, and they seem to work... "

It is not clear to me how could only the binaries get corrupted and not the whole fs. Has it something to do with copying from test to prod? Did you do an fsck to the filesystem? Did it show up any consistency errors? Should the binaries be the same on test and prod? (If yes, why not overwrite all prod executables with test ones, after a backup of prod?)
****
Unix operates with beer.
A Pedersen
Occasional Advisor

Re: Recovery of corrupted filesystem

Viktor:
Regarding the corrupted binaries. That is excactly my point - I can't be sure that it's only the binaries that is corrupt. Those were just the ones 'easiest' to identify, as the processes woulnd't start up. But, when I replaced the specific binaries with the ones from the testing environment, I could start up some of the processes.
Plus, some of the binaries differ from testing environment to production environment, as there has been development changes which has been implemented in the testing environment but not in the production environment yet.

I'll look a bit more into your suggestion regarding vgcfgrestore, import chgid etc.