Re: #$^%@# SCSI drive!!

Mark Vollmers · ‎01-02-2002

Hi, all. I have two filesystems on our RAID drive, /home and /download. These are mounted to the server when it boots up. I have had this problem for a little while, and have just gotten around to looking at it. When I reboot, I get a message in the syslog and on the console that says:

vxfs: mesg 021: vx_mountsetup: /home file system validation failure

boot continues then and finished up. I then go in, run fsck on /home, and it says that there are IO errors. Fix them. everything is fine until the next reboot, where I do it again. This problem has happened since I have upgraded to HP 11.0, but I don't think it happened right away, but several months after. I have just put in the new patches, and it did it before and after. The interesting thing is that /download mounts fine with no issue whatsoever. In the syslog is an error with a SCSI reset and lbolt, and I don't remember if that appears every time or not. There arn't any other performance issues (crashes, etc). I'd appreciate any ideas as to what's going on. Thanks.

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

A. Clay Stephenson · ‎01-02-2002

Hi Mark:

This is not a lot to go on but 1) Are the two file systems in different VG's and thus on separate LUN's. In that case, it's possible that the IO Timeouts are set differently on the two PV's. Typically RAID LUN's require timeouts in the rangle of 120-180 seconds rather than the default value. 2) You mentioned patches but have you installed all the LVM, SCSI, and VxFS patches. 3) Don't overlook the obvoius - cable length, proper termination, termination power - this could be a case of everything working almost perfectly.

If it ain't broke, I can fix that.

Rita C Workman · ‎01-02-2002

It would help to see Exactly what the scsi lbolt error is...often it's hardware...if you see something like:
lbolt dev: 1f00500

Then the 1f is hex for 31 - in other words a disk is having hardware problems.

So more info on the lbolt error would help...

Rgrds,
Rita

Sridhar Bhaskarla · ‎01-02-2002

Mark,

Are you mounting these file systems with nolog or tmplog option?. If so, mount them with log or delaylog. "mount" command will display how they are mounted. I can think of this if there are no H/W problems with the disk subsystem.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Helen French · ‎01-02-2002

Hi Mark,

When a VERITAS File System is mounted, the structure is read from disk. If the file system is marked clean, the structure is correct and the
first block of the intent log is cleared. If there is any I/O problem or the structure is inconsistent, the kernel sets the VX_FULLFSCK flag
and the mount fails. If the error isn't related to an I/O failure, this may have occurred because a user or process has written directly
to the device or used fsdb to change the file system.

Action

Check the console log for I/O errors. If the problem is a disk failure, replace the disk. If the problem is not related to an I/O failure, find out how the disk became corrupted. If no user or process is writing to
the device, report the problem to your customer support organization. In either case, unmount the file system and use fsck to run a full structural check.

Check this out:

http://us-support2.external.hp.com/cki/bin/doc.pl/sid=2d1abe5919d9550f85/screen=ckiDisplayDocument?docId=200000024613750

http://us-support2.external.hp.com/cki/bin/doc.pl/sid=2d1abe5919d9550f85/screen=ckiDisplayDocument?docId=200000050087829

http://us-support2.external.hp.com/cki/bin/doc.pl/sid=2d1abe5919d9550f85/screen=ckiDisplayDocument?docId=200000035869465

HTH,
Shiju

Life is a promise, fulfill it!

James R. Ferguson · ‎01-02-2002

Hi Mark:

Technical Knowledge Base document #GLPKBRC00002290 notes the following:

/begin_quote/

Message: 021
WARNING: msgcnt x: vxfs: mesg 021: vx_mountsetup - mount_point file system validation failure

Explanation

When a VERITAS File System is mounted, the structure is read from disk. If the file system is marked clean, the structure is correct and the first block of the intent log is cleared. If there is any I/O problem or the structure is inconsistent, the kernel sets the VX_FULLFSCK flag and the mount fails. If the error isn't related to an I/O failure, this may have occurred because a user or process has written directly to the device or used fsdb to change the file system.

Action

Check the console log for I/O errors. If the problem is a disk failure, replace the disk. If the problem is not related to an I/O failure, find out how the disk became corrupted. If no user or process is writing to
the device, report the problem to your customer support organization. In either case, unmount the file system and use fsck to run a full structural check.

/end_quote/

Thus, make sure no user or process is writing the the filesystem, and make *especially* sure no one is running 'fsdb'.

Since you note 'lbolt' errors in your syslog, it would also be worth while to correlate those to the disk(s) that map to this filesystem, although these may be relatively benign.

Regards!

...JRF...

Mark Vollmers · ‎01-02-2002

First off, here is the error in the syslog:

scb ->cdb: 28 00 01 ad dc c8 00 00 08 00
scb ->cdb: 4d 00 40 00 00 00 00 04 00 00
SCSI: Resetting SCSI -- lbolt: 55278, bus: 0
SCSI: reset detected -- lbolt: 55278, bus: 0

Secondly, both /download and /home are created in /dev/vgraid. I set the /home timeout high becuase of past problems, but I don't know where the /download is set. As for patches, I ran the custom patch manager, so it grabbed everything HP thought the server needs. nothing has changed for cables, but I would that to affect both /home and /download.

Lastly, on the server, mount shows:
/download on /dev/vgraid/download delaylog,nodatainlog (same for /home)

Please let me know if you need anything else. Thanks!
Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Mark Vollmers · ‎01-02-2002

I'm sure that no one is using fsdb, unless it is the system itself. Once the boot finishes, I can run fsck and fix whatever problem it has and mount it up, but I am more concerned with what is happening during shutdown or reboot that is causing the problem. I havn't resized any of the lvols since the upgrade to 11.0, and I havn't resized /home or /download.

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

A. Clay Stephenson · ‎01-02-2002

Hi Mark:

I don't suppose you have any sort of error logging on your RAID? I would try a couple of things at this point in an attempt to separate the hardware problems from filesystem problems.

dd if=/dev/vgraid/rlvol1 (or whatever) bs=256k of=/dev/null - this is a read-only operation and thus safe. If that passes repeat it for the other lvols in in VG. If those pass then the underlying I/O is probably okay.

I would then be tempted to create a new /home filesystem and restore from backup.

Finally, one last thought: Is your RAID powered-on and 'READY' or 'ONLINE' before you boot your server. You just might have a timing issue or transients on the bus otherwise.

If it ain't broke, I can fix that.

Mark Vollmers · ‎01-02-2002

Clay-

I am running the dd command and will see what it does. the raid is normally powered on and ready. In this last case, the server reboot after the patch install, so there should not have been a change in state for the RAID.

If nothing else works, I could create the new fs, but will that solve the problem? Everything that I think of concerning a bad RAID would affect /home and /download, but yet I have never (repeat, never) had an issue with /download, but /home has crapped out on me and crashed and caused me all sorts of trouble over the last year or so (both with 11.0 and 10.20). Both are referenced (to the best of my knowledge) the same way on the server (/etc/fstab, for example). So why does one not behave?

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Mark Vollmers · ‎01-02-2002

Clay-

the results of the dd if=/dev/vgraid/home bs=256k of=/dev/null were:

dd read error: Invalid argument
56251+0 records in
56251+0 records out

the file system, from bdf, is 25.6 GB, with 1665172 used and 8380236 free.

I'll run /download to see what that does.

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Mark Vollmers · ‎01-02-2002

Sorry. That should have been /dev/vgraid/rhome.

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

A. Clay Stephenson · ‎01-02-2002

Hi Mark:

After this, there is a rather good chance that making a new filesystem will fix you. I think you have a very subtle fs problem or you have not done a full fsck and the log replays are not really clearing your problem. There have been a number of VxFS patches that corrected various seldom-seen problems and you may be one of the lucky few. I would run a newfs and then restore home from backup.

If it ain't broke, I can fix that.

Mark Vollmers · ‎01-02-2002

Clay-

just for reference, the dd on /download showed no problems. so let's assume that I need to go ahead and remake the fs.

I'm going to go ahead and search for how to do it, but I do have a question about what to do with the old one. Do I need to dump it first and then make the new one, calling it /home? Do I keep the old one on? Also, will I need to redo all the files like /etc/fstab, where it lists /home, or will it just look to the new one. Needless to say, this whole thing makes me a little nervous (mass restoration of hard drives). Any advice is welcome. Thanks!

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Sridhar Bhaskarla · ‎01-02-2002

Hi Mark,

Do they have by any chance a lot of links in side? What will happen if you unmount /home or /download (whichever is easier) them and mount them manually?.

Also, what do you have in /etc/rc.log and syslog.log?. Do you see any errors other than
the ones you mentioned?.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Mark Vollmers · ‎01-02-2002

Sridhar-

I looked though the rc.log and syslog, and there was nothing new. in the rc.log, it says that /home is corrupted and needs to be checked. there are a few items that the command was not found, but that it becuase it is in the /home path and not mounted. I end up mounting /home myself, but I just use the mount /home or mount -a command. Should I type in the entire line?

mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

A. Clay Stephenson · ‎01-02-2002

Hi Mark:

This is rather easy (it assumes a vxfs filesystem):

1) Backup existing home directory:

cd /home
tar cvf /dev/rmt/0m .
where /dev/rmt/0m is your tape drive
then
tar vtf /dev/rmt/0m to list the contents of the backup and confirm that you have a good backup.

You can also use cpio if you like (or fbackup).

P.S. You might want to backup both /home and /download to protect yourself from yourself.

2) cd /
umount /home

3) newfs -F vxfs /dev/vgraid/rlvol1 (or whatever is the current raw device for this filesystem but be sure.

4) mount /home (since you haven't messed with /etc/fstab; no changes are needed).

5) cd /home
tar xvf /dev/rmt/0m

That should have you back in business. You can then do an exportfs -a and that should get your NFS stuff fixed as well.

Regards, Clay

If it ain't broke, I can fix that.

Kelli Ward · ‎01-02-2002

Interesting subject title.
I say that all the time. ; )
Another thought:
I had a similiar situation a while back, (and said "#$^%@# SCSI drive!!" many times) which turned out to be some sort of latent corruption on the drive.
I backed up the contents of the drive, mediainited it and restored the contents.
No more problem.
Could be another possibility to consider.
Good Luck and Happy New Year.
Kel

The more I learn, the more I realize how much more I have to learn. Isn't it GREAT!

Mark Vollmers · ‎01-02-2002

Clay-

Hate to be a bother, but one point of clarification, please.

I umount /home and then do a newfs -F vxfs /dev/vgraid/???? Do I use rhome here, and does that effectivly destroy the old rhome? Or do I use something like rlvol1, and then just reference that when I mount /home (/dev/vgraid/lvol1 /home). Does it really matter?

Then, when I mount up /home again, there should be nothing in it, right? thus, restore from tape.

Thanks!

mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Sridhar Bhaskarla · ‎01-02-2002

Mark,

What I meant was to manually unmount the file system and mount it back to see if there are any errors. Some more possibilities.

1. The disks (LUNs) that are being used by /home and /download are shared across multiple systems. Particularly in a SAN environment, the possibility is more due to LUN security issues. So what kind of backend do you have?.

2. The permissions on the logical volumes shouldn't be given to others so that the chance of ordinary users manually corrupting them is taken away.

3. Do they have a lot of activity? Are the reboots getting completed normally? It will also result this kind of situations particularly on the file systems that are heavily active where there may be some homemade deadly processes that do not like to leave but become zombie. Sometimes, we lose patience with the reboot because of this and switch off the servers before they close the logical volumes and flush the buffers.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

A. Clay Stephenson · ‎01-02-2002

Hi again Mark:

You do a newfs -F vxfs /dev/vgraid/rhome and yes your old filesystem is toast. That is why it is essential that you have a good backup before doing this. You other option is to create an entirely new logical volume and call it /newhome. Then copy everything from /home to /newhome. You would then umount /home and /umount /newhome. Finally modify /etc/fstab changing the lvol for home then mount /hiome. This would leave your old home filesystem intact but unmounted. This is the safest way to do this but does require available disk space.

If it ain't broke, I can fix that.

Mark Vollmers · ‎01-03-2002

I went ahead and recreated the fs (/dev/vgraid/rhome) and restored /home from tape. I went and rebooted and there were no issues with corrupted blocks or anything, so it appears that everything is working fine now and the problem was just some sort of glitch. Wish I'd known sooner, it would have saved me a bunch of headaches. Oh, well. Thanks for all your help, everyone!

Mark

"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: #$^%@# SCSI drive!!

#$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!

Re: #$^%@# SCSI drive!!