Operating System - HP-UX
1834307 Members
2191 Online
110066 Solutions
New Discussion

Re: Repairable, recurring errors on filesystem

 
Matt Hearn
Regular Advisor

Repairable, recurring errors on filesystem

Greetings!

On an HP-UX 11.0 system we have running SAP, one of the sapdatas became corrupted for no discernible reason. fsck was unable to repair it, so I was forced to do a newfs and begin a restore.

I also checked all the disks; looked for stale extents, checked the ioscan for "NO_HW" readings, exercised them through STM, and even dd'd the entire disk into /dev/null, hoping for an i/o error. I have nothing. (The filesystem is unmirrored.)

I had the SAP folks start restoring their data (using brrestore from a DLT7000), but they ran into tape drive problems (which I BELIEVE are unrelated, but I could be wrong; the tape drive in question shares a dual-port SCSI card with the drives attached to the lvol that got corrupted); after we resolved the tape issue, they began having other restore problems, getting this error:

"/oracle/JB1/sapdata1/JB1_TMP/sapreorg/btabd.data10: No such device or address"

I unmounted the device to fsck it, and got this error:
vxfs fsck: file system had I/O error(s) on user data.

The error is fixable by fsck -o full. This has happened a few times now. At the moment, the SAP folks are restoring into a different directory and it seems to be working.

Is it possible that bad data on the tape is somehow corrupting the filesystem? Or should I be taking a close look at the disks? Thanks!
2 REPLIES 2
Steven E. Protter
Exalted Contributor

Re: Repairable, recurring errors on filesystem

There is an i/o cable or disk or bus problem happening here.

I'd test the hardware using mstm cstm or xstm. I would also bring out HP to look hard at the hardware.

I have seen this in the past. Its tough to track down but won't stop happening until the cause is uncovered.

The tape drive should NOT share a scsi card with disk. The tape can't get the throughput it needs unless it has its own SCSI card. I suspect this setup could contribute to the problem if that SCSI card is the same one the disks are on.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Matt Hearn
Regular Advisor

Re: Repairable, recurring errors on filesystem

Tape and disks not sharing a card: even if it's a dual card? They aren't chained together; they're on different hardware paths (0/4/0/0.5.0 for the tape drive and 0/4/0/1.*.0 for the disks).

I do have our hardware vendor looking at the disks now. We have replaced the dual SCSI card and the cabling to the tape drive already; depending on what he sees happening with the disks, I may have him swap the cabling to the disk arrays.

Thanks!