1822008 Members
3685 Online
109639 Solutions
New Discussion юеВ

Inode corruption?

 
SOLVED
Go to solution
Nobody's Hero
Valued Contributor

Inode corruption?

Anyone have any idea what this error message means. Running 11.11 on an older Kclass using Peoplesoft Application server. Found the message in dmesg...

msgcnt 228 vxfs: mesg 008: vx_direrr: vx_readdir2_1 - /app file system inode 13 block 23 error 6

I dont see any stale inodes in the lvol.
UNIX IS GOOD
7 REPLIES 7
Florian Heigl (new acc)
Honored Contributor

Re: Inode corruption?

I can't exactly explain it to You, but I'd recommend You to take to following steps soon:

umount the filesystem
take a full backup via the raw device
dd if=/dev/vgNN/rlvolMM bs=64k conv=noerror| gzip -c - > /var/adm/crash/backup_vgNN_lvolMM
( the conv=noerror will skip bad blocks so that You definitely good all good data collected in case there is an issue with the hardware)

then go ahead and try to fsck the filesystem:

fsck -Fvxfs -o full,nolog /dev/vgNN/rlvolMM

everything fixed? -> ok, remount, test app

still errors when doing a second fsck run?
if time is critical and LV is small, try to recover tape backup
otherwise open a call with HPs competency center to let them debug the filesystem.
yesterday I stood at the edge. Today I'm one step ahead.
Florian Heigl (new acc)
Honored Contributor

Re: Inode corruption?

uh, err... missed points:

- check the backup taken (i.e. md5sum)
- before recovering the tape backup: newfs!
yesterday I stood at the edge. Today I'm one step ahead.
generic_1
Respected Contributor

Re: Inode corruption?

I think you had read error on disk. Take a look at the disks in that related vg in cstm or stm and see if you see any errors,they should show disks errors. Also check /var/adm/tombstones and /var/adm/syslog.
Nobody's Hero
Valued Contributor

Re: Inode corruption?

No hardware errors.
I guess I'll dump the FS and run a fsck
UNIX IS GOOD
Sreedhar Nathani
Valued Contributor

Re: Inode corruption?

Hi Robert,

The meaning of inode marked bad means, when accessing the inode, system encountered some problem, since not able to access the inode its marked bad.

When you umount and do fsck this inode will be deleted from the filesystem.

First found out what is this inode and which data it contains. If you can copy this file to some other name then copy/umount and fsck.
#ls -li /app should be cuase inode number is very less

If you are sure there is no problem and you can access the file, then you make this inode marked again good via fsdb command.

ex:
# fsdb
>13 i
>af=0
>quit
** you will see "aflags 1" if inode is marked bad, and "aflags 0" is it marked good.

After that unmount/fsck/mount.
You won't loose that file

Hope this helps

Bill Hassell
Honored Contributor
Solution

Re: Inode corruption?

The message: msgcnt 228 vxfs: mesg 008: vx_direrr: vx_readdir2_1 - /app file system inode 13 block 23 error 6

ends with the error number, and in this case, the error is an errno value, the classic Unix error code. From the include file and man page for errno:

from /usr/include/sys/errno.h:
define ENXIO 6 /* No such device or address */

from man 2 errno:
[ENXIO] No such device or address. I/O on a special file refers to a subdevice that does not exist, or is beyond the limits of the device. It can also occur when, for example, a tape drive is not on line or no disk pack is loaded on a drive.

SO it would appear that the disk drive went offline. You can verify this by looking in /var/adm/syslog/syslog.log to see if there are disk error messages.

Now this could be caused by someone doing an lvreduce on the fil;esystem. lvreduce is a critically dangerous command as it simply cuts off the end of the volume. It knows nothing about filesystems and does no error checking since an lvol might be raw or swap and not a real filesystem. Recovery from this is possible as long as you know the previous size of the lvol. Just use lvextend to return the lvol to the original size.


Bill Hassell, sysadmin
Alessandro Pilati
Esteemed Contributor

Re: Inode corruption?

Robert, I found this hints for you:

directory operation failed in an unexpected manner. The mount point,inode
and block number identify the failing directory. If the inode is an
immediate directory, the directory entries are stored in the inode, so no
block number is reported. If the error is ENOENT or ENOTDIR, an
inconsistency was detected in the directory block. This inconsistency could
be a bad free count, a corrupted hash chain or any similar directory
structure error. If the error is EIO or ENXIO, an I/O failure occurred while
reading or writing the disk block.

a full FSCK is
required to fix the structure of the file system.

In this example, the error number is "6" which when looked up in
/usr/include/sys/errno.h. it translates to:

#define ENXIO 6 /* No such device or address */

Very often ENXIO is caused by intermittent hardware problems and those
should be investigated before taking any corrective action with the file
system.

Check also this link:
http://www.sunmanagers.org/pipermail/sunmanagers/2002-November/018281.html

Anyway, make a full fsck.

Rgds,
Alex
if you don't try, you'll never know if you are able to