Operating System - HP-UX
1833838 Members
2713 Online
110063 Solutions
New Discussion

Disk failure or filesystem corruption

 
SOLVED
Go to solution
kenny chia
Regular Advisor

Disk failure or filesystem corruption

Hi
I have a workstation with the following problems, not sure if it is a disk problem or filesystem corruption,

1) Cannot mount /tmp to /dev/vg00/lvol6 during bootup

2) Tried to run fsck to /dev/vg00/lvol6
# fsck -F hfs /dev/vg00/lvol6
** /dev/vg00/lvol6
** Last Mounted on /tmp
** Phase 1 - Check Blocks and Sizes

CANNOT READ: BLK 32
CONTINUE? n

Program terminated

3) Try to read the lvol itself..
# dd if=/dev/vg00/lvol6 of=/dev/null
dd read error: I/O error
64+0 records in
64+0 records out

Thanks for any advice..
All Your Bases Are Belong To Us!
15 REPLIES 15
Bill Hassell
Honored Contributor
Solution

Re: Disk failure or filesystem corruption

fsck is reporting a disk read error--it cannot correct any disk I/O problems, only directory structures issues. dd proves that you have a bad disk. You'll need to replace the disk or try using mediainit to get the disk to relocate the bad blocks, then recover the disk with your Ignite/UX backup.


Bill Hassell, sysadmin
KCS_1
Respected Contributor

Re: Disk failure or filesystem corruption

Hi,

Have a look the outputs from

# vgdisplay -v vg00

how many LVs in VG00

# strings /etc/lvmtab

which disk in the VG00

# diskinfo -v /dev/dsk/cXtYdZ

If, no information about specified disk, It was disk fault.

# pvdisplay -v /dev/dsk/cXtYdZ
cXtYdZ is a disk in VG00
how to allocated data in VG00
specailly, /dev/vg00/lvol6

# lvdisplay -v /dev/vg00/lvol6

look at the current status or staled

as these outputs, you can check FileSystem currupted or Disk falied from specific disk.






Easy going at all.
kenny chia
Regular Advisor

Re: Disk failure or filesystem corruption

Hi Patrick
I've tried all that commands, but there are no unusual outputs for them. Syslog has no erros too. Possible reasons

1) lvol6 was not mounted
2) Nothing was therefore read and written to lvol6 and therefore read/write errors are not detected.
All Your Bases Are Belong To Us!
KCS_1
Respected Contributor

Re: Disk failure or filesystem corruption

Ok.

OOps~ I didn't pay attention to ur message!

>> # diskinfo -v /dev/dsk/cXtYdZ
If, no information about specified disk, It was disk fault.<<

Is this command no output??

E.G )

# strings /etc/lvmtab
/dev/vg00
/dev/dsk/c0t6d0
/dev/dsk/c1t0d2

# diskinfo -v /dev/rdsk/c0t6d0

How about this??


Easy going at all.
kenny chia
Regular Advisor

Re: Disk failure or filesystem corruption

hmmm
nothing abnormal?

# diskinfo -v /dev/rdsk/c0t6d0
SCSI describe of /dev/rdsk/c0t6d0:
vendor: IBM
product id: DDRS-34560WS
type: direct access
size: 4194157 Kbytes
bytes per sector: 512
rev level: HP01
blocks per disk: 8388314
ISO version: 0
ECMA version: 0
ANSI version: 2
removable media: no
response format: 2
(Additional inquiry bytes: (32)44 (33)37 (34)34 (35)32 (36)35 (37)31 (38)30 (39)0 (40)0 (41)0 (42)0 (43)0 (44)0 (45)0 (46
)0 (47)0 (48)0 (49)0 (50)0 (51)0 (52)0 (53)0 (54)0 (55)0 (56)0 (57)0 (58)0 (59)0 (60)0 (61)0 (62)0 (63)0 (64)0 (65)0 (66)0 (
67)0 (68)0 (69)0 (70)0 (71)0 (72)0 (73)0 (74)0 (75)0 (76)0 (77)0 (78)0 (79)0 (80)0 (81)0 (82)0 (83)0 (84)0 (85)0 (86)0 (87)0
(88)0 (89)0 (90)0 (91)28 (92)43 (93)29 (94)20 (95)43 (96)6f (97)70 (98)79 (99)72 (100)69 (101)67 (102)68 (103)74 (104)20 (1
05)49 (106)42 (107)4d (108)20 (109)43 (110)6f (111)72 (112)70 (113)2e (114)20 (115)31 (116)39 (117)39 (118)37 (119)2e (120)2
0 (121)41 (122)6c (123)0 (124)7f (125)fe (126)da (127)0 (128)0 (129)2 (130)0 (131)0 (132)0 (133)0 (134)0 (135)0 (136)0 (137)
0 (138)0 (139)0 (140)0 (141)0 (142)0 (143)0 (144)0 (145)0 (146)0 (147)44 (148)44 (149)52 (150)53 (151)2d (152)33 (153)34 (15
4)35 (155)36 (156)30 (157)57 (158)53 )
#
All Your Bases Are Belong To Us!
Michael Tully
Honored Contributor

Re: Disk failure or filesystem corruption

Are there any messages like SCSI read errors in your /var/adm/syslog/syslog.log file ?

If so , this will explain the reason. A bad block report may produce a file under the lost+found directory. Even so, I suspect that there disk has a problem, so I wouldn't fool around with it, take a good backup and get the disk replaced, whilst you have an opporutunity in which to do so.
Anyone for a Mutiny ?
kenny chia
Regular Advisor

Re: Disk failure or filesystem corruption

Hi Michael
I can't find any errors in the logs except /etc/rc.log (which states that it can't mount /tmp due to fsck error)

I suspect that a hardware reboot occurred before anything could be written to syslog
All Your Bases Are Belong To Us!
T G Manikandan
Honored Contributor

Re: Disk failure or filesystem corruption

There are some blocks which are corrupted.

This is a vg00 disk and replace the disk and recover using ignite.

I am not sure whether you have a recovery tape made ready.


Later you can use the bad disk and do a mediainit on the bad disk and recreate the file system.



Steven E. Protter
Exalted Contributor

Re: Disk failure or filesystem corruption

I've read this carefully and gone over it on and off for about half an hour.

Ignite seems to be the only hope for recoverying this system.

I would think that dmesg would show an lbolt or something, but maybe that will happen later.

If you have a recent make_tape_recovery tape get ready to use it. If not, get what you can off the workstation and prepare to cold install it or Ignite it from your system image.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
T G Manikandan
Honored Contributor

Re: Disk failure or filesystem corruption

What is the size of your /tmp file system.Is this mounted on lvol6 and is that creating problems.

Revert
Bill Hassell
Honored Contributor

Re: Disk failure or filesystem corruption

You probably aren't seeing any errors in syslog because /tmp is no longer being mounted and subsequent reboots (or cron jobs) have wiped out the original disk errors. You aren't seeing any disk errors because lvol6 is no longer mounted and therefore not being accessed. Run the dd command a couple of times again (always use if=/dev/vg00/rlvol6 (not lvol6) to bypass the buffer cache driver). Then look in syslog.log and you'll likely see the errors.

NOTE: your system probably seems to run OK without /tmp mounted because the mounpoint is being used for storage! This means that the root directory / may be close to full which is not good. Keep /tmp clean until you can make a complete make_tape_recovery backup and order a new disk. I would not use the mediainit method because you'll have nothing to go back to if soemthing goes wrong with the backup tape.


Bill Hassell, sysadmin
kenny chia
Regular Advisor

Re: Disk failure or filesystem corruption

Hi Bill

# dd if=/dev/vg00/rlvol6 of=/dev/null bs=1024
dd read error: I/O error
32+0 records in
32+0 records out

the strange thing is that nothing appears in dmesg or syslog

thanks
All Your Bases Are Belong To Us!
Bill Hassell
Honored Contributor

Re: Disk failure or filesystem corruption

Sounds like diagnostics are not loaded or running properly, or syslogd isn't running. To test syslog:

logger -p user.warn "This is a test"

But since dd is failing with an I/O error, get the data off the disk and replace it ASAP.


Bill Hassell, sysadmin
kenny chia
Regular Advisor

Re: Disk failure or filesystem corruption

The logger command works
and
# ps -ef | grep dia
root 1341 1081 0 Oct 13 ? 0:00 diaglogd
root 1081 1 0 Oct 13 ? 0:03 /usr/sbin/stm/uut/bin/sys/diagmond
#

anyway, I have logged a call and they will replace the drive

Thanks to all!
All Your Bases Are Belong To Us!
twang
Honored Contributor

Re: Disk failure or filesystem corruption

4 simple step to replace a non-mirrored disk:
# vgcfgrestore - /dev/vgXX /dev/rdsk/cxtxdx
# vgchange -a y /dev/vgXX
(create FS for every LV on the PV)
# newfs -F vxfs /dev/vgXX/rlvolx
(mount all new-created FS)
# mount /mountpoint