Operating System - HP-UX
1833541 Members
2998 Online
110061 Solutions
New Discussion

Re: hlep in diagnosing a possible disk failure

 
SOLVED
Go to solution
kevin donnelly
Advisor

hlep in diagnosing a possible disk failure

Our HPD350 running 11.0 is not completing its boot.

I replaced a broken tape drive and it will not boot now. It seems the last time the box was shut down it stuck at this same point and I was able to unplug the next to the last drive and plug it back in and it booted fine.

This time plugging and unplugging the drive only causes three beeps to be generated.

I have booted into single user mode but I don't know what to do next.

If I interrupt the boot process and have it list possible bootable sources it shows all the disk drives, the tape drive, and the cdrom. The drive is at least visable there but does not seem to respond correctly when it is time to mount the filesystems loaded on it.

What can I do to determine if the drive really has failed or if there is something else wrong?

All the files in /etc/lvmconf have not been modified in at least 6 months. The box reboots every Sunday and has been doing this fine. It just does not like to come backup after the power has been stopped.

Help!
13 REPLIES 13
Geoff Wild
Honored Contributor

Re: hlep in diagnosing a possible disk failure

A couple of things you can do...

Try dd'ing the questionable drive - either to a blank disk or to tape - if you get a failure - then you might have a bad disk....

If you have a contract with HP - I would place a service call...

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Paula J Frazer-Campbell
Honored Contributor

Re: hlep in diagnosing a possible disk failure

Kevin

Two points

1. Power down unplug the tape drive and reboot if it sticks again it is not the tape drive at fault. Have a look at dmesg and syslog the offending device should be logged in there.

2. Why do you reboot once a week? - this is a unix server and normally reboots are only carried out during patching or software installs.

Search the forum on how ofter servers are rebooted and you will find that "Not Often" is the answer. Unix is and never will be Windoze (reboot twice a day).


Paula
If you can spell SysAdmin then you is one - anon
S.K. Chan
Honored Contributor
Solution

Re: hlep in diagnosing a possible disk failure

At least you're able to boot in single user mode. That would give us something to work on. So all your disks in vg00 must be fine then since you're able to boot in single user mode which actually activates the root VG. Mount all your FS on vg00 first. Run ..
# ioscan -fnC disk
.. take note of all the other non-vg00 disks (ie match them with what's in /etc/lvmtab). Check if they respond to diskinfo.
# /etc/diskinfo /dev/rdsk/cXtXdX
If they do the next thing to do is manually activating all your othe VGs.
# vgchange -a y vg01
# vgchange -a y vg02
Post any error you see. If you got a disk that does not respond to diskinfo you can run further diagnostic (ie exerciser) on it using "cstm".
# cstm
cstm> map
cstm> sel dev
cstm> exc
cstm> map
==> Check with the map command once in a while to make sure the exerciser completes evenually.
cstm> eal
cstm> efl
cstm> einf
Update this thread on what are your findings.
kevin donnelly
Advisor

Re: hlep in diagnosing a possible disk failure

What would be the command to dd the drive? If I can't do it in SAM then my knowledge is real thin.

Also how can I try and mount some of the other file systems when in single user mode?

I tried mount /usr or mount /oracle and I get a message about the volume manager device file not existing. I must need to start something to get the LVM stuff running so I can mount the partitions.
kevin donnelly
Advisor

Re: hlep in diagnosing a possible disk failure

I ran the ioscan -fnC disk command and all the disks were identified.

How do I view the /etc/lvmtab file? When I try to cat it to the screen it overwrites part of itself. I don't get to see the part that relates to vg02.
Ramkumar Devanathan
Honored Contributor

Re: hlep in diagnosing a possible disk failure

strings /etc/lvmtab should display the lvmtab in better meaning.

- ramd.
HPE Software Rocks!
S.K. Chan
Honored Contributor

Re: hlep in diagnosing a possible disk failure

The command to run a dd test is ..
# dd if=/dev/dsk/cXtXdX of=/dev/null bs=32k
Are you sure you're in single user mode .. or in LVM maintenance mode ? Either way activate your vg00 first ..
# vgchange -a y /dev/vg00
and then run ..
# /sbin/mount -a
==> That will attempt to mount all you entries in /etc/fstab, you will see some error about not being able to mount non-vg00 FS, ignore those error for now.
To look at /etc/lvmtab, you would run ..
# strings /etc/lvmtab
since it's not a regular file.


John Dvorchak
Honored Contributor

Re: hlep in diagnosing a possible disk failure

After you have activated the volume groups with vgchange, as S.K. outlines. Then you can mount all filesystems with the mount -a command. This will look in /ect/fstab and mount all filesystem that are there. If a file system is already mounted, like /stand or / then it will just go on and mount the rest.

To use the dd command (man dd) let's say the suspect disk is c0t1d0 then you would:

dd if=/dev/dsk/c0t1d0 of=/dev/null count=2000

you can increase the count number, that is how many blocks of data it reads and just dumps the data to /dev/null which is 'nowhere'. Be carefull of the "if=" and "of=" as they are input and output respectively. You don't want to write to the disk, just read from it.
If it has wheels or a skirt, you can't afford it.
kevin donnelly
Advisor

Re: hlep in diagnosing a possible disk failure

Ok I am making progress with all your help.

I can get everything mounted except the file systems on vg02.

The drive I suspected was bad just hangs when I run diskinfo command.

I was not able to break out. What breaks out of a command when in single user mode? I tried ctrl-c,x,d,backspace, etc...

When I get the box back in single user mode I will try the cstm command and see what it says.

It looks like I need to get a new disk.

What will be the steps to get the box to boot without trying to mount vg02? That vg is the only one effected by the bad disk.

kevin donnelly
Advisor

Re: hlep in diagnosing a possible disk failure

Actually the problem volume group is 01.

How do I change the boot process so that vg01 no longer is activated at boot time? It is apparently hanging on the vgchange -a y vg01 command.

Once I get the new drive I can recreate the vg01 and restore the effected file systems from backup, hopefully.
S.K. Chan
Honored Contributor

Re: hlep in diagnosing a possible disk failure

If you're saying you're willing to blow away vg01 (since you're going to recreate and reinstall the data) then all you have to run is (still in single user mode) and with vg01 deactivated.
# /sbin/vgexport /dev/vg01
That should delete the entry from /etc/lvmtab file and all it's device files. I assume you know how to create a new vg01 later on when the new disk is installed and everything is online ?
John Poff
Honored Contributor

Re: hlep in diagnosing a possible disk failure

Hi,

You probably should also comment out the entries for vg01 in your /etc/fstab file.

JP
kevin donnelly
Advisor

Re: hlep in diagnosing a possible disk failure

It boots fine again.

Thanks everyone.

Now I just need to get a new disk and do some restoring.