System Panic

 
이정환_2
신규 회원

System Panic

도움을 요청합니다.

AS 신청하기 전에 혹시 빠른 해결 방법이 있을까 해서 질문을 드립니다.



현재 상태는 서버가 부팅이 안되거든요..

System Panic 상태가 되면서 계속 리부팅만 하고 있습니다.

"all VFS_MOUNTROOTs failed" 이 나면서요...

해결방법이 있으시면 알려주시기 바랍니다.





Maching 모델 : rx5670 (Itanium2 Processor)

OS : 11.23



에러메시지 :



Starting the STREAMS daemons-phase 1



System Panic:



panic: all VFS_MOUNTROOTs failed: NEED DRIVERS ?????

Stack Trace:

IP Function Name

0xe0000000010b2410 DoCalllist+0x3a0

End of Stack Trace



linkstamp: Fri Jan 30 02:24:39 EST 2004

_release_version: @(#) $Revision: vmunix: B11.23_LR FLAVOR=perf Fri Aug 29

22:35:38 PDT 2003 $



sync'ing disks (0 buffers to flush): (0 buffers to flush):

0 buffers not flushed

0 buffers still dirty

crash_callback: calling function e000000000c2b0e0





*** A system crash has occurred. (See the above messages for details.)

*** The system is now preparing to dump physical memory to disk, for use

*** in debugging the crash.



1 응답 1
김병수
본과생

System Panic

글세요,...뭐라 말씀을 드리기가 힘드네요..

itrc에 관련자료를 올립니다.



참조하시기 바랍니다.



Various Causes And Resolution Of VFS_MOUNTROOTs panic DocId: KBAN00000228 Updated: 20040408



DOCUMENT

There have been a number of calls generated by the following panic:



panic: (display==0xb800, flags==0x0) all VFS_MOUNTROOTs failed:

NEED DRIVERS ?????



The basic cause of the panic is that the root filesystem could not be

mounted. However, the panic message itself does not give any real indication

on why the the root filesystem failed to mount. It is most often NOT caused

by missing any drivers. Below is a summary of possible causes:



(1) Bad contents for Autoboot file

(2) Bad or old data in /stand/rootconf using JFS as root filesystem

(3) Corrupt LVM Boot Data Reserved Area (BDRA)

(4) Missing /dev/console on 10.20 with JFS for root filesystem

(5) Root filesystem is corrupted beyond repair

(6) Missing driver (or Bad/Corrupted kernel)

(7) Known problems



These panics have become more common on 10.20 as customers move the root

filesystem to JFS. Also, these panics have become evident on 11.0 when

new kernels are regenned via SAM without the patch PHCO_17792.



Below are some causes and suggested fixes for fixing VFS_MOUNTROOTs

panics. Many of the suggestions using hpux commands assume

you can either boot off the original disk, a different boot disk,

or the Support CD.



NOTE: for 11.X the support CD is the same as the Install CD.



(1) Bad contents for Autoboot file

------------------------------



Many users try to get fancy when specifying their Autoboot file. Often

they add a hardware path in the Autoboot string. This causes problems if

the disk device is moved to a different hardware address, or if we use

the wrong hardware path.



Reset the autoexecute file contents to just hpux as follows:



ISL> hpux set autofile "hpux"



Then reboot using the following command:



ISL> hpux



It is not necessary to specify the h/w path of the boot device. It will

and should always be the device which was booted from (assuming 10.01 or

later using LVM). By default, it will boot /stand/vmunix as well.



To verify the contents of the Autoboot file, you can do the

following:



# lifcp /dev/rdsk/cXtYdZ:AUTO /tmp/AUTO

# cat /tmp/AUTO (to see the contents)



OR simply type the following



# lifcp /dev/rdsk/cXtYdZ:AUTO -



To modify the context:



# mkboot -a "hpux" /dev/rdsk/cXtYdZ



where X = controller ID, Y = target ID and Z = lun ID



(2) Bad or old data in /stand/rootconf using JFS as root filesystem

---------------------------------------------------------------



System will usually boot in multiuser mode but fails to boot in

maintenance mode.



Typically, this occurs when expanding the /stand or the swap lvol when

using JFS for the root filesystem. The only way to expand root, boot, or

primary swap is to create another boot disk and copy the contents of the

old boot disk to the new. If the starting disk block on the root lvol is

different than on the original disk, then the panic can occur.



/stand/rootconf is a binary file that keeps the starting disk block and

size of the root filesystem. This file can be examined using "xd":



# xd /stand/rootconf

0000000 dead beef 0003 cb60 0001 5000

^^^^^^^^^ ^^^^^^^^^

\ \---- size in KB

\------------ Starting block



So in the above example, the root lvol starts at disk block address

0x3cb60 for 0x15000 blocks. To confirm this looks good, you can use adb

to dump the filesystem superblock:



# adb /dev/dsk/cXtYdZ

3cb60*400+2000?X

0xF2DA000: 0xA501FCF5

^^^^^^^^^^---- JFS Version 3 Magic Number



Where 0x400 is the size of a disk block (1K), 0x2000 is the offset into

the lvol of the superblock (8K), and 0x3cb60 is the block address found

in /stand/rootconf.



If the JFS Magic Number does not match, either the filesystem is corrupt

or /stand/rootconf is bad/old. To recreate the rootconf file, do the

following



a) if the system is already up then run the command



# lvlnboot -c



b) if the system is down, then you need to boot off support CD,

run recovery shell, and then exit to the shell and run



# chroot_lvmdisk

(confirm the boot disk path and let it fsck the root and stand).



The system will not be able to distinguish between the root

filesystem (/) and /stand filesystem. And so both

/dev/dsk/cXtYdZs1lvm and /dev/dsk/cXtYdZs2lvm will both point

to /ROOT/stand, so you need to only mount /dev/dsk/cXtYdZs1lvm to

/stand and then restore the /stand/rootconf from backup tape

provided there's one using the recovery part of the backup

command previously used to back it up.



Having restored the /stand/rootconf you can now unmount /ROOT/stand

and /ROOT and then fsck /dev/rdsk/cXtYdZs2lvm and

/dev/rdsk/cXtYdZs1lvm and then mount them to /ROOT and /ROOT/stand

respectively.



c) Re-create the file manually as described below.



For this you'll need to know how big the root filesystem is, how large

an extent is (usually 4MB) and what the starting extent is. I'll

describe an additional process to figure this out should you not know

this.



1. Rebuilding the rootconf where you know:

* start block of physical extent zero

* starting physical extent number

* size of the root lvol



Let's say that the root lvol starts at PE 36 and that the root lvol is

21 4Mb extents in size. And that the PV which holds the root LV

is bootable (was pvcreate'd -b when initially setup).



Therefore the starting block of the LV is:

4096 * 36 + 2912 = 150368 = 0x24b60

^^^^ ^^ ^^^^

block PE LIF

size



To convert decimal to hex:

echo '0d150368=X' | adb

24B60



The size of the LV is

4096 * 21 = 86016 = 0x15000

^^^^ ^^

block extents



echo '0d86016=X' | adb

15000



This means our rootconf needs to hold 3 words which xd(1) would

show as



0000000 dead beef 0002 4b60 0001 5000

000000c





To check a /stand/rootconf on your system enter:

xd /stand/rootconf





(dead beef is the magic identifier for all rootconf files).

We're going to use echo(1) to build this file and therefore we

need to convert each byte into octal.

Hex Oct

de 336

ad 255

be 276

ef 357

00 000

02 002

4b 113

60 140

00 000

01 001

50 120

00 000





To convert hex to octal

echo '0x4b=O' | adb

0113

Then do

echo "\0336\0255\0276\0357\0000\0002\0113\0140\0000\0001\0120\0000\c" >

rootconf

where the octal values are preppended with a backslash and we trail

the echo command with a \c.

Each field must be four digits.





2. Finding out the values for the root lvol.

For this you'll need to load the lvm program onto their system. You

should be able to do this via DDS. (The compressed lvm program is

obtainable from hppine34.uksr.hp.com:~ftp/pub/lvm.Z).



Their is a copy of "lvm" at the ftp site,

ftp://contrib:9unsupp8@hprc.external.hp.com/crash/lvm



(a) Look at primary lvmrec.

lvm -ld /dev/rdsk/c#t#d#



You need the three lines:

/* Physical Volume Number in VG */ 0

/* Size of each physical extent.*/ 22 bytes

/* The start of the user data. */ 2912



The "Size of each physical extent" is the power of 2 for the PE size.

i.e. a value of 22 means 4Mb.

The "The start of the user data" is in Kb blocks (2912 for a bootable

PV).







(b) Look at primary BDRA

lvm -PBd /dev/rdsk/c#t#d#

You need the line:

/* Root LV */ 64 0x3

It will display 64 0xn where n is the LV number for the root LV.







(c) Look at the primary VGDA

lvm -PVd /dev/rdsk/c#t#d#

You need the line for "LV entry n-1". i.e. if the root LV is 3 then

you need "LV entry 2". It will display something like:

LV entry 2 offset within VGDA 1056

/* Maximum size of the LV. */21

/* Logical volume flags. */

LVM_LVDEFINED LVM_NORELOC LVM_STRICT LVM_CONSISTENCY

LVM_CLEAN LVM_CONTIGUO US

/* The scheduling strategy. */ LVM_PARALLEL

/* The maximum number of mirrors. */0

So the LV is "Maximum size of the LV." extents in size.



You also need look at the PE layout for the "Physical Volume Number in

VG" you obtained in (a). e.g. If the "Physical Volume Number in VG"

was zero then you need to look at the entries which start:

PV header 0

Look down the PX to LX map which follows this for LE 0 for

the root LV. e.g. say root LV is 3.



PX = PE

LX = LE



Sample:

PX LV LX | PX LV LX | PX LV LX | PX LV LX

0 1 0 | 1 1 1 | 2 1 2 | 3 1 3

4 1 4 | 5 1 5 | 6 1 6 | 7 1 7

8 1 8 | 9 1 9 | 10 1 10 | 11 1 11

12 2 0 | 13 2 1 | 14 2 2 | 15 2 3

16 2 4 | 17 2 5 | 18 2 6 | 19 2 7

20 2 8 | 21 2 9 | 22 2 10 | 23 2 11

24 2 12 | 25 2 13 | 26 2 14 | 27 2 15

28 2 16 | 29 2 17 | 30 2 18 | 31 2 19

32 2 20 | 33 2 21 | 34 2 22 | 35 2 23

36 <-> 3 <-> 0 | 37 3 1 | 38 3 2 | 39 3 3

40 3 4 | 41 3 5 | 42 3 6 | 43 3 7

44 3 8 | 45 3 9 | 46 3 10 | 47 3 11



So we have PE 36 is LV 3 LE 0





Now we have all the information and we can rebuild the rootconf.

Go back up to the top and build the rootconf file.





(3) Corrupt LVM Boot Data Reserved Area (BDRA)

------------------------------------------



Typically, if the BDRA is corrupt, the system will still boot in

maintenance mode, but fail to boot in single-user or multi-user mode. To

verify or check the BDRA, you will need to boot in maintenance mode or

off the Support CD.



# lvlnboot -v /dev/



Check the the Boot, Root, Swap, and Dump areas. Use lvlnboot to add any

missing areas. One common problem is the missing Boot lvol. If all else

fails, try rebuilding the entire LIF and BDRA.



1) Assuming separate root and boot logical volumes.



# lvrmboot -r /dev/

# mkboot /dev/rdsk/cXtYdZ

# mkboot -a "hpux" /dev/rdsk/cXtYdZ

# lvlnboot -b /dev//

# lvlnboot -r /dev//

# lvlnboot -s /dev//

# lvlnboot -d /dev//

# lvlnboot -R /dev/

# lvlnboot -v /dev/



2) Assuming same root/boot logical volumes.



# lvrmboot -r /dev/

# mkboot /dev/rdsk/cXtYdZ

# mkboot -a "hpux" /dev/rdsk/cXtYdZ

# lvlnboot -r /dev//

# lvlnboot -s /dev//

# lvlnboot -d /dev//

# lvlnboot -R /dev/

# lvlnboot -v /dev/





(4) Missing /dev/console on 10.20 with JFS for root filesystem

----------------------------------------------------------



Odd as it may seem, if /dev/console is missing and the system takes a

hard boot (panic, TC, powerfail, etc), the system will not boot in either

maintenance mode or multi-user mode.



When the system boots, main() calls vfs_mountroot to mount the filesystem

in read-only mode. Then it runs pre_init_rc to fsck the root filesystem.

However, if /dev/console is missing, the pre_init_rc fails and the root

filesystem is not fsck'ed. The subsequent vfs_mountroot for read-write

access will cause the panic since the filesystem is still marked 'dirty'.



To resolve the problem, you must boot from the support CD. You can "exit

to a shell" and run chroot_lvmdisk. This will fsck the root filesystem

and possibly allow the system to boot. However, you should check for the

following device files and create them if they do not exist:



# mknod /dev/systty c 0 0x000000

# mknod /dev/console c 0 0x000000

# mknod /dev/tty c 207 0x000000

# ln /dev/systty /dev/syscon



(5) Root filesystem is corrupted beyond repair

------------------------------------------



If the root filesystem is totally corrupted, the vfs_mountroot() call to

mount the filesystem in read-only mode will fail. If you boot from the

support tape/CD and try to fsck the filesystem, it may indicate that there

is a bad superblock. In some cases, using an alternate superblock location

may help fix the problem, but it's always advisable to have the boot

disk diagnosed by a CE



For an hfs root filesystem, you may try the first alternate superblock

location 16, and if that doesn't fix the problem then don't try the

rest, you may want to consult the CE to replace the boot drive.



# /sbin/fs/hfs/fsck -b 16 /dev/rdsk/cXtYdZs1lvm

(ie if the root and boot logical volumes are the same)



OR



# /sbin/fs/hfs/fsck -b 16 /dev/rdsk/cXtYdZs2lvm

(ie if the root and boot logical volumes are separate)



Now if you were able to fix the root filesystem under the circumstance

cited above, then you should also go ahead and fix the BDRA. See above

for the details on fixing/updating the BDRA.



Finding the root cause is difficult. It could be that someone did a tar

backup to /dev/vg00/lvol3, or a dd over the drive. If fsck from the

support media cannot fix the root filesystem, the customer will have to

install.



If the customer must know the root cause, it is best to replace the root

disk and leave the corrupted root disk for analysis. Then perform an

install on the new root disk. We cannot guarantee that the root cause

will be identified, but it will help.



(6) Missing driver (or Bad/Corrupted kernel)

----------------------------------------



This is rare, but the problem could actually be caused by a driver, such

as vxbase, hfs, sdisk, disc3, etc. How this driver got deleted is a

mystery for 10.X. However, on 11.0 the only known missing driver issue is

usually caused by SAM without the patch PHCO_17792 when SAM is used to

regen a new kernel after installing the patch PHKL_18543.

You can try to boot off the backup kernel or a known good kernel and

regen a new one:



ISL> hpux /stand/vmunix.prev



# cd /stand/build

# /usr/lbin/sysadm/system_prep -v -k -s system

# /usr/sbin/mk_kernel -v -s system

# mv /stand/build/system /stand/system

# cd /

For 10.X, # mv /stand/build/vmunix /stand/vmunix

For 11.X, # kmupdate /stand/build/vmunix_test

# shutdown -r -y 0



(for 11.X systems, you can simply install SAM patch PHCO_17792 and

regen a new kernel.)



Or, boot off the support CD (NOTE: for 11.X systems, it's the Install CD)

and examine /stand/system for any missing drivers and install all of them

using the following insf commands:



# insf

# insf -e



And if possible, replace/restore the /stand/system and /stand/vmunix from

your backup tape.



(7) Known problems

--------------



Check patches for known problems that have been resolved. For

example:



- Large number of NIO devices



Fixed with the following patches:



10.01 s800 - PHKL_12566

10.10 s800 - PHKL_12567

10.20 s800 - PHKL_12568

10.30 s800 - PHKL_12569



- Fails to boot in Maintenance mode if BDRA is missing



10.01 s700 - PHKL_17447

10.01 s800 - PHKL_17448



- Boot disk is having I/O problems



11.00 s700/800 - PHKL_20333



- Problem with lvlnboot with separate root/boot lvols



10.20 s700/800 - PHCO_18563 and its dependencies



- JFS patches. Be sure customer is up to date on the latest JFS

patches for their respective version of HP-UX if they are using JFS

for the root filesystem. Specifically look for any patches that

affect the vx_mount.o module, or any patch that may cause a

filesystem to fail. One example of a necessary patch is:



10.20 s800 - PHKL_16751 and its dependencies

10.20 s700 - PHKL_16750 and its dependencies

11.00 s700_800 - PHKL_18543 and its dependencies





제생각엔 kernel에 문제가 있어 보입니다.

혹시 이전의 kernel로 booting하여 보시기 바랍니다.

( /stand/vmunix.prev )