System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64 cluster failed to boot -- is rootvol corrupted?

Sinan Alyuruk
Occasional Visitor

Tru64 cluster failed to boot -- is rootvol corrupted?

Hello,

The both nodes failed to boot, giving an error message at the boot sequence:


UNIX boot - Wednesday October 16, 2002

Loading vmunix ...
Loading at 0xfffffc0000230000

Sizes:
text = 9099968
data = 1949056
bss = 4036656
Starting at 0xfffffc0000243e20

Loading vmunix symbol table ... [2305872 bytes]
DUMP: A dump found in memory will tie up 5914624 bytes until released.
Memory trolling not supported, cpu Major id 13, Minor id 4
Alpha boot: available memory from 0x2f30000 to 0x3ff44000
Compaq Tru64 UNIX V5.1B (Rev. 2650); Fri Jun 27 18:10:32 EEST 2003
physical memory = 1024.00 megabytes.
available memory = 976.07 megabytes.
using 3856 buffers containing 30.12 megabytes of memory
Firmware revision: 6.6-19
PALcode: UNIX version 1.92-105
AlphaServer ES40

.
[long boot message deleted from post]
.

Beginning Adapter/Chip reinitialization (0x1)
cam_logger: SCSI event packet
cam_logger: bus 1
isp_cam_bus_reset_tmo
SCSI Bus Reset performed
clsm: checking for peer configurations
clsm: initialized
clsm: loading root configuration
clsm: started volume cluster_rootvol
clsm: root configuration loaded
Waiting for cluster mount to complete
panic (cpu 0): malloc: invalid size
syncing disks... done
drd: Not Clean Shutdown

DUMP: Warning: no disk available for dump.

DUMP: first crash dump failed: attempting memory dump...
DUMP: compressing 102256KB into 925503KB memory...
DUMP: Starting Address Ending Address Size(MB)
DUMP: ------------------ ------------------ --------
DUMP: 0xfffffc003fb44000 - 0xfffffc003ff43fef 4.0 (indicator)
DUMP: Writing data...... [6MB]
DUMP: crash dump complete.

P00>>>
P00>>>show dev
dka100.1.0.5.1 DKA100 COMPAQ BF01885A34 HPB3
dkb0.0.0.3.0 DKB0 COMPAQ BD0366459B B016
dkb100.1.0.3.0 DKB100 COMPAQ BD0366459B B016
dkb200.2.0.3.0 DKB200 COMPAQ BD0366459B B016
dkc0.0.0.4.0 DKC0 COMPAQ BD0366459B B016
dkc100.1.0.4.0 DKC100 COMPAQ BD0366459B B016
dkc200.2.0.4.0 DKC200 COMPAQ BD0366459B B016
dkc300.3.0.4.0 DKC300 RZ1CB-CS 0844
dkc400.4.0.4.0 DKC400 RZ1CB-CS 0844
dkc500.5.0.4.0 DKC500 COMPAQ BB00911CA0 3B05
dqa0.0.0.15.0 DQA0 HL-DT-ST CD-ROM GCR-8480 2.11
dva0.0.0.1000.0 DVA0
eia0.0.0.2004.1 EIA0 00-0B-CD-4B-0B-74
eib0.0.0.2005.1 EIB0 00-0B-CD-4B-0B-75
eic0.0.0.2004.0 EIC0 00-0B-CD-4B-0B-0E
eid0.0.0.2005.0 EID0 00-0B-CD-4B-0B-0F
pka0.7.0.5.1 PKA0 SCSI Bus ID 7
pkb0.7.0.3.0 PKB0 SCSI Bus ID 7 5.57
pkc0.7.0.4.0 PKC0 SCSI Bus ID 7 5.57

P00>>>show bootdef_dev
bootdef_dev dka100.1.0.5.1

Is this a scsi problem? Or a corrupted fs?

Thanks in advance..

5 REPLIES
Venkatesh BL
Honored Contributor

Re: Tru64 cluster failed to boot -- is rootvol corrupted?

The panic string says: "panic (cpu 0): malloc: invalid size"...so, it may not be a scsi or fs problem per se.

From when you started seeing this? What changed in the cluster?

I suggest you raise a service call with Tru64 support to get the problem analysed thoroughly.
Kapil Jha
Honored Contributor

Re: Tru64 cluster failed to boot -- is rootvol corrupted?

Does both nodes gives the same error??
did you try to halt one box and then start other one.

BR,
Kapil+
I am in this small bowl, I wane see the real world......
Sinan Alyuruk
Occasional Visitor

Re: Tru64 cluster failed to boot -- is rootvol corrupted?

The system suffered a power outage

I have tried shutting one node and booting the other respectively

Both fails at the same point trying to mount cluster_rootvol.

Rob Leadbeater
Honored Contributor

Re: Tru64 cluster failed to boot -- is rootvol corrupted?

Hi,

What is your quorum disk for the cluster, and what's the hardware layout of the storge ?

Looking at your "show dev" output, I can only see locally attached disks. Whilst some of these might be shared SCSI devices, I'm wondering whether you may have lost some SAN connectivity somewhere during the power outage.

Hope this helps,

Regards,

Rob
Sinan Alyuruk
Occasional Visitor

Re: Tru64 cluster failed to boot -- is rootvol corrupted?

Hi,

There is no SAN at the system, both nodes connect to LSM via shared SCSI cables. (Weird Y-cable)

We tried booting from cdrom and re-initialize the lsm partitions. This way It booted. Now I am turning from backups of / file system.

Problem seems to be occurred from a corrupted volume..