TruCluster
Showing results for 
Search instead for 
Do you mean 

Issue in shared scsi bus cluster

Occasional Visitor

Issue in shared scsi bus cluster

Expect you experties explanation for the following senario.

We have a trucluster consist of two DS25 & two 4314R external cages. Each server has 2nos of two port scsi cards. One of the servers uses Y cables to connect the other server & 4314R. Refer the diagram attached.

Mistakenly connection in c1, p1-1, channel A connected to channel B. And cluster booted. server crashed. THen connection corrected but server crashed againg giving same result. Hope you people can explain this situation for my knowledge.
7 REPLIES
Honored Contributor

Re: Issue in shared scsi bus cluster

This is just a theory:

A wrong SCSI connection can cause a problem with SCSI reservations.

SCSI reservations may prevent a node from accesing the disks.

If the SCSI reservations are not cleared, you could try shuting down the storage, or as last resource, use the cleanPR command.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Honored Contributor

Re: Issue in shared scsi bus cluster

Where the systems down when reconfigured?

Can you still boot one of the two nodes?

What messages are reported when it crashes?

Is the storage now properly connected? How about SHOW CONFIG at the SRM CONSOLE prompt (P00>>>) from the both boxes.

hth,
Hein.
Occasional Visitor

Re: Issue in shared scsi bus cluster

Hi Ivan,

Ok, scsi ID confilict I can understand as it is a shared scsi bus. But why it get crashed
after connecting to the correct port. And elaborate use of cleanPR command
Thax
Occasional Visitor

Re: Issue in shared scsi bus cluster

Hi Hein,

At the SRM prompt in both the servers I can see all 6 drived in ext. bay 1 & bay 2 once scsi channel connected properly. what not clear is why it get crashed permanatly with the wrong scsi connection. I have put crash out put of both servers for your analysis.

C1 out put at the crash:

Initializing CFSREC ICS Service
Registering CFSMSFS remote syscall interface
Registering CMS Services
CNX QDISK: Adding 1 quorum disk vote toward formation.
CNX MGR: Cluster sdp1sdp2 incarnation 0xabc25 has been formed
CNX MGR: Founding node id is 1 csid is 0x10001
CNX MGR: membership configuration index: 1 (1 additions, 0 removals)
CNX MGR: Node sdp1 1 incarn 0xabc25 csid 0x10001 has been added to the cluster
dlm: resuming lock activity
kch: resuming activity
CNX QDISK: Successfully claimed quorum disk, adding 1 vote.
CNX MGR: quorum (re)gained, (re)starting cluster operations.
Joining versw kch set.
clsm: checking for peer configurations
clsm: initialized
clsm: loading root configuration
clsm: started volume cluster_rootvol
clsm: root configuration loaded
Waiting for cluster mount to complete
ADVFS EXCEPTION
Module = ../../../../src/kernel/msfs/osf/msfs_io.c, Line = 1177
domain panic promoted because the domain is the cluster root: Found bad xor in sbm_total_free_space! Corrupt
file!
panic (cpu 1): domain panic promoted because the domain is the cluster root: Found bad xor in sbm_total_free_
ed SBM metadata file!
syncing disks... done
drd: Clean Shutdown

DUMP: Warning: no disk available for dump.
bcm0: Link down

DUMP: first crash dump failed: attempting memory dump...
DUMP: compressing 1049408KB into 15285279KB memory...
DUMP: Starting Address Ending Address Size(MB)
DUMP: ------------------ ------------------ --------
DUMP: 0xfffffc03feec4000 - 0xfffffc03fffedfef 17.1 (indicator)
DUMP: Writing data................................................. [49MB]
DUMP: crash dump complete.
halted CPU 1

halted CPU 0

halt code = 5
HALT instruction executed
PC = fffffc0000aaa030


#######################################
C2 crach output:

ics_ll_tcp: cluster network interface started: rendezvous port is 900
ics_tcp_init: Declaring this_node up 2
icsnet: configured
drd configured 0
kch: configured
dlm: configured
Starting CFS daemons
Registering CFS Services
Initializing CFSREC ICS Service
Registering CFSMSFS remote syscall interface
Registering CMS Services
CNX QDISK: Adding 1 quorum disk vote toward formation.
CNX MGR: Cluster sdp1sdp2 incarnation 0x9c05f has been formed
CNX MGR: Founding node id is 2 csid is 0x10001
CNX MGR: membership configuration index: 1 (1 additions, 0 removals)
CNX MGR: Node sdp2 2 incarn 0x9c05f csid 0x10001 has been added to the cluster
dlm: resuming lock activity
kch: resuming activity
CNX QDISK: Successfully claimed quorum disk, adding 1 vote.
CNX MGR: quorum (re)gained, (re)starting cluster operations.
Joining versw kch set.
clsm: checking for peer configurations
clsm: initialized
clsm: loading root configuration
clsm: started volume cluster_rootvol
clsm: root configuration loaded
Waiting for cluster mount to complete
ADVFS EXCEPTION
Module = ../../../../src/kernel/msfs/osf/msfs_io.c, Line = 1177
domain panic promoted because the domain is the cluster root: Found bad xor in sbm_total_free_space! Corrupted SBM metadata file!
panic (cpu 1): domain panic promoted because the domain is the cluster root: Found bad xor in sbm_total_free_space! Corrupted SBM metadata file!
syncing disks... done
drd: Clean Shutdown

DUMP: Warning: no disk available for dump.
bcm0: Link down

DUMP: first crash dump failed: attempting memory dump...
DUMP: compressing 1049408KB into 15283399KB memory...
DUMP: Starting Address Ending Address Size(MB)
DUMP: ------------------ ------------------ --------
DUMP: 0xfffffc03feec6000 - 0xfffffc03fffedfef 17.1 (indicator)
DUMP: Writing data........................
DS25 RMC V1.2
RMC>halt in

Honored Contributor

Re: Issue in shared scsi bus cluster

Hello
Looks like the advfs domain of your cluster root may be corrupted.
You can try to boot single user mode (>>> boot -flag 0) and sent output.
Sollution can be the following:
1. Boot first member from boot cd
2. Create fdmns links (if you need detailed instruction, no problem)
3. Use /sbin/advfs/verify utility to check the integrity of the domain
Repair the domain (do not delete or recreate). If repair not possible, resore from backup. It should work, because hardware configuration is original.
In vino veritas, in VMS cluster
Honored Contributor

Re: Issue in shared scsi bus cluster

Are you booting from the right member boot disks? Be sure that you don't try to boot directly from the shared root filesystem.

I agree that you may need to restore the root filesystem from a backup, to do that, boot with the original S.O. installation used to create the cluster, recreate the root_domain and restore the backup.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Honored Contributor

Re: Issue in shared scsi bus cluster

I do not think it is possible that you are trying to boot from shared root. But I do agree that best way is to boot installation OS and repair or restore cluster root domain. If you do not have installation OS any more, must boot from cd.
In vino veritas, in VMS cluster
//Add this to "OnDomLoad" event