TruCluster
Showing results for 
Search instead for 
Do you mean 

panic (cpu 1): Cluster member crashed

Advisor

panic (cpu 1): Cluster member crashed

Hello All,
One of our trucluster member has crashed yesterday. It was happened last year also.
Logs shows panic (cpu 1) and memory channel .
Is this problem due to bad hardware?
can any one throw some lights on this?

Details are given bellow.

OS : Compaq Tru64 UNIX V5.1A and trucluster 5.1A.


/var/adm/messages from Crashed member :
=====================================
Jan 30 08:38:51 member1 vmunix: panic (cpu 1): ics_unable_to_make_progress: heartbeat checking blocked


/var/adm/messages from other member :
=====================================

Jan 29 09:52:16 member2 vmunix: rm_state_change: mchan0 slot 6 offline
Jan 29 09:52:49 member2 vmunix: rm_lrail_remove_node: logical_rail 0 hubslot 6
Jan 29 09:52:49 member2 vmunix: rm_state_change: mchan1 slot 6 offline
Jan 29 09:52:49 member2 vmunix: rm_lrail_remove_node: logical_rail 0 hubslot 6
Jan 29 09:52:49 member2 vmunix: CNX MGR: communication error detected for node 2
Jan 29 09:52:49 member2 vmunix: CNX MGR: delay 1 secs 0 usecs
Jan 29 09:52:49 member2 vmunix: CNX QDISK: Cluster transition, releasing claim to 1 quorum disk vote.
Jan 29 09:52:49 member2 vmunix: CNX MGR: quorum lost, suspending cluster operations.
Jan 29 09:52:50 member2 vmunix: kch: suspending activity
Jan 29 09:52:50 member2 vmunix: dlm: suspending lock activity
Jan 29 09:52:50 member2 vmunix: CNX MGR: Reconfig operation complete
Jan 29 09:52:50 member2 vmunix: CNX MGR: membership configuration index: 3 (2 additions, 1 removals)
Jan 29 09:52:50 member2 vmunix: ics_mct: Node 2 is now down
Jan 29 09:52:50 member2 vmunix: CNX MGR: Node bwga456 2 incarn 0x22310 csid 0x10001 has been removed from the cluster
Jan 29 09:52:50 member2 vmunix: CLSM Rebuild: starting...
Jan 29 09:52:50 member2 vmunix: dlm: resuming lock activity
Jan 29 09:52:50 member2 vmunix: kch: resuming activity
Jan 29 09:52:50 member2 vmunix: ipintr: IP addr 0.0.0.0 on ee0: access denied
Jan 29 09:52:50 member2 vmunix: CNX QDISK: Successfully claimed quorum disk, adding 1 vote.
Jan 29 09:52:50 member2 vmunix: CNX MGR: quorum (re)gained, (re)starting cluster operations.
Jan 29 09:52:50 member2 vmunix: clua: reconfiguring for member 2 down
Jan 29 09:52:50 member2 vmunix: CLSM Rebuild: initiated
Jan 29 09:52:50 member2 vmunix: CLSM Rebuild: completed
Jan 29 09:52:51 member2 vmunix: CLSM Rebuild: done.
Jan 29 09:52:51 member2 vmunix: ipintr: IP addr 0.0.0.0 on ee0: access denied
Jan 29 09:52:51 member2 vmunix: Recovering filesystem mounted at / to this node (member id 1)
Jan 29 09:52:51 member2 vmunix: Recovery to this node (member id 1) complete for filesystem mounted at /
Jan 29 09:52:51 member2 vmunix: chk_bf_quota: user quota underflow for user 1545 on fileset
Jan 29 09:52:51 member2 vmunix: chk_blk_quota: user quota underflow for user 1545 on fileset
Jan 29 09:52:51 member2 vmunix: chk_bf_quota: user quota underflow for user 1545 on fileset
Jan 29 09:52:51 member2 vmunix: Recovering filesystem mounted at /var to this node (member id 1)
Jan 29 09:52:51 member2 vmunix: Recovery to this node (member id 1) complete for filesystem mounted at /var
Jan 29 09:52:51 member2 vmunix: Recovering filesystem mounted at /usr to this node (member id 1)
Jan 29 09:52:51 member2 vmunix: ipintr: IP addr 0.0.0.0 on ee0: access denied
Jan 29 09:52:51 member2 vmunix: Recovery to this node (member id 1) complete for filesystem mounted at /usr
Jan 29 09:52:51 member2 vmunix: Recovering filesystem mounted at /admin_home to this node (member id 1)
Jan 29 09:52:51 member2 vmunix: Recovery to this node (member id 1) complete for filesystem mounted at /admin_home
Jan 29 09:52:51 member2 vmunix: Recovering filesystem mounted at /tools to this node (member id 1)
7 REPLIES
Honored Contributor

Re: panic (cpu 1): Cluster member crashed

You should call for hardware verification, you can also run mc_diag and mc_cable from console (with both servers down). These messages could be a consequence of the problem, not the cause, I mean, maybe the node crashed, and after that these messages where logged. This is correct because the node was not available.

It would help the crash-data file that was generated in the /var/adm/crash directory (the one that has yesterday date). Also, ensure that you have the latest patch kit.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Advisor

Re: panic (cpu 1): Cluster member crashed

Hello,

Thanks for your reply.

Only member1 was down. It was in boot prompt. I just entered "boot" .Now it is Up and running .

During this time I got errors for member2 on member1 message file.
Thare is no problem with member 2.

R there any patch for this(For server panic )?

Regards,
Shashi
Honored Contributor

Re: panic (cpu 1): Cluster member crashed

What patch kit is installed?
I think I had some problems simular to yours, but were fixed with PK6.
In vino veritas, in VMS cluster
Advisor

Re: panic (cpu 1): Cluster member crashed

Installed patch kits are ...


Patches installed on the system came from following patch kits:
--------------------------------------------------------------

- T64V51AB03AS0003-20020827 OSF520
- T64V51AB03AS0003-20020827 TCR520
- T64V51AB21AS0004-20030206 OSF520
- T64V51AB21AS0004-20030206 TCR520



Regards,
shashi
Advisor

Re: panic (cpu 1): Cluster member crashed

Installed patch kits are ...


Patches installed on the system came from following patch kits:
--------------------------------------------------------------

- T64V51AB03AS0003-20020827 OSF520
- T64V51AB03AS0003-20020827 TCR520
- T64V51AB21AS0004-20030206 OSF520
- T64V51AB21AS0004-20030206 TCR520



Does the patch kit 6 needs cluster reboot?

Regards,
shashi
Honored Contributor

Re: panic (cpu 1): Cluster member crashed

Looks like you don't have the latest patch kit for V5.1A (which is PK6). Can you try upgrading to it?
Respected Contributor

Re: panic (cpu 1): Cluster member crashed

shashi,

Indeed, you will have to reboot the cluster during patchkit installation.
You can choose between a "Rolling" or a "Non-Rolling" upgrade.

More info about this in the Installation Guide or Release Notes of the patchkit.

Joris
To err is human, but to really faul things up requires a computer
//Add this to "OnDomLoad" event