TruCluster
Showing results for 
Search instead for 
Do you mean 

Registering CMS Services failed.

SOLVED
Go to Solution
Occasional Visitor

Registering CMS Services failed.

We have a two nodes (ES 40, Tru64 UNIX V5.1A (Rev. 1885)) cluster with memory channel interconnected. For some p-chip error warnings services were migrated to the other server and reseat the PCI cards of the second node. Then the P-chip error get corrected and once the OS booting it is stucked at the following point.

Registering CMS Services
and shows the following error
********************************************
rm_prail_boot_am_i_alone: node at hubslot 0 isn't responding

rm_crash_nodes: reason: code: 1

rm_crash_nodes: caller = 0xfffffc0000898e14, nodes_to_crash = 0x1, time = 0xc4cb000000074

rmerror_kill_i: node = 0, caller 0xfffffc0000897df0

rm_state_change: mchan0 slot 0 offline


************************************
And the running node also failed
I've analyzed the binary errorlog file and found the following errors in running node which was failed finally.

***************************************
Problem Found: Memory Channel Link Transmit Error Thu 4 Aug 2005 03:42:33 GMT+06:00

Problem Found: Memory Channel Transmit Too
Long Timeout.
*******************************************

Is this a problem with MCA seating or some software problem?

/hsr
5 REPLIES
Honored Contributor

Re: Registering CMS Services failed.

The first thing you should do is verify that the memory channel is working at hardware level.

With both nodes down, at SRM console use the

mc_diag
mc_cable

Commands to verify that the memory channel adapters are working and can communicate.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Honored Contributor

Re: Registering CMS Services failed.


For the HW intervention, the MC-cable was probably disconnected ?
Is it correctly re-inserted, fixed with connector screws, or maybe a pin is bent ?

You can run >>>mc_cable from console, but >>>mc_diag requires that mc_diag also runs on the other member.

To make any judgement on software prob, we need to know patch-level, ERP's and CSP's installed (dupatch -track -type kit).
Anyway it was working before the HW intervention, so chances are low it's software.

__ Johan.

_JB_
Occasional Visitor

Re: Registering CMS Services failed.

Hi Ivan/Johan,
Thanks for the replies. I just did the reseating in node 2. Currently it is down as when this try to Register CMS services other node goes down. Furthermore this is a production environment and when services failed accidently we have to restart the services(Applications) in 14 other servers. I just want to know that if I issue those commands mc_diag and mc_cable in this node, will it be caused for the other node (node 1) to be failed.

/himira.
Occasional Visitor

Re: Registering CMS Services failed.

Hi Ivan/Johan,
Thanks for the replies. I just did the reseating in node 2. Currently it is down as when this try to Register CMS services other node goes down. Furthermore this is a production environment and when services failed accidently we have to restart the services(Applications) in 14 other servers. I just want to know that if I issue those commands mc_diag and mc_cable in this node, will it be caused for the other node (node 1) to be failed.

/hsr.
Honored Contributor

Re: Registering CMS Services failed.

The mc_diag you can run at any time, but the mc_cable requires that the the other node runs the mc_cable command too, or won't be useful.

Also, if you run the mc_cable with the other node up, the other node may hang (it happened to me once).
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
//Add this to "OnDomLoad" event