Showing results for 
Search instead for 
Do you mean 

TruCluster Crash

Occasional Contributor

TruCluster Crash

Our computer(DS25) happened TruCluster Crash, and one was down and the other was hang. And in console mode, we typed carsh command to collect crash file, but it didn't success. Then we rebooted computer, and cat /var/adm/messages as follows:
Sep 5 07:58:42 f1phdb1 vmunix: rm_get_errcnt_lock failed on mchan0 ret 2. Clear and restart
Sep 5 07:58:42 f1phdb1 last message repeated 5 times
Sep 5 07:58:42 f1phdb1 vmunix: rmerror_int: mchan0 double failure
Sep 5 07:58:42 f1phdb1 vmunix: rm_write_sync: mchan1 , resets = 2, fatal_nodes = 0x3
Sep 5 07:58:42 f1phdb1 vmunix: rm_crash_nodes: reason: rm_lock_global_error: multiple errors code: 1
Sep 5 07:58:42 f1phdb1 vmunix: rm_crash_nodes: caller = 0xffffffff0071c4d0, nodes_to_crash = 0x1, time = 0x36c8444fcb179
Sep 5 07:58:42 f1phdb1 vmunix: panic (cpu 0): rm_lock_global_error: multiple errors
Sep 5 07:58:42 f1phdb1 vmunix: rm_state_change: mchan0 slot 1 offline.

The last patch session was

- T64KIT0022807-V51BB25-ES-20040630 OSF540
- T64KIT0023170-V51BB25-ES-20040803 OSF540
- T64KIT0023187-V51BB25-E-20040804 OSF540
- T64KIT0023584-V51BB25-E-20040913 OSF540
- T64KIT0024050-V51BB25-ES-20041026 OSF540
- T64KIT0024267-V51BB25-E-20041122 OSF540
- T64KIT0024453-V51BB25-E-20041212 OSF540
- T64KIT0025601-V51BB26-E-20050513 OSF540
- T64KIT0026246-V51BB26-E-20050819 OSF540
- T64KIT0026391-V51BB26-E-20050908 OSF540
- T64KIT0026411-V51BB26-ES-20050912 OSF540
- T64KIT0026447-V51BB26-ES-20050914 OSF540
- T64KIT1000108-V51BB26-E-20051116 OSF540
- T64KIT1000295-V51BB26-E-20060112 OSF540
- T64KIT1000300-V51BB26-E-20060117 OSF540
- T64V51BB25AS0004-20040616 OSF540
- T64V51BB25AS0004-20040616 TCR540
- T64V51BB26AS0005-20050502 OSF540
- T64V51BB26AS0005-20050502 TCR540
- TCRKIT0024182-V51BB25-20041110 TCR540
- TCRKIT0024683-V51BB25-E-20050116 TCR540
- TCRKIT1000351-V51BB26-20060213 OSF540
- TCRKIT1000351-V51BB26-20060213 TCR540
May you offer us the solution to solve this problem.(Why they were down together?)

Eric
1 REPLY
Honored Contributor

Re: TruCluster Crash

Looks like one of your memory channel cards (or the cable) is having a problem. Bring both systems down and at the P00>>> prompt execute the 'mc_diag' and 'mc_cable' commands. Note: 'mc_cable' needs to run simultaneously on both systems to produce meaningful results.
//Add this to "OnDomLoad" event