Operating System - Tru64 Unix
1829403 Members
1305 Online
109991 Solutions
New Discussion

GS80 Rebooting frequently

 
subhashpakhare
Advisor

GS80 Rebooting frequently

Dear Guru's
I am having 2-node GS80 cluster out of that 1 node is rebooting frquently,i am posting /var/adm/messages file's message,Could any body point me what exactly is the problem.I am having tru64 5.1A,with 32GB RAM & 6CPU's.Message is showing all memory & CPU is allright,
an 3 09:17:27 prodsap1 vmunix: Machine check code = 0x100000202
Jan 3 09:17:27 prodsap1 vmunix: Ibox Status = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: Dcache Status = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: Cbox Address = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: Fill Syndrome 1 = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: Fill Syndrome 0 = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: Cbox Status = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: EV6 captured status of Bcache mode = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: EV6 Exception Address = ffffffff0065902c
Jan 3 09:17:27 prodsap1 vmunix: EV6 Interrupt Enablement and Current Processor mode = 0000007ee0000000
Jan 3 09:17:27 prodsap1 vmunix: EV6 Interrupt Summary Register = 0000002000000000
Jan 3 09:17:27 prodsap1 vmunix: EV6 TBmiss or Fault status = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: EV6 PAL Base Address = 0000000000020000
Jan 3 09:17:27 prodsap1 vmunix: EV6 Ibox control = fffffe000c306396
Jan 3 09:17:27 prodsap1 vmunix: EV6 Ibox Process_context = 0000000000000000
Jan 3 09:17:27 prodsap1 vmunix: panic (cpu 0): System Uncorrectable Machine Check
Jan 3 09:17:27 prodsap1 vmunix: syncing disks...
Jan 3 09:17:27 prodsap1 vmunix: DUMP: A dump found in memory will tie up 111501312 bytes until released.
Jan 3 09:17:27 prodsap1 vmunix: Alpha boot: available memory from 0x161d0000 to 0x13fffa0000
Jan 3 09:17:27 prodsap1 vmunix: Compaq Tru64 UNIX V5.1A (Rev. 1885); Thu Jun 19 12:46:36 EAT 2003
3 REPLIES 3
Ralf Puchner
Honored Contributor

Re: GS80 Rebooting frequently

First of all, what does "Message is showing all memory & CPU is allright" mean? If all is allright machine would not crash, right?

An uncorrectable machine check is typically a hardware problem (CPU/Memory/Cache). Please have a look into the binary.errlog. The fastest way to solve the problem is to open a call within the HP support center and provide binary.errlog and crash-data (if available). Messages file is not suitable to analyze the problem!

To analyze a machine check you need special programs only accessible HP internally.
Help() { FirstReadManual(urgently); Go_to_it;; }
Mohamed  K Ahmed
Trusted Contributor

Re: GS80 Rebooting frequently

You have a Panic reported by CPU0, so there is definitly something wrong with the machine.
You can either call HP support and let them check it, or you can try to troubleshoot the machine by swapping CPU board or memory board and see if it would happen again.

As a recommendation, you should go with the first option.

HTH

Mohamed
Pedro Albuquerque
Frequent Advisor

Re: GS80 Rebooting frequently

If you have the tool WEBES 4.2 installed you can try to run wsea to see detailed information of binary.errlog.