Integrity Servers
cancel
Showing results for 
Search instead for 
Did you mean: 

BL860c I2 machine check (VMS 8.4)

 
Adrian Graham_1
Regular Advisor

BL860c I2 machine check (VMS 8.4)

Folks,

 

We had a VMS blade reset on us this morning, complete bounce and it was back up before HP SIM noticed it had gone. However, EWA0 to all intents and purposes wasn't working despite VMS being happy with it and the Cisco switch seeing its MAC address on the correct port. DECnet traffic wasn't passing over, even after restarting DECnet.

 

I moved DECnet from EWA0 to EWC0 and everything sprang back into life, so I restarted the box again and everything settled down, even with DECnet back on EWA0. I'm confused.

 

Anyone seen something like this before? Out of 8 blades it's the first one to do that. I've attached the relevant system logs if anyone can decode them.

 

Cheers

 

Adrian

2 REPLIES 2
Robert_Jewell
Honored Contributor

Re: BL860c I2 machine check (VMS 8.4)

Well, the logs you posted confirm what you already know.  The system crashed with a machine check abort.  

 

What is needed to determine the cause is the output of the tombstone logs.  From the EFI Shell you can run 'errdump mca' to display this data.  Within HP-UX the logs are stored in /var/tombstones, but I am not sure about VMS.

 

This type of event is typically due to a hardware issue so its worth looking into.  You can post those logs here and perhaps someone from HP will decode them for you.  Otherwise, open a support call to have them analyzed.

 

BTW, the following event might indicate a memory problem: 

658   SFW  5,0,0,0    2  40901FB101E10502 0000000000000000 MEM_NON_OPTIMAL_CONFIG

 

...but then again, it may be due to how your DIMMs are configured.

 

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
Adrian Graham_1
Regular Advisor

Re: BL860c I2 machine check (VMS 8.4)

Hi Bob,

Thanks for that, I didn't see the memory message but that only adds to the oddness since VMS, the OA and the iLO all report 48gb with no issues. DIMMs are split evenly across both processors, 24gb each.

I'll have to wait until I can shut it down again to get at the EFI shell, it needs to be out of hours. I'm convinced it was hardware that caused this so I've logged it with support as well. I searched the whole [sys0...] tree for anything .log created or modified today but there's nothing you wouldn't expect. OPERATOR.LOG just has system entries.

Cheers

Adrian