HPE 9000 and HPE e3000 Servers
1748117 Members
3714 Online
108758 Solutions
New Discussion юеВ

Re: N4000 HPMC and not support.... any help?

 
Denver Osborn
Honored Contributor

N4000 HPMC and not support.... any help?

I have a system that has rebooted a few times today and there's no dump and nothing logged to indicate the root cause. The ts99 is about all I've got to go off of.

If anyone has the right tools or experience parsing a ts99, could you help point me to a possible root cause?

thanks!
-denver
7 REPLIES 7
Tim Nelson
Honored Contributor

Re: N4000 HPMC and not support.... any help?

Once upon a time I had a number of N4000 that rebooted at random with no info whatsoever.

Firmware update fixed it.

If you have not done so in the last 5 years update both the MP and CPU firmware. It certainly is not going to hurt.
Steven E. Protter
Exalted Contributor

Re: N4000 HPMC and not support.... any help?

Shalom,

The console should show an HPMC?

When its running, is it missing a CPU? Maybe top can help tie this down.

I have seen this stuff resolved by firmware upgrades on other classes of servers. I agree with that approach.

But firmware won't fix a broken CPU.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Denver Osborn
Honored Contributor

Re: N4000 HPMC and not support.... any help?

Firmware is current, not sure what rev the MP is at...

The ts99 file (attached) showed no valid timestamp for cpu4, so we deconfigured that cpu and brought the box back online. After having it deconfigured used cstm to exercise cpus and box 'looks' stable.

Was hoping someone with the "secret decoder ring" might come across this thread and cut/paste the ts99. :)

This production box will be decommissioned in about 10 days, so I hope cpu4 was the culprit.

-denver
Andrew Rutter
Honored Contributor

Re: N4000 HPMC and not support.... any help?

hi,

your ts99 shows you have the latest pdc firmware installed, would be worth checking the gsp firmware though too.

Also is there any info in the error logs of the GSP?

it looks like it could be an IO card or pci backplane causing a problem, but cannot be fully sure. not clever enough.

have any changes been made recently or was the box just running along normally?

any info in any of the logs? did you check the ems log, syslog's, shutdown log, and roots mail? incase an alert was sent.

Andy
Sameer_Nirmal
Honored Contributor

Re: N4000 HPMC and not support.... any help?

I do think it was CPU 4 who indeed caused the HPMCs resulting in the system reboots. So the bad CPU4 is the root cause and apparently you have nailed it already.

The "ts99" says that they was a broadcast error on the CPU runway bus causing the bus check resulting in the machine check/HPMC. This error was caused by bad CPU4 since evidently all it's registers can't be all zeros including its hardware address.

Denver Osborn
Honored Contributor

Re: N4000 HPMC and not support.... any help?

So far the system has been up and stable for about 14-hours with cpu4 deconfigured.

thanks everyone for the feedback.
Denver Osborn
Honored Contributor

Re: N4000 HPMC and not support.... any help?

After the suspect cpu was disabled the box stayed up and had no problems finishing out it's life until we migrated to the newer system.