1848659 Members
9563 Online
104035 Solutions
New Discussion

K-370 and K380 Server

 
Tan Tian Ho
Occasional Advisor

K-370 and K380 Server

The Server Consist of 5 CPU. What happen if any one of the CPU fails. Will the server detect it and distrubte the loard equally to the 4 CPU ? or It will Shutdown totally terminating all services on the servers.
9 REPLIES 9
Dietmar Konermann
Honored Contributor

Re: K-370 and K380 Server

The server will panic with an HPMC and then reboot. If the processor in question fails self test during bootup then it will be disabled... the system comes up with 4 CPUs active then.

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Jeroen Peereboom
Honored Contributor

Re: K-370 and K380 Server

It will run.
We once found out after a few days (weeks?) that 1 CPU of a D (?) series was broken (red flashing light on the machine itself)....

JP
Robert-Jan Goossens
Honored Contributor

Re: K-370 and K380 Server

Hi,

Depends which cpu fails, if cpu 0 fails the system will go down (kernel is on cpu 0), if one of the other cpu's fails the system will detect the error and distribute the NEW processes on all the cpu's. Processes running on the failed cpu will terminate.

Regards,
Robert-Jan
Michael Tully
Honored Contributor

Re: K-370 and K380 Server

On the older K class systems the box will panic with a HPMC and reboot with the remaining CPU's as suggested by Dietmar.
Anyone for a Mutiny ?
Dietmar Konermann
Honored Contributor

Re: K-370 and K380 Server

Thank's, Michael. :-)

The sceneario I described indeed happens if the CPU "fails"... this means HPMC.

However, for correctable errors (LPMCs) there is a feature called "Dynamic Processor Resilience (DPR)" which is triggered by the diagnostics when a CPU observes at least 3 LPMCs (depends on CPU type) in 24 hours. The kernel then deallocates the CPU, which means that the scheduler will not schedule any new threads to run on it and all runable threads currently on its runqueue will be re-queued into other processors. It will be also marked to be completely disabled at next bootup.

Note that an deallocated CPU is *not* completely idle. Consider e.g. interrupt handling...

Best regards...
Dietmar.
"Logic is the beginning of wisdom; not the end." -- Spock (Star Trek VI: The Undiscovered Country)
Michael Tully
Honored Contributor

Re: K-370 and K380 Server

geez Dietmar, you sure pulled that one from the memory banks ...
Anyone for a Mutiny ?
Dave Unverhau_1
Honored Contributor

Re: K-370 and K380 Server

Too bad there's no way for all forum participants (and not just the thread originator) to award points to responses like Dietmar's.

...Of course if that were to happen, guys like Dietmar would have seven digit scores by now... ;^)

Thanks!

Dave
Romans 8:28
Ted Buis
Honored Contributor

Re: K-370 and K380 Server

New systems have DPR, but can Dietmar confirm that the K-class can do that too? Also, I think you have to have support tools installed for DPR to work. Correct?
Mom 6
Bill Hassell
Honored Contributor

Re: K-370 and K380 Server

To deallocate a processor when detecting LPMC's, you need the STM/EMS loaded. This tool is so critical, it should be a requirement for all systems - and - must be kept up to date, at least within 6 months. Download the current version from:

http://www.software.hp.com/portal/swdepot/displayInstallInfo.do?productNumber=B7609BA
http://www.software.hp.com/portal/swdepot/displayInstallInfo.do?productNumber=B6191AAE

The K-class boxes haven't been manufactured for many years. DPR may support deallocating a processor on the old hardware though. NOTE: There are many processor failures that will crash the machine (ie, cache parity errors). If the failure is permanent, it should be seen by the selftests but with the old K-class boxes, may not deallocate the failed procecessor but simply hang on the failed test. You may be able to interact with the processor ROMs and disable the failing processor, but some failures will completely disable the system.


Bill Hassell, sysadmin