Operating System - HP-UX
1823986 Members
4251 Online
109667 Solutions
New Discussion юеВ

CPU Failure on a node with dual CPU

 
SOLVED
Go to solution
Kevin Lamb
Frequent Advisor

CPU Failure on a node with dual CPU

Hi,

I have two D370 with Dual CPU's which are under ServGuard, if a single CPU fails on the primary node will this cause the node to fail over.

I beleive that the node will fail over but would like confirmation, as I am a bit of a newbie where SG is concerened, I am sure if you lose a single CPU this would cause an Abend and fail the machine.

Any help would be appreciated, I just need to clarify the info.

Kev
4 REPLIES 4
G. Vrijhoeven
Honored Contributor

Re: CPU Failure on a node with dual CPU

Hi,

Not the node but the packages running on the node can be configured to fail over. if you issue a:
#cmviewcl

take a look at the PKG_SWITCH. check if it is enabled for all packages.

check the /etc/cmcluster//*.cfg file for the node it will fail over to.
(NODE_NAME)
can contain more than one entry



to alter PKG_SWITCH use cmmodpkg command.
to alter node in cfg file add it with vi and cmcheckconf, cmapplyconf.

Hope this will help,

Gideon
John Palmer
Honored Contributor
Solution

Re: CPU Failure on a node with dual CPU

Hi Kevin,

I would expect the D class to crash in the event of a CPU failure with some sort of HPMC.

Depending on the type of fault and which CPU failed, it may or may not reboot.

Any packages configured to failover will do so.

Regards,
John
Kevin Lamb
Frequent Advisor

Re: CPU Failure on a node with dual CPU

John / Gideon,

Between the two replies I think I have now got the answer I was looking for, I was almost certain that the D class would fail and the packages would be transfered to the remaining node but needed clarification.

Have a great 2002

Kev
Stephen Doud
Honored Contributor

Re: CPU Failure on a node with dual CPU

A single CPU failure will most certainly cause an HPMC (High Priority Machine Check) on the D-class and force the machine to memory dump/reboot. If ServiceGuard is programmed to fail the package over, expect it to happen when a CPU fails.

For your information, new products like vPARS (Virtual Partitioning) allow the administrator to run multiple operating systems on the rp5470 (L3000) and rp7400 (N4000) servers - alleviating an entire server crash. To learn more about such products, check out this page:

http://docs.hp.com/hpux/11i/index.html - near the bottom: Virtual Partitions