Sever perform crash dump

Rashid Hamid · ‎12-11-2006

Hi All

I have mcsg running with 2 nodes, hp1(rp7420) and hp2(7400). Problem occured when I pull out primary LAN and standby LAN in hp1, all packages failover to hp2 without any problem, BUT hp1 perform crash dump.

Thanks

I'm Parit Madirono/Parit Betak Boyz

Patrick Wallek · ‎12-11-2006

Yes. That is perfectly normal.

In the event of a failure like that the machine will TOC (transfer of control) which creates a crash dump. The intent, I believe, is to have the crash dump to allow you to look into the root cause of the issue.

Also, since there was a problem, the machine that does not get control will TOC to make sure that all resources necessary for the packages are available to the other node.

This is discussed in detail in the MC/SG manuals, available here:

http://docs.hp.com/en/oshpux11iv2.html#Serviceguard

Patrick Wallek · ‎12-11-2006

For more information have a read through the "Responses to Failures" section of Chapter 3 - "Understanding Serviceguard Software Components" of the "Managing Serviceguard" manual. It specifically talks about conditions that can initiate a TOC.

In your case, this quote applies: "A TOC is done if a cluster node cannot communicate with the majority of cluster members for the predetermined time,..." Pulling the lan cables means the other node could not communicate.

The "Responses to Failures" section is here:

http://docs.hp.com/en/B3936-90100/ch03s07.html

The whole manual is available from the link I gave above.

Rashid Hamid · ‎12-11-2006

Thanks Patrick for the explanation.
I have another set of MCSG running with 2 nodes cluster, I just pull out primary and standby network and no TOC at all.

I'm Parit Madirono/Parit Betak Boyz

Stephen Doud · ‎12-12-2006

Serviceguard uses the concept of heartbeat messages between servers to verify that each member node is active.
If your cluster is configured to send heartbeat on only one LAN and you break that LAN, then Serviceguard has to reform the cluster and identify which nodes in the cluster continue to operate, and which must be rebooted to preserve data integrity.

If other clusters can survive such a test, then they must have multiple heartbeat networks.

cmviewconf will show how which networks are configured for heartbeat.

In the case where all heartbeat paths are broken, Serviceguard must use a rule to decide whether a server must crash or continue operation in th cluster.
In a scenario where HB fails between an even set of nodes (ie 1-1, 2-2, 3-3), Serviceguard requires the use of a cluster lock disk or Quorum Server to arbitrate which half of the remaining cluster is allowed to continue, and consequently, which half must crash.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Sever perform crash dump

Sever perform crash dump

Re: Sever perform crash dump

Re: Sever perform crash dump

Re: Sever perform crash dump

Re: Sever perform crash dump