Node failure

Cristi BODNARIUC · ‎07-11-2003

Hi,

I installed a 2 nodes cluster with SG A.11.14 with one shared SCSI external drive (configured as lock device) and only one lan on each node (which is also the heartbeat lan).

I try to test a failover but the things do not go as I expect.

I have 2 cases:

1) I get out the network cable of node 1
2) I power off node 1

In both cases the second node reboots.
I expected that it will host all the resources previously on the node1.

After reboot node2 can not even form the cluster, complaining that it can not get the OS version of node1. Shouldn't it go on running the cluster?

Do I have to do a special configuration?
What could be not well configured?

Thanks,
Cristi

melvyn burnard · ‎07-11-2003

Do you not have a standby lan for the heartbeat? if so, hen you may very well see the incorrect node TOC.
How are the nodes connected via lans, and how is the scsi connected? what are the disc and controller scsi addresses? what do the syslogs and OLDsyslogs show on each node?
Read the manuals at http:/docs.hp.com/hpux/ha for an idea on how to configure the cluster

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Bernhard Mueller · ‎07-11-2003

Hello,

that is why you should have a phyiscally separate HeartBeat-LAN.

If you have no LAN communication between the nodes at all, they both run for the lock disk to decide which one has to TOC. The other one will carry on as a one node cluster. So chances are 50% the node you expected to TOC will TOC.....

This is called arbitration. There is a lot of information about it in the manuals at docs.hp.com

Regards
Bernhard

Cristi BODNARIUC · ‎07-11-2003

Hi,

I thought that for the first case (when taking out the network cable from node1) the problem could come from the fact that there is no dedicated heartbeat way.

But when switching the power off on node1 I think it is not a heartbeat problem anymore and the second node should not TOC.

Maybe I am stil missing something. I will keep reading the manuals :)

The 2 nodes are connected to the company network (both are conected to a switch).
The external disk has 2 ends, one connected to node1 and the other to node2. It is powered separately.

After the reboot of node2 I can start the cluster with cmruncl -n node2 and the cluster runs well.

Karthik S S · ‎07-11-2003

Hi,

If you are running short of NICs better you configure the heartbeat on RS232. Refer to the SG documentation on how to set this up. Also a quick requirements for heartbeat could be found at,

http://www.netsysco.com/pdf/Manuals/Sg/HeartbeatReq.pdf

Regards,
karthik S S

For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn

melvyn burnard · ‎07-11-2003

Again, what do your syslog and OLDsyslog files tell you.
Is your cluster lock disc actually working? what type of disc is it?
And I would not recommend a serial heartbeat unless you cann really not afford at least another lan card.

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Bernhard Mueller · ‎07-11-2003

Cristi,

I believe there could be a problem with the binary cmclconfig file, since your assumptions for case 2 are correct. So delete them and do another chcheckconf / cmapplyconf.

One other thing to check is your .rhosts or cmclnodelist to include BOTH nodes on BOTH nodes. That could be the problem why node2 cannot form the cluster but a cmruncl -n node2 will work.

Regards,
Bernhard

Cristi BODNARIUC · ‎07-15-2003

Hi,

Thank you all for your help.

It seems that he problem was in the binary cluster config file which I did not compile/redistribute after I have changed the SCSI disk (with one with different SCSI ID).

Now if I shutdown a node the other takes over all the packages.

Thanks,
Cristi

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Node failure

Node failure

Re: Node failure

Re: Node failure

Re: Node failure

Re: Node failure

Re: Node failure

Re: Node failure

Re: Node failure