Operating System - HP-UX
1834017 Members
2544 Online
110063 Solutions
New Discussion

Package switch due to lan0 timeout.. How to prevent?

 
SOLVED
Go to solution
Tom Bies
Occasional Advisor

Package switch due to lan0 timeout.. How to prevent?

Hello,

I'm running a two node cluster with 1 package on each node. (Active/Active). Last weekend our network guys rebooted a piece of network equipment causing the primary MC/SG LAN (lan0) to timeout. When the lan0 timeout occurred the package on node number 1 moved to node number 2. Question... How can I prevent package failovers if a primary lan times out? I'm thinking the package switch should not have happened because both nodes in the cluster are on the SAME subnet AND have a dedicated X/O cable for heartbeat. I realize I don't have any standby cards configured but if the heartbeat X/O lan never timed out why did the package switch occur? Is this normal behavior?

Excerpt from syslog:

Aug 17 04:40:57 eawwdc02 cmcld: lan0 failed
Aug 17 04:40:57 eawwdc02 cmcld: Subnet 10.2.225.0 down
Aug 17 04:40:57 eawwdc02 cmcld: Subnet 10.2.225.0 in package oracmts_pkg1 is down.
Aug 17 04:40:57 eawwdc02 cmcld: Executing '/etc/cmcluster/oracmts_pkg1/oracmts_pkg1.cntl stop' for package oracmts_pkg1, as service PKG*37634.
Aug 17 04:40:58 eawwdc02 su: + tty?? root-jdeoil
Aug 17 04:40:58 eawwdc02 su: + tty?? root-jdecorn
Aug 17 04:40:59 eawwdc02 su: + tty?? root-jdecin
Aug 17 04:41:00 eawwdc02 su: + tty?? root-cmts
Aug 17 04:41:03 eawwdc02 su: + tty?? root-jdeflm
Aug 17 04:41:03 eawwdc02 su: + tty?? root-copstran
Aug 17 04:41:04 eawwdc02 su: + tty?? root-cmtsgxs
Aug 17 04:41:05 eawwdc02 su: + tty?? root-tibeai
Aug 17 04:41:11 eawwdc02 su: + tty?? root-jdecorn
Aug 17 04:41:05 eawwdc02 su: + tty?? root-cmtsgxs
Aug 17 04:41:12 eawwdc02 su: + tty?? root-oracle
Aug 17 04:42:03 eawwdc02 cmcld: lan0 recovered
Aug 17 04:42:03 eawwdc02 cmcld: Subnet 10.2.225.0 up

Excerpt from the ascii file:

NODE_NAME eawwdc01
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.2.225.25
NETWORK_INTERFACE lan1
HEARTBEAT_IP 10.2.215.14
NETWORK_INTERFACE lan4
HEARTBEAT_IP 1.1.1.1
FIRST_CLUSTER_LOCK_PV /dev/dsk/c7t12d0
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Warning: There are no standby network interfaces for lan0.
# Warning: There are no standby network interfaces for lan1.
# Warning: There are no standby network interfaces for lan4.

NODE_NAME eawwdc02
NETWORK_INTERFACE lan0
HEARTBEAT_IP 10.2.225.56
NETWORK_INTERFACE lan1
HEARTBEAT_IP 10.2.215.15
NETWORK_INTERFACE lan4
HEARTBEAT_IP 1.1.1.2
FIRST_CLUSTER_LOCK_PV /dev/dsk/c7t12d0
# List of serial device file names
# For example:
# SERIAL_DEVICE_FILE /dev/tty0p0

# Warning: There are no standby network interfaces for lan0.
# Warning: There are no standby network interfaces for lan1.
# Warning: There are no standby network interfaces for lan4.

Thanks,
Tom
9 REPLIES 9
melvyn burnard
Honored Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

This is exactly what Sg is supposed to do. You have no standby for the package, it is monitoring the subnet, which goes down. This tells hte package manager that as there is no possibility that the network manager could just switch lans, hten it must move hte package(s).

How to prevent it?, set up standby lans
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Uday_S_Ankolekar
Honored Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

Tom,
Service Guard is doing it's job! It defeats whole purpose of having SG if you want to prevent this!

-USA..
Good Luck..
Krishna Prasad
Trusted Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

I think what he is asking is if all the network cards are attached to the same core and on the same subnet, then failing over a package when the network is not available is useless in his case.

He actually just wants to monitor the serial heartbeat for failovers.

When we ran service guard years ago there was a way to that.
Positive Results requires Positive Thinking
Kent Ostby
Honored Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

Tom --

Check your configuration for the NODE_TIMEOUT value.

A node timeout value that is higher then the default (of 2 seconds) usually around 8 to 10 seconds will help prevent some of the problems (i.e. SG will wait longer before timing out).

A subnet failure will generally cause a switchover however.

I will give you the link for the document about setting up serial heartbeats which might help you in a situation where all of your lans are down.

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000062950444

I'm away from my desk right now (at home) so I'll have to double check this scenario in the morning.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
melvyn burnard
Honored Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

changing the NODE_TIMEOUT value will have no effect here, as that has nothing to do with hte package switching. The package switched as a subnet it is configured to use/monitor has failed, and therefore the package manager will switch the package if this is possible.
Also, having a serial heartbeat will do nothing for you in the event this happens again, and in fact with the configuration you show is actually NOT recommended.

Again, either set up a standby lan, or turn off the switching by disabling the package with cmmodpkg -d package_name
This means that you would NOT get an automatic failover in he event of a REAL failure, such as a node crashing.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Krishna Prasad
Trusted Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

The fact that he only has one sub-net doesn't help him even if it is the recomended config.

Since the network will still be down when he Service Guard fails what's the point?

Why not use a serial heartbeat? This way it will failover if the other machine dies.

I will agree that the beter installation would have two completly seperate sub-nets. But since that's not what he has I think the serial heartbeat is better in his case.
Positive Results requires Positive Thinking
Krishna Prasad
Trusted Contributor
Solution

Re: Package switch due to lan0 timeout.. How to prevent?

one more note - If the network card fails and you are only monitoring the heart beat - you won't fail either. You would then have to fource the move. Maybe APA would be better for you as far as network reduntency goes.

Until you get a second sub-net and core you will always have single piont of failure.

Positive Results requires Positive Thinking
Stephen Doud
Honored Contributor

Re: Package switch due to lan0 timeout.. How to prevent?

If the heartbeat is on a private network that will not be perturbed when the package network fails, the following will work for you.

To prevent serviceguard from detecting a SUBNET outage, thereby forcing a package failover, comment out your SUBNET entry in the package configuration file, and cmapplyconf the package config file while the package is down.

$ cmhaltpkg
$ cmapplyconf -f -P
$ cmviewconf | more (to verify no "package subnet" entry is listed for that package.
From this time on, ServiceGuard will not associate a network failure with the package, so it will leave the package running.

-Stephen Doud
Tom Bies
Occasional Advisor

Re: Package switch due to lan0 timeout.. How to prevent?

Thanks for the responses all. I justed wanted to make sure that I didn't miss something here, as I was having trouble understanding the point of a package switch with the primary subnet is down on both nodes. I will be setting up standby cards soon.

Thanks,
Tom