Operating System - HP-UX
1839259 Members
2387 Online
110137 Solutions
New Discussion

After multiple fallovers/tests/etc/. cluster is now showing 2 warnings

 
SOLVED
Go to solution
chisle
Advisor

After multiple fallovers/tests/etc/. cluster is now showing 2 warnings

We did a battery of tests yesterday, including the following:

*) panic of active node (basically just halted it from MP)
*) manual move of package from one node to the other
*) manual move of all packages from active node to the other (cmhaltnode followed by cmrunnode to bring services back up, without packages moving back over)

cmviewcl shows everything is normal, except the following unexpected entry from one of the packages: AUTO_RUN=disabled

In the WebUI (WebSMH), we see the following:

in addition to auto_run being disabled for that one packge, the nodes now show 'node switching' is disabled.

Hower, our configuration files for both packages contains:

auto_run yes
node_fail_fast_enabled no
failover_policy CONFIGURED_NODE
failback_policy MANUAL

we don't want the package automatically falling all over the place when the active node comes back on-line, we do want it to be a manual process, but we do want the package to automatically fallover to an active node and start if the current node fails for whatever reason.

Are we misconfigured, or did all the testing get things out of whack? Before we did the tests, all flags were green. The current status is a result of the tests.

How do we get back to how we were (no warnings) and with the expected behavior?
2 REPLIES 2
Matti_Kurkela
Honored Contributor
Solution

Re: After multiple fallovers/tests/etc/. cluster is now showing 2 warnings

The auto_run setting in the package configuration file is used only when starting the cluster. After the cluster is running, the auto_run setting is controlled by the cmmodpkg command.

Furthermore, every time you run cmhaltpkg, Serviceguard will automatically switch that package to "auto_run no" state. Otherwise the package would fail over to another node instead of just stopping, leaving the sysadmin to play a game of Whack-a-Mole with the cmhaltpkg command acting as the hammer.

Modern versions of Serviceguard remind you of this each time you use the cmhaltpkg command.

After moving the package manually, you should set auto_run back to "yes" (= fully re-arm the package failover mechanism) by running:

cmmodpkg -e

The "switching" attributes can be seen with "cmviewcl -v" too. They usually get disabled if a package start-up attempt fails on some nodes. They are sort of "I tried this package on this node, it failed, won't try that again on my own until the sysadmin tells me it's fixed" markers. They are intended to prevent Serviceguard from getting stuck in an infinite loop, trying repeatedly to start a buggy package on each node in succession.

To re-enable the "node switching" parameter for each node, run:

cmmodpkg -e -n

MK
MK
chisle
Advisor

Re: After multiple fallovers/tests/etc/. cluster is now showing 2 warnings

Thank you for the fuller explanation, along with the commands.

I had been running 'cmmodpkg -v -e SG_PACKAGE', but had neglected to do so after the failure had occurred. I also had not been running the -e -n SG_NODE params.

Your explanation was very helpful. After running the commands, everything went green.