Operating System - Linux
1827871 Members
1109 Online
109969 Solutions
New Discussion

Hostname node changes when a failover package is running in that node

 
SOLVED
Go to solution
jrevuelta2
Occasional Advisor

Hostname node changes when a failover package is running in that node

Hi team,

I have setup with success a SG for Linux 15 in a couple of VM´s running  RHEL 8.7 in a  wo nodes Simplivity hyperconverged system and using another VM for the quorum server but I have several problems:

1 - When I shutdown both nodes and start again the cluster does not run automatically. I have to restart in both nodes the cmproxy with "systemctl restart cmproxy" and the start the cluster with "cmruncl".

2 - My package called Customer-Appv2 does not start automatically although I have enabled it with "cmmodpkg -e Customer-Appv2"

3 - My package mount a VMFS disk using DLS with vg and ext4 FS ( that work perfect!). The package has an IP address that I have included in /etc/hosts in both nodes and also in the DNS server. In the node where the package is running the hstname has changed to the DNS name assigned to the package IP. So cluster command  cmviewcl gives and error:

cmviewcl -c sg-lnx-clx
cmviewcl: Cannot view the cluster configuration: No such file or directory.
Either this node is not configured in a cluster, user doesn't have
access to view the cluster configuration, or there is some obstacle
to viewing the configuration. Check the syslog file for more information.
For a list of possible causes, see the Serviceguard manual for cmviewcl.

uname -a
Linux sg-lnx-customer-app.plds.es 4.18.0-425.13.1.el8_7.x86_64 #1 SMP Thu Feb 2 13:01:45 EST 2023 x86_64 x86_64 x86_64 GNU/Linux

The node hostname before starting the package is sg-lnx-node2.plds.es and once the package is started it change to sg-lnx-customer-app.plds.es. If I run the same command in the other node everything is ok:

[root@sg-lnx-node1 ~]# cmviewcl

CLUSTER STATUS
sg-lnx-clx up

NODE STATUS STATE
sg-lnx-node1 up running
sg-lnx-node2 up running

PACKAGE STATUS STATE AUTO_RUN NODE
Customer-Appv2 up running enabled sg-lnx-node2

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
Prueba-Paquete down failed disabled unowned
paquete-basico down failed disabled unowned

4 - Moreoever Serviceguard Manager webgui just work in the node where package is not running. 

Thank you very much for your help,

Jose

7 REPLIES 7
Sush_S
HPE Pro

Re: Hostname node changes when a failover package is running in that node

Hi, 

Would you be able to provide the output of the following commands 

"journalctl -a --unit=cmproxy.service"

"cmgetconf -p Customer_Appv2 | grep auto_run"
See if autorun is set to yes


"grep AUTOSTART_CMCLD" $SGCONF/cmcluster.rc"



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
jrevuelta2
Occasional Advisor

Re: Hostname node changes when a failover package is running in that node

Here you have the info. I have shutdown and restart both nodes. I have run the cmviewcl and then cmruncl because the cmviewcl output:

[root@sg-lnx-node2 ~]# cmviewcl

CLUSTER STATUS
sg-lnx-clx down

NODE STATUS STATE
sg-lnx-node1 down unknown
sg-lnx-node2 down unknown

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
Customer-Appv2 down halted enabled unowned
Prueba-Paquete down halted enabled unowned
paquete-basico down halted enabled unowned

[root@sg-lnx-node2 ~]# cmruncl -v
cmruncl: Validating network configuration...
Gathering network information
Beginning network probing (this may take a while)
Completed network probing
cmruncl: Network validation complete
Checking for license.........
Waiting for cluster to form .... done
Cluster successfully formed.
Check the syslog files on all nodes in the cluster to verify that no warnings oc curred during startup.

After that

[root@sg-lnx-node1 ~]# cmviewcl

CLUSTER STATUS
sg-lnx-clx up

NODE STATUS STATE
sg-lnx-node1 up running
sg-lnx-node2 up running

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
Customer-Appv2 down failed disabled unowned
Prueba-Paquete down failed disabled unowned
paquete-basico down failed disabled unowned

If I try to start the package manually:

cmrunpkg Customer-Appv2

Checking license requirement for Package Customer-Appv2.
Found Valid License.
Running package Customer-Appv2 on node sg-lnx-node1
The package script for Customer-Appv2 failed with no restart. Customer-Appv2 should not be restarted
Unable to run package Customer-Appv2 on node sg-lnx-node1
Check the syslog and pkg log files for more detailed information
cmrunpkg: Unable to start some package or package instances.

Here you have the output of journalctl

[root@sg-lnx-node1 ~]# journalctl -a --unit=cmproxy.service
-- Logs begin at Thu 2023-03-23 16:50:00 CET, end at Thu 2023-03-23 17:01:01 CET. --
Mar 23 16:50:14 localhost.localdomain systemd[1]: Starting init script for Serviceguard Command Proxy Daemon...
Mar 23 16:50:14 localhost.localdomain cmproxy_prestart[1426]: Starting :
Mar 23 16:50:14 localhost.localdomain cmproxyd[1508]: Initializing
Mar 23 16:50:14 localhost.localdomain systemd[1]: Started init script for Serviceguard Command Proxy Daemon.
Mar 23 16:50:16 localhost.localdomain cmproxyd[1508]: Executing command: rm -f /usr/local/cmcluster/run/.cmproxyd.*.socket
Mar 23 16:50:16 localhost.localdomain cmproxyd[1508]: Ready

Then I have restarted cmproxycl with systemctl restart cmproxy

[root@sg-lnx-node1 ~]# journalctl -a --unit=cmproxy.service
-- Logs begin at Thu 2023-03-23 16:50:00 CET, end at Thu 2023-03-23 17:03:32 CET. --
Mar 23 16:50:14 localhost.localdomain systemd[1]: Starting init script for Serviceguard Command Proxy Daemon...
Mar 23 16:50:14 localhost.localdomain cmproxy_prestart[1426]: Starting :
Mar 23 16:50:14 localhost.localdomain cmproxyd[1508]: Initializing
Mar 23 16:50:14 localhost.localdomain systemd[1]: Started init script for Serviceguard Command Proxy Daemon.
Mar 23 16:50:16 localhost.localdomain cmproxyd[1508]: Executing command: rm -f /usr/local/cmcluster/run/.cmproxyd.*.socket
Mar 23 16:50:16 localhost.localdomain cmproxyd[1508]: Ready
Mar 23 17:02:57 sg-lnx-node1.plds.es systemd[1]: Stopping init script for Serviceguard Command Proxy Daemon...
Mar 23 17:02:57 sg-lnx-node1.plds.es systemd[1]: cmproxy.service: Succeeded.
Mar 23 17:02:57 sg-lnx-node1.plds.es systemd[1]: Stopped init script for Serviceguard Command Proxy Daemon.
Mar 23 17:02:57 sg-lnx-node1.plds.es systemd[1]: Starting init script for Serviceguard Command Proxy Daemon...
Mar 23 17:02:57 sg-lnx-node1.plds.es cmproxy_prestart[8812]: Starting :
Mar 23 17:02:57 sg-lnx-node1.plds.es systemd[1]: Started init script for Serviceguard Command Proxy Daemon.
Mar 23 17:02:57 sg-lnx-node1.plds.es cmproxyd[8817]: Initializing
Mar 23 17:02:57 sg-lnx-node1.plds.es cmproxyd[8817]: Executing command: rm -f /usr/local/cmcluster/run/.cmproxyd.*.socket
Mar 23 17:02:57 sg-lnx-node1.plds.es cmproxyd[8817]: Ready

and start the package again:

[root@sg-lnx-node1 ~]# cmrunpkg Customer-Appv2

Checking license requirement for Package Customer-Appv2.
Found Valid License.
Running package Customer-Appv2 on node sg-lnx-node1
Successfully started package Customer-Appv2 on node sg-lnx-node1
cmrunpkg: All specified packages are running

[root@sg-lnx-node2 ~]# cmviewcl

CLUSTER STATUS
sg-lnx-clx up

NODE STATUS STATE
sg-lnx-node1 up running

PACKAGE STATUS STATE AUTO_RUN NODE
Customer-Appv2 up running disabled sg-lnx-node1

NODE STATUS STATE
sg-lnx-node2 up running

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
Prueba-Paquete down failed disabled unowned
paquete-basico down failed disabled unowned


[root@sg-lnx-node2 ~]# cmgetconf -p Customer_Appv2 | grep auto_run
cmgetconf: Unable to get package configuration information : package Customer_Appv2 is not configured.

in the node where the package is running:

[root@sg-lnx-node1 ~]# cmgetconf -p Customer_Appv2 | grep auto_run
cmgetconf: Unable to get local cluster configuration information: No such file or directory.
Either cluster is not configured, or the user doesn't
have access to get the cluster configuration.

[root@sg-lnx-node1 ~]# hostname -A
sg-lnx-node1.plds.es sg-lnx-node1.plds.es sg-lnx-customer-app.plds.es
[root@sg-lnx-node1 ~]# hostname
sg-lnx-customer-app.plds.es

[root@sg-lnx-node1 ~]# cat $SGCONF/cmcluster.rc | grep AUTOSTART_CMCLD
# AUTOSTART_CMCLD
AUTOSTART_CMCLD=1

 

jrevuelta2
Occasional Advisor

Re: Hostname node changes when a failover package is running in that node

There was a typo error:

[root@sg-lnx-node2 ~]# cmgetconf -p Customer-Appv2 | grep auto_run
# Both "node_fail_fast_enabled" and "auto_run"
# "auto_run" defines whether the package is to be started when the
# The default for "auto_run" is "yes", meaning that the package will be
# If "auto_run is "no", the package is not started when the cluster
# "auto_run" replaces "pkg_switching_enabled".
# Legal values for auto_run: yes, no.
auto_run yes

 

jrevuelta2
Occasional Advisor

Re: Hostname node changes when a failover package is running in that node

If I reboot both nodes and run systemctl restart cmproxy in both nodes before run cmruncl, the package start automatically but the hostname of the node where the package is running has change and I can not run cluster commands in that node.

jrevuelta2
Occasional Advisor

Re: Hostname node changes when a failover package is running in that node

I have found that ServiceGuard Cluster Startup script run before NetworkManager has setup the hostname to sg-lnx-node1.plds.es  or sg-lnx-node2.plds.es . So the service fail and that is the reason I have to manual restart cmproxy and then run the cmruncl command. At the moment SG cluster startup script run the hostname is "localhost.localdomain".

Is it posible to run the SG Cluster Startup script after NetworkManager has setup the correct hostname? Or include a delay of XX seconds?

jrevuelta2
Occasional Advisor
Solution

Re: Hostname node changes when a failover package is running in that node

I have changed the hostname from  the default localhost.localdomain to sg-lnx-nodeX.plds.es (X 1 or 2) and everithing runs as expected: No hostname changes, the cluster run automatically, etc

Mike_Chisholm
HPE Pro

Re: Hostname node changes when a failover package is running in that node

>  "ServiceGuard Cluster Startup script run before NetworkManager"

What script exactly are you referring to here? $SGCONF/cmcluster_service does not set the hostname. I'm not aware of any Serviceguard script that explictly sets the hostname, but it might be there.

I do know that the Serviceguard command subsystem does have a requirement that the NODE_NAME as defined in the cluster.ascii file must match the hostname and uname -n of the system as it is known at the linux layer or else SGLX commands will not run properly. See https://support.hpe.com/hpesc/public/docDisplay?docId=sd00002308en_us&docLocale=en_US&page=GUID-66A0B0BE-A139-433C-95C2-4614F39FD4F7.html

**EDIT To be completely honest I guess the uname -n does not HAVE to match hostname but in almost every case it does and should. The hostname does have to match the NODE_NAME as defined in the cluster for SGLX commands to work reliabely.

I work for HPE.