Operating System - Linux
1771209 Members
2745 Online
109004 Solutions
New Discussion

SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

 
SOLVED
Go to solution
yilmazaydin
Valued Contributor

SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

Hello.

We are faced with longtime timeout during package starting. What happens:
1. Package failed on the node without restart on second node. First node was gone off and unreachable. 

2. We are trying start package to second node, but command cmrunpkg hung too long - no any messages in the package log file or system journal.  We can not wait more than 5 minutes and restarting second node.

3. After restarting second node we can run singlenode cluster via command cmruncl -f -n node2.  Package autostarted too.

Question - Why command cmrunpkg hung when one node of 2-nodes cluster is unreachable?

 

 

Additional info:

We have tested and confirmed the behavior of the cluster node during the period when the second node is unavailable.

Test-case
Cluster has two node and one package - node1 and node2
We should stop testpkg
We should stop node node2 with poweroff
We should check deadman (lsof |grep deadman)
We should start testpkg on the running node

 

Short results (timelapse)

Normal time for move package
date && cmhaltpkg testpkg && cmrunpkg -n node1 testpkg && date ##Fri Nov 11 12:49:09 UTC 2022 - Fri Nov 11 12:50:22 UTC 2022
date && cmhaltpkg testpkg && cmrunpkg -n node2 testpkg && date ##Fri Nov 11 12:51:15 UTC 2022 - Fri Nov 11 12:52:12 UTC 2022

Not Normal time
date && cmviewcl ##Fri Nov 11 12:56:31 UTC 2022
date && cmhaltpkg testpkg ##Fri Nov 11 12:56:40 UTC 2022
date && poweroff ##Fri Nov 11 12:57:07 UTC 2022
date && cmviewcl ##Fri Nov 11 12:57:20 UTC 2022
date && lsof |grep deadman ##Fri Nov 11 12:57:29 UTC 2022
date && tail -500 /var/log/messages | grep cmcld ##Fri Nov 11 12:57:37 UTC 2022
date && time cmrunpkg testpkg ##Fri Nov 11 12:58:27 UTC 2022 - waiting 18 minutes and abort it
date && cmhaltnode -f && date ##Fri Nov 11 13:17:41 UTC 2022
date && cmruncl -f -n node1 && date ##Fri Nov 11 13:18:59 UTC 2022

 

Has anyone encountered a similar situation?

 

1 REPLY 1
Sush_S
HPE Pro
Solution

Re: SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

Hi,

You are hitting a known problem which should be fixed in the next patch release(not sure on ETA). Please reach out to the support team for any workaround.

Thank you!


I am an HPE Employee
Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise

Accept or Kudo