SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

yilmazaydin · ‎11-10-2022

Hello.

We are faced with longtime timeout during package starting. What happens:
1. Package failed on the node without restart on second node. First node was gone off and unreachable.

2. We are trying start package to second node, but command cmrunpkg hung too long - no any messages in the package log file or system journal. We can not wait more than 5 minutes and restarting second node.

3. After restarting second node we can run singlenode cluster via command cmruncl -f -n node2. Package autostarted too.

Question - Why command cmrunpkg hung when one node of 2-nodes cluster is unreachable?

Additional info:

We have tested and confirmed the behavior of the cluster node during the period when the second node is unavailable.

Test-case
Cluster has two node and one package - node1 and node2
We should stop testpkg
We should stop node node2 with poweroff
We should check deadman (lsof |grep deadman)
We should start testpkg on the running node

Short results (timelapse)

Normal time for move package
date && cmhaltpkg testpkg && cmrunpkg -n node1 testpkg && date ##Fri Nov 11 12:49:09 UTC 2022 - Fri Nov 11 12:50:22 UTC 2022
date && cmhaltpkg testpkg && cmrunpkg -n node2 testpkg && date ##Fri Nov 11 12:51:15 UTC 2022 - Fri Nov 11 12:52:12 UTC 2022

Not Normal time
date && cmviewcl ##Fri Nov 11 12:56:31 UTC 2022
date && cmhaltpkg testpkg ##Fri Nov 11 12:56:40 UTC 2022
date && poweroff ##Fri Nov 11 12:57:07 UTC 2022
date && cmviewcl ##Fri Nov 11 12:57:20 UTC 2022
date && lsof |grep deadman ##Fri Nov 11 12:57:29 UTC 2022
date && tail -500 /var/log/messages | grep cmcld ##Fri Nov 11 12:57:37 UTC 2022
date && time cmrunpkg testpkg ##Fri Nov 11 12:58:27 UTC 2022 - waiting 18 minutes and abort it
date && cmhaltnode -f && date ##Fri Nov 11 13:17:41 UTC 2022
date && cmruncl -f -n node1 && date ##Fri Nov 11 13:18:59 UTC 2022

Has anyone encountered a similar situation?

Sush_S · ‎11-15-2022

Hi,

You are hitting a known problem which should be fixed in the next patch release(not sure on ETA). Please reach out to the support team for any workaround.

Thank you!

I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

SGLX A.12.80 TimeOut for cmrunpkg when second node failed.

Re: SGLX A.12.80 TimeOut for cmrunpkg when second node failed.