Operating System - OpenVMS

Two node cluster, but only one at a time is up

 
VMSCheck
Advisor

Two node cluster, but only one at a time is up

Hello, we have integrity 2 node cluster and one server at a time is coming up. What might be missing? Or what's need to be checked? Please help!

Thank you!

31 REPLIES 31
VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

One of the nodes in the cluster was down so tried to bring up but only one server at a time is coming up. I am sure some setting is missing somewhere or misconfigured. So can you anyone please help!

Following settings I can see from server end:

$ pipe mcr sysgen show /all|search sys$input MSCP
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 1 0 0 3 Coded-valu
MSCP_SERVE_ALL 7 4 0 -1 Bit-Encode
TMSCP_SERVE_ALL 1 0 0 -1 Bit-Encode
MSCP_BUFFER 1024 1024 256 -1 Coded-valu
MSCP_CREDITS 32 32 2 1024 Coded-valu
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
$
 
mcr sysgen show /all
    
Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
WBM_MSG_UPPER 80 80 0 -1 msgs/int D
WBM_MSG_LOWER 20 20 0 -1 msgs/int D
WBM_OPCOM_LVL 0 0 0 2 mode D
AUTO_DLIGHT_SAV 1 0 0 1 Boolean D
DELPRC_EXIT 5 5 0 7 Coded-valu D
SHADOW_REC_DLY 20 20 20 65535 Seconds D
SHADOW_HBMM_RTC 150 150 60 65535 Seconds D
MULTITHREAD 8 1 0 256 KThreads D
SHADOW_PSM_RDLY 30 30 0 65535 Seconds D
EXECSTACKPAGES 3 3 2 768 Pages D
GB_CACHEALLMAX 50000 50000 100 -1 Blocks D
GB_DEFPERCENT 35 35 0 1000 Percent D
CPU_POWER_MGMT 2 2 0 -1 Coded-valu D
CPU_POWER_THRSH 50 50 0 100 Percent D
IO_PRCPU_BITMAP 0-1023 0-1023 0 1023 CPU bitmap D
LOCKRMWT 5 5 0 10 Pure-numbe D
SSIO_SYNC_INTVL 30 30 5 65535 Seconds D
SCH_SOFT_OFFLD (none set) (none set) 0 1023 CPU bitmap D
SCH_HARD_OFFLD (none set) (none set) 0 1023 CPU bitmap D
PAGED_LAL_SIZE 0 512 0 2560 Bytes D
$

 

VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

SYSMAN> PARAMETERS SHOW/LGI
%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node XXXX
Node XXXX: Parameters in use: ACTIVE
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
LGI_CALLOUTS 0 0 0 255 Count D
LGI_BRK_TERM 1 1 0 1 Boolean D
LGI_BRK_DISUSER 0 0 0 1 Boolean D
LGI_PWD_TMO 30 30 0 255 Seconds D
LGI_RETRY_LIM 3 3 0 255 Tries D
LGI_RETRY_TMO 20 20 2 255 Seconds D
LGI_BRK_LIM 5 5 1 255 Failures D
LGI_BRK_TMO 300 300 0 5184000 Seconds D
LGI_HID_TIM 300 300 0 1261440000 Seconds D

SYSMAN>

VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

Anyone has any suggestion? Please suggest!!!

Steven Schweda
Honored Contributor

Re: Two node cluster, but only one at a time is up

> [...] we have integrity 2 node cluster [...]

   Hardware model(s)?  VMS version(s)?  Cluster interconnect?

> [...] and one server at a time is coming up. [...]

   When you do what, exactly?

   Console output?


> Following settings I can see from server end: [...]

   _Which_ "server"?

   Copy+paste with white space works better here with the "</>"
("Insert/Edit code sample") tool.

Volker Halle
Honored Contributor

Re: Two node cluster, but only one at a time is up

Has this been working before ?

What has been changed ?

What has happenend ?

And what happens, if you do what ?

The parameters you've shown have nothing to do with the basic clustering. Consider to show the values for:

VAXCLUSTER, VOTES, EXPECTED_VOTES, DISK_QUORUM, QDSKVOTES, SCSSYSTEMID, SCSNODENAME

from BOTH nodes.

Console messages ?

If it's urgent, consider to log a call with your OpenVMS support organization. This is just a forum...

Volker.

VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

Yes, was working before. Admin who was maintaining this server left and I was asked to look into it. I am more UNIX person and we don't have support .   One thing I am aware of the change was quorum disk was changed from thick to thin and when one of the nodes in the cluster later was down and couldn't bring up, we asked the SAN admin to move back how it was and it was changed back to thick. Still can't bring server up. And if we shutdown another node that hangs up and the other one will be up. So in a nutshell, one server at a time can only be up. Checked the network connections, that is working fine, no issues there. I am thinking some configuration check need to be set. And I need your help.

Console Log:

ogical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 18:39:53.38
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
2,1,2,0 5404006349E10000 0000000000000000 EVN_BOOT_START
***********************************************************
* ROM Version : 01.98
* ROM Date : Fri Sep 11 00:56:00 PDT 2015
***********************************************************
2,0,2,0 3404083709E10000 000000000002000C EVN_BOOT_CELL_JOINED_PD
2,1,2,0 340400B149E10000 000000480205000C EVN_MEM_DISCOVERY
2,0,2,0 340400B109E10000 000000080205000C EVN_MEM_DISCOVERY
2,0,2,0 Start memory test ...... 0/100
.......
2,0,2,0 Memory test progress.... 33/100
.......


CL:hpiLO (+, -, <CR>, C, D, F, L, ?, Q or Ctrl-B to Quit){Pg 1 of 93}->

====================================================================

Volker Halle
Honored Contributor

Re: Two node cluster, but only one at a time is up

The values of the cluster system parameters are crucial here. Post them from both node - as asked above.

$ MC SYSGEN

SYSGEN> USE CURRENT

SYSGEN> SHOW <...>

Also both systems MUST be able to communicate directly via the LAN.

The console output you've posted is from the 2nd node,? Did you wait long enough (let's say: 5 minutes) ? The 1st node is up and running ? Any messages on the console of the 1st node, if you start (or stop) the 2nd node ? If not, it looks like the LAN communication may not work correctly. 

The systems are using LAN failover (at least 3 LAN failover sets: LLA, LLB, LLC). If none of physical devices in the LAN failover set are connected to a LAN segment, which allows cluster communication with the other node, the local system cannot join the cluster. After some time, it may report the other node - based on the data in QUORUM.DAT - but it won't be able to join the cluster without cluster-communication via the LAN.

Volker.

VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

Hi, Can you please let me know what exact command I can type to get the system parameters as I am not sure?

$ MC SYSGEN

SYSGEN> USE CURRENT

SYSGEN> SHOW <...>

Thank you..

VMSCheck
Advisor

Re: Two node cluster, but only one at a time is up

SYSMAN> PARAMETERS SHOW/LGI
%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node XXXX
Node XXXX: Parameters in use: ACTIVE
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
LGI_CALLOUTS 0 0 0 255 Count D
LGI_BRK_TERM 1 1 0 1 Boolean D
LGI_BRK_DISUSER 0 0 0 1 Boolean D
LGI_PWD_TMO 30 30 0 255 Seconds D
LGI_RETRY_LIM 3 3 0 255 Tries D
LGI_RETRY_TMO 20 20 2 255 Seconds D
LGI_BRK_LIM 5 5 1 255 Failures D
LGI_BRK_TMO 300 300 0 5184000 Seconds D
LGI_HID_TIM 300 300 0 1261440000 Seconds D

SYSMAN>