Re: Two node cluster, but only one at a time is up

VMSCheck · ‎04-21-2021

Hello, we have integrity 2 node cluster and one server at a time is coming up. What might be missing? Or what's need to be checked? Please help!

Thank you!

VMSCheck · ‎04-21-2021

One of the nodes in the cluster was down so tried to bring up but only one server at a time is coming up. I am sure some setting is missing somewhere or misconfigured. So can you anyone please help!

Following settings I can see from server end:

$ pipe mcr sysgen show /all|search sys$input MSCP
MSCP_LOAD 1 0 0 16384 Coded-valu
TMSCP_LOAD 1 0 0 3 Coded-valu
MSCP_SERVE_ALL 7 4 0 -1 Bit-Encode
TMSCP_SERVE_ALL 1 0 0 -1 Bit-Encode
MSCP_BUFFER 1024 1024 256 -1 Coded-valu
MSCP_CREDITS 32 32 2 1024 Coded-valu
MSCP_CMD_TMO 0 0 0 2147483647 Seconds D
$

mcr sysgen show /all

Parameters in use: Active
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
WBM_MSG_UPPER 80 80 0 -1 msgs/int D
WBM_MSG_LOWER 20 20 0 -1 msgs/int D
WBM_OPCOM_LVL 0 0 0 2 mode D
AUTO_DLIGHT_SAV 1 0 0 1 Boolean D
DELPRC_EXIT 5 5 0 7 Coded-valu D
SHADOW_REC_DLY 20 20 20 65535 Seconds D
SHADOW_HBMM_RTC 150 150 60 65535 Seconds D
MULTITHREAD 8 1 0 256 KThreads D
SHADOW_PSM_RDLY 30 30 0 65535 Seconds D
EXECSTACKPAGES 3 3 2 768 Pages D
GB_CACHEALLMAX 50000 50000 100 -1 Blocks D
GB_DEFPERCENT 35 35 0 1000 Percent D
CPU_POWER_MGMT 2 2 0 -1 Coded-valu D
CPU_POWER_THRSH 50 50 0 100 Percent D
IO_PRCPU_BITMAP 0-1023 0-1023 0 1023 CPU bitmap D
LOCKRMWT 5 5 0 10 Pure-numbe D
SSIO_SYNC_INTVL 30 30 5 65535 Seconds D
SCH_SOFT_OFFLD (none set) (none set) 0 1023 CPU bitmap D
SCH_HARD_OFFLD (none set) (none set) 0 1023 CPU bitmap D
PAGED_LAL_SIZE 0 512 0 2560 Bytes D
$

VMSCheck · ‎04-21-2021

SYSMAN> PARAMETERS SHOW/LGI
%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node XXXX
Node XXXX: Parameters in use: ACTIVE
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
LGI_CALLOUTS 0 0 0 255 Count D
LGI_BRK_TERM 1 1 0 1 Boolean D
LGI_BRK_DISUSER 0 0 0 1 Boolean D
LGI_PWD_TMO 30 30 0 255 Seconds D
LGI_RETRY_LIM 3 3 0 255 Tries D
LGI_RETRY_TMO 20 20 2 255 Seconds D
LGI_BRK_LIM 5 5 1 255 Failures D
LGI_BRK_TMO 300 300 0 5184000 Seconds D
LGI_HID_TIM 300 300 0 1261440000 Seconds D

SYSMAN>

VMSCheck · ‎04-21-2021

Anyone has any suggestion? Please suggest!!!

Steven Schweda · ‎04-21-2021

> [...] we have integrity 2 node cluster [...]

Hardware model(s)? VMS version(s)? Cluster interconnect?

> [...] and one server at a time is coming up. [...]

When you do what, exactly?

Console output?

> Following settings I can see from server end: [...]

_Which_ "server"?

Copy+paste with white space works better here with the "</>"
("Insert/Edit code sample") tool.

Volker Halle · ‎04-21-2021

Has this been working before ?

What has been changed ?

What has happenend ?

And what happens, if you do what ?

The parameters you've shown have nothing to do with the basic clustering. Consider to show the values for:

VAXCLUSTER, VOTES, EXPECTED_VOTES, DISK_QUORUM, QDSKVOTES, SCSSYSTEMID, SCSNODENAME

from BOTH nodes.

Console messages ?

If it's urgent, consider to log a call with your OpenVMS support organization. This is just a forum...

Volker.

VMSCheck · ‎04-22-2021

Yes, was working before. Admin who was maintaining this server left and I was asked to look into it. I am more UNIX person and we don't have support . One thing I am aware of the change was quorum disk was changed from thick to thin and when one of the nodes in the cluster later was down and couldn't bring up, we asked the SAN admin to move back how it was and it was changed back to thick. Still can't bring server up. And if we shutdown another node that hangs up and the other one will be up. So in a nutshell, one server at a time can only be up. Checked the network connections, that is working fine, no issues there. I am thinking some configuration check need to be set. And I need your help.

Console Log:

ogical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 18:39:53.38
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
2,1,2,0 5404006349E10000 0000000000000000 EVN_BOOT_START
***********************************************************
* ROM Version : 01.98
* ROM Date : Fri Sep 11 00:56:00 PDT 2015
***********************************************************
2,0,2,0 3404083709E10000 000000000002000C EVN_BOOT_CELL_JOINED_PD
2,1,2,0 340400B149E10000 000000480205000C EVN_MEM_DISCOVERY
2,0,2,0 340400B109E10000 000000080205000C EVN_MEM_DISCOVERY
2,0,2,0 Start memory test ...... 0/100
.......
2,0,2,0 Memory test progress.... 33/100
.......

CL:hpiLO (+, -, <CR>, C, D, F, L, ?, Q or Ctrl-B to Quit){Pg 1 of 93}->

====================================================================

Volker Halle · ‎04-22-2021

The values of the cluster system parameters are crucial here. Post them from both node - as asked above.

$ MC SYSGEN

SYSGEN> USE CURRENT

SYSGEN> SHOW <...>

Also both systems MUST be able to communicate directly via the LAN.

The console output you've posted is from the 2nd node,? Did you wait long enough (let's say: 5 minutes) ? The 1st node is up and running ? Any messages on the console of the 1st node, if you start (or stop) the 2nd node ? If not, it looks like the LAN communication may not work correctly.

The systems are using LAN failover (at least 3 LAN failover sets: LLA, LLB, LLC). If none of physical devices in the LAN failover set are connected to a LAN segment, which allows cluster communication with the other node, the local system cannot join the cluster. After some time, it may report the other node - based on the data in QUORUM.DAT - but it won't be able to join the cluster without cluster-communication via the LAN.

Volker.

VMSCheck · ‎04-22-2021

Hi, Can you please let me know what exact command I can type to get the system parameters as I am not sure?

$ MC SYSGEN

SYSGEN> USE CURRENT

SYSGEN> SHOW <...>

Thank you..

VMSCheck · ‎04-22-2021

SYSMAN> PARAMETERS SHOW/LGI
%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node XXXX
Node XXXX: Parameters in use: ACTIVE
Parameter Name Current Default Minimum Maximum Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
LGI_CALLOUTS 0 0 0 255 Count D
LGI_BRK_TERM 1 1 0 1 Boolean D
LGI_BRK_DISUSER 0 0 0 1 Boolean D
LGI_PWD_TMO 30 30 0 255 Seconds D
LGI_RETRY_LIM 3 3 0 255 Tries D
LGI_RETRY_TMO 20 20 2 255 Seconds D
LGI_BRK_LIM 5 5 1 255 Failures D
LGI_BRK_TMO 300 300 0 5184000 Seconds D
LGI_HID_TIM 300 300 0 1261440000 Seconds D

SYSMAN>

Volker Halle · ‎04-22-2021

$ MC SYSGEN

SYSGEN> SHOW VAXCLUSTER

SYSGEN> SHOW EXPECTED

SYSGEN> SHOW VOTES

SYSGEN> SHOW DISK_QUORUM

SYSGEN> SHOW QDSKVOTES

SYSGEN> SHOW NISCS_LOAD_PEA0

SYSGEN> EXIT

And please post this data from BOTH nodes - if possible.

Under normal circumstances (2 nodes with a quorum disk), you should have: VOTES=1, QDSKVOTES=1, EXPECTED_VOTES=3, of course VAXCLUSTER=2 and NISCS_LOAD_PEA0=1

If you wait long enough (2-5 minutes !), if the 2nd node can't see the 1st running node, you should get: 'Have connection to...' messages, if cluster communication via the LAN works, but the new node is not allowed to join the cluster. Please carefully check the LAN cabling between the 2 nodes. Consider that LAN failover may automatically switch to a physical device, which presents the carrier signal, but it not correctly connected to the other node.

Volker.

VMSCheck · ‎04-22-2021

From the node which is up:

$ mc sysgen
SYSGEN> SHOW VAXCLUSTER
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VAXCLUSTER 2 1 0 2 Coded-valu
SYSGEN> SHOW EXPECTED
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
EXPECTED_VOTES 3 1 1 127 Votes
SYSGEN> SHOW VOTES
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
VOTES 1 1 0 127 Votes
SYSGEN> SHOW DISK_QUORUM
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
DISK_QUORUM "$1$DGA299 " " " " " "ZZZZ" Ascii
SYSGEN> SHOW QDSKVOTES
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
QDSKVOTES 1 1 0 127 Votes
SYSGEN> SHOW NISCS_LOAD_PEA0
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
NISCS_LOAD_PEA0 1 0 0 1 Boolean
SYSGEN>

Volker Halle · ‎04-22-2021

As you can see, these are actually the values I expected for a 2-node OpenVMS SAN cluster with a quorum disk !

Now boot the 2nd node and wait for 5 minutes, then post the complete OpenVMS console output - starting at the OpenVMS banner message:

HP OpenVMS Industry Standard 64 Operating System, Version ...

of the 2nd node and also the console output of the 1st node, if there is any during these 5 minutes.

Volker.

Volker Halle · ‎04-22-2021

The Itanium console has a large output buffer. Consider to scroll back to the most recent successful boot of those nodes and carefully check, which physical LAN interfaces have been used in the various LAN failover sets last time it worked, e.g.:

%LLC0, Logical LAN failset device connected to physical device EWD0

Note the physical device for each of the LAN failover sets LLA0, LLB0 and LLC0 and compare them to the physical devices used now.

Volker.

VMSCheck · ‎04-22-2021

From Console, I see the following from iLOM:

ogical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 18:39:53.38
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
===================================================================

And from Console Log I see the following:

ogical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 18:39:53.38
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
2,1,2,0 5404006349E10000 0000000000000000 EVN_BOOT_START
***********************************************************
* ROM Version : 01.98
* ROM Date : Fri Sep 11 00:56:00 PDT 2015
***********************************************************
2,0,2,0 3404083709E10000 000000000002000C EVN_BOOT_CELL_JOINED_PD
2,1,2,0 340400B149E10000 000000480205000C EVN_MEM_DISCOVERY
2,0,2,0 340400B109E10000 000000080205000C EVN_MEM_DISCOVERY
2,0,2,0 Start memory test ...... 0/100
.......
2,0,2,0 Memory test progress.... 33/100
.......

CL:hpiLO (+, -, <CR>, C, D, F, L, ?, Q or Ctrl-B to Quit){Pg 1 of 93}->

Volker Halle · ‎04-22-2021

Using '+' or '-', you can scroll back and forward through the console log. The OpenVMS boot starts at the OpenVMS banner message - as shown above...

Volker.

VMSCheck · ‎04-22-2021

With +, I see the following:

2,0,2,0 Memory test progress.... 66/100
.......
2,0,2,0 Memory test progress.... 100/100
2,0,2,0 1404002609E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,3,0 140400260DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,2,0 1404002649E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,3,0 140400264DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,3,1 140400260FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,2,1 140400264BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,3,1 140400264FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,2,1 140400260BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,2,0 5404020709E10000 000000000011000C EVN_EFI_START

Press Ctrl-C now to bypass loading option ROM UEFI drivers.

2,0,2,0 3404008109E10000 000000000007000C EVN_IO_DISCOVERY_START
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected

CL:hpiLO (+, -, <CR>, C, D, F, L, ?, Q or Ctrl-B to Quit){Pg 2 of 93}->

============================================================================

VMSCheck · ‎04-22-2021

How can I boot from here?

Volker Halle · ‎04-22-2021

CL:hpiLO (+, -, <CR>, C, D, F, L, ?, Q or Ctrl-B to Quit){Pg 2 of 93}->

As you can see, there are lots of pages in the console log. You need to find and post the most relevant data. I'm sorry to say, but this may be hard, if you don't have enough OpenVMS and Itanium knowledge...

Volker.

VMSCheck · ‎04-22-2021

This what I see when I reboot the node which is not up:

SYSBOOT> set STARTUP_P2 "YES"

SYSBOOT> continue

%RAD-I-ENABLED, RAD Support is enabled for 2 RADs

HP OpenVMS Industry Standard 64 Operating System, Version V8.4
▒ Copyright 1976-2019 Hewlett-Packard Development Company, L.P.

PGQBT-I-INIT-UNIT, boot driver, PCI device ID 0x2532, FW 4.04.04
PGQBT-I-BUILT, version X-33, built on Jul 19 2011 @ 16:12:20
PGQBT-I-LINK_WAIT, waiting for link to come up
PGQBT-I-TOPO_WAIT, waiting for topology ID
%DECnet-I-LOADED, network base image loaded, version = 05.17.02

%CNXMAN, Using remote access method for quorum disk
%SMP-I-CPUTRN, CPU #1 has joined the active set.
%SMP-I-CPUTRN, CPU #5 has joined the active set.
%SMP-I-CPUTRN, CPU #2 has joined the active set.
%SMP-I-CPUTRN, CPU #3 has joined the active set.
%SMP-I-CPUTRN, CPU #7 has joined the active set.
%SMP-I-CPUTRN, CPU #6 has joined the active set.
%SMP-I-CPUTRN, CPU #4 has joined the active set.
%VMScluster-I-LOADSECDB, loading
the cluster security database
%EWA0, Link up: 10 gbit, full duplex, flow control disabled
%EWE0, Function is disabled

%EWF0, Function is disabled

%EWC0, Link up: 10 gbit, full duplex, flow control disabled
%EWG0, Function is disabled

%EWH0, Function is disabled

%EWB0, Link up: 10 gbit, full duplex, flow control disabled
%EWD0, Link up: 10 gbit, full duplex, flow control disabled
%EWI0, Link up: 10 gbit, full duplex, flow control disabled
%EWM0, Function is disabled

%EWN0, Function is disabled

%EWO0, Function is disabled

%EWP0, Function is disabled

%EWK0, Link up: 10 gbit, full duplex, flow control disabled
%EWJ0, Link up: 10 gbit, full duplex, flow control disabled
%EWL0, Link up: 10 gbit, full duplex, flow control disabled
%EWA0, Jumbo frames enabled
%EWJ0, Jumbo frames enabled
%LLA0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLA0, Logical LAN failset device created
%LLA0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLA0, Logical LAN failover device added to failset, EWC0

%LLA0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLA0, Logical LAN failover device added to failset, EWL0
%LLA0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLA0, Logical LAN failset device connected to physical device EWL0
%LLB0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLB0, Logical LAN failset device created
%LLB0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLB0, Logical LAN failover device added to failset, EWB0
%LLB0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLB0, Logical LAN failover device added to failset, EWI0
%LLB0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLB0, Logical LAN failset device connected to physical device EWI0
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failset device created
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failover device added to failset, EWD0
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failset device connected to physical device EWD0

%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
%CNXMAN, Have "connection" to quorum disk

VMSCheck · ‎04-22-2021

Just tried to reset this node and logs in detail from how I did reset was as follows and stuck there as mentioned before:

- - - - - - - - - - Prior Console Output - - - - - - - - - -
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 20-APR-2021 19:02:02.30
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
%CNXMAN, Have "connection" to quorum disk
- - - - - - - - - - - - Live Console - - - - - - - - - - - -

MP MAIN MENU:

CO: Console
VFP: Virtual Front Panel
CM: Command Menu
CL: Console Log
SL: Show Event Logs
HE: Main Help Menu
X: Exit Connection

[ilo-xxxx-04]</> hpiLO-> cm

(Use Ctrl-B to return to MP main menu.)

[ilo-xxxx-04] CM:hpiLO-> rs

RS

Execution of this command irrecoverably halts all system processing and
I/O activity and restarts the computer system.

Type Y to confirm your intention to restart the system: (Y/[N]) y
y
-> SPU hardware was successfully issued a reset.

[ilo-xxxx-04] CM:hpiLO->

MP MAIN MENU:

CO: Console
VFP: Virtual Front Panel
CM: Command Menu
CL: Console Log
SL: Show Event Logs
HE: Main Help Menu
X: Exit Connection

[ilo-xxxx-04]</> hpiLO-> co

[Use Ctrl-B or ESC-( to return to MP main menu.]

- - - - - - - - - - Prior Console Output - - - - - - - - - -
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
2,1,2,0 5404006349E10000 0000000000000000 EVN_BOOT_START
***********************************************************
* ROM Version : 01.98
* ROM Date : Fri Sep 11 00:56:00 PDT 2015
***********************************************************
2,0,2,0 3404083709E10000 000000000002000C EVN_BOOT_CELL_JOINED_PD

- - - - - - - - - - - - Live Console - - - - - - - - - - - -
2,1,2,0 340400B149E10000 000000480205000C EVN_MEM_DISCOVERY
2,0,2,0 340400B109E10000 000000080205000C EVN_MEM_DISCOVERY
2,0,2,0 Start memory test ...... 0/100
.......
2,0,2,0 Memory test progress.... 33/100
.......
2,0,2,0 Memory test progress.... 66/100
.......
2,0,2,0 Memory test progress.... 100/100
2,0,2,0 1404002609E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,3,0 140400260DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,2,0 1404002649E10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,3,0 140400264DE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,3,1 140400260FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,3,1 140400264FE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,1,2,1 140400264BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,2,1 140400260BE10000 000000000006000C EVN_BOOT_CPU_LATE_TEST_START
2,0,2,0 5404020709E10000 000000000011000C EVN_EFI_START

Press Ctrl-C now to bypass loading option ROM UEFI drivers.

2,0,2,0 3404008109E10000 000000000007000C EVN_IO_DISCOVERY_START
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
Dual Port Flex10 10GbE BL8XXc i2 Embedded CNIC is detected
HP PCIe 2Port 8Gb Fibre Channel Adapter (driver 2.27, firmware 5.06.006)
HP PCIe 2Port 8Gb Fibre Channel Adapter (driver 2.27, firmware 5.06.006)
2,0,2,0 5404020B09E10000 0000000000000006 EVN_EFI_LAUNCH_BOOT_MANAGER
(C) Copyright 1996-2010 Hewlett-Packard Development Company, L.P.

Note, menu interfaces might only display on the primary console device.
The current primary console device is:
Serial PcieRoot(0x30304352)/Pci(0x1,0x0)/Pci(0x0,0x5)
The primary console can be changed via the 'conconfig' UEFI shell command.

Press: ENTER - Start boot entry execution
B / b - Launch Boot Manager (menu interface)
D / d - Launch Device Manager (menu interface)
M / m - Launch Boot Maintenance Manager (menu interface)
S / s - Launch UEFI Shell (command line interface)
I / i - Launch iLO Setup Tool (command line interface)

*** User input can now be provided ***

Automatic boot entry execution will start in 1 second(s).
Booting xxxx Normal Boot $1$DGA300: FGB0.2012-0002-AC00-3D42

PGQBT-I-INIT-UNIT, IPB, PCI device ID 0x2532, FW 4.04.04
PGQBT-I-BUILT, version X-33, built on Jan 16 2015 @ 12:02:52
PGQBT-I-LINK_WAIT, waiting for link to come up
PGQBT-I-TOPO_WAIT, waiting for topology ID

%RAD-I-ENABLED, RAD Support is enabled for 2 RADs

HP OpenVMS Industry Standard 64 Operating System, Version V8.4
▒ Copyright 1976-2019 Hewlett-Packard Development Company, L.P.

PGQBT-I-INIT-UNIT, boot driver, PCI device ID 0x2532, FW 4.04.04
PGQBT-I-BUILT, version X-33, built on Jul 19 2011 @ 16:12:20
PGQBT-I-LINK_WAIT, waiting for link to come up
PGQBT-I-TOPO_WAIT, waiting for topology ID
%DECnet-I-LOADED, network base image loaded, version = 05.17.02

%CNXMAN, Using remote access method for quorum disk
%SMP-I-CPUTRN, CPU #1 has joined the active set.
%SMP-I-CPUTRN, CPU #5 has joined the active set.
%SMP-I-CPUTRN, CPU #2 has joined the active set.
%SMP-I-CPUTRN, CPU #6 has joined the active set.
%SMP-I-CPUTRN, CPU #4 has joined the active set.
%SMP-I-CPUTRN, CPU #3 has joined the active set.
%SMP-I-CPUTRN, CPU #7 has joined the active set.
%VMScluster-I-LOADSECDB, loading
the cluster security database
%EWA0, Link up: 10 gbit, full duplex, flow control disabled
%EWE0, Function is disabled

%EWF0, Function is disabled

%EWG0, Function is disabled

%EWC0, Link up: 10 gbit, full duplex, flow control disabled
%EWH0, Function is disabled

%EWD0, Link up: 10 gbit, full duplex, flow control disabled
%EWB0, Link up: 10 gbit, full duplex, flow control disabled
%EWI0, Link up: 10 gbit, full duplex, flow control disabled
%EWM0, Function is disabled

%EWN0, Function is disabled

%EWO0, Function is disabled

%EWP0, Function is disabled

%EWK0, Link up: 10 gbit, full duplex, flow control disabled
%EWJ0, Link up: 10 gbit, full duplex, flow control disabled
%EWL0, Link up: 10 gbit, full duplex, flow control disabled
%EWA0, Jumbo frames enabled
%EWJ0, Jumbo frames enabled
%LLA0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLA0, Logical LAN failset device created
%LLA0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLA0, Logical LAN failover device added to failset, EWC0
%LLA0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLA0, Logical LAN failover device added to failset, EWL0
%LLA0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLA0, Logical LAN failset device connected to physical device EWL0
%LLB0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLB0, Logical LAN failset device created
%LLB0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLB0, Logical LAN failover device added to failset, EWB0
%LLB0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLB0, Logical LAN failover device added to failset, EWI0
%LLB0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLB0, Logical LAN failset device connected to physical device EWI0
%LLC0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLC0, Logical LAN failset device created
%LLC0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLC0, Logical LAN failover device added to failset, EWD0
%LLC0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLC0, Logical LAN failover device added to failset, EWK0
%LLC0, Logical LAN event at 22-APR-2021 16:44:10.94
%LLC0, Logical LAN failset device connected to physical device EWD0
%SYSINIT-I- found a valid OpenVMS Cluster quorum disk
%SYSINIT-I- waiting to form or join an OpenVMS Cluster
%MSCPLOAD-I-CONFIGSCAN, enabled automatic disk serving
%CNXMAN, Using local access method for quorum disk
%CNXMAN, Established "connection" to quorum disk
%CNXMAN, Have "connection" to quorum disk

Hein_vdHeuvel_d · ‎04-22-2021

>> Yes, was working before. Admin who was maintaining this server left and I was asked to look into it. I am more UNIX person and we don't have support .

It's very gracious of Volker to try to help, but I would urge you to go back to your management and tell them this is beyond basic operations and a simple google searched. You have done a good job reaching out and found this forum, but now the time has come to pay up

Let them hire a consultant for good money and use the opportunity to learn about the system. They got away with it thos far and now let them pay for their sins so to speak. They probably/possibly saved tens of thousands in maintenance and/or hiring/training a person with the rigth skill set. Now it is it time to pay a few thousand to a desrving consultant (not me!)

Good luck,

Hein

VMSCheck · ‎04-22-2021

Management is in the process of doing it and it takes weeks for renewal, bad timing it broke is all. But I thought this is the forum to discuss and get help each other as I am ineterested to debug and fix. It will be discouraging for unix people to learn openvms when I see this.

Dave Lennon · ‎04-22-2021

Hi, is it safe to say it is "hanging" right after it mentions the quorum disk? You did give it several minutes (maybe up to 5) to re-establish quorum, right?

I believe the quorum disk needs to be "VMS initialized" before it is used by the clustering software -- I know it will re-create the quorum.dat file in the top-level directory if someone accidentally deletes it (very early on in their career).

Again, not a simple fix for someone not well-versed in VMS, but I'd suggest booting one node "conversationally" or perhaps off the install DVD (both are not up running VMS now, right?) and making sure that quorum disk unit is a-okay, i.e. it can be mounted as a VMS (ODS-2 or ODS-5) volume.

It should not matter if the disk unit is thin or thick provisioned on the SAN storage array, VMS really doesn't care (or know).

Did you say your site will have a position for VMS system manager? Where is it located?

VMSCheck · ‎04-22-2021

Yes, I waited long enough and infact it is staying at that stage since a day. I did reset and it came back again and hung there. Would it possible when SAN concoverted to thick and thin, can that change anything to break on system level to cause the issue we are seeing?

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Two node cluster, but only one at a time is up

Two node cluster, but only one at a time is up