Operating System - OpenVMS
1752793 Members
5788 Online
108789 Solutions
New Discussion юеВ

Re: OpenVMS network down after reboot

 
smsc_1
Regular Advisor

Re: OpenVMS network down after reboot

a LAN Failover Device (e.g. LLA0 in OpenVMS, named LE0 in TCPIP) is a 'virtual LAN device'. It consists of one (or more) PHYSICAL LAN interfaces. If any of those PHYSICAL LAN interfaces is up (Link Up), the LAN traffic is sent/recevied via that PHYSICAL LAN interface. If that interface/link fails, the 'other' LAN interface will be selected for send/receive. If you only have ONE PHYSICAL interface in a LAN Failover set and that interface is 'Link down', so is you LAN Failover device.

Ok, so:

LE0/LLA0 ==> EID0 (PHYSICAL LAN interfaces)

So currently I have only one PHYSICAL device associated to logical device.

Because EID0 is down, also LE0 is down and the node is not reachable.

Am I right now?

Still don't understand if the issue is related to OpenVMS or hardware like network card. I bet hardware because, as I wrote, no one change the confgiuration before or after HP intervention.

./ Lucas
Volker Halle
Honored Contributor

Re: OpenVMS network down after reboot

Lucas,

you've got it !

If you have another working cluster member with the same LAN config, compare LANCP LIST DEV/CHAR from both systems.

How many network cables are plugged into those systems ? 6 or 8 ? You have 8 LAN interfaces, 6 have reported 'Link up'. Are 2 LAN interface without a cable ? If not, you have to check and follow those cables, for the 2 interfaces, which do NOT have the 'Link' LED on.

If you're not onsite, have someone send you pictures from the back of those systems...

Volker.

smsc_1
Regular Advisor

Re: OpenVMS network down after reboot

Hi Volker, thank you again for explanation.

On both nodes I have the following failover (the first node is working perfect and have exactly the same output shown).

I don't understand what it means. As I understand LLA0 have the following faolover devices:

"EID0" Failover device
"EIJ0" Failover device

But EIJ0 doesn't exists, so it means that the network on OpenVMS is misconfigured, because, as per in this case, if EID0 fails, the whole connection goes down since there is no other device active.

NODE82 Device Characteristics, permanent database, EIA0 (9-SEP-2021 12:58:51.28):

NODE82 Device Characteristics, permanent database, EIB0 (9-SEP-2021 12:58:51.28):

NODE82 Device Characteristics, permanent database, EIC0 (9-SEP-2021 12:58:51.28):
Value Characteristic
----- --------------
20 Failover priority

NODE82 Device Characteristics, permanent database, EID0 (9-SEP-2021 12:58:51.29):
Value Characteristic
----- --------------
20 Failover priority

NODE82 Device Characteristics, permanent database, EIE0 (9-SEP-2021 12:58:51.29):

NODE82 Device Characteristics, permanent database, EIF0 (9-SEP-2021 12:58:51.29):

NODE82 Device Characteristics, permanent database, EIG0 (9-SEP-2021 12:58:51.29):

NODE82 Device Characteristics, permanent database, EIH0 (9-SEP-2021 12:58:51.30):

NODE82 Device Characteristics, permanent database, EII0 (9-SEP-2021 12:58:51.30):
Value Characteristic
----- --------------
10 Failover priority

NODE82 Device Characteristics, permanent database, EIJ0 (9-SEP-2021 12:58:51.30):
Value Characteristic
----- --------------
10 Failover priority

NODE82 Device Characteristics, permanent database, LLA0 (9-SEP-2021 12:58:51.30):
Value Characteristic
----- --------------
Yes Logical LAN enabled for startup
"EID0" Failover device
"EIJ0" Failover device

NODE82 Device Characteristics, permanent database, LLB0 (9-SEP-2021 12:58:51.31):
Value Characteristic
----- --------------
Yes Logical LAN enabled for startup
"EIC0" Failover device
"EII0" Failover device

 

./ Lucas
Volker Halle
Honored Contributor

Re: OpenVMS network down after reboot

Lucas,

check the system and network documentation !!!

If you believe the stored LAN Failover config is correct, you may also be missing/have lost a complete LAN card (with 2 ports, i.e. EII and EIJ) !

Count the LAN cables and look at the pictures of the LAN interface ports of BOTH systems.

Volker.

smsc_1
Regular Advisor

Re: OpenVMS network down after reboot

You're right Volker, EII and EIJ are missing from second node (compared with the first one) due to hardware fault. I miss that information while I compared both cluster nodes information.

And finally HP was convinced  that there was HW issue. After changing the faulty ethernet adapter, everything work as expected now. The node is currently reachable.

Anyway, thank you Volker for your time, thank to the log I sent to HP, using your comands, we are able to address the issue.

Now I have another issue to address on this system

On both cluster nodes I have the following:


SYSTEM> SHOW DEVICES/REBUILD_STATUS
Device Name Rebuild needed?
NODE82$DKA100: Yes
$1$DGA100: Yes
$1$DGA200: Yes
$1$DGA300: Yes
$1$DGA301: Yes
$1$DGA302: Yes
$1$DGA303: Yes
$1$DGA400: Yes
$1$DGA401: Yes
$1$DGA500: Yes
$1$DGA501: Yes
$1$DGA502: Yes
$1$DGA503: Yes
$1$DGA600: Yes

Rebuild needed on all volumes (they are volumes connected via 3par). Honestly I don't know what kind of rebuil it needs to do. e.g. this is an internal server disk: NODE82$DKA100, moreover not used in OpenVMS startup operation (we used it just as application log repository)

During boot, OpenVMS gives me an error like "cannot rebuild due to missing space", but there is sufficient space on all volumes.

**bleep**! Never ending story!

./ Lucas
Volker Halle
Honored Contributor

Re: OpenVMS network down after reboot

Lucas,

please open a new topic for the 'rebuild needed' problem and then we'll see...

Please report the exact error message seen during OpenVMS boot when mounting the disks.

Volker.

smsc_1
Regular Advisor

Re: OpenVMS network down after reboot

Thank you Volker, solved using this documentation without wasting other forum space

https://www0.mi.infn.it/~calcolo/OpenVMS/ssb71/6015/6017p018.htm

./ Lucas
Volker Halle
Honored Contributor

Re: OpenVMS network down after reboot

Lucas,

now that you've learned something about LAN Failover, it's time to review your system network configuration - as a first step, compare the setup with the other cluster member.

You have 2 LAN Failover devices (LLA consisting of EID,EIJ and LLB consisting of EIC,EII). By loosing EIJ and EII through a LAN card failure, both of your LAN Failover devices failed. And that's because EID and EIC are not correctly connected to your network switches ! With your 10 LAN ports, you should have 10 network cables connected to this system.

LAN Failover is used to increase redundancy of the LAN connection against network component failures. So it is correct, to distribute the members of a LAN Failover set across different LAN interface cards - as you've now learned. But you also need to make sure, that the 'other' members of each LAN Failover set correctly works. After you've made sure, that the 'other' LAN interfaces (EID and EIC) have a network 'Link up', you can use LANCP> SET DEVICE/SWITCH LLAx to test, whether the 'other' member in the LAN Failover are functioning correctly.

And for the future: regularily monitor all members of LAN Failover sets, that they all have 'Link up'. 

Volker.

smsc_1
Regular Advisor

Re: OpenVMS network down after reboot

Sure, this will be my next step, and I'm already working on it.

According to HP engineer, who was onsite, there are no cables on EID and EIC at all. That's why, with other failover member down, due to hardware failure, the connectivity gone down too. This is clearly a design error, even because services on these servers are critical.

Maybe it's time to travel for me, even if I don't known if my company allow me to travel for the COVID situation.

./ Lucas
Volker Halle
Honored Contributor

Re: OpenVMS network down after reboot

Lucas,

regarding REBUILD: you've probably received - and misinterpreted - the following messages during MOUNT:

%MOUNT-I-REBLDREQD, rebuild not performed; some free space unavailable; diskquota usage stale

When a disk is not cleanly dismounted before a node shuts down (e.g. through a system crash), some blocks may still be marked as used in the volume bitmap and therefore not available for allocation. A SET VOLUME/REBUILD resolves this.

Volker.