Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem at bootphase to build LLx device

Lachnitt_1
Frequent Advisor

Problem at bootphase to build LLx device

Hello World!

Some weekends ago, i tried to activate the failoverdevices in our production-cluster with 4 ES40 and one DS20.

At 4 nodes it worked fine.
But on one ES40 it looped at the boot-phase, when he's building the Failover devices:

"...
%DECnet-I-LOADED, network base image loaded, version = 05.10.00


%SMP-I-SECMSG, CPU #01 message: P01>>>START
%SMP-I-SECMSG, CPU #02 message: P02>>>START
%SMP-I-CPUTRN, CPU #01 has joined the active set. %SMP-I-CPUTRN, CPU #02 has joined the active set.
%SYSINIT-I- waiting to form or join an OpenVMS Cluster %VMScluster-I-LOADSECDB, loading the cluster security database %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN driver loaded. New device is LLA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN failset device is unavailable, LLA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN fail over device added to failset %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN failset device is unavailable, LLA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN fail over device added to failset %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.02 %LLDRIVER, Logical LAN failset devices are all unavailable for LLA0


%EWA0, Link state change - link up: 1000 mbit, full duplex %PKA0, Copyright (c) 1998 IntraServer Technology Inc. PKW V2.1.21 ROM V2.0 %PKA0, SCSI Chip is SYM53C895, Operating mode is LVD Ultra2 SCSI %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.23 %LLDRIVER, Logical LAN failset device is available, EWA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:38.23 %LLDRIVER, Logical LAN connected to physical port EWA0


%EWB0, Link state change - link up: 1000 mbit, full duplex %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:39.02 %LLDRIVER, Logical LAN failset device is available, EWB0


%EWA0, Link state change - link down
%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.23 %LLDRIVER, Logical LAN disconnected from physical port EWA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.23 %LLDRIVER, Logical LAN failset device is unavailable, EWA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.23 %LLDRIVER, Logical LAN connected to physical port EWB0


%EWB0, Link state change - link down
%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.96 %LLDRIVER, Logical LAN disconnected from physical port EWB0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.96 %LLDRIVER, Logical LAN failset device is unavailable, EWB0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:40.96 %LLDRIVER, Logical LAN failset devices are all unavailable for LLA0


%EWA0, Link state change - link up: 1000 mbit, full duplex %LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:41.97 %LLDRIVER, Logical LAN failset device is available, EWA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:41.97 %LLDRIVER, Logical LAN connected to physical port EWA0


%EWB0, Link state change - link up: 1000 mbit, full duplex

%EWA0, Link state change - link down
%LLDRIVER, Logical LAN event at 25-SEP-2004 15:36:43.23 %LLDRIVER, Logical LAN disconnected from physical port EWA0

..."
and so on.

I fixed this only, by I plugged the network cable off at EWA, then he could build the LLA and went on with the boot:

"...
%LLDRIVER, Logical LAN event at 25-SEP-2004 15:42:42.45 %LLDRIVER, Logical LAN failset devices are all unavailable for LLA0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:42:43.46 %LLDRIVER, Logical LAN failset device is available, EWB0

%LLDRIVER, Logical LAN event at 25-SEP-2004 15:42:43.46 %LLDRIVER, Logical LAN connected to physical port EWB0

%CNXMAN, Discovered system EZK24

%CNXMAN, Established connection to system EZK24

..."

On next WE we'll try it again, cause we think that a defective network equipment has disturbed the machine.

But next time such a defect can disturb us, at the same way.

And EWA and EWB are plugged in different Routing-Switches on different places (Difference ~ 500 meters).

So, why has the LL-Driver not chosen the EWB without my help?

Regards Kuddel
9 REPLIES
John Gillings
Honored Contributor

Re: Problem at bootphase to build LLx device

Kuddel,
> So, why has the LL-Driver not chosen the EWB without my help?

As you've found by "helping", if the device goes down and stays down, LL-Driver will find the other, working device and stick with it. The problem here seems to have been EWA bouncing up and down.

This kind of scenario is one of the hardest to deal with in designing recovery algorithms. TANDEM NonStop does it by implementing a "Fail Fast" strategy. Even the smallest suspicion of a fault and the component is considered failed and cut off. The disadvantage is that transition states (like reboots) can generate spurious or transient fault conditions, so it's not always a good idea.

Each time there is a change of state LLDRIVER needs to take note and decide what to do about it. It also looks like EWB was bouncing a bit. Perhaps there's a network configuration issue that was causing them to interfere with each other?

What happens if you reconnect the EWA cable now that the system is stable?
A crucible of informative mistakes
Ian Miller.
Honored Contributor

Re: Problem at bootphase to build LLx device

parhaps LLDRIVER should do hysteresis and delay failing over for a short (adjustable) period.
____________________
Purely Personal Opinion
Lachnitt_1
Frequent Advisor

Re: Problem at bootphase to build LLx device

@John & Ian

After the LLx-Device was created, I could directly after that plug in the EWA.

And after that plugging off EWB, was also no problem: The Driver switched to EWA and worked fine!

Only at llx-building-phase there is the problem.

At last WE, we choosed a new Port at the Routing-Switch and the ES40 was fine.

Short in front of open the champagne, i rebooted the other ES40 in the same Serverroom and now that box has the problem!!!

So on next WE, we'll try to find out, what is wrong with that bloody switch!

HP recommended that we should switch off "SpanningTree", but we are a little afraid of that!!!

First try would be to switch on "Fasstart".

Or has anybody a brand new idea to that?

Thx Kuddel
Volker Halle
Honored Contributor

Re: Problem at bootphase to build LLx device

Just for info:

this problem has been escalated to HP OpenVMS engineering.

The best workaround found so far is:

CTRL-P

>>> CONT

this has ALWAYS allowed both EW device links to get up and the LLA device to be successfully established.

Volker.
Lachnitt_1
Frequent Advisor

Re: Problem at bootphase to build LLx device

@all

Volker and me found out that it is a timing problem with the LL-Driver.

We had on 2 ES40s the problem, which had only this 2 Gigabit-Ifs bulit in.

On the other place the 2 Es40s has a additional 100MBIT-If in there.
So on this boxes the Drivers for the Gigabit-Ifs are loaded, and then the Driver for 100MBit. And at last - with enough time is gone - the LL-Driver loads, and at 99% everything is fine.

But on the boxes with no additional Ifs in, theres no time spent, between loading the drivers for the IFs and the LLDriver.

We only found it out by coincidence, as we pressed +

and made then CONTINUE, we analysed step-by-Step that Timing was the Problem!

Today all ES40s & the DS20 got 2 100MBit Cards, and all had booted without this Problem.

Bye Kuddel

Jan van den Ende
Honored Contributor

Re: Problem at bootphase to build LLx device

Kuddel,

do you really think Volker merits only 1 point? Me thinks, a bit meagre!
He was instrumental to the solution to your problem!
And did Ian not give the right (admittedly, unproven) answer?

if we want this forum to have a high "solution rate", at least "solved problems" deserve an 8+ recognition!

Cheers.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Lachnitt_1
Frequent Advisor

Re: Problem at bootphase to build LLx device

@Jan

As you read the the thread exactly,
you had also had read that Volker and me found it out together!!!!

And if you also watch how much points, I gave to Volker in this and other threads, you wont gave me such a bad reply!!!!

Watch better other Threads, where you and Volker and the other wise guys gave your Knowledge to rookies like me and they honor that with nothing!!!!

good night.
Jan van den Ende
Honored Contributor

Re: Problem at bootphase to build LLx device

Kuddel,

in no way it was any intend of mine to be unpleasant to you.
On the contrary, like most of the "old crew", one of our most-wished-for things is welcoming, and helping, rookie-VMSers, be they total IT-newbees, or IT-oldtimers new to VMS!!

Please, if I wrote anything that made an unpleasant impression on you, believe me, it was TOTALLY unintentional, I deeply regret it, and ask to apologise for it!

I (we) wish you (as in you personally, and any newbee as well) to feel WELCOME here, and we certainly do NOT wish to offend!!
You know, EVERY oldtimer once was a newbee as well!!

So, please, no hard feelings??

Cheers.

Have one on me.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Lachnitt_1
Frequent Advisor

Re: Problem at bootphase to build LLx device

@Jan

After that, there's no problem anymore!

Bye Kuddel