HPE EVA Storage
1843980 Members
1901 Online
110226 Solutions
New Discussion

inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

 
Chris Warne_1
Advisor

inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

We have a problem on ES40s with dual KGPSAs using MSA1000s which we don't really understand.

bootdev_def is set to dga104.1001.0.1.0 dgb104.1002.0.2.0 dga204.1004.0.1.0 dgb204.1003.0.2.0
(i.e. a system shadow set where each disk is dual pathed via each controller)

Now, whenever the system boots, the following is displayed :

P00>>>b
(boot dga104.1001.0.1.0 -flags 0,0)
dga104.1001.0.1.0 is not connected
failed to open dga104.1001.0.1.0
(boot dgb104.1002.0.2.0 -flags 0,0)
block 0 of dgb104.1002.0.2.0 is a valid boot block

The system then boots OK via the second path.

However, today we noticed that specifying the actual boot device allowed the system to boot from it :

P00>>>b dga104.1001.0.1.0
(boot dga104.1001.0.1.0 -flags 0,0)
block 0 of dga104.1001.0.1.0 is a valid boot block

and subsequently, the first device/path is now always available for boot when using the bootdef_dev, even after a power cycle :

P00>>>b
(boot dga104.1001.0.1.0 -flags 0,0)
block 0 of dga104.1001.0.1.0 is a valid boot block


However, the system will not now boot from the dgb path.
I should also mention that ffauto and ffnext are both OFF.

What I'd like to know is what determines whether the disk/path is "connected" or not. It seems rather inconsistent at the moment, and it would seem that if one of the two KGPSA cards fail, the system will not necessarily boot via the other card .

regards,
Chris
Chris
6 REPLIES 6
Uwe Zessin
Honored Contributor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

The MSA1000 (and many other storage arrays) do not work in an 'active/active' mode. One controller is managing the I/O to a single LUN[1] - this is the path you can boot from and which shows as 'connected'.

The other controller presents the internal logical disk on the same LUN address, so that the server is aware that there is an alternate path available it can switch to in case the current path fails. An I/O request will be rejected with a 'NOT READY' message - the WWIDMGR displays this as 'not connected' and you can normally not boot over this path.

A server _can_ require a failover of this LUN [2] by sending a SCSI 'START UNIT' command via the 'unconnected' path. I have never tried/ noticed it, but my guess is that a BOOT command with an explicit device specified sends a 'START UNIT'.

You should be able to see this failover by watching the LED displays of both MSA1000 controllers. Top row, second LED from right indicates the active [1] controller.


[1] The MSA1000 is a special case, because _all_ logical disks are managed by the 'active controller' - the other one is called the 'standby controller'.

[2] Watch out, as the failover of a single LUN will implicitly cause a failover of *ALL* LUNs, due to [1].
.
Chris Warne_1
Advisor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

Thanks for the response, Uwe.
So this implies then that despite having two controllers etc, the system's not truly redundant?
Our MSAs are connected to the KGPSAs via a Sanswitch, and I thought the reason for this is to allow everything to "appear" to each KGPSA, so if one KGPSA goes down, or a cable fails, eveything is still accessible via the other KGPSA, but it looks like the "redundancy" does not include failover from one KGPSA to the other without a system outage ....
Chris
Uwe Zessin
Honored Contributor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

Define "truly redundant" ;-)

I have done a few OpenvMS/MSA1000 implementations and OpenVMS did tell the MSA1000 to do a failover when a cable was cut or a switch went down. It is my understanding that an adapter failure usually results in an operating system crash anyway.
.
Chris Warne_1
Advisor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

It seems I can overcome this "redundancy" issue simply by setting FFAUTO to ON - then it will always boot regardless of whether one or other of the KGPSAs is down/disconnected.

I'm not sure why this parameter isn't set anyway, as I can't see any situation where you wouldn't want it to try to boot from each device listed in bootdef_dev.
Chris
Uwe Zessin
Honored Contributor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

I think it is not the default, because it can cause unintended failovers. Fortunately, FFAUTO=ON only kicks in, when all boot paths have been tried and none have been in a 'connected' state.

The MSA1000 is a workgroup array with only one host port per controller. Such a configuration cannot cope with some multiple errors.
.
Chris Warne_1
Advisor

Re: inconsistency using BOOTDEF_DEV list with dual KGPSA controllers

Thanks for you comments. It makes a lot more sense to me now than it did 24 hours ago!
Chris