Array Setup and Networking
cancel
Showing results for 
Search instead for 
Did you mean: 

VMware ESXi w/ iSCSI boot - controller failover behavior

 
SOLVED
Go to solution
jjohnston112782
Occasional Visitor

VMware ESXi w/ iSCSI boot - controller failover behavior

Hi,

I have ran into many occurrences over time where ESXi hosts throw up an alarm stating they lost connectivity to the datastore backing the boot filesystem if they were booted using iSCSI.  I have experienced it on any array that supports iSCSI boot and the only fix is to restart the management agent on the hosts. This usually happens during a storage controller failover.  The storage controller failover process usually takes anywhere from 15-30 seconds.

I was doing some research and found a software iSCSI adapter setting named Recovery Timeout that, according to Cormac Hogan, is the number of seconds before an active path is marked dead.  It is currently set to 10 seconds.  I was wondering if I changed the setting to something like 60 seconds if there would be any adverse affects anyone could think of.  My thought is within 60 seconds the controller failover process should have completed and in the rare case that both controllers are dead it wouldn't matter if 60 seconds went by before the hosts started freaking out.


Thoughts?

11 REPLIES 11
ndyer39
HPE Blogger

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Hi,

Out of interest, do you have the Nimble NCM for VMware PSP installed in your ESX hosts? I'm curious to see if this would resolve the path timeout issue as you've observed.

Nick Dyer
Nimble Field CTO & Evangelist

twitter: @nick_dyer_
jjohnston112782
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Yes it is installed. Oddly enough two of the eight hosts did not display the warning but the rest did.

khlosey25
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Interested in this also, have you found the correct fix? I get the warning on all my hosts, NCM (NCS and PSP) is installed. I am manually failing over controllers for preproduction testing.

I changed the Recovery Timeout to 60 and still get the warnings.

jjohnston1127 is this post yours? I am not finding much info online on this.

iSCSI boot hosts - lose access to boot file system upon controller failover

jjohnston112782
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

I have not found a fix.  I have multiple targets for the boot LUN - the original static target that gets set when it boots by VMware pointing to TG1, then one static target to each of the discovery IPs for the LUN.  When I look at the paths for the datastore inside ESXi, all three show active (two going to TG1 and one going to TG2 IP), but it seems if the path to the LUN that the server booted from goes down, VMware freaks out even though there is still an active path to the data store.  I did post that in the VMware communities, hoping someone would have run into this before or told me I have something configured wrong, but no luck.  Nothing obvious stands out to me.

khlosey25
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

In my testing, manual failover does work. No loss of datastores or vm corruption, only receive the warning on the hosts.

I Have been running FC attached storage since 2002 with VMware, new to iSCSI. I too believe my iSCSI config is good,

I would be willing to compare configs sometime.

Kevin

jjohnston112782
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Yeah, I've been doing iSCSI for a long time and iSCSI boot for a quite some time. My experiences are any data store that has multiple paths fails over fine, but the boot LUN for whatever reason does not - or at least not quick enough to avoid the warning message and having to restart the host management agents.

My iSCSI setup is pretty general, per the recommended setup.  Two iSCSI vmkernels, each bound to one host vmnic with no standby adapters.  Select both vmkernels for Network Configuration, setup dynamic discovery to both iSCSI discovery IPs.

khlosey25
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Similar configuration. I am going to hit up my local Nimble engineer, will post any findings. Thanks

brad_fluit
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

I've been doing iSCSI boot for some time as well, also right in line with best practices.  Not until yesterday did I understand why the lost connectivity error comes up when where is an iSCSI interruption on the boot volume.  I'll try to summarize.

- I am usually working in Cisco UCS environments where we create 2 iSCSI boot options.  Its important to note that this doesnt equal multipath, since the BIOS is only choosing one of those paths to boot from.  Once booted, that remains the only NIC/path used for the boot LUN.

- MPIO is not supported for iSCSI boot LUN's/volumes. So the path chosen at boot time remains the only path for runtime. This explains why the error is only seen on the boot volume, and all other volumes continue to run fine after a failover.

- If the path for the boot LUN/volume is interrupted, vSphere throws the error warning we are familiar with, and never clears it. After a controller failover, my experience has been that the host is able to once again see its boot volume, but the error remains.

- A reboot of the host, or less disruptively, a restart of the management agents on the host will clear the error.

Hope that helps you out a little.

khlosey25
Occasional Visitor

Re: VMware ESXi w/ iSCSI boot - controller failover behavior

Thanks for the reply. I understand why we get the warning, I was hoping there was a setting to eliminate the warning.

Thanks guys