HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

SAN/IQ upgrade performance degradation caused VMFS iSCSI timeouts

SAN/IQ upgrade performance degradation caused VMFS iSCSI timeouts

We were recently/finally upgrading from SAN/IQ 8.5 to 9.0.  We've been a LeftHand customer for years, since way back before the acquisition, so we're really familiar with the upgrade process and have done it a number of times since 7.0.

 

It appears that the "parallelization" of the new 9.0 upgrade tool may have grabbed more performance from our P4500 and P4500G2 nodes than they could actually spare, causing our ESXi servers to report, via syslog, to the mothership:

 

vmc004 iscsid: Kernel reported iSCSI connection 5:0 (iqn.2003-10.com.lefthandnetworks:san-nj1-mg:68:vmfs001 if=default addr=10.16.56.10:3260 (TPGT:1 ISID:0x1) (T4 C0)) error (1011) state (2)

 

Now, repeat that for about 8 VMFS volumes and a dozen VMC hosts, and you've got every single one of our VMs' underlying Linux filesystems dropping into Read-Only mode and requiring a reboot in order to come back to sanity. 

 

We've never experienced performance problems this bad in any previous upgrade paths, but those upgrades all went one at a time across the nodes in a slow methodical manner. My conjecture is that the new parallel process was eating up cycles on all the nodes in parallel (even if only one node at a time was physically "out of the cluster" for reboot/resync/whatever), and with a cluster our size, it could easily have been bouncing/resyncing two nodes at the same time, further putting a crimp on performance.

 

(a) Has anyone else seen this happen?

(b) Has anyone come up with a way to prevent it from happening in the future?