Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

Can iSCSI multipath balance be easily monitored from Nimble storage?

 

Can iSCSI multipath balance be easily monitored from Nimble storage?

Is there an easy way to monitor the iSCSI network activity on interface "tg1" and "tg2" (or the 1 gbit data ports) on a single chart?

Is there a way to then drill down and identify which hosts are associated with the activity on the interface?

If NimbleStorage InfoSight or the Nimble System Management Web interface can not provide the information, might such statistics be available via SNMP or the Nimble API, or network sflow statistics?  I could write a script to gather the information if I new that the information was available, and how to access it.

For the VMware environments, would Infosight's VMVision facility provide the drill-down details to identify multi-path imbalance?  We are currently not using VMVision, and were somewhat concerned of VMVision's added overhead, especially on our small CS240 and CS1000 arrays.

At the "volume" level, you can see which hosts are using which volumes, and their respective throughput and latency levels ... aggregated for all data interfaces.  We would like to view similar information, but at the network interface level ... to help identify how well the hosts are balancing their respective multipath iSCSI traffic.

If your host-level multipathing was working "well", you would expect that the activity on both interfaces would be very similar. 

In our case, they are not. This is not a Nimble problem per se, but a sub-optimal host configuration issue. This gross host-side multipath imbalance is not a significant issue yet due to our low activity levels, but as we increase our IO levels, obtaining high performance levels when the iSCSI traffic is clumped on one Nimble interface, with the other Nimble interface starved for activity will be more difficult than it needs to be.

I would like to use the Nimble performance statistics to identify the worst host offenders, and then work on re-configuring those hosts for better multipath balance. An out-of-balance host that is doing a trivial amount of IO may not be worth re-configuring at this time.

Using the Nimble system management web interface (for a single storage array), you can display activity at the interface level, but there is one independent chart generated per interface, each with their own scale factor.  This makes it a bit tricky to compare the "tg1" chart with the "tg2" chart, as they often have different scale.

If I monitor the performance at the "interface" level on the individual storage array, I can see the activity across the "tg1" and "tg2" interfaces are very different, even though most of the server systems were dual-path-connected to the Nimble Storage.

We now know that these host VMs and the ESXi systems themselves may not be properly configured for dual-pathing AND configured to do a reasonable job of balancing the traffic across both server 10GbE NICs.

We are NOT using the Nimble connection manager software, as it does not apply in many of our configurations.

We have found several VMware, Linux, and Windows knowledge base type articles with the details on how to achieve not just dual-path connections, but psuedo-balanced dual path connections. In reviewing these articles we have confirmed that in many cases, we had not properly set up the multiple path configuration to achieve reasonable balance.

Thank you for your help.

Dave B

4 REPLIES
anofsinger41
Occasional Visitor

Re: Can iSCSI multipath balance be easily monitored from Nimble storage?

Sorry to answer with a question, but when you say:

We have found several VMware, Linux, and Windows knowledge base type articles with the details on how to achieve not just dual-path connections, but psuedo-balanced dual path connections.

Could you point me toward these articles, at least for VMWare / Hyper-V?

Re: Can iSCSI multipath balance be easily monitored from Nimble storage?

Unfortunately, I can't easily find the articles because they were spread across multiple sources ... VMware, the Guest OS vendor, and the storage vendor ... and "it depends" on and is significantly tainted by the overall design of the iSCSI network topology.  If you Google "vmware guest multipath" you get over 59,000 hits.

 

A second good/bad point is Nimble allows you to "bend the rules" with regard to multipathing, and if you take advantage of "bending the rules", it may become more difficult in higher levels of the networking stack to keep multiple paths separate and reasonably balanced.

 

A third source of confusion is that for many viewpoints, the primary interest in multi-pathing is high availability, tolerating the failure of a path, and path failover issues.  The discussions on how to "properly" set up multipathing from this viewpoint is focused on redundancy as a means for near-continuous availability ... not performance.

 

In general, don't assume that VMware or some other software will "do the right thing" if the dual path-ness is not very visible and straightforward.  You often need to provide some type of "hint" to the configuration to keep the paths separate.

 

Also ... iSCSI storage IO is multipath-aware for recent versions of virtualization software, recent Windows guest VMs, and recent Linux guest VMs.  GENERAL TCP NETWORKING IS NOT MULTIPATH-AWARE.

 

Whenever you combine multipath-aware iSCSI storage IO with non-multipath-aware general TCP networking you are asking for problems. Yes, it can be done. Yes, there are ways to add "hints" (like subnets or vlans) to help keep separate things separate ...but it takes more configuration effort.

 

So take advantage of vlans, subnets, separate interfaces, separate switches, separate VIRTUAL switches, separate IP addreses ... wherever possible to increase the separate-ness of multipath IO.

 

For example.  You have a physical ESXi server, with dual 10GbE interfaces. You have a dual-path iSCSI volume from Nimble storage that you want to connect as an iSCSI volume to a Linux guest VM.

 

One configuration has both 10GbE interfaces connected to a single VMware vswitch, with a single vnxnet3 virtual network interface allocated to the guest Linux VM.  You properly set up dual path multipath under the Linux VM, but when you perform an IO test, you find that all the iSCSI IO traffic is using a single 10GbE interface in VMware, and the other interface is idle.  If you ran an additional Linux VM configured the same way ... there would be a 50/50 chance that this other VM would use the same 10GbE interface ... generating congestion ... with the other interface being idle.

 

Second example ... same configuration, but this time there are TWO VMware vswitches, with one interface connected to each vswitch.  The guest VM also has TWO vmxnet3 virtual NICs, with each guest VM network intrface connected to a different vswitch.  Now, you have a very visible, and explicit, dual sets of paths from the VM guest to the Nimble storage.  In general, in this scenario, VMware will do what you expect ... IO from VM interface 1 goes to vswitch 1, and physical interface 1.  IO from VM interface 2 goes to vswitch 2, and physical interface 2.

 

If the Nimble IP address assignments were in different subnets, especially non-routable subnets, the separation would even be more explicit.  Many customers also use VLANs to separate ISCSI storage traffic.

 

Using a second virtual network interface on the VM guest is significant.  VMware takes this as a very big "hint" that you want separate IO streams ... as much as possible.  The same concept applies to using multiple virtual vswitches.  If you connect both physical interfaces to a single vswitch, it is more challenging to keep the streams separate.  Many of these issues are related to "multi-homing" which can be challenging to do ... especially in a virtualized environment.

 

Multi-pathing techniques and methods are also different for VMware managed storage, such as VMDK's within a VMware storage pool, than iSCSI volumes that are being passed through to the guest VM as raw iSCSI volumes.  Much of my interest has been in the iSCSI pass-through volumes and their issues.  However, for many customers, the VMware managed storage represents the bulk of their IO activity.  For this case, you need to refer to the VMware-specific rules and configuration options for multipath. 

 

Many of these issues may be blurred and masked if you are using the Nimble connection manager for ESXi, and/or the Guest VMs. That can be both good and bad. 

 

So ...in summary ... dual interfaces, connected to dual vswitches, with dual virtual NICs per VM is the minimum starting point, ideally with Nimble IP addresses on separate subnets.

 

If you are using a single subnet and/or a single vswitch, good multi-path balance can be achieved ... if you have dual virtual NICs assigned to the Guest VM.  This topology will need extra "hints" to keep the paths separate. The techniques are beyond the scope of this posting.

 

Dave B

swilson120
Occasional Advisor

Re: Can iSCSI multipath balance be easily monitored from Nimble storage?

This would be a nice addition on the volume page.  We come from Equallogic and it would show you the quantity of connections to a volume as well the amount of data sent over each connection and the uptime of the connection.  This info was very valuable to indicate if an iscsi session/nic/or host was "flapping".  Equallogic would show that one of the paths would have an up time of 0 minutes and low bytes and the rest would be many days if not months.  We could then take that information and go look at the switch to see if there was an error on the network interface of that host or throttling at the switch.

You can monitor the multipath on the host for only brief periods of time using a tool like iptraf on linux or windows resource monitor or sysinternals and look to see if network traffic on iscsi is about the same quantity of bytes or a period of time.

Scott

chris24
Respected Contributor

Re: Can iSCSI multipath balance be easily monitored from Nimble storage?

Hello Scott,

If you wish to file an RFE the best way is by creating a case, the point you raised has merits and I have filed the suggestion. Monitoring on your switch checking for elevating CRC / retransmits would likely also pick this up.

To monitor connections you can use monitor > connections, to view throughput monitor > interfaces.

Many thanks,

Chris