HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Andrew Steel
Advisor

Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi All,
I have an issue which hopefully someone else has come across. Setup is:
- 8-node failover cluster (2008 R2 Datacenter), HPDL380 G6's, all running Server Core
- each node has 6 Nics as follows:
- Nic 1: Management (VLAN 100)
- Nic 2: Failover (VLAN 103)
- Nic 3 & 5: iSCSI configured with MPIO (Lefthand DSM also running) (VLAN 102)
- Nic 4 & 6: Teamed for VM's (with multiple VLAN's, host not configured to connect to any of these VLAN's within Hyper-V virtual network setup)
- SAN is HP Lefthand iSCSI as follows:
- NIC's bonded using ALB (Adaptive Load Balancing)
- SAS Cluster of 4 nodes
- SATA Cluster of 2 nodes
- Network: relevant bits are 2 Cisco 4948 switches with 10Gb uplinks/trunk
- All iSCSI ports configured for Jumbo frames in addition to the uplink/trunk between the switches (server Nics, SAN Nics etc as well)
- All iSCSI ports configured for flow control (as well as server Nics etc)
Think that covers the main bits for now - let me know if you want anymore detail.

2 volumes were created on the SAS SAN cluster (a 1GB for witness disk and a 2TB to be used for CSV) bought online and NTFS volumes (1 simple volume on each) created then taken offline again. These were presented to all the nodes,
Next I have successfully configured a failover cluster (all validation tests passed - only warning for Nics 3 & 5 being on the same network as MPIO in use).
I enabled CSV. At this stage the 2 disks are online and happy. I then tried to add the 2TB disk to CSV ("add storage") - the disk proceeds to fail, attempts are made by each cluster node to bring the disk online and once it gets to the end the disk shows as "Failed". OK - start again - same problem...

Steps I've tried to solve the problem:
- Tried different sized volumes and also GPT rather than MBR - no change
- Tried without MPIO and other variations with the iSCSI settings - no change
- Looked at network issues - nothing obviuos to report
- Created a volume on the 2 node SATA cluster - this can be added to CSV and stays online and happy - so seems to be something to do with the 4 node SAS SAN (hmmm - more connections via MPIO and iSCSI to blame?)
- The SAS volumes are quite happy until you add them to CSV


I have had a hunt around and there isn't a lot of info on the web that covers what I'm seeing, or I've missed something blindingly obvious.
- hopefully someone else has come across a similar problem

Update:

OK - Some more trial and error findings:

If I enable MPIO for the 2 node SATA cluster iSCSI connections I get the same issue i.e. add to CSV then disk Fails and offline. By disconnecting the iSCSI sessions and reconnecting without Multi-Path from each of the 8 failover cluster nodes I can then get the disk back online.

Attempting the same config with the SAS cluster (which has 4 nodes) I still get the problem.

So my reasoning is that it must be something to do with the number of active iSCSI sessions:
- SATA cluster iSCSI connection without MPIO generates 3 active iSCSI sessions (an initial connection + connection to each node from 1 NIC) - This works
- SATA cluster iSCSI connection with MPIO generates 5 active sessions (an initial connection + connection to each node from each NIC)- This fails
- SAS clutser iSCSI connection without MPIO generates 5 active sessions (an initial connection + connection to each node from 1 NIC) - This fails

I know the CSV does some tricky stuff to enable mulitple nodes to read/write to the same volume - so could this be an incompatability with iSCSI MPIO and CSV?
Is anyone using iSCSI MPIO successfully with CSV?
What's your thoughts on this being a Microsoft issue or HP Lefthand DSM issue?

Thanks for any pointers...
17 REPLIES
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Another update/question:
Have tried a lot of different configs/options and still running into the same problem. Reverted to using a single NIC to see if that was an issue.

2 node SAN cluster without MPIO works (DSM still creates multiple connections though). Anymore than that fails.

I'm begining to think it doesn't work - so my new question is:

Does anyone have a P4300 Lefthand SAN with more than 2 nodes working succesfully with CSV (clustered shared volumes for Hyper-V)?

Cheers
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

OK - Brain has almost stopped working but here is my final observations on CSV, 2008 R2 and Lefthand SAN's with their DSM for today...
- 8 node cluster to 2 node san without MPIO (single NIC connection) is OK
- 8 (and 7) node cluster to 4 node SAN does not work no matter what
- 6 node cluster (or less) to 4 node SAN without MPIO is OK
- 4 node cluster to 2 node SAN with MPIO is OK
- 4 node cluster to 4 node SAN with MPIO does not work

For the ones that do work I'm not convinced of the stability. Also it didn't make any difference if I had 2 NIC's or one for iSCSI when using 8 node to 4 node SAN etc - it still failed.

If anybody has a good theory then I'm all ears. My current theory is that there is some limitation on the number of possible iSCSI connections for a CSV volume - though this is purely based on observation and not on any real understanding of why.

'Night (in Australia anyway)
teledata
Respected Contributor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

I personally haven't done any CSV with 2008 R2, but a few things I'm spitballing:

1) Have you tried without Jumbo Frames? (simply try disabling jumbo frames on the SAN)

2) Wondering if perhaps the DSM from Lefthand may have a problem with the newer R2 version of 2008. (I know there have been significant changes to the API requirements for R2. this is partly why there is such a long wait for Citrix XenApp on R2). Does the LeftHand DSM specifically state support for R2?

3) Is this problem only presented with CSV. What about traditional NTFS volumes on the same R2 server?
http://www.tdonline.com
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,

I haven't tried without the jumbo frames, will give it a shot though I am dubiuos it will make a difference.

The latest MPIO DSM compatability may be an issue, I'm also using the patch for MPIO 1.23 from the HP software update site for the P4300 - I'll dig a bit deeper. There has been the occasional blue screen after installing it so likely to be a problem.

Normal NTFS volumes work fine - it's only when they are added to CSV that they fail (and only when the number of iSCSI connections gets to a certain point - I'm yet to work this out thoroughly but you can get the gist from the previuos messages).

Cheers
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

OK - just checked the MPIO compatability and it says it's OK for 2008 R2.

From the release notes:
"The DSM for MPIO is updated to support the Windows Server 2008 R2 release.
â ¢ DSM for MPIO version 8.1.0.80"

Updated version from Dec 09 is 8.1.0.85 which I have installed.
andreasjan
Occasional Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi Andrew,
I have recently configured similar setup:
- 4 nodes of DL380 G6, Windows Server 2008 R2 with Hyper-V (FCM and CSV) and LHN DSM MPIO enabled (with latest patch vers 8.1.0.85.1). 2 NICs per server connect to the iSCSI network.
- 2 nodes of P4300 (using ALB)
- HP ProCurve switch

CSV configuration was successful. Live migration was also tested.
So far the setup is OK, only there were intermittent blue screens due to HP DSM MPIO driver and I have an open case with HP support.

I just want to check with you, in the FCM --> [windows cluster name] --> Networks --> [iSCSI network], I saw there is only 1 NIC per server shown, although I configured 2 NICs per server for the iSCSI network. I wonder whether you see also only 1 NIC per server, used for the iSCSI network.
TIA

Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,

Yes - only 1 NIC shows up - it is the first NIC it finds when doing the setup of FCM and is random.

Glad to hear things are running OK. I'd be keen to hear your feedback if HP come up with a solution for the blue screen as I'm having the same issue (with and without FCM).

I've done a complete redesign after having the issues. I was OK at the same size as your setup (i.e. 4 windows nodes and 2 lefthand nodes - it started to come unstuck with more connections (e.g. 6 windows nodes to 4 lefthand nodes).

So now I have:
1 NIC dedicated to iSCSI (jumbo frames and flow control)
1 NIC dedicated to internal cluster communication (CSV and heartbeat) - change the metric to do this (jumbo and FC)
1 NIC dedicated to Live Migration (jumbo and FC)
1 NIC for cluster management
2 NIC's teamed for VM's (but not available to the host - these have 5 VLAN's configured for placement of servers in various VLAN's e.g DMZ or internal or...)

My current theory being that the 2 NIC's with MPIO wasn't giving any greater bandwidth (2nd NIC is failover only apparently). Also it wasn't stable - CSV disks would not come online once you get to a certain number of connections. And on doing further reading about CSV the redirected IO could accomplish the fault tolerence I needed (though maybe not too great for performance - but if an entire switch has failed and everything is still running then I guess you got to be happy).
Just for clarity:
Cluster Internal NIC's are all into switch 1
Live Migration NIC's are all into switch 2
Storage NIC's are split between the 2 switches.
So a swicth failure should allow 4 of the nodes to still access the SAN and the CSV re-directed IO will keep all 8 nodes up and running (though with reduced IO performance).

I'm still building and testing but so far so good.

Cheers
andreasjan
Occasional Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,
Glad also to hear that your system goes well.

The HP support told me the next release of LHN DSM 8.5 should solve the blue screen issue. The blue screen happens intermittently when I unplug the iSCSI connection or reboot one of the switches in the iSCSI network. How did you encounter the issue? Have you done an investigation on the crash dump files?

There is another blue screen issue solved by Microsoft hotfix KB976443. This blue screen happens randomly due to Msiscsi.sys.

Other than blue screen, initially the DL380 G6 had issue with auto reboot (no blue screen, no hint at all), solved by upgrading the system PLD (on the system board). I heard that the initial shipment of DL380 G6 was plagued with this issue.

As the issue is related with LHN DSM, I don't enable the MPIO as the workaround. The workaround (no MPIO) will rely on redirected IO in case one of the switches fails or there is an NIC/cable issue.

In my case, the networking settings of each server:
- 2 NICs to iSCSI network (no jumbo at the moment). One of them connect to switch1 and the other one to switch2 (as there is no restriction, that MPIO must connect to the same switch)
- 1 NIC for Live migration/cluster comm to switch1. I called it cluster1
- 1 NIC for Live migration/cluster comm to switch2. It is called cluster2
- 2 teamed NICs (TLB) to public network for VMs external network. One NIC connect to switch3 and another one to switch4.

In your case, I guess you have also enabled cluster comm on the live migration network. So that if your switch1 fails, the cluster comm can still go through the live migration network.

So far I the windows and LHN cluster (as well as the VMs) are stable when I reboot one of the switches in the iSCSI network.

Cheers
Darren Speer
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi Guys,

I have a similar setup.

3 x DL380
2 x P4300 SAS Starter Kits (4 nodes)
2 x 2910AL Switches

I have not even got as far as getting the clustering working.

I have followed the guide http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01750150/c01750150.pdf

I have 6 NICs in the servers, of which I was going to use two for the iSCSI netowrk, one to each switch. As per the recommendation I have not teamed the server NICs, and I have ALBed the SAN nics.

If I only plug one of the NICs into the one switch they seem to operate fine, The minute I plug the gear into both switches I either A) cannot ping the SAN at all, or B) can only ping it with small packets (<1400bytes), I do have jumbo frames enabled on the server, switch and SAN

Does anyone out there have a config for two 2910AL in a similar setup that is working?

I have installed the lastest MPIO DSM, but I am now at a bit of a loss

Cheers
Pandurang
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

We have also same issue. Have you got any solution for this or limitation...

Thanks
Pandurang
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi Darren,

My first thought is to chek the uplink (or trunk or whateve you want to call it) between the switches - make sure they are passing the iSCSI VLAN.

I can't see your message while replying (which is slightly annoying) - I think you said you had the NIC's bonded and ALB running. If they are patched into seperate switches then switches also need to see each other (and pass the iSCSI traffic).

If you verify that's all OK then I'll need to think a bit harder...

Cheers
Sergio Alves
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,
I was having the exactly same problem... It was the Trunk between the two iscsi Switches.
The synthoms that i had were related to communication between the servers and the storage, like pings..

Cheers!
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Another update on the original issue:

After much playing around I settled on using 4 node windows clusters with the 4 node san. I removed the 2nd iSCSI nic and allocated it to a dedicated live migration subnet.

The problem persists that if you have too many iSCSI connections the CSV volumes fail. I have tracked it down to a problem with the persistent reservations - don't know what the solution is except to use a smaller number of iSCSI connections (less cluster nodes, less nics, no MPIO etc).

If anyone else comes across the issue and finds a solution I would be keen to know.

Cheers
jim.hendo
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

I have a similar issue with the Lefthand DSM installed.

Test Lab as follows:
2x DL360 G6
P4300 SAS Starter SAN
Server 2008 R2 Core
6x NIC

NIC 1 - Management
NIC 2 - Live Migration
NIC 3,5 - HP Team for VM's
NIC 4,6 - iSCSI MPIO

At first I install MPIO with the MS iSCSI DSM and set up multipath with Round Robin. The CSV cluster worked and the csv come online, everything was running fine.

After researching more into Lefthand SANs I found the Lefthand DSM. I installed the DSM on the first node and everything was still ok, I was able to live migrate my VMs to this node. After installing on the second node both my Quorum drive and CSV fail to start.

Thought this might be a problem with the MS DSM and Lefthand DSM both installed on the nodes and causing conflicts. So rebuilt both nodes this afternoon.

Installed OS
Installed Hyper-V Role
Installed Windows Updates
Installed PSP Pack and Teamed NICs
Installed latest Lefthand DSM and configured iSCSI Multipath
Set a 1GB LUN for Quorum and a 500GB LUN for CSV

Cluster Validation now fails because it says my shared storage does not support persistent reservations......

That's as far as I have got today, the failed cluster validation made me give up and go home... lol

So now thinking to rebuilding Nodes again and giving up on the Lefthand DSM, my SAN is mirroring over the 2nodes so I do not think I'll get much of an advantage out of using it.

Can you see any reason why the Windows DSM shouldn't be used?


Thanks
jim.hendo
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Ok more testing...

Have setup 2x Core R2 VMs and setup the OS untill the MPIO is about to be installed. Took snapshots of the VMs.
Installed the Lefthand DSM MPIO software and setup iSCSI network.

Cluster validation fails because the storage does not support persistant reservations!!!!!

Reverted snapshots and installed the Microsoft MPIO DSM, then setup the iSCSI network.

Cluster validates and builds perfectly fine!


What gives?

http://www.microsoft.com/presspass/events/virtualization/docs/LefthandHyper-VBrief.pdf

This document from Microsoft says:

Beyond the key features mentioned above, SAN/iQ has a rich set of
additional storage management features, including:
iSCSI network load balancing (via Microsoft MPIO)

So Microsoft seem to be saying its ok to use thier MPIO DSM and not Lefthands.....

AgdataIT
Occasional Visitor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

I'm having similar issues. I have a 2 node setup as of right now with 2 connections on each node to the isolated iscsi network. The HP LH Support Team says that iscsi with the HP DSM can only support Failover. This is a serious problem in my eyes. How can you create a device that can't increase throughput with MPIO? I am now limited to 1- 1gb connection from my VM Host and Guests to their data source. This makes no sense. I attached the doc that I got from them.

I have seen someone else has played with not using the HP DSM and just sticking with the MS loaded module.

Will this work for a 2 node 08 R2 CSV Hyper-V Cluster? Any caviats?

Thanks in advance
IainS
Frequent Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

AgdataITm, this is the issue I've been asking about in a recent post too and been in touch with LeftHand support.

The current 8.5 version of the DSM apparently only does failover and not load balancing across more than 1 server NIC. This is despite the manual suggesting it does do load balancing and caused me much frustration.

You can use just the Microsoft DSM and it does load balance very nicely. However it seems to only talk to one node at a time since it isn't node aware like the LeftHand DSM.

So with the Microsoft DSM you get both NICs used (assuming 2) so better peak bandwidth, however there is a noticeable reduction in write performance and iOPS for random i/o over the LH DSM. This is just with 2 LH nodes so I imagine it gets worse with more nodes.

I've reluctantly decided to go back to the LH DSM since I think we will get better performance with multiple host Hyper V nodes doing things at the same time, and database i/o etc and reluctantly accept the peak bandwidth reduction from about 190-200 MBps to about 110-115 MBps.

Apparently V9 will support load balancing and timing for it is late this year to early next (what LH support told me).

Also there are a couple of patches you may want to look into getting if you have not already. Patch 10078-02 LH support in particular was keen to get me to install (update to network drivers on LH nodes), and patch 10085-00 seems to address the issue other people in this thread are having with lots of hosts and nodes causing CSV to go offline. I stuck them both on our LH nodes yesterday and so far all seems well.