StoreVirtual Storage
1751829 Members
5028 Online
108782 Solutions
New Discussion юеВ

Lefthand P4300 - MPIO with Hyper-V CSV - Problem

 
Andrew Steel
Advisor

Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi All,
I have an issue which hopefully someone else has come across. Setup is:
- 8-node failover cluster (2008 R2 Datacenter), HPDL380 G6's, all running Server Core
- each node has 6 Nics as follows:
- Nic 1: Management (VLAN 100)
- Nic 2: Failover (VLAN 103)
- Nic 3 & 5: iSCSI configured with MPIO (Lefthand DSM also running) (VLAN 102)
- Nic 4 & 6: Teamed for VM's (with multiple VLAN's, host not configured to connect to any of these VLAN's within Hyper-V virtual network setup)
- SAN is HP Lefthand iSCSI as follows:
- NIC's bonded using ALB (Adaptive Load Balancing)
- SAS Cluster of 4 nodes
- SATA Cluster of 2 nodes
- Network: relevant bits are 2 Cisco 4948 switches with 10Gb uplinks/trunk
- All iSCSI ports configured for Jumbo frames in addition to the uplink/trunk between the switches (server Nics, SAN Nics etc as well)
- All iSCSI ports configured for flow control (as well as server Nics etc)
Think that covers the main bits for now - let me know if you want anymore detail.

2 volumes were created on the SAS SAN cluster (a 1GB for witness disk and a 2TB to be used for CSV) bought online and NTFS volumes (1 simple volume on each) created then taken offline again. These were presented to all the nodes,
Next I have successfully configured a failover cluster (all validation tests passed - only warning for Nics 3 & 5 being on the same network as MPIO in use).
I enabled CSV. At this stage the 2 disks are online and happy. I then tried to add the 2TB disk to CSV ("add storage") - the disk proceeds to fail, attempts are made by each cluster node to bring the disk online and once it gets to the end the disk shows as "Failed". OK - start again - same problem...

Steps I've tried to solve the problem:
- Tried different sized volumes and also GPT rather than MBR - no change
- Tried without MPIO and other variations with the iSCSI settings - no change
- Looked at network issues - nothing obviuos to report
- Created a volume on the 2 node SATA cluster - this can be added to CSV and stays online and happy - so seems to be something to do with the 4 node SAS SAN (hmmm - more connections via MPIO and iSCSI to blame?)
- The SAS volumes are quite happy until you add them to CSV


I have had a hunt around and there isn't a lot of info on the web that covers what I'm seeing, or I've missed something blindingly obvious.
- hopefully someone else has come across a similar problem

Update:

OK - Some more trial and error findings:

If I enable MPIO for the 2 node SATA cluster iSCSI connections I get the same issue i.e. add to CSV then disk Fails and offline. By disconnecting the iSCSI sessions and reconnecting without Multi-Path from each of the 8 failover cluster nodes I can then get the disk back online.

Attempting the same config with the SAS cluster (which has 4 nodes) I still get the problem.

So my reasoning is that it must be something to do with the number of active iSCSI sessions:
- SATA cluster iSCSI connection without MPIO generates 3 active iSCSI sessions (an initial connection + connection to each node from 1 NIC) - This works
- SATA cluster iSCSI connection with MPIO generates 5 active sessions (an initial connection + connection to each node from each NIC)- This fails
- SAS clutser iSCSI connection without MPIO generates 5 active sessions (an initial connection + connection to each node from 1 NIC) - This fails

I know the CSV does some tricky stuff to enable mulitple nodes to read/write to the same volume - so could this be an incompatability with iSCSI MPIO and CSV?
Is anyone using iSCSI MPIO successfully with CSV?
What's your thoughts on this being a Microsoft issue or HP Lefthand DSM issue?

Thanks for any pointers...
17 REPLIES 17
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Another update/question:
Have tried a lot of different configs/options and still running into the same problem. Reverted to using a single NIC to see if that was an issue.

2 node SAN cluster without MPIO works (DSM still creates multiple connections though). Anymore than that fails.

I'm begining to think it doesn't work - so my new question is:

Does anyone have a P4300 Lefthand SAN with more than 2 nodes working succesfully with CSV (clustered shared volumes for Hyper-V)?

Cheers
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

OK - Brain has almost stopped working but here is my final observations on CSV, 2008 R2 and Lefthand SAN's with their DSM for today...
- 8 node cluster to 2 node san without MPIO (single NIC connection) is OK
- 8 (and 7) node cluster to 4 node SAN does not work no matter what
- 6 node cluster (or less) to 4 node SAN without MPIO is OK
- 4 node cluster to 2 node SAN with MPIO is OK
- 4 node cluster to 4 node SAN with MPIO does not work

For the ones that do work I'm not convinced of the stability. Also it didn't make any difference if I had 2 NIC's or one for iSCSI when using 8 node to 4 node SAN etc - it still failed.

If anybody has a good theory then I'm all ears. My current theory is that there is some limitation on the number of possible iSCSI connections for a CSV volume - though this is purely based on observation and not on any real understanding of why.

'Night (in Australia anyway)
teledata
Respected Contributor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

I personally haven't done any CSV with 2008 R2, but a few things I'm spitballing:

1) Have you tried without Jumbo Frames? (simply try disabling jumbo frames on the SAN)

2) Wondering if perhaps the DSM from Lefthand may have a problem with the newer R2 version of 2008. (I know there have been significant changes to the API requirements for R2. this is partly why there is such a long wait for Citrix XenApp on R2). Does the LeftHand DSM specifically state support for R2?

3) Is this problem only presented with CSV. What about traditional NTFS volumes on the same R2 server?
http://www.tdonline.com
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,

I haven't tried without the jumbo frames, will give it a shot though I am dubiuos it will make a difference.

The latest MPIO DSM compatability may be an issue, I'm also using the patch for MPIO 1.23 from the HP software update site for the P4300 - I'll dig a bit deeper. There has been the occasional blue screen after installing it so likely to be a problem.

Normal NTFS volumes work fine - it's only when they are added to CSV that they fail (and only when the number of iSCSI connections gets to a certain point - I'm yet to work this out thoroughly but you can get the gist from the previuos messages).

Cheers
Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

OK - just checked the MPIO compatability and it says it's OK for 2008 R2.

From the release notes:
"The DSM for MPIO is updated to support the Windows Server 2008 R2 release.
├в ┬в DSM for MPIO version 8.1.0.80"

Updated version from Dec 09 is 8.1.0.85 which I have installed.
andreasjan
Occasional Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi Andrew,
I have recently configured similar setup:
- 4 nodes of DL380 G6, Windows Server 2008 R2 with Hyper-V (FCM and CSV) and LHN DSM MPIO enabled (with latest patch vers 8.1.0.85.1). 2 NICs per server connect to the iSCSI network.
- 2 nodes of P4300 (using ALB)
- HP ProCurve switch

CSV configuration was successful. Live migration was also tested.
So far the setup is OK, only there were intermittent blue screens due to HP DSM MPIO driver and I have an open case with HP support.

I just want to check with you, in the FCM --> [windows cluster name] --> Networks --> [iSCSI network], I saw there is only 1 NIC per server shown, although I configured 2 NICs per server for the iSCSI network. I wonder whether you see also only 1 NIC per server, used for the iSCSI network.
TIA

Andrew Steel
Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,

Yes - only 1 NIC shows up - it is the first NIC it finds when doing the setup of FCM and is random.

Glad to hear things are running OK. I'd be keen to hear your feedback if HP come up with a solution for the blue screen as I'm having the same issue (with and without FCM).

I've done a complete redesign after having the issues. I was OK at the same size as your setup (i.e. 4 windows nodes and 2 lefthand nodes - it started to come unstuck with more connections (e.g. 6 windows nodes to 4 lefthand nodes).

So now I have:
1 NIC dedicated to iSCSI (jumbo frames and flow control)
1 NIC dedicated to internal cluster communication (CSV and heartbeat) - change the metric to do this (jumbo and FC)
1 NIC dedicated to Live Migration (jumbo and FC)
1 NIC for cluster management
2 NIC's teamed for VM's (but not available to the host - these have 5 VLAN's configured for placement of servers in various VLAN's e.g DMZ or internal or...)

My current theory being that the 2 NIC's with MPIO wasn't giving any greater bandwidth (2nd NIC is failover only apparently). Also it wasn't stable - CSV disks would not come online once you get to a certain number of connections. And on doing further reading about CSV the redirected IO could accomplish the fault tolerence I needed (though maybe not too great for performance - but if an entire switch has failed and everything is still running then I guess you got to be happy).
Just for clarity:
Cluster Internal NIC's are all into switch 1
Live Migration NIC's are all into switch 2
Storage NIC's are split between the 2 switches.
So a swicth failure should allow 4 of the nodes to still access the SAN and the CSV re-directed IO will keep all 8 nodes up and running (though with reduced IO performance).

I'm still building and testing but so far so good.

Cheers
andreasjan
Occasional Advisor

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi,
Glad also to hear that your system goes well.

The HP support told me the next release of LHN DSM 8.5 should solve the blue screen issue. The blue screen happens intermittently when I unplug the iSCSI connection or reboot one of the switches in the iSCSI network. How did you encounter the issue? Have you done an investigation on the crash dump files?

There is another blue screen issue solved by Microsoft hotfix KB976443. This blue screen happens randomly due to Msiscsi.sys.

Other than blue screen, initially the DL380 G6 had issue with auto reboot (no blue screen, no hint at all), solved by upgrading the system PLD (on the system board). I heard that the initial shipment of DL380 G6 was plagued with this issue.

As the issue is related with LHN DSM, I don't enable the MPIO as the workaround. The workaround (no MPIO) will rely on redirected IO in case one of the switches fails or there is an NIC/cable issue.

In my case, the networking settings of each server:
- 2 NICs to iSCSI network (no jumbo at the moment). One of them connect to switch1 and the other one to switch2 (as there is no restriction, that MPIO must connect to the same switch)
- 1 NIC for Live migration/cluster comm to switch1. I called it cluster1
- 1 NIC for Live migration/cluster comm to switch2. It is called cluster2
- 2 teamed NICs (TLB) to public network for VMs external network. One NIC connect to switch3 and another one to switch4.

In your case, I guess you have also enabled cluster comm on the live migration network. So that if your switch1 fails, the cluster comm can still go through the live migration network.

So far I the windows and LHN cluster (as well as the VMs) are stable when I reboot one of the switches in the iSCSI network.

Cheers
Darren Speer
New Member

Re: Lefthand P4300 - MPIO with Hyper-V CSV - Problem

Hi Guys,

I have a similar setup.

3 x DL380
2 x P4300 SAS Starter Kits (4 nodes)
2 x 2910AL Switches

I have not even got as far as getting the clustering working.

I have followed the guide http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c01750150/c01750150.pdf

I have 6 NICs in the servers, of which I was going to use two for the iSCSI netowrk, one to each switch. As per the recommendation I have not teamed the server NICs, and I have ALBed the SAN nics.

If I only plug one of the NICs into the one switch they seem to operate fine, The minute I plug the gear into both switches I either A) cannot ping the SAN at all, or B) can only ping it with small packets (<1400bytes), I do have jumbo frames enabled on the server, switch and SAN

Does anyone out there have a config for two 2910AL in a similar setup that is working?

I have installed the lastest MPIO DSM, but I am now at a bit of a loss

Cheers