StoreVirtual Storage
1748232 Members
3401 Online
108759 Solutions
New Discussion

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

 
Stephane-OTG
Occasional Advisor

P4000 VSS Provider with QLogic HBA and DPM 2010 server

Hi,

First, here is a brief description of our current environment:

  • One cluster composed of 4 Hyper-V, installed with Windows 2008 R2 SP1 Enterprise Core.
  • Each host is running on an HP DL360G7 with 8 network cards (1 Management, 1 Cluster network, 1 CSV network, 1 Live Migration network and 4 Hyper-V Guests networks) and a dual port QLogic iSCSI HBA (QLE4062C) for SAN connectivity.
  • The QLogic HBA are connecting to an HP LeftHand iSCSI SAN composed of 4 nodes (2x P4300 and 2x P4300G2)
  • 8 Cluster Shared Volumes are used to host all the guest VM with their VHD's on the SAN.
  • DPM 2010 is used to backup the all the VM at the host level (not within the guest).

Now for the issue:

As mentioned above, I am using DPM 2010 to backup all the Hyper-V guests from the hosts that are part of the cluster with Cluster Shared Volumes. I therefore have downloaded and installed the VSS Hardware provider provided by HP Lefthand.
This was working until I installed the QLogic HBA in each host.
Now that the iSCSI connection is established at the HBA level (no TCPIP connectivity presented to the OS), the LeftHand VSS provider cannot connect to the SAN to create the snapshot, when DPM requests a backup to get started of any of the VM guests on the cluster.
So I thought this could be fixed by either using a separate network card within the host and connected it up to the SAN network, or by simply install the Ethernet Controller driver for the QLogic adapter so that it is also recognised as an ethernet controller.
After doing the above (I've tested both), the VSS Hardware provider can see the SAN (TCPIP connectivity is established at the OS level) and is able to create a snapshot at the SAN level.

The problem is that, after the VSS hardware provider has created the snapshot on the SAN, the Hyper-V host cannot establish an iSCSI session to this snapshot to start the backup.
The Application event log on the host shows the following error:
Event ID 266 - HP P4000 VSS Provider: "Failed to connect snapshot LUN , 0x80001034 - Volume connection failure."

I can see the actual snapshot volume in the Microsoft iSCSI initiator, but the status keeps saying: "Reconnecting" and eventually, the DPM backup fails and the error above is logged in the event log.

I'm sure it is a configuration issue, and that I missing something here, so any suggestion would be much appreciated.

Thank you,
Stephane

6 REPLIES 6
David_Tocker
Regular Advisor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

Did you ever find an answer to this issue? I have had enough issues without throwing in the Qlogic HBAs.

I found there were a number of updates to Hyper-V / Cluster services that needed to be applied just to get reliable VSS hardware provider backups working with the software iSCSI provider built into SP1.

I found this wiki to be invaluable:

 

http://social.technet.microsoft.com/wiki/contents/articles/1349.hyper-v-update-list-for-windows-server-2008-r2.aspx

 

Regards.

David Tocker
Stephane-OTG
Occasional Advisor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

Hi David,

 

I haven't got an answer yet.
I've open a case with QLogic, which just got escalated to an engineer.

I also have an HP engineer coming out to check out our configuration to make sure everything has been installation according to best practices (or as close as possible).

 

The list of hotfixes is indeed invaluable, but not as complete as possible.

I have found, by searching "Cluster hotfixes" on google over 52 hotfixes for Clustering, CSV, HBA and iSCSI, just for Windows 2008R2 SP1.

 

I'll keep you posted on what QLogic and HP come up with.

 

In the mean time, if you really want to use QLogic HBAs, I would strongly recommend to test them in a lab environment first.

The testing of QLogic HBAs compare to regular network cards (in my environment) are showing some huge difference.

For example, transferring a 1.5GB file with an HBA will go at about 9MB/sec (on a 1Gb iSCSI SAN), while the same file over a regular network card (NC365T with Broadcom chipset) will average around 49MB/sec (transferring from and to the same volume).

 

 

David_Tocker
Regular Advisor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

Another thought - perhaps the Qlogic adaptors do not understand the multi-pathing correctly - anything connected to the 'VIP' should speak p4000 afaik. Another thing to try with the Qlogic HBAs would be trying them with jumbo packets off and flow control on....
Regards.

David Tocker
SMaximus7
Occasional Contributor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

As I understand it, the QLogic HBA's with MPIO are not a supported HP configuration using the HP P4000 DSM for MPIO.  However if you remove the HP P4000 DSM for MPIO and just use the Microsoft DSM it is a supported configuration. 

 

Let us know what you ended up doing please.  I'm interested in finding out the outcome.  I will be implementing a very similar solution within the next month and this could help a lot.

 

Thanks,

Stephane-OTG
Occasional Advisor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

Hi,

 

Where do I start?!?

 

Here is an overview of all the problems we faced in our environment by using the QLogic HBA with MPIO connecting to a P4000. Although we found a workaround or a solution for most of them, if I had to choose again, I would not go with this combination again.
The amount of downtime, phone conversation with HP, QLogic and Microsoft are not worth the few CPU cycles that an iSCSI HBA will save on the host.
So unless your virtual machines are CPU hungry, and you absolutely need to save the CPU, I would personally go with traditional NICs (broadcom is my preference).

History of the problems and solution/workaround:

  • Although not related to the QLogic HBAs. the first two problems we encountered was with the P4000 OS version and DSM. We were running the SANIQ 8.0 when we first implemented our Hyper-V cluster and quickly ran out of iSCSI connections (If I recall properly, that version was limited to 64 connection in total). This was fixed by upgrading to SANIQ 9.5, but we then ran into the DSM problem.
    The DSM would, from time to time, lock one of the volumes, which of course would "fail" the volume until the host that locked it was rebooted.
    Uninstalling DSM fixed that problem.
  • We then "upgraded" our hosts from broadcom NIC's to QLogic HBAs. The first issue was backup using DPM with the VSS hardware provided.
    When the backup kicks off, DPM requests the P4000 VSS provider to take a snapshot of the volume. The problem here was that it requires a TCP/IP connection to the SAN to do so. Since the iSCSI HBA are only seen as SCSI adapter, the TCP/IP stack is not installed on top of them in Windows, so the OS has no direct IP connectivity to the SAN and backup would never start.
    The way around this was to re-install one broadcom NIC (alongside the HBAs) that is dedicated for backup (which isn't a bad thing anyway), so that windows has a direct TCP/IP access to the SAN and the provider can communicate with it to create the snapshot.
    Although the snapshot was now able to be created, the connection to it was another problem that started.
    I could not get the QLogic HBA to automatically connect to the snapshot (It was later confirmed by QLogic support), so the only workaround here was to use the Microsoft iSCSI initiator.
    The problem there is that the VSS provider wants to use the "Default" iSCSI initiator connection (ie, the first IP address present in the drop down list when selecting a specific IP for the connection). If this first IP address (the "default") was not the broadcom backup NIC, the connection could never be established.
    The workaround here was to ensure (and still is) that the backup NIC is the top of the list IP address in the iSCSI initiator by disabling and re-enabling any other NIC that is on top (which would send them to the bottom of the list). Ridiculous really!
  • We finally got a successful backup, so I thought everything was fine until we started receiving more and more complains from our clients that the systems were performing badly.
    After investigation, it appeared that using the QLogic HBA with MPIO set in "Round Robin" mode reduced the transfer rate to the equivalent of a 100Mbps connection! I could not get more than 10MB/s on the disks.
    The workaround here (the case is still open with HP and Microsoft) was to use the MPIO in "Weighted path" mode and alternate connectivity between the HBA ports to separate node (effectively creating a sort of load balancing), but this can only be done if you have multiple SAN cluster in your environment (like we do). If not, you'll have to use the QLogic dual port as a 1Gbps card with fail over.
    Side note: the MPIO in "round robin" mode was working perfectly with the Broadcom NICs and QLogic was also able to confirm that it worked with their HBA connecting to multiple different SAN, but not when connecting to a P4000.
    This is still an open case with HP and Microsoft.
  • Unfortunately, this wasn't the end of our problems as the QLogic driver started to create random blue screens on the hosts (about twice to three times per month).
    QLogic was able to identify the problem within their drivers and wrote a fix.
    Since the fix was implemented (about two weeks ago) we haven't had a blue screen (fingers crossed).

Probably worth mentioning that, when talking to one of the HP P4000 field engineers, he said that, throughout all the installations he's done, he never saw the iSCSI QLogic HBA being used (except has a chipset on modules in a blade system). He said that all his customers (big and small), when using iSCSI over copper, are using Ethernet NICs.

 

I think this is a good "overview" of the problems we had with the HBA.
In conclusion, I'd re-iterate that, in hindsight, the CPU gain from the HBA was not worth the effort, work, trouble, revenue loss that it introduced (what we gained in CPU, we lost in disk access anyway because of the MPIO round robin problem).

 

A last word: please understand that the above is our personal experience with the product and that I'm sure it might be working fine in other, similar environments.
I'm just giving out the information, alongside a big warning, to do your research and testing (in a lab of possible) before investing the time and money in this combination (QLogic, P4000, MPIO).

 

Hope this helps,

Stephane

 

David_Tocker
Regular Advisor

Re: P4000 VSS Provider with QLogic HBA and DPM 2010 server

Sounds like a fair call to be honest.

 

We are running DPM 2012 - no hardware HBA's (to be honest our quad core hosts hardly ever tap out the CPU, we only run about 6 customer servers per CPU)

 

Still get constant errors in the fail over cluster manager about direct connectivity being lost to the LUN. (error 56

 

I suspect that the various Microsoft teams dont actually talk to each other (let alone them talking to the HP P4000 team)

 

It really makes you wonder when all these great features such as 'hardware snapshots' dont work or throw errors.

 

I am of the opinion that HP P4000 is heavily geared towards vmware...

 

Seems somewhat at odds with their marketing...

 

Regards.


David Tocker.

Regards.

David Tocker