Disk Enclosures
1751823 Members
5218 Online
108782 Solutions
New Discussion

HP P2000 G3 - Large file copy degrades performance/kills SAN

 
TangoTech
Visitor

HP P2000 G3 - Large file copy degrades performance/kills SAN

Hi All,

 

Our current setup consists of a HP P2000 G3 SAN connected to a HP C7000 Enclosure with 5 blades via 8G fibre channel.

 

1 - BL460c G7 running ESXi 5.1 (Build 1065491)

2 - BL490c G7 running ESXi 5.1 (Build 1065491)

3 - BL490c G7 running ESXi 5.1 (Build 1065491)

4 - BL490c G7 running ESXi 5.1 (Build 1065491)

5 - BL460c G7 running Server 2008 R2 (Bare metal, boot from SAN)

 

There are 3 enclosures in the SAN:

 

Enclosure 1 (vDisk1) - 12 x 600G 10K SAS in Raid 6

Enclosure 2 (vDisk2) - 12 x 2TB 7.2K SATA in Raid 6

Enclosure 3 (vDisk3) - 12 x 2TB 7.2K SATA in Raid 6

 

We currently have around 60 VM's running on the 4 blades, all with their data stores on the SAN. They are spread across the 3 vdisks.

 

Each VM has its own LUN. So we have just over 65 LUN's.

 

Here is our problem. When copying a large amount of data from one VM to another (VM1 running Windows copies files to VM2 running Windows) it starts off well. However after 10-15 minutes the copy starts to slow.

 

At this point all other VM's on the same enclosure start to go unresponsive. e.g.:

 

VM1 is on enclosure 1 (vDisk1)

VM2 is on enclosure 2 (vDisk2)

VM3 is on enclosure 3 (vDisk3)

 

If we copy files from VM1 to VM2 and wait for the performance to drop, they will go unresponsive. However VM3 will continue to work.

 

It appears that the large file copy is overloading the SAN. However I would expect something as simple as a file copy to not overload it in the first place, and if it does, that the SAN and/or VMware to compensate for this and slow the IO down.

 

If anyone has any suggestions or questions please let me know.

 

Thanks,
Tango

 

 

4 REPLIES 4
Rajiv
HPE Pro

Re: HP P2000 G3 - Large file copy degrades performance/kills SAN

Hi Tango,

 

You mention that the copy starts well from VM1 to VM2 and then drops off.

 

Do these VM's go into unresponsive state if the copy is left to be completed ?

 

I'm aware of a memory consumption issue in Windows 2008 for Hyper V exports and this looks like a similar case.

 

http://support.microsoft.com/kb/2547551

 

It would help if we knew how the hosts and the array are connected.

 

If they are connected via an 8Gbps switch then it is recommended to enable Fillword setting to 3 as a starting step.

 

Do the array logs or switch logs indicate any errors ?

 

Perhaps you may refer to the best practices document for P2000 G3 with Vsphere/ESX.

 

http://h20195.www2.hp.com/v2/GetPDF.aspx%2F4AA3-3801ENW.pdf

 

Though the above document is for Vsphere 4.1 most of the content may still be applicable for 5.x.

 

Also I assume there would be other hosts sharing this array and SAN, do they work okay or they go unresponsive as well ?

 

Thanks & Regards,

Rajiv

 

I work for HPE

Accept or Kudo

tcsmalad
Visitor

Re: HP P2000 G3 - Large file copy degrades performance/kills SAN

Hi Tango,

 

From what I understand, the copy which we are doing involves VM1's vdisk - > SAN - > HBA -> VM1's memory -> Network - > VM2's Memory - > VM2 HBA - > SAN -> VM2 vdisk.

 

Based on what your described, network could well be a problem as well. The unresponsiveness of the VMs could be due to network choke as well.

 

We may also need to check if the ESX hosts by any chance are getting choked in terms of memory availability or even from VM's memory perspective. This is the second possibility. I don't suspect SAN or storage here.

 

However, to isolate this, we may do the following:

 

  • If possible present two LUNs to a single VM. This way we will eliminate network. Lets try copying the files within the same VM. Here, VM itself could be a bottleneck but, at least we may isolate the network
  • If it is not possible to assgin a second vdisk lets copy files within the same vdisk in a different directory.

If we are able to get anywhere between 80MBPS to 150MBPS for a single instance of file copy, SAN or storage is doing fine. Despite having 8Gbps SAN connectivity, you may only get this much speed and this limitation is usually the copying application limitation and not SAN limitation.

 

Regards,

Venkat

TangoTech
Visitor

Re: HP P2000 G3 - Large file copy degrades performance/kills SAN

Thanks Rajiv and tcsmalad.

 

I'll review the HP doc and make sure we are following best practices.

 

To answer some of your questions:

 

The SAN is connected to the Blade directly with 8GB Fibre using Brocade switches. (HP B-series 8/24c SAN Switch Blade System c-Class PN: 489865-002)

 

The ESXi hosts/VM's use 1GB Ethernet switching in the blade to talk out to the network. (HP GbE2c Layer 2/3 Ethernet Blade Switch PN: 438030-B21)

 

There are currently only the 5 Blades listed above using this array. No other servers are connected to it.

 

If the file copy is left running the VM doing the copying slow down a little but does not go unresponsive. Once the copy is cancelled or left to complete the other unresponsive VM's will often start to work again. (Not always tho, sometimes they require a reboot)

 

I have had HP tech support open a ticket and review the logs, they did not find any problems. It was suggested to update the firmware which I have since done.

 

I have also since enabled Storage I/O for all data stores.

 

In response to tcsmalad suggestion, I will add another LUN to one of the VM's and see if we have the same problem when removing the network.

 

Thanks

Tango

 

TangoTech
Visitor

Re: HP P2000 G3 - Large file copy degrades performance/kills SAN

Hi All,

 

Sorry for the delay.

 

Since enabling storage I/O in vmware the problem seems to have gone/become less obvious. However my concern is that the problem is still there but the storage I/O is avoiding it happening or masking the problem.

 

Further troubleshooting I have done.

 

Added an addtional LUN's to a VM and tested copying. While the copy completes at a much better speed now (doesn't slow down after 10-15 minutes), the VM's do seem to slow down while the copy is happening. I suspect that the storage I/O is kicking in here and limiting the copy so the VM's don't go unresponsive.

 

Any other suggestions you can other would be appreciated.

 

Thanks.