Re: disk latencies EVA 8000 VMware

Houghtpj · ‎11-27-2009

Hi,

We recently changed the layout of our Luns / VMFS. We are running our VMware environment on 1 disk group compromising 112 FATA disks. Our initial design was to have 500GB VMFS for guest OS vmdk, 250 GB VMFS for guest Pagefile, and 500GB VMFS for data vmdks. We had 8 lots of these sets of LUNS and had approx 25 VM's in each of the OS Luns.

We recently changed this design to go with larger 1TB luns and lump all of a VM's vmdks into these shared VMFS. We now have 8 of these with a max of 22 VM's per LUN.

I'm now seeing higher read latencies of 40+ ms on these VMFS and wanted to gain a better understadning of why? As far as i'm concerened the amount and type of i/o hitting the disk group and the amount of disks backing the disk group is the same even though the LUN layout is different. I'm not seeing any queues at the HBA level or vmkernel level, the latencies do seem to be coming from the disks. One thing we haven't done is load balance the paths to the EVA, but we hadn't done this before we changed the design and had no latency issues. I can't monitor the host port stats with evaperf due to an issue with xcs 6110 and command view 8 not reporting hps correctly so have no way of seeing if we are overloading the first controller port, but again this would have been the case before we changed things.

The only way I can explain this is we have now totally randomised the I/O on a LUN level where as before we had some kind of I/O similarity between LUNS. I know the FATA disks may be a problem but we had no issues using these disks before we changed the layout. All LUNs are now vRaid 5 where as before there was a mixture of vRaid 1 and 5, but as i mentioned latencies seem to be reads so vRaid should not matter as much.

We are probably going to find a place in between both designs with some VM's with split vmdk's and some on the larger shared LUNS but i was hoping someone could offer an explaination.

Víctor Cespón · ‎11-27-2009

FATA disks + VmWare + VRAID5 + Many vmdks on one LUN = bad idea.

The FATA disks are NOT designed and NOT recommended for continuous random I/O.

You need to capture EVAperf data to see all the numbers.

XCS 6110 is inactive right now and has sequential read performace problems (although no influence here, probably).

Houghtpj · ‎11-27-2009

Hi

Thanks for the reply.

As i mentioned I'm aware that the FATA disks may be the problem, but as i also mentioned we have used these disks in exaclty the same layout in both of our VMFS / LUN designs with no performacne problems. If the FATA disks are now the bottle neck why weren't they previously?

When we had the LUNs set out with guest vmdk's split over seperate VMFS, latencies were ok. Also, we now have less VM's per LUN (more vmdk's though) so scsi reservations shouldn't be a problem and i'm not seeing any more scsi reservation messages in the vmkernel log than seen previously.

Latencies are mainly read latencies so why would the raid level matter? Could the writes to the raid 5 be affecting latencies in the reads. If so i thought the EVA hadled raid 5 writes pretty well!?

So what i really want to understand is why has changing our VMFS layout increased latencies even though we are using the same number and same type of disks and have the same number of VM's doing the same things i/o wise.

Houghtpj · ‎11-27-2009

sorry i slightly misread your reply, you mentioned many vmdk's per lun ( i thought you said many VM's per lun).

if this is our issue why is that so?

What i'm trying to understand is if we have caused this problem by increasing the I/O density per LUN, why is this if all vDisks are striped over all disks in the disk group. we still have the same overall number of vmdk's as previously. Surely it doesn't matter how your vDisks are sized and what they contain as the EVA virtualises them anyway so there shouldn't be hotspots as such? Am i missing something fundamental here?

Víctor Cespón · ‎11-27-2009

Yes, from the hardware point of view the I/Os are spread over all disks on the disk group anyway, but there's other considerations, many operating systems have limits to the number of I/Os pending per LUN.

You need to capture data with EVAperf and get it analyzed to get an answer based on facts.

eddiel_1 · ‎11-27-2009

if i read your post correctly, are you saying you consolidated your system/page file/and data disk vmfs volumes into a single vmfs volume?

Houghtpj · ‎12-08-2009

Sorry for late reply. Yes thats correct all VM's vmdk's are in one shared VMFS.

eddiel_1 · ‎12-08-2009

more than likely you are seeing a combination of the follwing:

1)fewer volume resources not balanced across controllers as effectively as before (through the sheer number of volumes in the previous config, I/O was bound to better balanced)

2) SCSI reservation issues with individual VM's vying for shared volume resources.

The earlier design is actually a better solution, isolating OS and DATA I/O from contention. i would recommend the following read for information on SCSI locks with clustered volume resources with ESX.

http://www.vmware.com/files/pdf/scalable_storage_performance.pdf

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: disk latencies EVA 8000 VMware

disk latencies EVA 8000 VMware