Disk Enclosures
1748213 Members
3164 Online
108759 Solutions
New Discussion юеВ

EVA8400-22GB- Bad performance on CPUs

 
Norbert Moritz
New Member

EVA8400-22GB- Bad performance on CPUs

Hello,
we have a big problem with our brand new EVA8400 system. It's a version with 22GB cache, 27 enclosures and 260x 600GB FC 15k disks (one diskgroup). Attached are only VMWare hosts with vSphere 4.0. Every evening at 10PM when the backup jobs are starting the EVA generates high write latencies. The ESX hosts get path failures. I have studied all available documents about best practise, best performance,...., but the best is the following which I could see with EVAPerf: Both CPUs went to 99% and then to 0% for a few minutes. At the same time the write latency of the diskgroup at controller A goes to 250 ms and controller B to 350ms. The result is the time out for the ESX hosts. Now my question, does anybody can explain why the CPUs went to idle when they have a total host requests of 16000 with 300 MB/s? Many thanks for some ideas, Norbert
9 REPLIES 9
Sivaramakrishna_1
Honored Contributor

Re: EVA8400-22GB- Bad performance on CPUs

Hi,

I hope you are talking about VCB... What is the Host you are using for VCB?? If Windows 2008?? then it is a know issue, you need to fine tune the FC HBA Driver on the Host. The Backup is a Serial IO Job, with 2008 it induces a larger IO Transfer sizes which is causing the issue.

The Latest FC HBA Driver states about the IO Transfer size on HBA(Preferably 128k) so that you will get rid from this issue.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSeriesId=3659910&prodTypeId=12169&objectID=c02518189

Bases on the HBA Model fine tune the parameter so that you will not get the issue.
Uwe Zessin
Honored Contributor

Re: EVA8400-22GB- Bad performance on CPUs

And for VMware ESX:

- c02697105 - HP StorageWorks Enterprise Virtual Arrays - VMware Hosts Reports Multitude of 'Task Set Full' Error in vmkernel Log

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c02697105&printver=true
.
Norbert Moritz
New Member

Re: EVA8400-22GB- Bad performance on CPUs

Hi, no we don't talk about VCB backup. It's only the "normal" backup (NetBackup) of roundabout 300 virtual servers at the same time. Last week we have set the transfer block size of all 16 ESX hosts to 128k. The path failures of the ESX hosts were less but we have still a high write/read latency on the disks. Nobody (incl. HP) is able to explain why both CPUs goes to idle at the moment of the highest I/O. With EVAPerf I also see that we get write and read latencies up to 30ms on each of the 260 physical disks at this moment.
The EVA system is not able to handle and process the amount of data. FYI, we have worldwide 20 EVA systems, from EVA5000,6000,8100 to EVA8400, and no other machine has such performance problems. It's very strange, maybe it's a bug in the controller firmware (09534000)?
Alzhy
Honored Contributor

Re: EVA8400-22GB- Bad performance on CPUs

We were in exactly the same boat a few months ago. EVA8400 - the full sh'bang. But our EVA was used not only by an ESX farm but by Linux and HP-UX Physical servers.

How many vSPhere physical servers are hooked up and what kind of servers? Are these Blade based servers?

Also - how many FC Connections do your Linux Physicals have total?

Maybe you are just saturating your EVA?

And lastly - what kind of Virtual Machines do you run on these Virtual Environment? Windows solely or a mix of Linux, Solaris, etc?

Hakuna Matata.
Norbert Moritz
New Member

Re: EVA8400-22GB- Bad performance on CPUs

Hi, we only have 16 ESX hosts physically attached to this EVA, nothing else. DL585G7 and BL-680C hardware. vSphere 4.0 and 4.1 are the only O/S systems. Virtual servers are 100% Windows systems. At the moment it looks like that EVAPerf has a problem to get performance data from the CPU if it is at 100%, so EVAPerf logs 0%. That makes sense and the CPU is not at idle at the highest I/O requests. But there is also the question open why goes the CPU to 100%. Till now we have no explanation for this behaviour.
Thanks, Norbert
Alzhy
Honored Contributor

Re: EVA8400-22GB- Bad performance on CPUs

Well -- 16 *could* be a lot depending on how you use the EVA and how much concurrent activity comes out of your Virtual Infrastructure. Are all your VMs - Windows or a mix of Linux, Windows or other obscure OSes (i.e. Solaris, BSD, etc..)?

Also - are these ESX Phys Servers - Blades? If so -- how many fibre runs from the Blade Enclosures to the SAN/EVA?

How many vDisks presented as Datastores so far to your VI? And what sized Vdisks?

Do you use RDMs?


Things to check and suggestions:

- Check multipathing of the Datastore LUNs
- Adjust Queue Depths as well
- Make sure your EVA Vdisks alternate between controllers

And finally, find out the heavily taxed Vdisks via EVAperf -- something should come out of there as far as further tuning/adjustments etc.


Hakuna Matata.
Uwe Zessin
Honored Contributor

Re: EVA8400-22GB- Bad performance on CPUs

> That makes sense

Excuse me, but that does NOT make sense !!
If EVAperf cannot retrieve the data,
it must say so and not give wrong values !!

Competent database systems, for example, offer the NULL value as a means to indicate unknown / missing data.

And no, I don't mean to "shoot the messenger" ;-)
.
Norbert Moritz
New Member

Re: EVA8400-22GB- Bad performance on CPUs

Hi Uwe, yes you are right. Don't misunderstood my "make sense". I mean this for analyzing our performance problems. It's was not clear why EVAPerf says that the controller goes idle at the highest I/O. This is impossible. So I think EVAPerf is not able to output the 100, maybe only 2 digits are possible for CPU data. :-(
Patti T
Advisor

Re: EVA8400-22GB- Bad performance on CPUs

Are you still having issues with your EVA8400?  I also have several EVA6400 and EVA 8400s that are giving me problems that I do not see with my 6100s or 8100s. 

We have set the xfer size down to 128KB, which helps, but still get high latencies at times and the CPU goes close to 100% on the controllers. 

 

We also had issues with multiple snapshots on a LUN - almost killed us and we had to switch to snapclones to get better performance.

 

I think they changed something on the cache algorithms..