Disk Enclosures
1753620 Members
6432 Online
108797 Solutions
New Discussion юеВ

Re: Match LUN to CPU usage

 
Nelson Jeppesen
Occasional Contributor

Match LUN to CPU usage

We're having off and on CPU usage (EVA controller) and LUN write latencies issues. We have a 8400 /w 190 400GB 15k drives in two disk groups. All the firmware/software is up to date. (HBAs,HDDs, Controllers, Enclosures,MPIO,etc.....). One downside to our SAN is that all the LUNs are vRAID5 but HP support said that would not cause any issues but was vague about when it would.

Typically the SAN is at about 2k-4k IOPs/s about 15%-35% cpu usage on the controllers (both about the same)

Many times an hour CPU usage will go up with write latencies but IOPS/sec never go much above 4k. The problem is that I cant track it down to what LUN is the cause. Support was never any help. This is very frustrating. Ive seen write latencies go up to one second. When issues occur I cant figure out what server is the cause.

I've looked at queue depth, cluster size, write/read latencies, IO/sec.
3 REPLIES 3
chris huys_4
Honored Contributor

Re: Match LUN to CPU usage

Hi Nelson,

Going on a tangent here, because Im not at all a storage engineer. ;)

What hosts are connected to the system ? (i.e. linux hosts/windows hosts/hp-ux hosts/vmware involved)
What is the fc hba "speed" of the hosts fc hba's ? I suppose that the fc hba "speed" of the 8400 frontend fc hba's, is 4Gb/sec ?
Can you see the results of the latencies also on the host ? All hosts ? Or only certain hosts ?
Do you have "hp-ux sar -d 1 100 equivalent output" from 1 of the hosts, i.e. diskstatistics every second for 100 seconds, "during the problem" and can you attach the results ?

Greetz,
Chris
Nelson Jeppesen
Occasional Contributor

Re: Match LUN to CPU usage

We're running 4gb FC everywhere but throughput is never much above 200MB/sec but usually in the 10-40MB/s range.

10 2003 servers
10 2008 servers
12 VMWare hosts

Sometimes the write latencies can be seen on all hosts if it gets bad but usually its limited to a few SQL servers and a few VMWare hosts.
chris huys_4
Honored Contributor

Re: Match LUN to CPU usage

Hi,

> Sometimes the write latencies can be seen on
> all hosts if it gets bad but usually its
> limited to a few SQL servers and a few VMWare
> hosts
SQL has the problem of doing write bursts of its transaction log every time a checkpoint its called.

If the write bursts, are bad, it means a few 1000 IOs, were each IO has a small IOsize.

With sqlIO you can check how much small (random/sequential) IOs a EVA can handle, dependent on the EVA diskconfiguration.

But I suppose, if the sql server would exceed that, it would bring the whole EVA to its knees.

A few articles about sql performance.

http://sqlblog.com/blogs/joe_chang/archive/2008/03/04/storage-performance-for-sql-server.aspx

http://www.simple-talk.com/sql/performance/high-performance-storage-systems-for-sql-server/

http://webcache.googleusercontent.com/search?q=cache:a7gmdOfX75kJ:download.microsoft.com/download/B/E/1/BE1AABB3-6ED8-4C3C-AF91-448AB733B1AF/Analyzing%2520Characterizing%2520and%2520IO%2520Size%2520Considerations.docx+sql+log+write+bursts&cd=6&hl=nl&ct=clnk&gl=be

I would tune down like mentioned in the last article, the scsi queue depth of the different fc hba's of the sql servers, to the minimum, i.e. to 2. At least then the IO will queue up on the host, instead of "queuing up" on the EVA, which means that the other non-sql server hosts will still be able to send IOs and not be impacted as much as currently.

For the rest maybe disabling writecaching for the transactionlogsluns so that the write cache doesnt get monopolized by the transactionlogsIOrequests, but not sure if thats a good idea or not.

Greetz,
Chris