How to detect IO bottleneck?

Guillou_2 · ‎10-25-2004

Hi,

I have an alphaserver 1000 4/200 (vms 6.2) with 64 Mo of memory, 6 RZ28 mounted in 3 shadowsets and 1 TZ86.

I extracted statistics with ECP, the CPU seems to be ok, the server needs more memory (the FPL is often under freegoal*2 and sometime near freegoal) but I am not able to say if all is ok with the disks.

What I can see is spikes of IO in particular during the backup of each shadowset but I don't know if it's normal

DSA0 (system) 75 IO/s
DSA1 (data) 100 IO/s
DSA2 (data) 80 IO/S

Is it acceptable for RZ28?

I hope that someone could give me some rules in order to detect IO bottleneck

Thanks in advance for your help

Steph

Hein van den Heuvel · ‎10-25-2004

>> What I can see is spikes of IO in particular during the backup of each shadowset

Backup will push the IO to the max. This is notmally perfectly acceptable and unavoidable. The data for those disks will need to be read. You can not speed that up.

>> DSA0 (system) 75 IO/s
DSA1 (data) 100 IO/s
DSA2 (data) 80 IO/S

Is that during backup? If so, fine.
Is that peak production? If so, OK,
Is that average over a long production window? Hmmm, then you may have an IO bottle neck. thne you would need to reduce the IO some (RDB or RMS buffering?) or add disks to offload and spread (simple disks for temp/sort/work disks?).

You would need to know the read/write ratio, or the undelying DKAxxx, DKBxxx IO rates to see if it is really high. If is is 100% read, then it is OK, high but readily sustainable, as 'the other disk' can help if the first one is busy.
However if it is mosly write, and each disk has to run at those rate, then you are in bottleneck range.

Very roughly for the RZ28, IO/sec max = 100 IO/sec, Busy - but OK: 50 IO/sec, easy going: less than 20 IO/sec.

hth,
Hein.

labadie_1 · ‎10-26-2004

$ monitor disk/item=queue

when showing more than 1, means that you are asking more than what the disk can do.

The queue length is a good indicator.

75 I/O /s does not mean a lot if we do not know the size of the I/O. Usually, most I/O are of 16 K or less under Vms.

Michelle Popejoy · ‎10-26-2004

Steph,

>I extracted statistics with ECP...

I haven't used ECP lately, but in Unicenter TNG Performance Manager v2.2 you can look at graphs by Top Disk Device or Volume. I'd particulary look at Queue Length (as mentioned previously) and Busy. You could also do a Top Users Disk I/O Rate. If you had more than 3 disks, a custom graph might be useful to isolate the disks you care about. It might also be interesting/instructive to get a stacked or pie graph of read i/o and write i/o for each disk.

Cheers,
Michelle

"I have not failed. I've just found 10,000 ways that won't work." - Thomas Alva Edison (1847-1931)

Hein van den Heuvel · ‎10-26-2004

beware, queue depth is a fine indicator except...

- backup (and Oracle checkpoints) WILL issue many IOs at the same time, so no matter how fast your disk is, you WILL see a queue in those case.

- The classic batch job (a loop with read, process) will never show a queue but is completely defined by the IO speed.

Tell us more about your typical usage, and observation time details and we can help you better....

Hein.

Ian Miller. · ‎10-26-2004

The peaks during backups are normal (as explained already). With the vintage hardware you have (I've got some of those too :-) then 100 IO/s is going well. You should have the tape drive on a seperate bus from the disks.

____________________
Purely Personal Opinion

Pablo Itté · ‎10-26-2004

Steph
I wonder you can improve performance by following some tips.

Disk fragmentation drives to poor quality IOs specially using backup command or rmu/BACKUP. That is because data retriving is not contigous and disk drives has to seek for the next file record.
Issuing a backup image and restore will defrag the volume.

Another tip is use the largest block size in backup command, In that way the hard IOs to the tape will be less and you will have less interrupts to the cpu and scsi channel arbitration. Consider that using larger block size will increment wirtual size of the backup process, that may lead to increase IO because of process page fault. Perhaps increasing your ram will be good.

Consider not to share scsi channel between tape drives and disks it does not only afect performance, but commonly whe a tape drive has a hardware problem it may keep control of the channel (beacuse its a sequential device and does not finish the pending IO)driving to host to issue a bus reset with disk io error consecuences.

Consider 100 good quality IOS are better than 50 bad quality IOS
Average queue depth in monitor disk/tem=que will indicate how really busy is the disk.
I hope this may help.

Guillou_2 · ‎10-29-2004

Thanks all for your help and your advices

Another questions:
is there a limit of free space under which problem could appear?

I want to know the size of an IO, how can i find this information?

I want to know the "normal" IO rate for RA92 disks mounted in shadowset, and when this rate could be considered as heavy?

Hein van den Heuvel · ‎10-29-2004

RA92: Capacity = 1.5 GB
Peak transfer rate = 1.72 MB/Sec (same as RA90) Average seek time = 16 ms
The RA92 has more sectors per track than the RA90 (73 vs. 69) but it spins slower (3400 RPM vs. 3600 RPM) hence the same peak transfer rate. This results
in a 8.8 ms average rotational latency vs. 8.3 for the RA90, but because of
the faster actuator (16 ms average seek vs 18.5 ms) the overall average access
time is better.

In general you have to add the rotation to the seek for for the time it takes to do an IO. Normally, nowadays the transfer time plays no role, but at 2 MB/sec = 2KB/ms, so for a typical 8KB (16 block) buffer that is an other 4 ms for a grand total of 28ms.
So for random smallish IOs the max is 1000/28 is less than 40 IO/seek. If you can stay on track, and ask quick enough (next, next, next), then you may see 100 IO/sec or so.

- MB/sec ... hard to get with standard tools. This is an old VMS verson I suppose? With the XFC comes a GREAT io size histogram. For old versions check out general performance tools like SPM / DECps.

- Freespace. The ppotential problem with lack of freespace it that it tends to come hand in hand with severe fragmentation. Mnay folks believe less than 10% free, for generic usages, is asking for trouble.

Given performance, support and electricity cost you should probably seriously look at upgrading the whole lot.... like to a PC running sharonvax?

Cheers,
Hein.

faris_3 · ‎10-29-2004

Hi,

A very good tool to take baseline I/O perf
and comparisons is the freeware
diskperf developped by JF Pieronne :
http://www.pi-net.dyndns.org/jfp/french/

hth,
HF

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

How to detect IO bottleneck?

How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?

Re: How to detect IO bottleneck?