high disk utilization + long queue at low transfers

Zbigniew Kłos · ‎06-23-2004

Hello,

I have got a serious performance problem with the following configuration - rp7400 server connected to IBM FastT 900 array via Brocade FC switches (technology similar to VA) running Progress database.

Database filesystem is 300GB large and is located on logical volume distributed on two 150GB physical volumes (which are two array LUNs owned by two different array controllers). This allows static load distribution between the two array processors.

Database users complain about low database/application performance.
Disk statistics reported via sar -d or Glance reveal that disk utilization (used for database filesystems) is at 100% and disk queue lenght is at 50-60 while total disk transfer is only about 1MB/s (!). Average queue service time is about 10ms and average wait time for the disk is about 5ms.

However I am able to transfer about 30MB/s per second at 80% disk utilization using the following command dd if=/dev/vg<>/lvol of=/dev/null bs=8k. If the block size is smaller, transfer drops down but even at 1k it is able to achieve 2MB/s at 30% utilization.
dd used on individual files on the filesystem (dd if=/filesystem/file of=/dev/null) achieves 5MB/s at about 20% util but if I launch multiple dd's for different files I am able to maintain 25MB/s at 80%.

I get very similar results for dd with logical volumes/filesystems placed on individual disks in FC10 box (similar config but LVM physical volumes are individual disks, not array's logical volumes).
However the same database placed on FC10 filesystems puts only 10 to 20% utilization for FC10 disks at similar transfer rates.

I suspect the database itself is the problem (very random access patterns to different regions of the filesystem, lack of proper optimisaton, indexing etc. etc.) but I have no obvious proof for that.

Do you have any ideas ?

Best regards,
Zbigniew

Steven E. Protter · ‎06-23-2004

Couple of ideas:

Monitor performance over time and see if there is a pattern. Attaching a script:

There are a couple of possible issues:

Contention on the disk array with other servers data striped on the same physical disk. The answer for this is murky. We think the problem can be resolved by isolating high intensity systems and haveing them not striped across all disks. Instead of being striped across 30 disks, we may reduce to 18. The disk array people madly say that is a bad idea.

This could be a kernel performance tuning problem. I'm linking a very useful document on the subject by one HP's top persons in the field.

http://www1.itrc.hp.com/service/cki/search.do?category=c0&mode=id&searchString=UPERFKBAN00000726&search.x=28&search.y=8&searchCrit=allwords&docType=Security&docType=Patch&docType=EngineerNotes&docType=BugReports&docType=Hardware&docType=ReferenceMaterials&docType=ThirdParty

Of course tune the database as best you can. Oracle has a stats pack that lets the dba find and correct performance issues. Hopefully Progress has something similar.

I feel for you, we're going through issues here as well.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Sridhar Bhaskarla · ‎06-23-2004

Hi,

This is the problem if you use few disks of big LUNs. System can only queue up certain requests and your throughput is dependent on the IO Size. You can fill up the queue with x number of small requests thus making your transfer rate low or the same number but bigger IO requests making your transfer rate high. Buffer cache really can play a role here as it can combine multiple requests into one unless the system is doing too much of random work.

Your immediate solution may be to increase the queue depth on these LUNs. Find out the maximum queue depth that your IBM storage system can support. Devide it by the number of LUNs configured on it. Say it comes to 25. Then adjust the queuedepth using the command

scsictl -m queue_depth=25 /dev/rdsk/cxtydz

Default is 8.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Zbigniew Kłos · ‎06-23-2004

Hello,

Thank you for your answers. I would like to clarify a few isuues.

First of all I am not completely right saying that FastT storage array is similar to HP's VA - it is more advanced and probably more like EVA's (but I do not know EVA's so I am not sure).
In IBM's FastT I am able to create separate virtual arrays from a selected set of physical disks and this database server is using physical disks of his own so there is not additional load caused by other servers to the disks. This array has total of 30 disks and 14 of them are dedicated to this database server (they create virtual array protected by RAID 0+1).

However other servers cause additional load to stroage processors but at the time of described problems and tests this additional load was very low.

According to buffer cache - it is set to be from 5 to 20% of RAM (which is 12GB at the moment). Is it OK or shuould I rather go for a setting like from 20% to 20% ;-) ?

I was trying to observe the size of I/O's when testing transfer with dd used with a file on a filesystem (dd=/filesystem/file of=/dev/null) and it is about 25kb per I/O. It is not a fixed value as it is when using dd with a raw logical volume and directly set bs=xx.

Database activity itself (which uses the same filesystem) generates very small I/O's (like 2kb per I/O) and array performance for such i/o's (dd if=/dev/vgxx/rvol of=/dev/null bs=2k) is much smaller (3MB/s) then with i/o's like 8kb (5MB/s) but is still better than 1MB/s that database is able to achieve at 100% disk load.
It is strange for me bacuse vxfs blocksize is set to default for such large filesystem - that is 8KB. So why filesystem-generated I/O's are so small ?

Disk array is flying at full speed with I/O's of 16KB or more (30MB/s) which is strange because block size for these LUN's is set to 8KB (same as filesystem's blocksize).

Large array LUNs (2x 150GB for this filesystem) might be the problem - however my tests with dd (both with filesystem and raw logical volume) show similar performance with other LUNs which are 30GB in size. But I was not able to test database performance on such filesystem because it is much smaller (60GB) and not able to store this database.

I'll try to experiment with scsictl command.

If you have any ideas or suggestions after this explanations please reply to this messager or contact me at zbigniew.klos@telekomunikacja.pl

Best regards,
Zbigniew

stone_3 · ‎08-05-2004

hi

In your last msg, I find you want to find the answer from system. In my opinion you should check the allocation of the table, maybe there are some hot tables.

regards,
wei lei

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

high disk utilization + long queue at low transfers

high disk utilization + long queue at low transfers

Re: high disk utilization + long queue at low transfers

Re: high disk utilization + long queue at low transfers

Re: high disk utilization + long queue at low transfers

Re: high disk utilization + long queue at low transfers