1752277 Members
4537 Online
108786 Solutions
New Discussion юеВ

Re: disk bottle neck

 
SOLVED
Go to solution
Karl oliver
Frequent Advisor

disk bottle neck

Hello folks
Looking for some advice on whether we have a disk bottle neck on our system.
One of our disks shows very busy over 90% at times. It is a san attached. Which means it is a vol with probably 5 disks
I think we do have a bottle neck yet avwait avserv are not high, Also our sys admin says it is not a problem because data will be cached due to large san cache. Yet SAR and glance indicate it is a problem. Glance regularly shows critical disk bottle neck.
disk is > 90% for over at hour at times
see portion of sar report attached

12 REPLIES 12
Bill Hassell
Honored Contributor

Re: disk bottle neck

The sar stats look quite good (performance is not an issue). The only way to improve the busy disks is to analyze what is being accessed, that is, how are the lvols in the VG being used? Track down the major directories and see what can be moved to another volume group.


Bill Hassell, sysadmin
Steven E. Protter
Exalted Contributor

Re: disk bottle neck

Shalom,

Being > than 90% utilization is not by itself a problem. Glance being red is something I often saw when batch jobs were hitting my databases. It went away with the application stress.

The question I would asked based on the sar data is what is going on with the applications on the system when this occurs. If you have 4 batch jobs running the same hour, maybe they can be scheduled differently.

Is there a user response question when glance is red? Or do users still see their data responses in an acceptable period of time.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Karl oliver
Frequent Advisor

Re: disk bottle neck

More information
We have a incremental backup running at this time. The find to locate files changes has to to go through 142 gig of files ,millions of small file 1 -3 k in size !
Backup software is Comvault
Users are complaining of slow response, hence my question.
This machine runs apache web servers.

Our first thought was to stop the backup for 1 night to see if this made a difference. then week later we plan to move data to faster disk system.

The database is on a different machine.

I attach some sar stats for database server (dsk_132)some of these appear to have high busy q what is considered too high?
Zinky
Honored Contributor

Re: disk bottle neck

Yowsa!

Bruddah -- you have zeerious I/O issues per thine SAR stats.

You have massive "queuing" on yer disks!

c0t6d0 even show I/O queing in the 500+!

The culprit aer thise Gazillion Tiny files good sir. Backinf those up as Files with *ANY* backup software is NOT going to do the trick. The best is to back it up RAW r even split mirror it...

My few cents.
Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
Karl oliver
Frequent Advisor

Re: disk bottle neck

Thanks for comments so far people
Note
With my two messages there are 2 attachments
My first message with attachment shows web server machine- this has the high busy and millions small files

My 2nd message with attachment shows database server machine- this has the high avque for disks
Steven E. Protter
Exalted Contributor

Re: disk bottle neck

Shalom,

I recommend you see about relocating your database to faster storage. index and data and oracle redo should be on raid 1 not raid 5 storage. Raid 5 parity nine requires 45 writes to disk for every write operation. Stuff queues fast.

I would then look at the dba and see if there is inefficient sql running on the database.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Zinky
Honored Contributor

Re: disk bottle neck

Karl,

What kind of "SAN" storage do you have sir? I assume both your Web Server (wit the Gazillionitis) and the DB Server use the same "SAN" Storage?

The "high" utilization pctgs on your WebServer's disk means that particular disk is servicing a fairly high IOPS (io's per second) - which is NOT bad per se IF your disk can take it or IF there is no marked performance issues. If there is -- then look at maybe reorganizing your webserver tree -- maybe separate DocRoot, Logger and whatever subtrees so those are spread out amongst many disks and not just one disk.

The massive queing on your DB Server means that disk in question is severely degraded on the SAN. It may additionally mean that the DiskGroup or ArrayPool on your SAN STorage Array to which that Disk was "carved" from may as well be heavily loaded and it is possible some other Server connected to your SAN is "hogging" I/O out of that array group.

So Checking your SAN (Array) is also one area you need to look at.

HTH Sir.
Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
Karl oliver
Frequent Advisor

Re: disk bottle neck

thanks for tips, This part of san is dedicated to just this server.However we have discovered that we a connected to slower system what they call Tier-2 (SATA).
so we are moving disk to Tier-1 (FC) . I have asked for explanation as to differences.
Our San is managed by a different company.
Also the file system in quesion has
20440162 used i-nodes.
Is there a limit where suddenly performance degrades Substantially when a certain threshold has been reached for inodes ?
We are looking at splitting this file system up as you suggested.
Should our SAN management people be able to monitor performance of SAN and advise us of bottle necks ?
Bill Hassell
Honored Contributor
Solution

Re: disk bottle neck

> Tier-2 (SATA). so we are moving disk to Tier-1 (FC) . I have asked for explanation as to differences.

SATA is an inexpensive interface and typically, very large disks (hundreds of GB) are setup as lower performance (tier 2) since there are fewer disks for the same amount of storage. Fewer disks means less parallel paths for random access to the storage.

FC is typically fibre channel and is much faster than SATA. Coupled with a lot of smaller disks, many more paths can exist for the same amount of storage.

> Also the file system in quesion has
20440162 used i-nodes. Is there a limit where suddenly performance degrades Substantially when a certain threshold has been reached for inodes ?

inodes are simply pointers to chunks of disk space. The inode count is unimportant as you could have millions of files or just a few very large files -- the inode count will be similar. Decades ago, inodes were important. Today (with VxFS filesystems), how you organize the filesystem is much more important. One directory with a million files will be very difficult to use.

As with any performance question, you have to characterize the usage. Are files constantly opened, closed, created and removed, or are the files being read sequentially? Are there specific files that are very busy? The key is to spread the disk activity across different paths.

> Should our SAN management people be able to monitor performance of SAN and advise us of bottle necks ?

Unfortunately, they can only tell you about the busy disks or LUNs. Translating a set of LUNs into specific filesystem and directories will require Glance. Note that Tier 1 storage can help but it costs more per megabyte and is usually limited in size. It does make a lot of sense to map a very busy volume group to premium (Tier 1) storage and the rest to slower storage.

And finally, Glance's "bottleneck" alerts are simply annoying. I turn off the alerts and then look at I/O rates on each filesystem. When I see a continuous high access rate, I'll switch to I/O rates for the device files. This will indeed map to LUNs in the storage array.


Bill Hassell, sysadmin