1848023 Members
3341 Online
104022 Solutions
New Discussion

Critical I/O Bottleneck

 
SOLVED
Go to solution
Adrian Sobers2
Super Advisor

Critical I/O Bottleneck

I am using Quest Spotlight on UNIX (in freeware mode) to monitor our HP-UX production server. I am getting an alert which says:

Critical I/O bottleneck: CPU wait time is consistently high (93%) indicating a disk bottleneck. Check Disk Utilization and Service Time to determine which disk(s) are contributing.

My question is how do I go about this? From the bdf command, the disk(s) have plenty of free space, so what else should I look for?

Looking forward to your assistance as usual.

Thank You.
27 REPLIES 27
Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

Try iostat

Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

ummm thanks, the output from iostat is:

device bps sps msps

c1t2d0 0 0.0 1.0
c1t0d0 0 0.0 1.0
c2t0d0 0 0.0 1.0
c2t2d0 0 0.0 1.0


and this indicates what exactly?
Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

You need to run iostat over a period of time

iostat t n

where t = number of seconds to wait between snapshots, n = number of snapshots to take

e.g.

iostat 60 10

Will give you a snapshot every minute for 10 minutes.

Look at bps in particular to see which disk is busy. See man iostat for further details.

Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

I have attached the output from:

iostat 10

Now the monitoring program is indicating that there is no diskbottle neck any longer. It seems to be sporadic then? Is there anything else I can/should look at before leaving this issue. I have other things to look at today...
Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

You really need to run the iostat when Quest Spotlight is upset, but looking at your output you are getting the odd spike on c1t2d0, is this your root/swap disk?
Alzhy
Honored Contributor

Re: Critical I/O Bottleneck

Adrian,

The correct command to determine if the system indeed has a serious I/O bottleneck is to use sar - specifically the syntax:

sar -d 5 10

means to collect per disk statistics every 6 seconds 10 times.

Look at avque (anything greater than 0 is bad) avwait and avserv (anything above 20 consistently is bad).

What kind of storage are you using? Do you use LVM/VxVM and do do you stripe your Oracle storage?
Hakuna Matata.
Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

Nelson,

Here is the output from the command you suggested.
Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

Still looks like c1t2d0 is the busy disk. :)
Alzhy
Honored Contributor

Re: Critical I/O Bottleneck

Adrian,

What is on c1t2d0? I suspect that's your culprit disk.

You may want collecting sar data over time.

mkdir /var/adm/sa
and add to root's crontab:

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /usr/lib/sa/sa1

Over time, simply do a sar -d (to get the current day's disk sar stats) or sar -d -f /var/adm/sa/saNN (for day NN's sar stats).

If you could give use your LVM or VxVM config the better..

Hakuna Matata.
Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

We use LVM, attached is some documentation on the server. It might answer most of your questions.


Alzhy
Honored Contributor
Solution

Re: Critical I/O Bottleneck

Adrian..

I think you simply have a server that has very slow disk subsystem for your kind of application. Either get more controllers in and more disks and stripe/mirror your Oracle storage ...
Hakuna Matata.
Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

How do I find out what is on c1t2d0?

I do not see any indication of it in bdf or ioscan commands?

Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

Do you have access to the sqldba application? If so can you run

SQLDBA> MON SYSTEM IO
Alzhy
Honored Contributor

Re: Critical I/O Bottleneck

The long term sar stats should tell you more.

One more thing - this is a production server and yet your OS subsystem and most other filesystems are not even mirrored. Only Oracle archive log filesystem is mirrored - I suppose your environment can take downtime since you're simply protecting Oracle archive logs?
Hakuna Matata.
Alzhy
Honored Contributor

Re: Critical I/O Bottleneck

TO find out what is on c1t2d0:

Do this:

pvdisplay -v /dev/dsk/c1t2d0 2>/dev/null|grep "current"|awk '{print $3}'|awk -F\/ '{print $NF}'|sort|uniq
Hakuna Matata.
Stephen Keane
Honored Contributor

Re: Critical I/O Bottleneck

If you have root access you can issue the command ioscan -Cdisk -fn

This will match the hardware path to the device file showing you what the H/W path for c1t2d0 is, in your case 0/0/1/1.2.0 which in your document maps to VG00.

If you do pvdisplay /dev/dsk/c1t2d0
you should see /dev/vg00

Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

Nelson,

Assuming that I cannot get more disks or controllers. How do I go about mirroring other information such as OS information etc?

Alzhy
Honored Contributor

Re: Critical I/O Bottleneck

There's an abundance of topics for howto's on this forum.

If this environment is realy not a big one. You can try to configure your current 4 disk environment as follows:

controller 1: c1t0d0,c1t1d0
controller 2: c2t0d0,c2t1d0

OS Disk+swap+some oracle stuff (ie. orasoftware and archive...)
VG00: - c1t0d0 + c2t0d0
Oracle Data:
VG01: c1t1d0 + c1t2d0

Or if you've deeper pockets - get a FibreChannel disk enclosuer system -- ie. DS2405.
Hakuna Matata.
Victor BERRIDGE
Honored Contributor

Re: Critical I/O Bottleneck

Hi Adrian,
In order to understand your configuration, I suppose we would need also to know the output of vgdisplay -v vg00|grep dsk and vgdisplay -v vg01|grep dsk
and also your swap usage:
swapinfo -tam
and the pvdisplay -v without the physical extents of all disks would be helpfull


All the best
Victor
Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

Victor,

attached is the information you required (except the last bit). I'm not sure about the command.

Victor BERRIDGE
Honored Contributor

Re: Critical I/O Bottleneck

Hi Adrian,
If you are unable to mirror the system disk (disk space scarce...), I would go to load oracle and other software on one disk but not connected to the same controller as the vg00 system disk, AND the 2 others each on a different controller also in vg01 AND use LVM STRIPPING for your oracle DATA...
In this way you solve your risk of contention onone disk by balancing the load through 2 controllers

All the best
Victor













Victor BERRIDGE
Honored Contributor

Re: Critical I/O Bottleneck

Just saw your zip file,
well pv was at 0, but doesnt mind for now,
I vote:
vg00 c1t2d0
vg01 c2t2d0 for software like oracle etc NO DATA
s2vg03 c1t0d0 AND c2t0d0 # s2 for striping on 2disks...

What do you think?
Adrian Sobers2
Super Advisor

Re: Critical I/O Bottleneck

Victor,

I appreciate your help but I would have to do some serious reading up on what this would involve. Remember we do not have Online JFS. Also I am still relatively new to both Oracle and UNIX and since I was thrown in the deep end for both products, I'm still learning.

How would you go about this process if you were new?
Victor BERRIDGE
Honored Contributor

Re: Critical I/O Bottleneck

I ment s2vg02...
Now looking more in depthat your config:
1) I would never on non RAID scsi create logical volumes of the size of lvDATA this may be one IO issue.. for performance and recovery(it can happen..) I woulnd create if not on SAN a file system greater than 30GB
once s2vg02 created with the disks c1t0d0 and c2t0d0 in it:
start creating your first stripped lv:
lvcreate -i 2 -I 64 -L 30000 -n lvol1 /dev/s2vg02
then:
newfs -F vxfs /dev/s2vg02/rlvol1

etc...

Good luck
Victor