Operating System - HP-UX
1752546 Members
4620 Online
108788 Solutions
New Discussion юеВ

Re: high avgservice time and avque value

 
SOLVED
Go to solution
Yazan Yacoub
Regular Advisor

high avgservice time and avque value

Hello every body,

I am having an issue with disk performance of one of your database servers, system connected to EVA5K, avque and service time is very high, I am not able to figure out what is the issue, is it related to Storage or to OS itself. I have attached sar outputs from the system.

and I appreciate if any one can expalin relation between avque and service time and avwait.

Many thanks
8 REPLIES 8
Jeeshan
Honored Contributor

Re: high avgservice time and avque value

hmmm, pretty much complicated.

your sar -d output says, disk busy is 100% but avwait is 0%, which means disk is highly utilized.

sar -u output shows the % wait is higher than usage.

and sar -d output is very good.


BTW, what is your dbc_max_pct and dbc_min_pct settings?


BTB, please check with vendor about the storage performance.
a warrior never quits
Steven E. Protter
Exalted Contributor

Re: high avgservice time and avque value

Shalom,

Lets look at the disk side. What application is using the disk?

A very common error is to put a write intensive application on a raid 5 disk lun. This is not good for performance. Too many places on disk to write the data.

Better raid 1 for example on oracle data, index and relog.

avque refers to the volume of transactions waiting to writ. service time has to do with its name, how long it took to get stuff done. avwait, is how long transactions had to wait to get done. high avwait times are bad. Think about changing disk configuration, and application data layout and load. Laying out the data intelligently can save a lot of wear and tear on the disks and the sysadmin.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Hein van den Heuvel
Honored Contributor

Re: high avgservice time and avque value


Please study the sar -d output carefully and try to put them in the context of the expected use and configuration capabilities.

The systems appears to be in an IO performance zone where a serious understanding is warranted.
Read up in texts like: http://docs.hp.com/en/11iv3IOPerf/IOPerformanceWhitePaper.pdf

Here is what the raw numbers show:
r+w/s : 600+ Typically that's just fine.
blks/s: 380,000+ for almost 200Mb/sec.

That exceeds the capacity of single 2gb fiber connection, and would put a significant stress on 4gb connection with deep queues and poor average service times to be expected.
It also would require a good few (20+) spindles to comfortably sink or sources that data.

You should ask the Eva on details of how it perceives the load. (throughput, responsetimes, balance, read-write ratio,...)

To understand the problem we do not only need to know EVA5K, but many more details
- What exact HPUX version + extras?
- How many ACTIVE fibre connections?
- What is the speed of those connections
- Multi-Pathing in use?
- How many drives behind the diskgroup?

If we work the numbers some more, and if they are correct, then they suggest 300+ KB/IO. That may just be fine, but it is a high number which needs to be explained and understood from an application perspective.
What is the software application? What is it (roughly) trying to do?

Finally, were there is plenty of idle time, there appear to be a relatively high SYSTEM component to the CPU consumption.

Hope this helps some,
Hein van den Heuvel
HvdH Performance Consulting.
skt_skt
Honored Contributor

Re: high avgservice time and avque value

ideally avserv below 20 acceptable.

service time collected by sar is blended/exaggerated for metas;So sar output can be taken trend of IO happening rather than exact service time.
Yazan Yacoub
Regular Advisor

Re: high avgservice time and avque value

Thanks every body,

The server is hosting oracle database, and the database is heavily used,

The OS is Solaris, and we are using Sun Stor Traffic Manager and multipathing software.

the Server have two HBA's 1gb speed, and I have monitored the HBA's, and following the max value for I/O from HBA side:
HBA1:
Received 152MB/Sec
Transmitted 12.14MB/Sec
HBA2:
Received 35.1MB/Sec
Transmitted 7.54MB/Sec

Regards,
Hein van den Heuvel
Honored Contributor
Solution

Re: high avgservice time and avque value

So you maxed out your somewhat unbalances storage connections.

Either find out how to need less (Oracle or application tuning: all those reads are a waste of time :-), of by a buy more connectivity and/or a bigger, better, (HP !?) box.
IMHO it was wrong of you to post here. The level of the problem suggest one needs a good understanding of the system (The OS and storage details) and/or or a good understanding of Oracle.

Best regards,
Hein van den Heuvel
HvdH performance consulting.

Pal_5
Advisor

Re: high avgservice time and avque value

Your sar output shows:
1. there are many (20-30) parallel IO worker process
2. average IO request size is ~ 256 kB/IO,
3. avwait is 0, probably the IO workers read from the EVA
4. sar -b shows ~ 0 bread/s (physical read),
probably this statistics has no much value due to solaris different buffering mechanism
5. slightly high system CPU usage

My theory is that you have many runaway Oracle full table scans using directIO (eg forcedirectio mount or discovered directio).
These processes content for some IPC (semaphore) resource also increasing system CPU usage.

The EVA seems healthy, ~200 MB/s sustained averege throughput is not bad.

What you (or dba admin) can do:
- indentify the runaway Oracle processes (eg by OEM or v$session etc)
- stop these (these processes have adverse effect on overall performance)
- review the corrensponding SQLs by the developers for bugs (eg. infinite loops)
- sometimes (re)indexing can help
Yazan Yacoub
Regular Advisor

Re: high avgservice time and avque value

Thanks Pal for your response, I am still struggling with this case, trying to find a solution.

Could you explain how you come up with point number 1,2,3.

Currently my target is to decrease I/O hits going to physical disk, and I am checking SGA value, and I am wondering how can I know how much I am utilizing Oracle Buffer Cache, and what is the relationship between Buffer Hit % and Buffer cache, dose the hit% tell me if I have enough Buffer cache.

Thanks