Operating System - HP-UX
1824881 Members
3871 Online
109675 Solutions
New Discussion юеВ

High IO Wait and High Service Time

 
Marcelo Bazan
New Member

High IO Wait and High Service Time

The sar -u reported high %wio (>30%), but when a use sar -d to identify avwait and avserv reports, the avwait is 0.00 and service time greater than 100.00 for some disks.

I read a lot of posts about IO problem but nothing say if may be a patch problem.

How can I know if is a patch problem or really have a IO problem?

#sar -u
00:00:00 %usr %sys %wio %idle
06:25:01 53 4 32 11
06:30:03 52 2 33 13
06:35:00 53 2 32 13
06:40:01 53 4 29 15
06:45:02 53 2 33 12
06:50:00 52 3 33 12


#sar -d

06:00:01 c2t0d0 5.11 1.24 13 93 2.15 7.14
c2t1d0 3.56 1.07 11 64 1.56 4.93
c12t0d0 4.27 0.50 0 3 0.00 201.65
c12t0d1 9.80 0.50 1 23 0.00 133.82
c12t0d2 10.06 0.50 1 15 0.00 106.96
c12t0d3 11.25 0.50 1 10 0.00 176.19
c12t0d5 12.40 0.50 1 10 0.00 203.25
c12t2d1 0.43 0.50 1 6 0.00 8.22
c12t5d5 0.07 0.50 0 0 0.00 198.48
c12t2d0 6.30 0.96 5 240 3.56 46.76
c12t0d6 0.35 0.50 2 40 0.00 1.42
c12t0d7 10.75 0.50 1 10 0.00 177.14
c12t1d0 0.53 0.50 0 1 0.00 99.87
c12t1d1 5.32 0.50 2 31 0.00 58.06
c12t1d2 1.55 0.50 0 10 0.00 31.17
c12t1d3 10.22 0.50 1 11 0.00 148.73
c12t1d4 5.37 0.50 3 60 0.00 45.00
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: High IO Wait and High Service Time

Shalom,

Generally, the response is to bring your system to a recent bi-annual patch set and then take the same measure again.

If the problem was purely from patches not installed, it should improve. If it persists, then other factors such as applications and disk layout are then pursued.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
skt_skt
Honored Contributor

Re: High IO Wait and High Service Time

watch the output of sar_disk_out.

dedd13n:root [/home/kumarts/scripts] cat sar_scr
while true
do
sar 1 20 >> /home/kumarts/perf/SAR_out/logs/sar_out
echo "-----------------------------" >> /home/kumarts/perf/SAR_out/logs/sar_out
uptime >>/home/kumarts/perf/SAR_out/logs/uptime_out
echo "-----------------------------" >> /home/kumarts/perf/SAR_out/logs/uptime_out
date >>/home/kumarts/perf/SAR_out/logs/vmstat_out
vmstat 1 20 >> /home/kumarts/perf/SAR_out/logs/vmstat_out
echo "-----------------------------" >> /home/kumarts/perf/SAR_out/logs/vmstat_out
date >>/home/kumarts/perf/SAR_out/logs/sar_disk_out
sar -d 1 20 |grep -i average >>/home/kumarts/perf/SAR_out/logs/sar_disk_out
echo "-----------------------------" >> /home/kumarts/perf/SAR_out/logs/sar_disk_out
sleep 900
done


Alos i use extract command to generate the data for a particular perioed and clculste the avg service time for each job.

Marcelo Bazan
New Member

Re: High IO Wait and High Service Time


Thanks for answers!

I want to identify if I have a bug problem or performance problem! May I possible to restart de daemon that collect performance information while the machine running (stop less)? How can I do this?

I read in some posts that performance commands get statistics with pstat, but a can├в t find it this daemon specific. I found the statdaemon, but I can├в t sure if this is a daemon is responsible to improve performance statistics.

Re: High IO Wait and High Service Time

sar gets its data straight from the OS - there is no intermediate daemon to terminate/restart. Only a reboot will reset any errors in data structures that might account for this being a bug.

But why do you think this is a bug? What are the disks in question? Maybe you are overloading them or they have a problem?

HTH

Duncan


I am an HPE Employee
Accept or Kudo
Tim Nelson
Honored Contributor

Re: High IO Wait and High Service Time

Marcelo,

Your output does look strange.. The io requests are like 1-5 io/s. I find it odd that service time would be in the hundreds.

What OS version and latest patch level are you currently on ? If you are older than 1 year on patch bundles you should probably update anyway.

kenj_2
Advisor

Re: High IO Wait and High Service Time

The long service times on essentially idle LUNs may be due to a new LVM polling feature introduced with:

PHKL_34518 11.11 LVM Cumulative Patch; LVM OLR; SLVM 16 Node
PHKL_34094 11.23 LVM Cumulative Patch

The problem has been observed on XP and Hitachi arrays - it could occur on other arrays. It is a cosmetic problem and does not slow down user IO requests. New patches have added a switch to the pvchange command to disable the feature.

The following patches were released to address the problem on HP-UX 11.23:

PHCO_35524 - 11.23 LVM commands patch
PHKL_36244 - 11.23 LVM Cumulative Patch

The following GR patches were released to address the problem on HP-UX 11.11:

PHKL_35970 11.11 LVM Cumulative Patch
PHCO_35955 11.11 LVM commands cumulative patch

===============================

In your original post your sar -u output showed 32% wio. %wio is a cpu metric. It is a subset of idle cpu. When the cpu state is sampled, if it is idle but there are *any* process waiting for a disk IO to complete then the cpu state is counted as wio.

On an otherwise idle system, if there is batch, backup activity, or a short term burst of IO then this metric will increase. If the cpu is active then %wio will go down, not because there are less processes waiting for IO but because there is less idle time.

The real question is whether your disk subsytem is performing well and whether your critical processes are waiting unecessarily long for IO to complete. I would look at things like the logical IO response times for the critical processes - this is what matters to the application. Do your applications record IO response times ? Using the Glance Process screens (and a calculator) you can figure out logical response times.

Ken Johnson
Marcelo Bazan
New Member

Re: High IO Wait and High Service Time

Thanks for ALL answers!

I saw the patch details PHKL_35970 and found the descriptions:

"( SR:8606469034 CR:JAGag24285 )
On systems with patch PHKL_34518 or PHKL_34986 installed, measurement utilities such as sar(1m) may report high average I/O service times for idle multi-pathed disks.
Disks in use show no increase in average service time."

I don't have the PHKL_34518, PHKL_34986 patchs or PHKL_35970 recommended, then I don't know if could I have high average I/O service times reports.

I searched patch by avserv criteria and I found the PHKL_33858, but this I have installed in my machine.

I'll search how can I compute the logical response times by Glance and a calculator.
I think that machine haven't any patchs applied in more time, but to apply any patch I want to be certain that will resolve the problem report.

Bazan