Operating System - HP-UX
1756491 Members
2365 Online
108848 Solutions
New Discussion юеВ

Re: interpretation of 'sar -d'

 
SOLVED
Go to solution
Johan Barelds
Frequent Advisor

interpretation of 'sar -d'

Hi all,

Can someone give me some reference on the numbers below:
--
0:00:01 device %busy avque r+w/s blks/s avwait avserv
Average c1t2d0 12.42 1.12 44 391 5.91 3.9
Average c2t2d0 7.66 1.59 20 196 6.93 5.31
Average c7t9d0 9.24 2.26 19 313 10.53 11.39
Average c7t11d0 7.28 1.61 35 808 8.09 4.66
Average c5t9d0 6.22 3.44 11 201 13.57 15.55
Average c5t11d0 3.35 3.89 11 398 13.93 10.75
-----------
Are e.g. the 'avserv' and 'avwait' times long compared to 'normal' disks?
I see the numbers but i find it quite hard to interpret them in terms of load on the device.

Thanks!

Grz. Johan
Make my day..:-)
9 REPLIES 9
Sanjay_6
Honored Contributor

Re: interpretation of 'sar -d'

Hi Johan,

Here is a step by step guide on how to identify the cause of poor system performance,

http://www1.itrc.hp.com/service/cki/docDisplay.do?docLocale=en_US&docId=200000073548851

The itrc doc id is KBRC00014838.

Hope this helps.

Regds
Johan Barelds
Frequent Advisor

Re: interpretation of 'sar -d'

Hi Sanjay,

Thanks for the quick response.
Unfortunatly i can't follow the link (i am a European user). I can't find the doc either on the European site. Can you point me in the right direction?

Thanks.
Grz. Johan
Make my day..:-)
Sanjay_6
Honored Contributor
Solution

Re: interpretation of 'sar -d'

Hi Johan,

Try and see if you can open the doc using the doc id.

Try

On the left bar,

maintenance and support for hp products --> Search technical knowledge base -> Search by Doc ID -> Give the doc id mentioned.

you may need a hp support contract linked to your id to view this doc.

you can also try this link from hp docs site.

http://docs.hp.com/en/5990-8172/ch07s04.html

hope this helps.

regds

Chris Vail
Honored Contributor

Re: interpretation of 'sar -d'

In your sample c5t11d0 has twice the average wait of c2t2d0. But what does this mean? For example if c5t11d0 is twice the size (or larger) than c2t2d0, then doubling the average wait time might be reasonable. Further, are these SCSI-1,2,3, FW or any of the variants, or are these fiber or copper attached to a SAN? All of these issues can account for the speed differences. Further, how fast are the disks? If they're all the same, and they're all attached the same way, and all the same size, and all have the same amount of data stored, and the application is requesting data from the equally, then indeed you might need to do some additional investigation. But without this information we can't tell you what the numbers themselves mean.

We need to put the sar data in context. We need to know what the application is, how its laid out, relative sizes and technologies, O/S version and patch levels.

The blocks per second differences between the disks are significant. The fastest is no more than twice as fast as the slowest. This tells me that the load is fairly well balanced, but that observation is made in a vacuum of real knowledge of the system. This may or may not be acceptable.

Get us some more information and we'll try to help you understand the readings.


Chris
Johan Barelds
Frequent Advisor

Re: interpretation of 'sar -d'

Hi Chris,

The disks are used in a use with LVM.
The setting is as follows:

Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 311296 159178 142627 53% /
/dev/vg00/lvol1 94384 27968 56976 33% /stand
/dev/vg00/lvol8 2048000 1539731 479991 76% /var
/dev/vg00/lvol7 2048000 1265344 733796 63% /usr
/dev/vg00/lvol6 131072 59782 67317 47% /tmp
/dev/vg00/lvol5 1024000 432105 554973 44% /opt
/dev/vg00/lvol4 40960 33431 7529 82% /home
/dev/vg01/lvol1 143351808 116046000 27260304 81% /data

The disks c1t2d0 and c2t2d0 belong to the vg00 group. The others to the vg01 group.
The vg01 group is a mirrored group.
The vg01 group has the following PV groups:
--- Physical volume groups ---
PVG Name PVG0
PV Name /dev/dsk/c7t9d0
PV Name /dev/dsk/c7t11d0

PVG Name PVG1
PV Name /dev/dsk/c5t9d0
PV Name /dev/dsk/c5t11d0

The disk are all in a disk cabinet and attached with scsi (not sure wich type).
See a diskinfo of one of them below:
--> diskinfo -v /dev/rdsk/c5t9d0
SCSI describe of /dev/rdsk/c5t9d0:
vendor: HP 73.4G
product id: ST373307LC
type: direct access
size: 71687369 Kbytes
bytes per sector: 512
rev level: HPC3
blocks per disk: 143374738
ISO version: 0
ECMA version: 0
ANSI version: 3
removable media: no
response format: 2
--
OS is HP-UX 11.00.
The application is put on the /data/ volume. vg00 is only used for system stuff.
Homedirs are also placed on /data.

Hopes this gives you some more inside info.
Grz. Johan
Make my day..:-)
Chris Vail
Honored Contributor

Re: interpretation of 'sar -d'

You mention in your later posting that the disks on c7 are mirrored, whereas the disk on c1 & c2 aren't. Note that the average wait time on the c7 disks is about double that of the disks on c1 & c2. This is reasonable for mirrored disks.

The tipoff is that the average blocks per second is higher on the mirrored disks than on the unmirrored ones. Again, this is probably due to the mirroring, but it really says that all the drives in your system are operating at about the same speed within a reasonable tolerance. I don't see a technical problem here.

Now, if your users are complaining that the system is too slow, this is another thing altogether. The problem might be disk speed, but there are other issues to look at first.

Re-run the sar command when the users are complaining and compare with the same output when they're not. If there are no significant differences, then the speed problem is NOT disk speed, but somewhere else.

I've usually found that speed complaints are due to poor application design rather than poor systems design. If the application is designed poorly, then no amount of hardware thrown at the problem will ever speed it up significantly. OTOH: I've seen at least one instance where a database re-org resulted in a 20x increase in speed, without changing any hardware at all.

So the next questions are:
Are the users complaining?
What is the application?
Have you tried glance, vmstat, or other data collect by sar? What do these show? Compare heavily laden with unladen performance numbers.

But as for the core of your question: I don't see a problem.


Chris
Johan Barelds
Frequent Advisor

Re: interpretation of 'sar -d'

Hi Chris,

Thanks again for your clear interpretation.
I have been investigating yesterday the whole day and i came to the conclusion that we do have a CPU problem instead of a i/o problem.
Thanks to the excellent guide pointed out by Sanjay i was able to do some good troubleshooting and after beeing busy whole day with sar, iostat and vmstat it is quite obvious that whe need more processing power or (if possible) optimize the queries from our application.

I was a good excercise and i would like to thank you all for contributing your experiences and information.

Grz. Johan
Make my day..:-)
Chris Vail
Honored Contributor

Re: interpretation of 'sar -d'

Before purchasing additional or faster CPU's, I urge you to investigate tuning your application. Even a slow PA-RISC processor is still pretty fast.
I've been doing this long enough (20+ years) that whenever anyone complains about speed, my first line of inquiry is to inquire after application tuning. I have very, very rarely seen a long-running application suddenly get slow just because the CPU is too slow.
Just today I had a complaint about an elderly IBM machine that was too slow. After spending the morning investigating, I found that they had just put in a new SQL query that was taking too long. They blamed it on the computer rather than their query. I watched it as it was running, and it ran out of memory, not out of CPU ticks. The system is so old that the OS is EOL, and parts are scarce. I told them they had to either fix their query or buy new hardware. They fixed the query. Now the system is running fine.
I say that %90+ of all speed issues are application issues rather than hardware or O/S issues. Its a lot easier to blame the computer/OS than it is to write efficient code. Its not that the computer/OS is NEVER at fault, but rather its so infrequent that its almost not worth bothering with.

Chris
Florian Heigl (new acc)
Honored Contributor

Re: interpretation of 'sar -d'

Hi,

maybe You could need some numbers to compare...

I've simply run
time find / 2>/dev/null 1>/dev/null & sar -d 30 1
for explanation:
- my system has two stone aged 4.3GB scsi drives, only one of them get's searched during those 30 seconds.
- I ran this right after boot, so that the buffer cache is not too much an issue.
- please understand that even a blazingly fast 15k drive would show 75% busy in that situation, but it would be done with the task in 1/10 of the time. Your disks don't show high busy rates or extreme queue lengths, so I wouldn't point at them first.
Those disks don't have much to do at all.
(see below)

HP-UX snowwhit B.11.11 U 9000/800 01/13/05

23:35:57 device %busy avque r+w/s blks/s avwait avserv
23:36:07 c0t6d0 69.73 1.01 101 978 10.66 17.33

high busy, low read ratio, few blocks/sec -> this disk barely keeps up with its workload.
not the case with Yours, it seems to me.

to get the raw disk throughput i often just use the below.
snowwhite:/var/adm/syslog# time dd if=/dev/rdsk/c0t6d0 of=/dev/null bs=1024k >
100+0 records in
100+0 records out

real 0m21.39s
that's 5MB/s, the disk isn't quite being fast, but within it's specs.

@work current drives spit out 75MB/s, unless 'someone' connected the tape/cdrom to the fast U160 hba and the disks to UW.

With real applications the transfer rate is not that much of an issue, but it still means, that a current disk would only have needed 1/15 the time mine did. as simple as that.

anyway, like it's already been said - look at the application (maybe even think about using tusc to gather some low-level data), the disks may be a bottleneck, but in Your sar output they don't even get the chance to prove it. :)
yesterday I stood at the edge. Today I'm one step ahead.