1833783 Members
3078 Online
110063 Solutions
New Discussion

Re: SAR vs Glance

 
Bhaskar Luitel
Advisor

SAR vs Glance

I am running sar as well as glance ( both 5 sec interval)against some XP256 LUNS , and see a drastic difference in the outputs.

sar shows the 'aserv' ( avg servtime) in the range of 40-50
whereas glance shows it at 3-4 range. Which one to beleive ?

I have heard some people saying that Glance takes some proprietary HP stuff into account. Should that make such a huge difference? Please let me know if anyone has encountered with this issue before or anyone knows the answer.

thanks in advance.
14 REPLIES 14
Andy Monks
Honored Contributor

Re: SAR vs Glance

Typically, I'd always believe glance. It's had some special hooks written into the kernel , whereas sar uses the pstat() system call. At the end of the day, they 'should' both get the same information and from previous experience looking where they do get the info, they are usually pretty close. However, I do believe that the averages maybe worked out differently.

The other thing might be what one is using as the start time over the other one. Using an array averages 30-40 seem a bit high unless it's being hammered. Could you post some of the sar and glance data so I can have a look?b
Bhaskar Luitel
Advisor

Re: SAR vs Glance

the array was hammered until yesterday. we made some changes to the array config ( created a LUSE out of 6 luns and FS/LVM on top of that ) and now glance shows the huge difference in 'avserv'.

But somehow sar (so far) is not able to capture it. The following is the output of sar -d 5 5 ( the avg)
14:39:57 device %busy avque r+w/s blks/s avwait avserv

c7t9d4 100.00 1.40 229 3818 7.12 39.93
c9t10d2 100.00 1.20 260 4272 9.54 52.68


thanks for your help

Alan Riggs
Honored Contributor

Re: SAR vs Glance

I also would tend to favor glance, but I would also check to make sure that you have the latest glance pathces applied to your system. It would be helpful to look at some side-by-side data comparisons. Are your users experiencing any noticeable delays in I/O response?
Andy Monks
Honored Contributor

Re: SAR vs Glance

Looks like sar needs to go back to school :-

c7t9d4 100.00 1.40 229 3818 7.12 39.93
c9t10d2 100.00 1.20 260 4272 9.54 52.68

the avserv is meant to be avwait / avq, which isn't 39.93 or 52.68. It should be 5 and 7.95.

Either that, or they have changed the formula since I last looked at the source code.

Therefore believe GLANCE :-) and look for sar patches
Bhaskar Luitel
Advisor

Re: SAR vs Glance

User response time is normal like before , but most of the night jobs did not finish. This is where we had the problem earlier too.
And to us this is boolean , finished or not finished. we have no way of telling which stage they are in.

the aservtime shown by 'sar' is same as before. even when the users were hammered online users did not notice , only many2 of the batch jobs could not complete.
Bhaskar Luitel
Advisor

Re: SAR vs Glance

what would have been the difference which only glance could see not sar? does any sar patch address this issue ? I checked the definition of sar ( man patches) as well as glance ( help pages). it is same.

thanks for your help
Andy Monks
Honored Contributor

Re: SAR vs Glance

While getting a beer from the fridge, I suddenly realized I divided when I should have multipled.

avserv = avq * avwait. Still, sar is way out!
Anthony deRito
Respected Contributor

Re: SAR vs Glance

This information may not be of help to you but I have just compared the disk services times of several disks on my system and the values are almost equal for both sar and glance. We are using:

Glance/Plus B3692A version C.02.15.00

sar:
/usr/bin/sar:
$Revision: 76.3.1.7 $
PATCH_10_20: sar.o 98/04/23

What versions are you using?
Alan Riggs
Honored Contributor

Re: SAR vs Glance

I don't believe you have the correct information on sar, Andy. My understanding is that avwait refers to the time a request waits in queue before gaining access to the disk. Avserv refers to the time it takes to service each request once it reaches the disk controller (seek, rotation, data transfer).

So . . . sar may still be misreporting, but it is not possible to determine that just from those two lines. Can you post the glance data and sar data side-by-side for the LUNs in question?
Andy Monks
Honored Contributor

Re: SAR vs Glance

I definitaly hadn't looked at the source for a while.

Alan is right. avserv is completely different and can't be calcuated as I did.

So, it's likely it that it's related to the different ways they obtain their data.
Alan Riggs
Honored Contributor

Re: SAR vs Glance

Actually, now that I think of it what are you looking at in glance to determine the average serv time? I do not recall that metric being available in glance.
Bhaskar Luitel
Advisor

Re: SAR vs Glance

There is a metric avaibale called 'dsk avg svctime' the help page of which describes it in the same way as 'sar -d's '. However this metric wasn't enabled by default.


the following sone number from glance and sar

device name dsk_avg_svctime util% phys/i/o

0/2/0/0 ... 4.64 100% 215.4 19.2
1/0 ..... 4.15 100% 220.2 16.2




sar -d

c7t9d4 100.00 3.68 247 4062 20.82 63.24
c9t10d2 99.20 3.50 202 3372 11.13 30.45


these devices files will map to the HW address above ( in glance)

Bhaskar Luitel
Advisor

Re: SAR vs Glance

Glance C.02.30.000 HP GlancePlus/UX
sar:

-r-xr-xr-x 1 bin bin 36864 May 12 1998 sar
ftwhpn3:/usr/sbin# what sar
sar:
$Revision: 82.1.1.3 $
PATCH_11_00: sar.o 98/05/12

thanks for your help
Alan Riggs
Honored Contributor

Re: SAR vs Glance

Ah, thanks for the info. I have never investigated non-default glance monitors before. From a quick look, I think that the metrics might actually be measuring different things. Glance uses teh midaemon, which pulls information from the KI trace buffers. Sar uses pstat which queries kernel counters. The glance help page for BYDSK_AVG_SERVICE_TIME states: "service time is measured from the perspective of the kernel, not the disk device itself. For example, if a disk device can find the requested data in its cache, the average service time could be quicker than the speed of the physical disk hardware." Sar, I believe, reports only physical disk queries in the sar -d output. If my understanding is correct, then what you are seeing represents the difference between service time for all requests, including those that hit cache, and service time for requests that go to disk.

Which raises the question: how are your cache hit rates, especially for the period of your large nightly batch jobs?