1833790 Members
2759 Online
110063 Solutions
New Discussion

bogus Sar output

 
SOLVED
Go to solution
Theresa Patrie
Regular Advisor

bogus Sar output

Hi All,
I am writing a script to analyze the last two 15 minute increments for CPU usage. I have been stripping out the last two timestamped lines from a sar command and all has been working, or so I thought. I check at the top of the hour, quarter past, half past and then three-quarters past. At every three-quarters past, some of my machines report a bogus entry and others do not. Here is an example of what I am seeing:

edlhp123-root $ remsh edlhp119 sar
GOOD-----------------------------------------
HP-UX edlhp119 B.11.11 U 9000/785 10/25/02

12:00:00 %usr %sys %wio %idle
12:15:00 0 0 0 100
12:30:00 0 0 0 100
12:45:00 0 0 0 100

Average 0 0 0 100
edlhp123-root $ remsh edlhp120 sar
BAD!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
HP-UX edlhp120 B.11.11 U 9000/785 10/25/02

12:00:00 %usr %sys %wio %idle
12:15:00 0 2 9 90
12:30:00 0 0 0 99
12:45:00 0 0 0 100
12:00:00 100 100 100 100

Average 0 0 0 0
----------------------------------------------
The results for edlhp119 are as I would expect, but the results for edlhp120 show an entry at the end for 12:00 with 100% everywhere and the average is wrong too. Anybody know what could be happening here?? It only happens after the XX:45 entry and on some machines some of the time. I'll be leaving soon, but look forward to reading the replies on Monday.
Thanks in advance!
Theresa
This is my easy job!
16 REPLIES 16
Theresa Patrie
Regular Advisor

Re: bogus Sar output

I am surprised that I haven't gotten any responses. Hasn't anybody else seen bad sar data?? I have been digging a little and have found that the bogus data seems to follow a reboot. I am running sar two different ways and both catch the errors. My crontab looks like this:
0 6-18 * * 1-5 /usr/lib/sa/sa1 900 4
5 18 * * 1-5 /usr/lib/sa/sa2 -s 7:00 -e 18:01 -i 900 -A

I have attached the sar29 output from October 29. I cannot figure out where these values are coming from. Note the data at 14:15, 16:00 and 18:00. One other interesting point is the reboot status from shutdownlog. Notice the time of the reboots.

13:53 Tue Oct 29, 2002. Reboot:(by mrl-ws5! root
14:56 Tue Oct 29, 2002. Reboot:(by mrl-ws5! root
15:23 Tue Oct 29, 2002. Reboot:(by mrl-ws5! root
16:46 Tue Oct 29, 2002. Reboot:(by mrl-ws5! root)
16:59 Tue Oct 29, 2002. Reboot:(by mrl-ws5!root)

(These are valid reboots being done by another sysadmin). Obviously the reboots that cause the workstation to be down at the top of the hour will prevent the sar for the following hour to run, but it doesn't seem like a reboot during the previous hour should corrupt the sar data for the next cycle. Anybody have any ideas??
This is my easy job!
John Poff
Honored Contributor

Re: bogus Sar output

Hi Theresa,

I saw your original post and I didn't respond because I didn't have worthwhile to add.

We had several systems down on Saturday for maintenance, so I looked at one of them (rp8400 running 11.11) and I found the same symptoms in 'sar' after a reboot. I found a patch (PHCO_24477) for sar for 11.11 but it doesn't seem to address the issue. I've attached the output from sar from our box, which was down for a few hours and rebooted in the afternoon. Note the entries for 13:00. Weird!

Maybe time to call HP?

JP
John Poff
Honored Contributor

Re: bogus Sar output

Hi again,

Oops. Forgot to post my 'sar' attachment. Here is is.

JP
James Murtagh
Honored Contributor

Re: bogus Sar output

Hi Theresa,

I have seen a few occurences of these things in the past, mostly fixed with patches.

There is a patch on 11i for bad values of avque and avwait which you are seeing in your results. Whether they are causing the other results to be out I don't know.

PHKL_27200 s700_800 11.11 sar shows incorrect values for avwait, avque

Also, the results also consistently show up high disk IO/filesystem IO values, with the buffer cache read and write hit ratios 0% in all cases where the high activity is seen.

I have set up your cron jobs on my 11i workstation to see if I can reproduce this.

Regards,

James.
Theresa Patrie
Regular Advisor

Re: bogus Sar output

James & John,
Thanks for your replies...they were both helpful. I think I will apply the PHKL patch first and see if it clears up. If not, I will apply the PHCO patch as well.
James...good find on PHKL_27200, I couldn't find that on a patch search, but is at the ftp site. Also, if you are trying to reproduce this, you'd have to do a few reboots because I think that is the only time it happens.
I'll post again when I find out how the patches work.
Thanks,
Theresa
This is my easy job!
Theresa Patrie
Regular Advisor

Re: bogus Sar output

Well, we already had PHCO_24477 installed as part of a bundle, so that does not fix it. I installed PHKL_27200 and still see the bogus data in sar after a reboot. Time to call HP!
This is my easy job!
steven Burgess_2
Honored Contributor

Re: bogus Sar output

Theresa

James does in fact work for HP

Regards

Steve
take your time and think things through
James Murtagh
Honored Contributor

Re: bogus Sar output

Ah, rumbled again.

I've managed to reproduce this on my system, after a reboot as predicted. I'll look into this a bit more now.

Also, I'll check tomorrow at work if this problem has been reported and a fix is on the way, or will raise the change request if not.
Vincente Fernandes
Valued Contributor

Re: bogus Sar output

Check /var/adm/sa dir. Are there files like "sa30" fox. Oct 30? Are this files been created on a daily basis. you can use scripts from /usr/lbin/sa in your cron table and they will automatically create files for you on a daily basis.
I have seen the problem before. The immediate solution is to remove or copy the bad sar file and run script /usr/lbin/sa/sa1 which will create a new sar data file under /var/adm/sa dir.
Theresa Patrie
Regular Advisor

Re: bogus Sar output

Good work Steven...I was tempted to give you 10 points for that one!! James...you should show that sought after HP logo with pride!! Thanks for checking into it. I have entered a call to the support center.
Vincente...I have just started the sar reporting, so all the files in /var/adm/sa are newly created files. Something else is going on here.
Thanks All. I'l post any info I get from the support center.
This is my easy job!
James Murtagh
Honored Contributor

Re: bogus Sar output

Hi Theresa,

I've done a bit of checking....looks like the problem lies with the restart record, something that has happened in all other major hpux releases too. See PHCO_25174 for hpux 11.00 for a full description.

Think the hp logo is only for permanent staff! :-(

Cheers,

James.
James Murtagh_1
New Member
Solution

Re: bogus Sar output

Hi Theresa,

No doubt you will already have an answer from HP but here is the work around :

Create a startup script in /sbin/init.d with the following input, or tailor it as you see fit :

#!/sbin/sh

case $1 in
start_msg)
echo "Start sar"
;;

stop_msg)
echo "Stop sar"
;;

start)
MATCH=`/usr/bin/who -r|/usr/bin/grep -c "[234][ ]*0[ ]*[S1]"`
if [ ${MATCH} -eq 1 ]
then
/usr/lbin/sa/sadc /var/adm/sa/sa`date +%d`
fi
;;
esac

Change the permissions to enable it to be executed.

Link this to say /sbin/rc2.d/S669perf.

Upon reboot this will now add a restart record to the sar data and you should now see normal values.

The problem had to do with the pstat() function taking the old values from the sar file.

Regards,

James.


Theresa Patrie
Regular Advisor

Re: bogus Sar output

Thanks for the info James. The response from HP was that it is a known issue, JAGaa72902, that is being worked. They will contact me when they have a solution. I will definitely try your workaround because I don't know how long it will take HP to resolve this. Thank you very much!
Theresa
This is my easy job!
Theresa Patrie
Regular Advisor

Re: bogus Sar output

Hi James,
I've implemented your workaround on one of our test stations. I am not sure which data I prefer, with or without the workaround. I guess it is a toss-up because both seem to produce bad data. Neither of these machines has anybody logged in or much of anything running, yet I still see wierd numbers after a reboot. I will cross my fingers for a fast fix from HP. Just thought you might like to see the data. Thanks again, Theresa

-------------WITH WORKAROUND IMPLEMENTED---------------------
mrl-ws5-root $ sar

HP-UX mrl-ws5 B.11.11 U 9000/715 11/08/02

06:00:00 %usr %sys %wio %idle
06:15:01 0 0 0 100
06:30:00 0 0 0 100
06:45:00 0 0 0 100
07:00:00 0 0 0 100
07:15:00 0 0 0 100
07:30:01 0 0 0 100
07:45:00 0 0 0 100
08:00:00 0 0 0 99
08:11:01 HP-UX restarts
21 30 23 26
08:15:00 71 23 2 4
08:30:00 23 4 0 72
08:45:00 0 0 0 100

Average 5 2 0 93
mrl-ws5-root $ ll /etc/rc.log
-rw-r--r-- 1 root root 18301 Nov 8 08:11 /etc/rc.log
mrl-ws5-root $ tail -n 1 /etc/shutdownlog
08:07 Fri Nov 8, 2002. Reboot: (by mrl-ws5!root)

-------------WITHOUT WORKAROUND IMPLEMENTED---------------------
# sar

HP-UX mrl-ws1 B.11.11 U 9000/715 11/08/02

06:00:16 %usr %sys %wio %idle
06:15:15 0 0 0 100
06:30:16 0 0 0 100
06:45:15 2 2 0 96
07:00:16 0 0 0 100
07:15:15 0 0 0 100
07:30:15 0 1 0 99
07:45:15 2 2 0 96
08:00:16 0 0 0 99
08:15:16 201 100 100 100
08:30:16 68 12 5 15
08:45:15 0 0 0 100

Average 201 100 100 100
# ll /etc/rc.log
-rw-r--r-- 1 root root 62362 Nov 8 08:17 /etc/rc.log
# tail -n 1 /etc/shutdownlog
08:07 Fri Nov 8, 2002. Reboot: (by mrl-ws1!root)
This is my easy job!
James Murtagh
Honored Contributor

Re: bogus Sar output

Hi again Theresa,

To be honest, the figures you reported after the work-around look OK. As the system starts up there will be a lot of IO and system calls as all the daemons and rc scripts are started.

You may want to move the startup script that we created to a higher run level, possibly after CDE is started in rc3, which may give you lower results as most of the system activity will already have taken place.

Regards,

James.
Theresa Patrie
Regular Advisor

Re: bogus Sar output

Hi James,
Yes, I realize that what I am seeing is the system startup, but what I am really after is how much the machines are being used by the user community. Those startup values skew my data because basically I am focusing in the %idle parameter.
Thanks again!
Theresa
This is my easy job!