1826499 Members
1768 Online
109692 Solutions
New Discussion

Re: Monitoring systems

 
SOLVED
Go to solution
Paula J Frazer-Campbell
Honored Contributor

Monitoring systems

Hi to all

I am writing a script to monitor system resources and mail me warnings if certain parameters are exceeded.

My list, limit and collection method so far is :-

= 0% idle on CPU (sar -u)
> 80% user on CPU (sar -u)
> 15 % wio (sar -u)
> 50% busy (sar -d)
avwait > avserv (sar -d)
>= 90% rcache (sar -b)
>= 70% wcache (sar -b)

Does anything else come to mind?

Any suggestion gratefully received.

If you would like the completed script please Email me:-
paula@avro.co.uk

;^)

Paula

If you can spell SysAdmin then you is one - anon
8 REPLIES 8
Stefan Farrelly
Honored Contributor

Re: Monitoring systems


You should setup your script to be similar to the alarm thresholds in measureware. HP have spent some time working out good thresholds for alarms.

Heres the alarmdef file from that as an example (the one we use on all our servers);

[attached]
Im from Palmerston North, New Zealand, but somehow ended up in London...
Thierry Poels_1
Honored Contributor

Re: Monitoring systems

hi,

we also check "sar -v" and monitor nfile and nproc usage.

regards,
Thierry.
All unix flavours are exactly the same . . . . . . . . . . for end users anyway.
Ralph Grothe
Honored Contributor

Re: Monitoring systems

Hello Paula,

I am also interested in writing scripts to do my own sort of performance monitoring, or collect system metrics.
That's why I would be interested in your scripting efforts too since I'm not very experienced with HP-UX systems.
Unfortunately I haven't found time yet to get familiar with the adviser's syntax because I think if you already have MWA running you could benefit from its logging and alarm triggering mechanisms.
Have a look at the extract manpage to find out how to retrieve data from the MWA logfiles.
At the moment I attempt to get the CPAN module Perf::ARM installed, but still experience errors during the make test.
If you are into Perl you may find this module useful.
You may reach the module's author under

Regards
Ralph
ralph.grothe@lit.verwalt-berlin.de
Madness, thy name is system administration
Bill McNAMARA_1
Honored Contributor

Re: Monitoring systems

Why don't you consider using EMS which can do the same and is a free download.

MeasureWare (OV performance agent) has a very sophisticated alarming capability that
allows you to configure not just thresholds but also alarms based on
multiple metrics, symptoms and duration. This makes it very effective in
only producing alarms when there really are alarms, or to proactively alarm
before a real problem exists

Later,
Bill
It works for me (tm)
James R. Ferguson
Acclaimed Contributor
Solution

Re: Monitoring systems

Hi Paula:

If you want to develop something yourself (and that's fun, informative, and a skill-builder) then I'd add high-water marks for critical system tables ('sav -v' as Thierry suggested, too) and certainly for filesystem utilization ('bdf').

You'll want to record thresholds and alert as the threshold you define is exceeded. Give consideration to how you will send a second or third alert if the "water" continues to rise and/or if the level stays constant but the situation persists.

With my regards!

...JRF...
Paula J Frazer-Campbell
Honored Contributor

Re: Monitoring systems

Hi James

You hit the nail on the head "fun, informative, and a skill-builder".

Plus I can build it to suit me and my systems.

;-)

Paula
If you can spell SysAdmin then you is one - anon
Alan Riggs
Honored Contributor

Re: Monitoring systems

Well, since we are building for fun and profit let em suggest:

CPU threshhods should not be single event alarms. I hope you are using teh everage over an extended monitoring period. (1-5 minutes)

Run queue thressholds should be averaged over an even longer period. (2-10 minutes)

vmstat is your friend: look fro free page list getting too small and paging rate getting too high.

top directed to a file will let you grep out memory usage and high CPU processes.

If you have any problems with memory leaks, use ps with the XPG4 environment to capture memory sizes greater than the expected values.
Wodisch
Honored Contributor

Re: Monitoring systems

Hello Paula,

to add something more to your script (and yes, I am
interested in it, of course ;-) I would include even more
"sar": "sar -a" and check for excessive directory I/O
(but it seems like the "dirb/s" have been dropped in 11i)
and even some "find", to search for huge directories
(even in 11i that's still a resource hog), c/b devices
outside of "/dev", sticky bits outside your fixed list
of "official" programs.
Then, how about checking "nfsstat" if you are using NFS?
And even "grep" on the "/etc/mnttab" to find NFS-mounts
using silly blocksizes, and such?

Hopefully this thread does go on for while!

My first ?0.02 on this,
Wodisch
PS: please mail to wodisch@wodisch.de, thanks!