Re: Monitoring systems

Paula J Frazer-Campbell · ‎08-30-2001

Hi to all

I am writing a script to monitor system resources and mail me warnings if certain parameters are exceeded.

My list, limit and collection method so far is :-

= 0% idle on CPU (sar -u)
> 80% user on CPU (sar -u)
> 15 % wio (sar -u)
> 50% busy (sar -d)
avwait > avserv (sar -d)
>= 90% rcache (sar -b)
>= 70% wcache (sar -b)

Does anything else come to mind?

Any suggestion gratefully received.

If you would like the completed script please Email me:-
paula@avro.co.uk

;^)

Paula

If you can spell SysAdmin then you is one - anon

Stefan Farrelly · ‎08-30-2001

You should setup your script to be similar to the alarm thresholds in measureware. HP have spent some time working out good thresholds for alarms.

Heres the alarmdef file from that as an example (the one we use on all our servers);

[attached]

Im from Palmerston North, New Zealand, but somehow ended up in London...

Thierry Poels_1 · ‎08-30-2001

hi,

we also check "sar -v" and monitor nfile and nproc usage.

regards,
Thierry.

All unix flavours are exactly the same . . . . . . . . . . for end users anyway.

Ralph Grothe · ‎08-30-2001

Hello Paula,

I am also interested in writing scripts to do my own sort of performance monitoring, or collect system metrics.
That's why I would be interested in your scripting efforts too since I'm not very experienced with HP-UX systems.
Unfortunately I haven't found time yet to get familiar with the adviser's syntax because I think if you already have MWA running you could benefit from its logging and alarm triggering mechanisms.
Have a look at the extract manpage to find out how to retrieve data from the MWA logfiles.
At the moment I attempt to get the CPAN module Perf::ARM installed, but still experience errors during the make test.
If you are into Perl you may find this module useful.
You may reach the module's author under

Regards
Ralph
ralph.grothe@lit.verwalt-berlin.de

Madness, thy name is system administration

Bill McNAMARA_1 · ‎08-30-2001

Why don't you consider using EMS which can do the same and is a free download.

MeasureWare (OV performance agent) has a very sophisticated alarming capability that
allows you to configure not just thresholds but also alarms based on
multiple metrics, symptoms and duration. This makes it very effective in
only producing alarms when there really are alarms, or to proactively alarm
before a real problem exists

Later,
Bill

It works for me (tm)

James R. Ferguson · ‎08-30-2001

Hi Paula:

If you want to develop something yourself (and that's fun, informative, and a skill-builder) then I'd add high-water marks for critical system tables ('sav -v' as Thierry suggested, too) and certainly for filesystem utilization ('bdf').

You'll want to record thresholds and alert as the threshold you define is exceeded. Give consideration to how you will send a second or third alert if the "water" continues to rise and/or if the level stays constant but the situation persists.

With my regards!

...JRF...

Paula J Frazer-Campbell · ‎08-30-2001

Hi James

You hit the nail on the head "fun, informative, and a skill-builder".

Plus I can build it to suit me and my systems.

;-)

Paula

If you can spell SysAdmin then you is one - anon

Alan Riggs · ‎08-30-2001

Well, since we are building for fun and profit let em suggest:

CPU threshhods should not be single event alarms. I hope you are using teh everage over an extended monitoring period. (1-5 minutes)

Run queue thressholds should be averaged over an even longer period. (2-10 minutes)

vmstat is your friend: look fro free page list getting too small and paging rate getting too high.

top directed to a file will let you grep out memory usage and high CPU processes.

If you have any problems with memory leaks, use ps with the XPG4 environment to capture memory sizes greater than the expected values.

Wodisch · ‎08-30-2001

Hello Paula,

to add something more to your script (and yes, I am
interested in it, of course ;-) I would include even more
"sar": "sar -a" and check for excessive directory I/O
(but it seems like the "dirb/s" have been dropped in 11i)
and even some "find", to search for huge directories
(even in 11i that's still a resource hog), c/b devices
outside of "/dev", sticky bits outside your fixed list
of "official" programs.
Then, how about checking "nfsstat" if you are using NFS?
And even "grep" on the "/etc/mnttab" to find NFS-mounts
using silly blocksizes, and such?

Hopefully this thread does go on for while!

My first ?0.02 on this,
Wodisch
PS: please mail to wodisch@wodisch.de, thanks!

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Monitoring systems

Monitoring systems