- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- monitor SG with nagios
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-02-2007 08:33 AM
тАО08-02-2007 08:33 AM
Working with nagios 2.9 on HPUX 11.23
IA systems.
Configuring nagios to monitor. Question - how to keep tabs on SG. When it fails over then I need to watch the other node.
I have a couple of ideas but I am looking for more. Anybody out there doing so?
Many thanks!
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-02-2007 09:52 AM
тАО08-02-2007 09:52 AM
Solutionnagios scripts for hpux
http://tinyurl.com/2oz9hw
the check_heartbeat script in combination with the nagios log file script (check_logfiles) could be a start.
Hope this helps a bit,
Robert-Jan
Ps. I have some other nagios links I will check tomorrow. If I find something usefull I will add a message.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-02-2007 12:30 PM
тАО08-02-2007 12:30 PM
Re: monitor SG with nagios
I personally would consider using cmviewcl output to monitor the status of service guard.
Whether its up or down can be monitored with a simple grep scrip as well as failovers.
failovers are also noted in the /var/adm/syslog/syslog.log file.
I'm sure nagios can do it but you may need to write your own nagios monitor script. I'd look at two issues, the serviceguard daemon and the status from cmviewcl.
SEP
back after a week off the net.
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2007 01:20 AM
тАО08-03-2007 01:20 AM
Re: monitor SG with nagios
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2007 02:43 AM
тАО08-03-2007 02:43 AM
Re: monitor SG with nagios
there are basically two ways how you could monitor the packages' state, either by active or passive checks.
Are you familiar with NRPE (Nagios Remote Plugin Executor), and NSCA (Nagios Service Check Acceptor)?
The first are actively executed by your Nagios server through (usually an inetd spawned) nrpe daemon on a monitored remote host,
while the latter are (compareble to SNMP traps) executed on demand on the monitored host (or a distributed other Nagios server, e.g. to bridge a firewalled LAN) and are sent to the central Nagios server (via send_nsca) to an nsca daemon that (usually inetd spawned) lingers there and pipes the results of passive check commands into the servers command FIFO.
I tried both with our HP-UX MC/SG clusters,
and both work well.
For nrpe checks I have written a small Nagios plugin that I attached to this posting.
This needs to be placed in the directory where your Nagios check commands reside on the SG cluster node, and an appropiate check command needs to be defined in the nrpe.cfg file of that nrpe host, which could look like this
# grep check_SG_PKG_STATE /usr/local/nagios/etc/nrpe.cfg
command[check_SG_PKG_STATE]=/usr/local/nagios/libexec/check_sg_pkg_state.pl -i sms
Note, in this definition I "ignored" a test cluster package sms that I only set up to play and experiment with SMS notifications.
I wrote my plugin such, that it parses the cmqueryconf command to build up a hash that holds all the primary nodes of any package in the cluster.
The check itself simply parses cmviewcl to match the package distribution according to configuration with the current one (expcept for "ignored" packages, as sms in above example).
If anything deviates a critical state is signalled.
For this to work you need to define the user that the nrpe daemon executes under to be part of the monitor role in your cluster config, so that he may execute the non-distructive sg commands cmviewconf and cmviewcl.
e.g.
# grep nrpe /etc/inetd.conf /etc/services
/etc/inetd.conf:nrpe stream tcp nowait tivoli /usr/local/nagios/sbin/nrpe nrpe -c /usr/local/nagios/etc/nrpe.cfg -i
/etc/services:nrpe 5666/tcp # Nagios Remote Plug-in Executor
This may look pretty daft, that the user here is called tivoli.
Yes, we used to start with Tivoli monitoring but shifted to Nagios for obvious reasons ;-)
# cmviewconf | sed -n '/Access Policy/,/role:/p'
Cluster Access Policy Information:
user name: tivoli
user host: CLUSTER_MEMBER_NODE
user role: monitor
When you have set up all correctly,
from you Nagios server you simply can execute
e.g.
$ check_nrpe -H samoa -c check_SG_PKG_STATE
SG_PKG_STATE OK - pkg1 up vaila enabled running, pkg2 up vaila enabled running,
pkg3 up samoa enabled running, pkg4 up lanai enabled running
In your Nagios server you then could define a service similar to this:
define service {
use generic-service
service_description SG_CLUSTER_PKGs_UP
servicegroups sg_services
hostgroup_name sg_clusters
normal_check_interval 15
check_command check-nrpe!check_SG_PKG_STATE
contact_groups sg_admins
}
Now, for passive checks you best would place the send_nsca command call as Stephen suggested in each sg package's start/stop script in the customer_defined_*_cmd function bodies.
e.g.
NSCA_CLIENT=/usr/local/nagios/libexec/send_nsca_with_mcrypt
NSCA_CONF=/usr/local/nagios/etc/send_nsca.cfg
NSCA_SERVER=123.123.123.123
NSCA_PORT=5667
customer_defined_halt_cmds() {
printf "%s;%s;%u;CRITICAL - MC/SG Package %s halting on %s\n" \
$PACKAGE sms_pkg_state 2 $PACKAGE $(uname -n) \
|LD_LIBRARY_PATH=/usr/local/lib \
$NSCA_CLIENT -H $NSCA_SERVER -p $NSCA_PORT -d ';' -c $NSCA_CONF
}
Note, the LD_LIBRARY_PATH is just a kludge to satisfy a bad compilation of the send_nsca binary.
For this way you need to configure your Nagios server to accept passive checks,
which means mainly
accept_passive_service_checks=1
in your main nagios.cfg.
Also you need to set up nsca in the inetd of your nagios server.
$ grep nsca /etc/inetd.conf
#nsca stream tcp nowait nagios /opt/sw/nagios/bin/nsca nsca -c /opt/sw/
nagios/etc/nsca.cfg --inetd
Note, my Nagios server used to run on an AIX box, why the syntax may slightly deviate from HP-UX's inetd.conf.
For passive checks you could define a service similar to this
define service {
use generic-service
service_description sms_pkg_state
servicegroups sg_clusters
host_name sms
;notification_options c,r,u
notifications_enabled 0
;contact_groups nagiosadmin,admin_mobile,service_center
contact_groups nagiosadmin,admin_mobile
max_check_attempts 1
is_volatile 1
active_checks_enabled 0
passive_checks_enabled 1
check_freshness 0
check_period never
check_command passive-check-pad
}
Here it is important to set is_volatile and
max_check_attempts to 1.
Happy Checking
Ralph
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2007 03:35 AM
тАО08-03-2007 03:35 AM
Re: monitor SG with nagios
Usually you would add host definitions to your nagios server config for every single package which is reachable by its relocatable or virtual IP adress (VIP for short).
Because you already have defined (as I hope) host checks which simply run the check_host command
Note, check_host is a hard link to the check_icmp command, the latter of which must be owned by root and have the suid bit set.
This is because only root may emit ICMP packages.
Now you must know that check_icmp behaves quite differently when invoked as check_host.
Just run the --help option on both invocations to find out.
The main difference however is that check_host regards the check to be OK already as soon as the first ICMP package has returned whereas check_icmp would wait for every packet to return.
This can be a big performance boost.
Also never, ever define a check_interval in your host definitions as this could impair the performance significantly.
This is what the Nagios doc says about it:
check_interval: NOTE: Do NOT enable regularly scheduled checks of a host unless you absolutely need to! Host checks are already performed on-demand when necessary, so there are few times when regularly scheduled checks would be needed. Regularly scheduled host checks can negatively impact performance - see the performance tuning tips for more information. This directive is used to define the number of "time units" between regularly scheduled checks of the host. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
So I have defined this host template
which I simply "use" with every new host definition, and where I only overwrite those directives that need to be overwritten
(a bit OO like).
define host {
name generic-host
alias Host Class Definition
register 0
max_check_attempts 5
active_checks_enabled 1
passive_checks_enabled 0
check_period 24x7
check_command check-host-alive
obsess_over_host 0
check_freshness 1
freshness_threshold 1800
event_handler passive-check-pad
event_handler_enabled 0
flap_detection_enabled 0
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 0
contact_groups pb22_unix
notification_interval 30
notification_period 24x7
notification_options d,u,r
notifications_enabled 1
process_perf_data 0
}
And specifically this is what my check-host-alive definition looks like
define command {
command_name check-host-alive
command_line $USER1$/check_host -H $HOSTADDRESS$ -t 15 -c 10000
}
Of course, for our contacts we usually supress all host notifications by a
"host_notification_options n" in the contacts_template.cfg
(note, that pre 3.X Nagios versions lacked a configuration directive by which you could prevent to be flooded by service alerts for a failed host, where only a host alert would suffice for the admin to be alerted).
Also one has to consider that the failover of an SG package could be performed much quicker than until nagios had verified a hard state change to critical.
Therefore, I would rather concentrate on service checks for services that are running on your SG packages.
Often you have a database or some similar service running under a certain VIP.
Then it is much better to run specific checks like e.g. check_oracle via nrpe (for it requires an sqlplus binary).
Or you can download for free from Oracle the so called instant client which contains a working sqlplus binary without the need for a full blown Oracle installation.
Place this on your Nagios server and you even can do without nrpe by running checks directly from your nagios server to any Oracle DBMS.
So if any of the package bound services fails
you will be notified about these failures (given a max_check_attempts or a retry_check_interval that would result in a sooner critical hard state change than the package has failed over ;-)
Also consider service dependency definitions for all those services that form a common cluster package.
This will reduce notifications to a sane amount.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО08-03-2007 06:20 AM
тАО08-03-2007 06:20 AM
Re: monitor SG with nagios
Many thanks!
My ideas were from the output of cmviewcl but I also like the Heartbeat idea as well.
I am using the nrpe for remote checking. I see that HP is including nagios and nrpe with the iexpress but the versions are at 2.0. I downloaded the source and compiled all (plugins & nrpe). Using gcc I compiled the latest stable with no troubles. I found lots of references to problems with check_swap not working but I am having no problems.
The nagios itself runs on a CentOS 5 server (HP ProLiant DL360) and the nrpe and plugins I have for Linux, Solaris, and HPUX. The HPUX is compiled for PARISC 2.x. I am not going to bother with anything lower since these are not supported and we have these systems scheduled for decom.
Ralph - much appreciated on the on the script and config.
SEP - hope you are on vacation. Have a great time off the net.
Again, many thanks to all of you!