- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Need help on output interpretation for writing...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 03:36 AM
10-12-2009 03:36 AM
Need help on output interpretation for writing a Nagios quorum server check plug-in
I am about to script a custom Nagios plug-in to monitor the availability of a quorum server from those cluster nodes that might require it as a tie breaker during cluster reformation.
A colleague of mine set up a quorum server on an RHEL Linux box
# rpm -q qs
qs-A.04.00.00-0.rhel5
Though I could merely check the existence of the qsc process(es) on the quorum server host and the port their listening socket has been bound to
e.g.
$ /usr/lib64/nagios/plugins/check_procs -c 1:4 -C qsc
PROCS OK: 2 processes with command name 'qsc'
# /usr/sbin/lsof -nP -c qsc -a -i4 -i tcp|grep LISTEN
qsc 20800 root 5u IPv4 1172935 TCP *:1238 (LISTEN)
# /usr/lib64/nagios/plugins/check_tcp -H localhost -p 1238
TCP OK - 0.000 second response time on port 1238|time=0.000085s;;;0.000000;10.000000
I thought that it would be a more realistic check if I ran the cmquerycl command from each potential quorum requesting cluster node instead.
(probably after having set USER_NAME to the uid that inetd spawns the nrpe daemon under and USER_ROLE to "monitor")
However, it isn't yet clear to me how to interpret the 195 seconds displayed, for instance.
And what does it want to convey by outputting "Replacing Quorum Server..." to stderr?
# cmquerycl -w none -l net -q asterix -c nbr02 -n $(uname -n)|tail -1
Replacing Quourm Server asterix with asterix
Quorum Server: asterix 195 seconds
If I look up the cluster config I can rather see a 120 secs polling interval and a 2 secs timeout extension.
# cmviewconf|grep -E 'qs (host:|polling|timeout)'
qs host: asterix
qs polling interval: 120.00 (seconds)
qs timeout extension: 2.00 (seconds)
How does this break down to 195 seconds?
Also I would have to check how the cmquerycl output changed if the quorum server was down or unreachable, and if the command might hang (in which case I would need to set a (alarm) timer.
I think that it would not hurt if I stopped the quorum server process for these tests since the quorum server would only be required and queried by the cluster nodes if cluster reformation was required (e.g. owe to a split brain or similar).
Is this correct, or could I inadvertently peril running cluster states?
Rgds
Ralph
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 06:16 AM
10-12-2009 06:16 AM
Re: Need help on output interpretation for writing a Nagios quorum server check plug-in
cmquerycl is used once, to build the initial cluster template file, cluster.ascii. In MS/SG it is found under /etc/cmcluster/cluster_name/cluster.ascii.
Use 'cmgetconf -c cluster.ascii /tmp/TEMP_cluster.ascii
And perform any parsing with this TEMP file.
############################
This comment confuses me and I'll need more explanation.
"...I am about to script a custom Nagios plug-in to monitor the availability of a quorum server from those cluster nodes that might require it as a tie breaker during cluster reformation...."
a) Are you saying you have a nagios application, and you want to place the responsibility of failing over in the Nagios application?
If so then this is also not how MC/SG works. MC/SG is a compliled application dependent upon the compiled binary running in the kernel.
level One : Physical Resourse
level Two : MC/SG
level Three : HP-UX
level Four : Application / Nagios
The execution of a qurom, is preformed with the heartbeat mechanism, and, as you stated, polled by the MC/SG binary ever few seconds. If one node in the cluster fails to receive a heartbeat, it fails over.
How, exactly, are you going to fit a third outside the cluster application server to initiate a failover?
And why would you want to tamper with something created by the Manufacturer to perform this function?
I guaranntee you, you would be making yourself NON-SUPPORTED by HP. And forced to go back to MC/SG if you ever needed their help. And since your company is probably paying a billion a year in support, I can't believe for a second that you have management appproval.
Finally, there would be absolutely no one that came after you, should you leave, who could support this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 06:18 AM
10-12-2009 06:18 AM
Re: Need help on output interpretation for writing a Nagios quorum server check plug-in
You are using the term 'quorum' in the same way as failover and have confused me and yourself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 07:27 AM
10-12-2009 07:27 AM
Re: Need help on output interpretation for writing a Nagios quorum server check plug-in
you have confused me too with your reply.
Nagios is a monitoring system that runs on a server of its own and that has nothing to do with ServiceGuard or any other HA clustering SW (though you can replicate the Nagios server itself for HA, but that's not meant here).
One could compare Nagios to proprietary monitoring solutions such as e.g. HP OpenView,
although OV is more or less relying completely on SNMP I assume (which Nagios can be configured to use as well, but isn't usually, the protocol of the checks being totally up to the user or rather the implementation of used plug-ins).
Also OV is more of a kind of a checks "pushing" system (e.g. SNMP traps) whereas Nagios usually is set up to behave like a checks "polling" system (although one can also set up Nagios to rely mostly on so called passive checks which would make it more similar to OV in this respect).
One could probably best think of Nagios as a check plug-ins scheduling construction kit that can be extended ad lib.
But sorry for digressing.
All I want is to check by such a custom plug-in that the MC/SG quorum server really is up and available.
I never remotely thought about tampering with MC/SG cluster logic.
The reason why I want these checks to be run from the cluster nodes (via the Nagios Remote Plug-in Executor nrpe) is for once that I simply lack (or don't want to install) an MC/SG "client" that can talk to the MC/SG quorum server in its own protocol to query its state on my RHEL Nagios server, and second that such querying would be more realistic if performed from the cluster nodes rather than from the Nagios server.
I know that the cmquerycl command is usually only run to create an MC/SG configuration template dump to be edited.
However, the qs manpage of the quorum server SW cites cmquerycl in a usage example which led me to believe that cmquerycl being the onl (scriptable) user space command which can "talk" to the QS to obtain some sort of status information that would show my Nagios plug-in that the QS is living and servicing.
I would gladly use another command to this end if you could tell me which.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 08:08 AM
10-12-2009 08:08 AM
Re: Need help on output interpretation for writing a Nagios quorum server check plug-in
LVM uses 'vgchange' in the same way with MC/S
to check for quorum not present error messages. But again, you are confusing the purpose of the heartbeat
a subnet that connects the Quorum Server to a cluster is also used for the cluster heartbeat, configure the heartbeat on at least one other network, so that both Quorum Server and heartbeat communication are not likely to fail at the same time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2009 08:41 AM
10-12-2009 08:41 AM
Re: Need help on output interpretation for writing a Nagios quorum server check plug-in
I know (btw. for host checks I use check_host which is a hard link to check_icmp), but I consider a mere check if the host that runs the QS service is up or down not sufficient
(besides, I have this checked automatically as soon as I integrate a new host into my Nagios config)
>LVM uses 'vgchange' in the same way with MC/S
>to check for quorum not present error
>messages. But again, you are confusing the
>purpose of the heartbeat
I can't follow, what has vgchange to do with the QS?
Here, we aren't using a quorum disk (where only vgchange activation/deactivation would make any sense to me)
What has heartbeat to do with QS?
As far as I have understood, a QS is only needed to fulfill a quorum when it is due, during a cluster reformation.
Or am I completely wrong?
>a subnet that connects the Quorum Server to
>a cluster is also used for the cluster
>heartbeat, configure the heartbeat on at
>least one other network, so that both Quorum
>Server and heartbeat communication are not
>likely to fail at the same time.
The QS resides in a completely separate LAN and can only be reached from the clustered nodes via the default gw but not through a NIC in the same segment.
So how should this be achieved?
Btw. this cluster's design and configuration wasn't done by me.
I am only the guy who was asked to include this thing into his monitoring.
Even if this cluster had LAN connections that could provide what you are suggesting I am by no means entitled to tamper with its configuration in such a way, I am sure.
I apologize Michael, though it may sound so,
I am not trying to be rude.
I appreciate your help very much but I can't get you, probably as much as you can't get what I am driveling about.