Insight Control for Linux
cancel
Showing results for 
Search instead for 
Did you mean: 

ICE for Linux 6.3 Nagios problem

SOLVED
Go to solution
Allen012
Advisor

ICE for Linux 6.3 Nagios problem

I have installed and setup HP ICE for Linux on a Red Hat 5.6 system.  Everything seems to be working except for Nagios.

 

The system name (icelx1) is "sm701".  Nagios insists that the host is called "nh", and fails on connecting to it. 

 

I have tried adding "nh" as an alias in both the host file and in DNS, with no change.  Because Nagios cannot find "nh" it blocks access to all of the other servers.

 

How can I change the definition of icelx1 from "nh" to "sm701"?

36 REPLIES
Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Nagios always uses "nh" for the Linux CMS and this cannot be changed.  Can you say more about the error you're seeing. e.g. Does the Nagios UI not come up?

 

Can you run "shownode info" on your Linux CMS and send me the output.


Can you run "/etc/init.d/nagios status" on your Linux CMS and send me the output.


Thanks,

Donna

 

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

 
Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

Nagios is trying to connect to server nh - server nh does not exist!  Server sm701 exists!

 

Nagios show node nh as DOWN and blocking all others

Status is "Failure to lookup host nh"

 

"shownode info" gives a table with all nodes

icelx1        | [ipaddress of node]  |  [fqdn of node]     | [ip of ilo]   | ILO3

. . .

 

/etc/init.d/nagios status

Checking for nagios

NAGIOS OK: 1 process, status log updated 15 seconds ago

supermon (pid 31394) is running...

mond (pid 31389) is running...

gathering status for nrpe ... icelx[1-8]

             Ok NRPE v2.12 - sm701 [ and all of the other nodes]

Nagios nsca:

sm701: [ ssh banner]

sm701: 0 data packet(s) sent to host successfully

 

 

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

Please note:

 

I have stood up a second instance of ICE for Linux in another network, AND HAVE THE IDENTICAL PROBLEM!

 

This strikes me as broken software.  It is not working as advertised.  It is not working as noted in the Installation and configuration Manual, and it is not working as outlined in the User's Guide.

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Can you please run the following command on your CMS and send me the output.

 

# /opt/hptc/nagios/libexec/db_get_node_status nh

 

Thanks,

Donna

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

BTW, I've never seen this problem before.  When you click on "nh" in the Nagios UI does the IP address for "nh" match the IP address in the "shownode info" output for icelx1?

 

How many NICs do you have on your CMS?  I'm wondering if you have multiple NICs and for some reason that's confusing Nagios.


I'm bringing in another developer to look at this.  We'll figure this out.

 

Donna

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

/opt/hptc/nagios/libexec/db_get_node_status nh

 

OK - sm701: rta 0.012ms, lost 0%|rta=0.012ms;150.000;200.000;0; pl=0%;10;80;;

 

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

I have two NICs, but the "nh" address is the same as the "shownode info" address. It is also the default route, and the subnet where the other (client) servers are.

 

I get the same failure message for all of the hosts (CMS and clients) "Failure to lookup host xxx"

 

DNS is working right - forward and reverse

/etc/hosts is correct, and agrees with the DNS

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

I'll work with the developer tomorrow to figure out next steps for troubleshooting this issue.  In the mean time, can you email me the contents of  /opt/hptc/etc/sysconfig/nagios (donna.firkser@hp.com).  And send me a copy of the Nagios UI which shows this error.

 

Thanks,

Donna

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Was thinking we might find the problem if you run nagios in debug mode.  Use these steps to stop nagios and restart nagios in debug mode.   Please forward along any messages you see after starting nagios.

 

 /etc/init.d/nagios stop_nagios

cd /opt/hptc/nagios/bin

NAGIOS_DEBUG=1 ./nagios ../etc/nagios_local.cfg

 

Thanks,

Donna

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

/opt/hptc/etc/sysconfig/nagios:

 

NAGIOS_MONITOR_HOST=icelx1

NAGIOS_SUBMIT_HOST=[ip address of sm701]

NRPE_HOST=icelx1

NSCA_HOST=[ip address of sm701]

HTTPD_HOST=icelx1

NAGIOS_MASTER=icelx1

NAGIOS_MASTER_IP=[ip address of sm701]

CPACCESS_HOST=icelx1

NAGIOS_USER=nagios

NAGIOS_GROUP=hpadm

HTTP_GROUP=apache

 

 

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

Did the debug run. Found that I was getting permission issues. Changed permissions on some directories and files which where owned by root. Started getting connections.

It appears that there is a significant issue when installing ICE into a hardened Linux environment. In that environment the umask for root is set to 0077. That means if the install script does not check permissions, any directory or file created by root during the install is going to have permissions of 700 (rwx --- ---).

Nagios runs as the nagios user, and therefore does not permissions to access directories and files that it needs.

The install procedure and scripts need to be changed to check and/or correct the permissions on all of the directories and files created.
Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Glad to hear you found the issue.  At a minimum, for the next IC-Linux release we're working on (i.e. 7.0), we'll document that users with a "hardened Linux environment" may need to change the permissions on the directories used by Nagios. 

 

Is Nagios now working as expected?

 

Thanks,

Donna

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

DO NOT document as "need to change permissions" There are too many and they are too hard to find - I am still looking. Either emphasize that they MUST set the umask for root to 0022, or else (preferred) in the install script you need to check it and/or change it for the duration of the install.

 

I am still having some problems - trying to run them down.  There is also an issue in the way that the configuration scripts (try to) modify the sudoers file.  My hardening prevents it, and that causes some of the sensor scripts that rely on sudo to fail.  It would be better if the changes were specified up front, or otherwise made more visible in the documentation.

 

I am having "unintialized value" errors in the /opt/hptc/supermon/bin/storeMetrics perl script (line 109), and multiple

      'Argument "NRPE" isn't numeric in division (/) at /opt/hptc/nagios/libexec/check_node_config line 128.'

errors, among others.

 

If I thought that I could do a total "clean" uninstall I would do it and try again, but it has been my experience that the uninstall scripts do not delete all of the affected directories.

 

Still working the issue.

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

There's an uninstall.sh procedure in the Install Guide which outlines which directories you need to manually clean up.  We create a copy of the entries Nagios adds to /etc/sudoers in case you need to update this file on your own.  Does this help.

 

 # cat /etc/sudoers.icelx.proto
# IC-Linux requires Default requiretty to be disabled # HP-HPTC-defaultrequiretty
Cmnd_Alias CHECKALLSSHKEYS = /opt/hptc/nagios/libexec/check_keys # HP-HPTC-KeySync
Cmnd_Alias CHECKSYSLOGALERTS = /opt/hptc/nagios/libexec/check_syslogalerts # HP-HPTC-SysLog
Cmnd_Alias CHECKSFS = /opt/hptc/nagios/libexec/check_sfs # HP-HPTC-SysLog
Cmnd_Alias CHECKLSF = /opt/hptc/nagios/libexec/check_lsf # HP-HPTC-CheckLSF
Cmnd_Alias CHECKICMP = /opt/hptc/nagios/libexec/check_icmp # HP-HPTC-CheckICMP
Cmnd_Alias CHECKSEL = /opt/hptc/nagios/libexec/check_sel # HP-HPTC-CheckSEL
Cmnd_Alias CHECKSELMON = /opt/hptc/nagios/libexec/check_selmon # HP-HPTC-CheckSELMON
Cmnd_Alias CHECKLVS = /opt/hptc/nagios/libexec/check_lvs # HP-HPTC-CheckLVS
Cmnd_Alias SENSORS = /opt/hptc/supermon/bin/sensors # HP-HPTC-Sensors
Cmnd_Alias CHECKHOSTS = /opt/hptc/nagios/libexec/check_node_status # HP-HPTC-CheckNodeStatus
Cmnd_Alias RRDSWSETUP = /opt/hptc/cacti/rrd_switch_setup # HP-HPTC-RrdSwitchSetup
Cmnd_Alias SWITCHPOLLER = /opt/hptc/nagios/libexec/switch_poller # HP-HPTC-SwitchPoller
Cmnd_Alias SCONTROL = /opt/hptc/bin/scontrol # HP-HPTC-scontrol
Cmnd_Alias POWEROFF = /opt/hptc/sbin/power # HP-HPTC-power
Cmnd_Alias HPASMCLI = /sbin/hpasmcli # HP-HPTC-hpasmcli
Cmnd_Alias MDADM = /sbin/mdadm # HP-HPTC-mdadm
Cmnd_Alias MCELOG = /usr/sbin/mcelog # HP-HPTC-mcelog
nagios ALL = NOPASSWD:  CHECKALLSSHKEYS,CHECKSYSLOGALERTS,CHECKSFS,CHECKLSF,CHECKICMP,CHECKSEL,CHECKSELMON,CHECKLVS,SENSORS,CHECKHOSTS,RRDSWSETUP,SWITCHPOLLER,SCONTROL,POWEROFF,HPASMCLI,MDADM,MCELOG # HP-HPTC-Nagios


 

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

The "Configure Management Services" process does not put any where close to that many entries into the sudoers file, just the following:

Cmnd_Alias HPASMCLI = /sbin/hpasmcli # HP-HPTC-hpasmcli
Cmnd_Alias MDADM = /sbin/mdadm # HP-HPTC-mdadm
Cmnd_Alias MCELOG = /usr/sbin/mcelog # HP-HPTC-mcelog
nagios ALL = NOPASSWD: HPASMCLI,MDADM,MCELOG # HP-ICELX-mond
Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

I don't find the /etc/sudoers.icelx.proto file in /etc, or anywhere else on my CMS.
Donna Firkser
Regular Advisor
Solution

Re: ICE for Linux 6.3 Nagios problem

/etc/sudoers on the managed nodes only has a few entries. i.e.

 

# HP-ICELX-mond: This is required for monitoring to function when run as the nagios user
Cmnd_Alias HPASMCLI = /sbin/hpasmcli # HP-ICELX-mond
Cmnd_Alias MDADM = /sbin/mdadm # HP-ICELX-mond
Cmnd_Alias MCELOG = /usr/sbin/mcelog # HP-ICELX-mond
nagios ALL = NOPASSWD: HPASMCLI,MDADM,MCELOG # HP-ICELX-mond

 

/etc/sudoers on the CMS must have the following entries for Nagios to work properly.

 

# IC-Linux requires Default requiretty to be disabled # HP-HPTC-defaultrequiretty
Cmnd_Alias CHECKALLSSHKEYS = /opt/hptc/nagios/libexec/check_keys # HP-HPTC-KeySync
Cmnd_Alias CHECKSYSLOGALERTS = /opt/hptc/nagios/libexec/check_syslogalerts # HP-HPTC-SysLog
Cmnd_Alias CHECKSFS = /opt/hptc/nagios/libexec/check_sfs # HP-HPTC-SysLog
Cmnd_Alias CHECKLSF = /opt/hptc/nagios/libexec/check_lsf # HP-HPTC-CheckLSF
Cmnd_Alias CHECKICMP = /opt/hptc/nagios/libexec/check_icmp # HP-HPTC-CheckICMP
Cmnd_Alias CHECKSEL = /opt/hptc/nagios/libexec/check_sel # HP-HPTC-CheckSEL
Cmnd_Alias CHECKSELMON = /opt/hptc/nagios/libexec/check_selmon # HP-HPTC-CheckSELMON
Cmnd_Alias CHECKLVS = /opt/hptc/nagios/libexec/check_lvs # HP-HPTC-CheckLVS
Cmnd_Alias SENSORS = /opt/hptc/supermon/bin/sensors # HP-HPTC-Sensors
Cmnd_Alias CHECKHOSTS = /opt/hptc/nagios/libexec/check_node_status # HP-HPTC-CheckNodeStatus
Cmnd_Alias RRDSWSETUP = /opt/hptc/cacti/rrd_switch_setup # HP-HPTC-RrdSwitchSetup
Cmnd_Alias SWITCHPOLLER = /opt/hptc/nagios/libexec/switch_poller # HP-HPTC-SwitchPoller
Cmnd_Alias SCONTROL = /opt/hptc/bin/scontrol # HP-HPTC-scontrol
Cmnd_Alias POWEROFF = /opt/hptc/sbin/power # HP-HPTC-power
Cmnd_Alias HPASMCLI = /sbin/hpasmcli # HP-HPTC-hpasmcli
Cmnd_Alias MDADM = /sbin/mdadm # HP-HPTC-mdadm
Cmnd_Alias MCELOG = /usr/sbin/mcelog # HP-HPTC-mcelog
nagios ALL = NOPASSWD:  CHECKALLSSHKEYS,CHECKSYSLOGALERTS,CHECKSFS,CHECKLSF,CHECKICMP,CHECKSEL,CHECKSELMON,CHECKLVS,SENSORS,CHECKHOSTS,RRDSWSETUP,SWITCHPOLLER,SCONTROL,POWEROFF,HPASMCLI,MDADM,MCELOG # HP-HPTC-Nagios

 

Donna

 

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

/etc/sudoers.icelx.proto gets created as part of the %post when we install the nagios RPM on the CMS. My guess is that it's not on your system because the "hardening" on your CMS prevented our code from updating /etc/sudoers and /etc/sudoers.icelx.proto is created based on what we put into /etc/sudoers.
i.e.
/bin/grep '# HP-HPTC-' /etc/sudoers > /etc/sudoers.icelx.proto
Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Just curious.  Are you using SELinux to do the "hardening" on your Linux CMS?

 

Thanks,

Donna

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

No SELinux.

 

Our customer provides a set of "tests" that we run against the systems, and we correct any "findings"

 

 

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Thanks.   Let us know how things are progressing with Nagios and what you had to do to make it work in your "hardened" environment.   This will help us fix the issue in a future IC-Linux release.


Donna

Donna Firkser
Regular Advisor

Re: ICE for Linux 6.3 Nagios problem

Hi,

 

Any update on your Nagios issues?  Have you been able to get things working?

 

Thanks,
Donna

Allen012
Advisor

Re: ICE for Linux 6.3 Nagios problem

The manual addition of the entries to the sudoers file seems to have fixed most of my problems getting Nagios to run. 

 

Now is need to get it to shutup.  It send hundreds of emails a day that are all of little or no use.  I want to talk only through ICE.  How do I shut it up?