Operating System - HP-UX
1825789 Members
2084 Online
109687 Solutions
New Discussion

HP Script to test a Linux server's status

 
SOLVED
Go to solution
Steven E. Protter
Exalted Contributor

HP Script to test a Linux server's status

Production is a Linux AS 3 box, openssh installed public keys exchanged.

Emergency backup is HP-9000 D320 with a lot of miles but the stability of a sherman tank. Rock solid. Secure Shell installed, public keys exchanged.

Production is a web server and it locked up for no reason on Saturday, potentially causing a severe loss of service.

I'm leaving the country in a few days and although my backup is good it could take him critical hours to make the scene.

So I wrote a script which I plan to run every 15 minutes.

So far I have two tests, one of web connectivity, one of ssh connectivity.

My intent is if any of the front-line tests fail, to try and check for ping. If the server won't answer ping requests on the internal network, where they are allowed then the server is dead and the 9000 box will go production mode and take over service.

If there are problems but service can't be taken over then someone is going to get an email and an urgent page.

Here is the current code:

SVR_FAIL="N"
export SVR_FAIL

/usr/bin/ssh root@dns1.investmenttool.com "cat /etc/issue"
rc=$?
echo "Ssh test code: $rc"
if [ $rc -ne 0 ]
then
SVR_FAIL="Y"
fi

cd /tmp
/usr/local/bin/wget http://www.investmenttool.com/index.shtml
rc=$?
echo "web test code: $rc"

if [ $rc -ne 0 ]
then
SVR_FAIL="Y"
fi

if [ "$SVR_FAIL" == "Y" ]
then
# run the ping test, see if service can be taken over
/usr/sbin/ping -n 1 -m 180
rp=$?
fi

if [ $rp -ne 0 ]
then
/usr/contrib/bin/takeoverservice
fi

It always takes me so long to get to the question.

What can I do to enhance the script? Should I add more tests? What should they be? Or is the KISS principle here. The wget test should work or the ping test should be run.

I know, get serviceguard, but I don't have the time or the money.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
18 REPLIES 18
Jeroen Peereboom
Honored Contributor

Re: HP Script to test a Linux server's status

SEP,

It looks fine, but do you know if ping was successful the last time your server locked up? If it's really the webservice you must be providing, maybe the ping test is not needed and may not be desired.

Is it possible that the web test fails and the ping succeeds? You won't run takeoverservice, but is your website available? Maybe do a few wget tests in a few minutes and if all fail takeoverservice.

HtH

JP
Hoefnix
Honored Contributor

Re: HP Script to test a Linux server's status

SEP,

It looks OK.
Just one thought, when only the webservice fails at production and you are still able to ssh the box why not force a restart of this service on it original box?


Regards,
Peter
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

The last time the box locked(once in two years isn't bad) I didn't bother pinging it. It was production down, non-responsive and I used the power to get it back into production.

I was hoping for some suggestions on further tests.

Peter, very good idea. If the web fails the first thing I should do is ssh onto the box and start the web server.

The reason I don't is because there is alreay a cron job running every 15 minutes on the Linux server that checks services. There is this nice command in Linux called service that checks service status very efficiently.

service named status
gives me an indicator if DNS is running. Same for httpd. I partially ported this command to HP-UX but think it would be a nice addition to HP-UX from HP.

The box checks itself on web,mail,dns and other vital services. Its been quite efficient at bringing those services back up when I have them down for maintenance.

The problem I've probably ran into was for reasons unknonw the network connectivity was lost. If ping is down, takeover service and then email me. If ping is responding then email me and get someone on site to boot the server.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
RAC_1
Honored Contributor

Re: HP Script to test a Linux server's status

I won't trust ping. In case inetx(xinetd-linux) dies, you can still ping the server. You should use some other method to check it.

Also if there are firewalls in betwwen, and in your absence, that disable ping your script will fail giving a false alarm.

Anil
There is no substitute to HARDWORK
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

Okay RAC,

Ping is a basic test of functionality.

This script is running on an internal network. ICMP has explicitly been left on to ease diagnosis on that network.

Is ping still a problem under those circumstances?

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Geoff Wild
Honored Contributor

Re: HP Script to test a Linux server's status

Script looks good....but just a suggestion on another way to do this...

Why not install BigBrother?

http://www.bb4.org/

You can use that to test http...etc...

I use BB as a backup to OVO...

Either that - modify your script to telnet LINUXSERVER 80 and read the result.

Rgds...Geoff

Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
generic_1
Respected Contributor

Re: HP Script to test a Linux server's status

My suggestion would be to set up VNC because you can use the web tool to remotely connect to your network. Its brother TightVNC has a little better speed. Also you could set up a ssh tunnel for extra security if you like. Although its not the fastest the web access feature would be a nice way to connect in emergency from almost anywhere.

As far as ping goes I would just simply search the net for a free utility that make a beep/special noise when ping fails. That way you do not have to hope they are reading their email when the server fails. Have fun on your trip.
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

Goeff, the telnet command was in there. It requires manual input and hangs, making it inappropriate. I guess I could give it the input with a

<< EOF
^]
EOF

string, but the wget command works pretty nicely and essentially does the same thing.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Geoff Wild
Honored Contributor

Re: HP Script to test a Linux server's status

Steven - sorry - missed the wget command you had.

Now as far as network goes...do you have a script on the linux box as well? That is - if it loses it's network connection - stops services?

The reason I state that - if your dev (HP) box takes over - then all of a sudden network is restored to your Prod (Linux) - you will have 2 Prod Web servers running?

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

Geoff,

Another good point. While I was doing the Linux OS install on the production server, I had the hp box running the websites.

I had some minor issues and tried to put the Linux box into production three or four times. I noticed that if you try and bring up the HP box on all IP addresses while production was up, HP notes the error and fails to bring up the IP addresses.

If such a situation occurs I believe one or the other servers networking will drop the IP addresses. If not, I can get in on the private network and correct the situation.

In any situation where conflict can occur I should have been notified by email and will be checking on things. I have three different devices on the firewall, any of which will let me in to do work.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

Jeff,

VNC?

Brain Lock. Please define and explain.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Edgar Arroyo
Regular Advisor

Re: HP Script to test a Linux server's status

SEP,

I'm new at HP but I know some networking. If possible add a 2nd network card on both machines (primary and backup) and add a 3rd machine to check both machines, you can even move the testing on this 3rd machine to see if any of the 2 have heartbeat. If both are active that means your primary machine is back up, then turn off the backup machine. Create a class C network 192.168.x on the 2nd set of network cards and do the testing through them. If one is dead, activate the other and notify you that the "one is dead" machine is down. You can do the vice-versa. But then you're stuck if the 3rd machine (test machine) goes down... :) Then you could add to the other 2 the testing of the 3rd machine and make it somewhat redundant.
jason meech
Occasional Advisor

Re: HP Script to test a Linux server's status

i actually wrote a similar script to monitor a freind of mines web server (he's a bit of a linux newb and saves me waiting for him to call and tell me its broke)
since i was only concerned with web i only used wget to test its availability

only i had wget once direct and once using a proxy that connects out through a different ISP as it turned out frequently it was just routing problems

but your on an internal network

my advice is if web is the only service that matters thats the only one worth testing
for an alarm

althogh it would be good practice to leave the ping and ssh test just have it log the results to help you better diagnose futer crashes
(i like using perl because one can interface easily with telnet)
i would also make a basic telnet script to grab a proccess list at the last known good time

but thats just me
Volker Borowski
Honored Contributor
Solution

Re: HP Script to test a Linux server's status

Hi SEP,

I do not like to depend on a "single" ping, which can strand somewhere for any reason.
Why not make 3 or 5 and do the switch only if ALL are dead ?

rp=0
if /usr/sbin/ping
then
true
elif /usr/sbin/ping
then
true
elif /usr/sbin/ping
then
true
else
echo Three pings died !
rp=3
fi

I did not check the definition of the ping RC-codes, maybe even a "ping -n 5" gives a rc that is usefull, if a majority of packages survive.

Is there a web-application involved ? In this case, the index.html might only ensure that the webserver is up, but your application database might be disturbed... I would tend to wget-query some php-info or some serverinfo script, that ensures a databse connection is also valid and data can be fetched from the database. index.html might be cached somewhere.

Volker

Victor Fridyev
Honored Contributor

Re: HP Script to test a Linux server's status

Hi,

I'd like to add my penny.
The sequence of tests which I use is the following:
if ping ; then
if rcp (or scp); then
if remsh date( or ssh date ); then
3rdparty apps status (Oracle or Informix)
fi
fi
fi
With rcp or scp I copy test scripts for 3rd party applications and run it via remsh or ssh. Such a method requires standardization of computers and a management machine with appropriate access.

HTH
Entities are not to be multiplied beyond necessity - RTFM
Naveej.K.A
Honored Contributor

Re: HP Script to test a Linux server's status

hi,

VNC-virtual network computing

http://www.realvnc.com/

regds
Naveej
practice makes a man perfect!!!
Craig Gilmore
Trusted Contributor

Re: HP Script to test a Linux server's status

Hi,

I'm sure you've already thought of it, and eliminated the possibility of a nameservice problem?

Reading the flow... if SSH or the wget fails, then check with a ping. If ping works then do nothing. I agree that you should use more than just 1 ping to check. Or use nmap to scan the port?

You've already had a couple suggestions of doing an ssh and starting the webservice, if the ping works. Seems to me that a nameservice failure would cause the same problem.

ServiceGuard, uses the "heartbeat" which is really just a ping... You've got that design.

I'd build the whole thing into a short C program for speed and safety. But I think you have the major sections covered.

BTW: I'd get netdump configured on that Linux box and track down that mystery hang. Yes, easier said than done. :)

Good Luck!
Steven E. Protter
Exalted Contributor

Re: HP Script to test a Linux server's status

Service named status is a good test on the server itself to maintain its own health.

Although none of you have any way of knowing this is done by cron 4 times per hour. If there is a failure of many network services, the script tries to resolve them.

That can be a pain when I bring things down for diagnostics.

Because I left town without this script running the takeoverservice command(because I didn't have time to test it) when the production server halted while I was in Israel, I had to send someone to the server for a manual reset.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com