1771005 Members
2611 Online
109003 Solutions
New Discussion юеВ

MC/ServiceGuard

 
SOLVED
Go to solution
Subramanian_2
Occasional Advisor

MC/ServiceGuard

Hi,
I am new to MC/Serviceguard. I have an application which I need to be configured on a failover cluster using MC/Serviceguard.

I read the documnetation and found that there are two basic files that i need to configure:-
Package Configuration file and
Package Control File

However is it necessary to configure 'Package Control File'

My application has the following basic needs
(It is a database application):-
1)If the application fails on a node because of hardware errors, it needs to restart on a designated alternate node
2) If the application fails on a node because of software errors, it needs to restart on the same node

From what I read, I understood that Condition 1 can be taken care of by modifying the Package Configuration file
But I am not sure if condition 2 can be taken care of by modifying the Package Configuration file.

I read about restarting services by modifying the package control service. Can someone please explain thye difference between a package and a service?
Also I could gather that I need to write two scripts - for 'run' and 'halt'
Do we have any scripts for monitoring, probing, updating etc. (these are provided by Sun Clustering software)

Please help.

Regards,
Subramanian S
11 REPLIES 11
Geoff Wild
Honored Contributor

Re: MC/ServiceGuard

A package is a set of applications and/or services. And yes, you need a package control file.

A good doc is:

http://docs.hp.com/hpux/onlinedocs/B3936-90073/B3936-90073.html

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Kent Ostby
Honored Contributor

Re: MC/ServiceGuard

The first place to look of course is the managing ServiceGuard manual.

http://docs.hp.com/hpux/ha/index.html#ServiceGuard

See also the attached document on setting up a ServiceGuard package.

Some of the links on the attached doc may not work since they are internal to HP, but the basic document will give you a lot of what you need.

Also, welcome to the HP-UX forums.

Best regards,

Kent M. Ostby
"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"
Jeff Schussele
Honored Contributor

Re: MC/ServiceGuard

Hi,

A package is used to start & stop the software.

A service is used to monitor critical processes of the SW & then perform actions (restart, failover, notify, etc.) if any of them disappear for any reason. Or a service can monitor the OS for resources or abnormal events as well and perform defined actions.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Rita C Workman
Honored Contributor

Re: MC/ServiceGuard

Recommend reading the threads the other folks pointed you to and, if possible, attend MC/SG training given by HP. It is very helpful.

With that said....
Your configuration file is your .ascii file and this controls the basics of which node it will failover, if failover is auto or manual, if failback is auto or manual.
Your control file (.cntl) is a lenghty script that is modified to include your volume groups/lvol/mountpoint and any start/stop scripts for your application you have selected. It can also include other commands you run before or after your application.

For failover due to hardware...well that is pretty standard. But your point about if the application fails. That is another thing. Your application (Oracle/SAP/Informix - whatever) could fail and it will stay right where it is, unless you code something to keep checking to see if the app is up. So don't....the package will still be up, and you (or whoever) will have to clean up the hung processes & clear resources and restart the appl. Hopefully after they fixed the reason it just fell first.

Rgrds,
Rita
Subramanian_2
Occasional Advisor

Re: MC/ServiceGuard

Hi,
Thanks for all your prompt reponses. Now I have got a better idea of things. I still have questions regarding restarting a failed package. My cluster application has already been configured on Sun Cluster 3.1 for failover. Sun cluster provides the following methods for monitoring and restart:-
monitor_start
monitor_stop
monitor_check

What basically these do is implement a timeout function and a user defined decision making function to first check if the application is running and then decide on whether to restart locally or failover.

Do we have similar methods in MC/Serviceguard wherien I can fillin my own probe finctions?

Of what I know, MC/Serv.. only provides methods for Run and Halt.

Help please

Re: MC/ServiceGuard

I have to say that the manuals, which are pretty good for most of Serviceguard, are pretty poor when it comes to describing how to actually set up an application within a package.

The thing to remember is that for the package component of Serviceguard, it's all scripts so you can pretty much customise it to your hearts content. The other important point is that YOU can change it! (not like Sun Cluster where Sun PS have to get involved - I once remember a Sun Cluster which took 2 weeks to get Sun PS to change an IP address!).

Anyway, I'm assuming you've had a look at a skeleton package control file as created by running:

cmmakepkg ├в s /etc/cmcluster/pkg1/pkg1.cntl

...or whatever.

The two important parts in this are the functions customer_defined_run_cmds() and customer_defined_halt_cmds(). This is where you start and stop your app, as you seem to have already worked out... straightforward so far.

What about monitoring the status of an application? Well you were already pretty close to the answer to this when asking about the difference between packages and services. A package is a group of data(logical volumes/filesystems), processes and IP address(es) that make up an application. A service is part of a package in that it is started and stopped by the package - but Serviceguard can take special actions based on what happens to services.

What most people use services for is exactly what you want - monitoring applications. So how do we set one up?

First of all a service needs to be defined in the package configuration file AND the package control script. Lets assume we are creating a simple process monitor for your application (does the database process exist in the process table - i.e. 'ps -ef | grep ...' - you get the idea!) First of all we need some lines in the pkg1.conf file:

SERVICE_NAME db-mon
SERVICE_FAIL_FAST_ENABLED NO
SERVICE_HALT_TIMEOUT 300

That's a name for our service, how long it should be given to stop on a SIGTERM (kill -15), and whetjer failure of the service should immediately cause this system to reboot (often referred to as transfer-of-control or TOC in HPUX speak).

Now we need to add some lines to our control script, you'll find a commented out example already in there, but we will add something like this:

SERVICE_NAME[0]=db-mon
SERVICE_CMD[0]="/etc/cmcluster/pkg1/db-mon.sh"
SERVICE_RESTART[0]="-r 2"

So thats the same name as from the conf file, the name of a script which actually contains your monitor commands, and an indication of what to do if the script exits - the '-r 2' means attempt to restart the monitor twice before issuing a failover alternatively '-R' means, just keep restarting the service infinitel

I am an HPE Employee
Accept or Kudo
Solution

Re: MC/ServiceGuard


Now the hard part! you need to write a monitor script that does something useful! The typical bones of a monitor script would do something like this:

while true
do
Is process active? If so continue
if not attempt to restart, and then exit
done

Of course they can get a lot more complex than that if you want to monitor more things apart from the fact that a bunch of processes are running - I've written monitor scripts which attempt to update critical tablespaces in an oracle database, and time out if not updated in 30 seconds. I've attached a generic script I sometimes use as a starting point for this sort of thing.

So what happens is, your set all this up, and then if your software fails the script detects this, restarts the software and then exits - when serviceguard detects that the service has exits, it will restart the script the defined number of times before initiating a failover. So the upshot of all this is pretty much the same as with the Sun Cluster option (only in my opinion a lot more configurable!)

That said, I can't emphasise enough how important it is to read the Serviceguard manual, and attend a course - it amazes me that people will spend such huge quantities of money on creating a serviceguard environment and then not train their people to run it - A more illustrious member of the forums than me made an analogy recently between sysadmins and pilots - you wouldn't want someone with no training piloting your $50M jumbo jet would you?

Anyway this has probably raised as many questions as it has answered, but hopefully its been usefull.

HTH

Duncan

I am an HPE Employee
Accept or Kudo

Re: MC/ServiceGuard

wierd... I had to submit that message in two parts - kept getting a message 'error while submitting reply' not sure why...


I am an HPE Employee
Accept or Kudo
Subramanian_2
Occasional Advisor

Re: MC/ServiceGuard

Hi Duncan,
Your reply as well as the attached script were very helpful. Would be great if you can also attach your package control, configuration scripts and related scripts.

Thanks in advance,
Subramanian S