Operating System - HP-UX
1844150 Members
2363 Online
110229 Solutions
New Discussion

MC/SERVICE GUARD Problems

 
khilari
Regular Advisor

MC/SERVICE GUARD Problems

Hi people, i want to know that what kind of problems do u guys face with regards to mc service guard. in an environmnet where ignite-ux and service guard are key components. What problems can be anticipated on a daily basis..
5 REPLIES 5
HGN
Honored Contributor

Re: MC/SERVICE GUARD Problems

Hi

Service guard is pretty stable and you should not be seeing any issues if all the pre-requesties are in place. Keep looking at the package logs on a daily basis to make sure things are fine time the environment gets stable.

Try the failover once in a while to make sure everything is working fine.

There are quite a few threads of service guard which you can take a look at ,few of them are
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=969824
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=965665

Rgds

HGN
Geoff Wild
Honored Contributor

Re: MC/SERVICE GUARD Problems

I don't have any problems. :)

As long as you plan well, and build to spec, you should have a cluster that runs and runs.

From an ignite perspective, make sure that your lan0's are all on the same subnet and "visible" by the ignite server - else you will need to setup boot helpers if you ever need to re-ignite a server.

Rgds...Geoff
Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.
Tiziano Contorno _
Valued Contributor

Re: MC/SERVICE GUARD Problems

The things I usually "think" to forget or that make me pay more attention:

Same UID and GID for every user;
rcp each config and start/stop script modified once in a while;
Just to be coerent the same minor number for /dev//group files;
Master the export import thing when extending/creating VGs paying attention to the PRI and ALT order in the imported volume (doing vgreduce/vgextend).

Regards.
Julio Yamawaki
Esteemed Contributor

Re: MC/SERVICE GUARD Problems

Hi,

I have been using MC/SG for about 10 years and I have no big problems with it.
I think what you need is a procedure to check good cluster conditions (and anticipate any problem that can occurs):
1. Check failover at least once every 6 months to check if package is in a "good" condition in the FO node
2. Check mount points for filesystems in a daily basis (don't forget crontab...)
3. I have made some scripts that checks cluster conditions: if package is not in the primary node, if some of the nodes are not able to run the package, if package switching is allowed and others. Daily.
4. I customized some scripts from HP to send e-mail every time some condition changes, like package starting, package run, etc.
5. I check daily some others configuration problem, like printers, users and groups. Kernel configuration too, of course, using crontab, but if you have a scheduler, you can use it.
6. Another thing is to check syslog every day to see hardware problem, like mirrors, lan cards and others that can affect cluster operation.

Regards,
A. Clay Stephenson
Acclaimed Contributor

Re: MC/SERVICE GUARD Problems

First, you need a unified system of UID's, GID's, host name resolution, service names (e.g. /etc/services). There is a Mickey Mouse approach to doing this (copying files across hosts) or there is an elegant way to do this. (NIS, NIS+, or LDAP). DNS can be used for hostname resolution or you may incorporate that as well into NIS, NIS+, or LDAP. NIS is the easiest to implement but the least secure; NIS+ is secure but to be obsoleted in the not too distant future; LDAP is both secure and will be around for a long time to come.

Now for the surprise, there are essentially zero problems with MC/SG. In fact, by the time you are ready to deploy MC/SG, you should have your hardware, OS, and applications so robust that MC/SG very seldom comes into play. After years of running MC/SG, I have literally never had a package failover other than those that were deliberately induced. During that same period, I've had tens of disk failures, network failures, cabling problems, etc. but those were handled by LVM, for example, long before MC/SG needed to come to the rescue. It's ironic that when you do MC/SG "right" you very seldom need it ---- and that's why you buy MC/SG and put all the work into building clusters.
If it ain't broke, I can fix that.