Planning
Showing results for 
Search instead for 
Do you mean 

General Planning for Active/Passive failover

Occasional Contributor

General Planning for Active/Passive failover

I have a pair of N class servers hooked via fiber channel to an EMC.
I have done a vgexport/vgimport to copy the volume groups from one to the other.
The Primary server runs an IP address and a secondary fail over ip address.

What are some of the things I need to be concerned about when I fail over from one to the other. I am thinking primarily of
1) Unix file systems
2) Raw database file systems
3) Creating the spoof ip address
4) anything else that anyone could point out.

Any pointers of just general advice would be appreciated. - keith
6 REPLIES
Honored Contributor

Re: General Planning for Active/Passive failover

Hi Keith,

This is exactly what Service Guard (MC/SG) is designed to do. It handles the VG move, asserts the virtual IP & can be scripted to do many other things. Trying to do this sort of thing w/o MC/SG can be quite daunting.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Valued Contributor

Re: General Planning for Active/Passive failover

One bit of info :
automounters have the ability to do replicated automount entries - i.e. you can specify more than one location, and it will mount whichever is closest/fastest/etc

Use AutoFS instead of standard automount as the benefits are highly needed in your case like multithreading,parallelism,etc
The sufficiency of my merit is to know that my merit is NOT sufficient
Occasional Contributor

Re: General Planning for Active/Passive failover

Actually, we disabled MC/service guard because it caused more outages than it saved.

My take on service guard and pretty much any other high availability software is that it is very fragile.
Honored Contributor

Re: General Planning for Active/Passive failover

Hi (again) Keith,

Well, just like a house, it's only going to be as solid as the foundation it's built upon.

By "damage" I assume you mean that the packages failed over a lot causing downtime.
Or that packages did not start/stop cleanly.
These are symptoms of the cluster config not being as stout as it could be. For example, if you ran your heartbeats on the public network & the traffic was always high AND your heartbeat timeouts were low, then YES it's going to failover strictly due to network congestion. Solution? Put the heartbeats on private nets or increase the timeout values.
Now the package startup/shutdown is a tougher nut to crack, but basically you have to know just how the environmant needs to be set & what - if any - dependencies need to be fulfilled for the SW to start/stop cleanly & then meet those needs.

It's not easy setting up a cluster from scratch, but a well configured cluster requires very little admin after implementation.

I really believe that if you thought MC/SG caused "damage" then you're going to have a real eye-opener trying to do this without using clustering SW.

My 2 cents,
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Acclaimed Contributor

Re: General Planning for Active/Passive failover

Actually, MC/ServiceGuard is very robust and certainly does not cause more problems than it fixes --- but you do have to do your homework. One of the biggest problems you will face in trying to build a "homegrown" solution is 1) detecting an actual failure and 2) preventing more than one system from accessing the shared data. UNIX will happily allow multiple systems to mount the same disks and filesystems with absolutely disasterous results.


I humbly (well, maybe not so humbly) suggest that you go back and spend more time with MC/SG and find your problems there. Remember, thousands of man-hours have already gone into fixing the same problems that you are trying to address.

Now for confession time: I have never had a 'real' package switch on my current cluster and I'm at over 4 years of zero unplanned production downtime. During that time, I have had many disk failures, network failures, power failures, HVAC failures --- but no downtime. MC/SG's greatest feature is the discipline that it demands. Many problems (like failed disks) are addressed long before MC/SG comes into play.


If it ain't broke, I can fix that.
Occasional Contributor

Re: General Planning for Active/Passive failover

Well, not to beat a dead horse, but with MC/service guard, installed by HP, we had 8 outages over a 3 yr period caused by SG, none caused by hardware/os.
Since I say that service guard is not an option, does anyone actually have any useful advice.
//Add this to "OnDomLoad" event