Operating System - HP-UX
1835267 Members
2439 Online
110078 Solutions
New Discussion

Scenario with MC Service Guard and disk array

 
Stephen_23
Occasional Contributor

Scenario with MC Service Guard and disk array

I have 2 servers in a cluster with SGuard. They are both connected to a shared disk array.
We are running some hardware failover tests.
The test:
Turn off the disk array and see what happens.

I did this and nothing happened from a SGuard point of view.

I received syslog messages and email with critical events.

What do you folk out there believe should happen from a SGuard point of view ?

Cheers,
Stephen
4 REPLIES 4
A. Clay Stephenson
Acclaimed Contributor

Re: Scenario with MC Service Guard and disk array

I believe that the cluster should have responded in whatever way the monitor scripts tell it to. It's up to you to monitor whatever resources you like. Bear in mind that this is not the kind of failure MC/SG is designed to handle. In fact, MC/SG doesn't even enter into this situation - this is strictly an LVM (or VxVM) problem. It is expected that the array be robust enough to handle failure of a data path and thus LVM will switch to an alternate.

If this is a single shared array and it crashes you are dead - period. Pull the plug on one of your hosts, pull network cables, ... - that's the stuff MC/SG is designed to handle. In your case, LVM should have been able to get to a mirrored array but you don't have one. In most case, arrays should be able to handle the expected failures without any help from outside.

If it ain't broke, I can fix that.
Pierce Byrne_1
Frequent Advisor

Re: Scenario with MC Service Guard and disk array

You need to force a read or write to the shared disk. It may take up to 60 secs by the time it checks all i/o paths. Once this fails your packages should start to fail. Hopefully there will be no loss of data...
Christopher Caldwell
Honored Contributor

Re: Scenario with MC Service Guard and disk array

As Clay mentioned - things depend what you tell ServiceGuard to do.

Think of it this way:
-you're running an array with RAID5
-you get a disk failure
-in SG, you can monitor for such an event an take an action based on the event
-should you fail to another host? Probably not, the array should recover because your running RAID5.

What if you get failures all the way through hot spare?
-you're toast, because the drives aren't working regardless of which host you're running

In practice, we don't configure many hardware events to trigger a failure, because
1) we've covered the failure scenario with redundancy (multiple NIC cards, multiple FC cards)
or
2) the failure would cause the box to TOC anyway, so I don't need to monitor an event.

Stephen Doud
Honored Contributor

Re: Scenario with MC Service Guard and disk array

Hi Stephen,

ServiceGuard can only monitors and react in the following 3 arenas:

1- node failure
2- lan NIC failure
3- package SERVICE failure

That's ALL!

To get SG to be able to monitor and react to other resource failures, employ and configure EMS HA Monitors via the package configuration file.

As stated previously, the underlying LVM mirroring subsystems is available to deal with an array outage.

-s.