Operating System - HP-UX
1832299 Members
2290 Online
110041 Solutions
New Discussion

Possible Q with MC/SG and VxVM

 
Andrew Underhill
Occasional Advisor

Possible Q with MC/SG and VxVM

Hi Folks, got a question about some tests that I performed on a pair of Clustered SD3200s.

The servers have: UX11i, VxVM 3.5 with the latest patch set, and an ASL for the HDS9980v's that they are both connected to via a SAN. MC/SG 11.14

The tests that have caused me problems are where I disable the Primary FC to prove that DMP uses the Secondard FC. Which it does. If I then re-enable the Primary, and disable the secondary without waiting the 180 seconds of PFTO that the disks have been set to, I get all kinds of problems. I/O errors, failed/offlined disks, un umountable mounts &co.

My Q is this: when both cards are "failed" shouldn't have MC/SG failed the package over?

I have no resources defined. How do I make MC/SG failover the package, and how do I tidy up!

Any help greatfully pointed :=)
3 REPLIES 3

Re: Possible Q with MC/SG and VxVM

Andrew,

The scenario you describe is basically 2 failures - Serviceguard is designed to handle single points of failure, not multiple - so it wouldn't usually handle well the situation you described. This is exactly why we use products like DMP or LVM PVlinks.

That said, it is fairly typical for people to want to protect against this situation, and the standard way to do this is to use the EMS HA Monitors product to set up a disk monitor - ITRC Document UMCSGKBRC00012483 gives a good overview of how to achieve this, and for more background on EMS HA Monitors, see the manual here:

http://docs.hp.com/hpux/onlinedocs/B5736-90046/B5736-90046.html

HOWEVER... as far as I am aware, EMS HA Monitors doesn't support VxVM - its LVM only. SO I guess your looking at writing your own monitor. This shouldn't be too complex - all you basically need is a script which loops round interrogating VxVM every n seconds and checking that at least one path to all disks is accessible. If any disk cannot be reached via any path then exit the script (i'm afraid I don't know VxVM well enough to give you any syntax). Then you just need to define the script as a service in your ServiceGuard package, for example, in your pkg.conf file (assuming your script is called /etc/cmcluster/pkg/dmp-monitor.sh) :

SERVICE_NAME DMP-SERVICE
SERVICE_FAIL_FAST_ENABLED YES
SERVICE_HALT_TIMEOUT 300

And then in your pkg.cntl file:

SERVICE_NAME[0]=â DMP-SERVICEâ
SERVICE_CMD[0]=â /etc/cmcluster/pkg/dmp-monitor.shâ
SERVICE_RESTART[0]=â â

Notice that you MUST set SERVICE_FAIL_FAST_ENABLED to YES - this is because if some of the disks can't be accessed then there's every chance that ServiceGuard would stop responding during the file system umounts if trying to do a clean shutdown. The only way to effectively failover in this scenario is to TOC the system and start the app on the other node (which SERVICE_FAIL_FAST_ENABLED will do). Of course this means that anything else running on this node is toast - but there's no other way around IO caused by failed paths.

Hope this is


I am an HPE Employee
Accept or Kudo

Re: Possible Q with MC/SG and VxVM

not sure what happened to my quotation marks there, must be something to do with pasting from Word - the should look more like this:

SERVICE_NAME[0]="DMP-SERVICE"
SERVICE_CMD[0]="/etc/cmcluster/pkg/dmp-monitor.sh"
SERVICE_RESTART[0]=""



HTH

Duncan

I am an HPE Employee
Accept or Kudo
Andrew Underhill
Occasional Advisor

Re: Possible Q with MC/SG and VxVM

...and the points go to the person in the pointy hat!

This is just what I had thought was the case. People keep trying to view the building as an SPF, where it really is multiple.

I'd have to experiment with the import of the DG on the other node (in another building), as doing it manually recently after this self made disaster was lugubrious to say the least.

Thanks for your help. Its much appeciated.