Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Volume Shadowing Copy After Reboot

 
SOLVED
Go to solution
Allan Large
Frequent Advisor

Volume Shadowing Copy After Reboot

I am trying to get an understanding of software Volume Shadowing.

In reading the documentation (which really needs to be updated), I found and have used the MOUNT/POLICY and DISMOUNT/POLICY commands to help insure that only an minicopy is performed when a shadow member is removed and then added back to the set .... when the system is not rebooted.

However, I cannot find where I can do this type of thing after a system reboot. Am I missing something or is a full copy always required after a reboot ?

Thanks in advance ....

24 REPLIES
Bill Hall
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Do you mean full merge rather than full copy? If your DSA devices a properly DISMOUNTed before the shutdown (you should probably do so in SYS$MANAGER:SYSHUTDWN.COM) they can be mounted at boot time without a merge.

I can't think of a reason why you would have a full shadow copy on boot. The shadow generation number would have to have been removed from the member being copied to.

Bill

Bill Hall
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Bill:

I am just getting a grasp on the commands to setup the volume shadowing correctly.

What we have are two itaniums, each with their own disks. One disk on each system is shadowed together. When either system reboots, the corresponding shadow member goes into a full copy.

I am reviewing the SET SHADOW/POLICY command now and feel that is going to be the answer, but I have yet to fully understand it. Am I headed in the right direction ?
Mike Kier
Valued Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

I highly recommend you poke around Keith Parris' < http://www2.openvms.org/kparris/ > page for various white papers and presentations he's done around volume shadowing and around high availability and DR in general.
Practice Random Acts of VMS Marketing
Bill Hall
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Please clarify the description of your two systems. Are they clustered? Are both systems being shutdown together?
Attaching the output from a $show shadow/full on both systems might help us understand your environment better.

Bill
Bill Hall
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

OK .. I thought I had this figured out but apparently not, so I am reopening this thread.

I have a couple of questions so I will start with this one and will progress to others later.

QUESTION: Is there a way to configure volume shadowing such that when a node in a cluster crashes and reboots, the disk on that node that is a member of a shadow set doesn't have to go through a full copy ? Or does the fact that it was a "dirty" dismount always force the full copy ?
The Brit
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,
When you say "2 Itaniums with their own disks.." are you referring to internal disks or SAN disks?

To press an earlier question! Are these two systems clustered??

If a system is subjected to an orderly shutdown then there should never be a need to perform a shadow copy at startup. At worst, there may be need to "rebuild" the volume (usually because when the system shutdown and dismounted the disks, there may have been files open, or installed, e.g. pagefiles on some disks).
If a cluster node "crashes", then when it reboots it will have to perform a "merge" operation. This could be a full merge (which can take a significant amount of time, and will normal affect io performance), or if you have it enabled, the system will perform a "mini-merge", which takes seconds, and is normally completed by the time the system is fully booted.

The volume shadowing qualifiers that you are refering to really are not part of this discussion (since volume shadowing pre-dates them by many years.)

Give us more information about your configuration.

1. Cluster configuration.
2. Storage information.
3. Blades or Rack

Dave.
Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Please read the following thread (to the end).

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1118643

If your shadow sets contain members that are only available to other cluster members via MSCP serving, shadowing will work, but there are many limitations. Shadowing works best when every node has direct (non-MSCP served) access to each member. If your two nodes don't have direct access to your members, then you won't be able to do what you want in the general case. Also, if you have only two nodes, and no shared storage, then you won't be able to use a quorum disk, so your cluster will stall if the loss of a node causes the cluster quorum to be lost.

Please cut and paste the output of the following into a text file (for example, a file created with notepad), save it to a .txt file, and attach the file to your next reply. The output will fill several pages, and saving it to a text file will make it much easier to read than trying to read the ITRC mangled output. (Please do NOT attach a WORD .doc file, or a WORDPAD .rtf file). Then we can determine a bit about your configuration:

Replace DSAxxx with one of the shadow set virtual unit devices that you for having problems with.

$ mcr sysman set environment/cluster
SYSMAN> do show device/full DSAxxx

Also, can you please show us the exact commands you are using to mount and dismount your disks?

Jon
it depends
Steve Reece_3
Trusted Contributor

Re: Volume Shadowing Copy After Reboot

"QUESTION: Is there a way to configure volume shadowing such that when a node in a cluster crashes and reboots, the disk on that node that is a member of a shadow set doesn't have to go through a full copy ? Or does the fact that it was a "dirty" dismount always force the full copy ?"



If the systems crash, I'd expect a shadow copy to result.
Cluster design is something that isn't really covered very well in all of the material that I've seen as far as disk storage is concerned. In general, I'd always suggest shared storage between cluster members (whether it be a shared SCSI bus or a shared disk array with SAN connectivity. A disk on the array can then be used as a quorum disk in the case of a two node cluster and no shadow copies occur once a node reboots after a crash.

I've been party to clusters formed with SmartArray storage and local SAN connected storage (when the client had insufficient funds to join two storage networks together cross-site) they're not something I'd ever like to do again. If one node crashes, reboots, the shadow copy starts but then the second node crashes, you're up to your armpits in disks that have partial shadow copies and partial updates and it's back to tape to recover the situation.

Steve
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

The following is a test scenario that I have set up in order to demonstrate the issue we are having with volume shadowing.

We have 2 rx2620's with 32mg internal drives running OpenVMS 8.3 that are clustered together in a 2 node cluster. For the purpose of this demonstration, one node is setup with a quorum disk so that it will remain "up" when the 2nd node is shutdown.

In the attached text file, you will see the drives $2$dka100 and $3$dka100 that are the shadow members.

You will then see the formation of the shadow disks ... starting with the initialization of the disks, setting the shadow policy and the mounting of the shadow set. The MOUNT command is issued on both systems.

After the shadow set is formed, you will see the output of the SHOW DEV/FULL DSA0 as well as the SHOW SHADOW command.

Then in preparation for a reboot of one node, the DISMOUNT command is issued and then the system is rebooted.

While the one node is rebooting, a change is made on DSA0 (a directory is created).

NOW ... when the rebooted node comes up and the shadow disk is mounted, it goes into a FULL COPY. It is my understanding that it should do either a mini-copy ....

What am I doing wrong ?



Bill Hall
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Change the startup procedure on each of your systems that mount DSA0 to:

$ mount/system dsa0:/shad=($2$dka100,$3$dka100)/policy=minicopy=optional users

Change the syshutdwn.com procedure to dismount the shadowset on the local system, don't dismount the local member of the shadowset:

$ dism DSA0:/policy=minicopy=optional

That should help. But as someone else mentioned, you really should invest in shared storage. Even a low end SCSI or SAS storage shelf will help tremendously. There is at least one StorageWorsk MSA shelf that should work for you.

Bill
Bill Hall
Zeni B. Schleter
Regular Advisor

Re: Volume Shadowing Copy After Reboot

I use Dismount/Policy=Minicopy=Optional. Look at the HELP on that subject and see if that is what you need.
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Bill:

Shutting down one node without dismounting the local disk causes the shadow set to go into a mount verification which renders it useless for quite a long time.

Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Zeni:

If you will look at the attachment in my earlier post, you will see that I am doing:

$ dism/policy=minicopy=optional

Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

Let's reduce the requirements for a test to the subset case of an orderly shutdown of the essentially non-voting cluster member. I will assume that RADI64 is the node that is going to be rebooted, and that its votes are not needed to maintain quorum. (Side note: In a 2 node cluster, if your quorum disk is not on shared storage, then there is no point in having one. Just set your required node's votes to 1, the other node's votes to 0, and set expected votes to 1 on both nodes)

See chapter 7 of the 7.3-2 Shadowing manual, section "Minicopy Restrictions". In the pdf version of the manual, this starts on page 117. On page 118, a bit more than halfway down the page, see the bullet

"If a node with one or more master bitmaps shuts down or crashes, the bitmaps on the node are deleted. Therefore, the shadow sets whose master bitmaps were deleted will not be able to use a minicopy operation. Instead, a full copy will be performed."

Clue #1: Make sure the master bitmap is on the primary node, not the node that will reboot.

See chapter 7, section "Master and Local Write Bitmaps". In the pdf version of the manual, this is on page 120.

"In an OpenVMS Cluster system, a master write bitmap is created on the node that issues the DISMOUNT or MOUNT command that creates the write bitmap. When a master write bitmap is created, a local write bitmap is automatically created on all other nodes in the cluster on which the shadow set is mounted, provided the nodes have sufficient memory."

Clue #2: Where the dismount must occur is on a node that will not reboot. In your case, that is OLDMOE.

Bitmaps created at mount time are to be used to add additional members later that were there at the time that the shadowset VU (DSAxxx) was dismounted. The time you would use that would be after a cluster shutdown, and the subsequent reboot of OLDMOE. This is not the case you are interested in when RADI64 is going to reboot.

When RADI64 is going to reboot, you want to remove the member that is being MSCP served by RADI64, and you want to do this dismount from the OLDMOE node, since you want the master bitmap to be on OLDMOE.

The thread "Mounting of HBVS disks in sylogicals.com fails on a node" discusses a technique I have used, starting with the comment dated Oct 29, 2007 15:55:00 GMT

http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1172934

Also, since you are running V8.3, you should investigate multiuse bitmaps, which allow the same write bitmaps to be used for both minicopy and minimerge. The documentation is sparse, and what is available is in the V8.3 new features manual. The best documentation I am aware of is the following: http://h71000.www7.hp.com/openvms/journal/v11/hbmm_amcvp_openvms_shadowing.html or in pdf http://h71000.www7.hp.com/openvms/journal/v11/HBMM_AMCVP_OpenVMS_shadowing.pdf

Jon
it depends
Steve Reece_3
Trusted Contributor

Re: Volume Shadowing Copy After Reboot

I've just flicked through the "Guidelines for OpenVMS Cluster Configurations" manual on the HP website and, specifically, the interconnects section on http://www.openvms.compaq.com/doc/82final/6318/6318pro_002.html#bottom_002

As I suspected, the SCSI interconnect hasn't been taken across to Integrity and the use of shared SCSI buses isn't supported. The cheapest solution to shared storage that's supported will be an MSA2000 I guess? Pretty rubbish and pretty expensive for a low-end cluster, but then the clustering licenses on Integrity aren't exactly cheap either I guess so YMMV.
The Brit
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Referring back to Bill Hall's last reply,

On the node which is shutting down, you should dismount the VOLUME, i.e. DSA0, not just the local disk. Dismounting the Volume, on the system which is shutting down, will automatically dismount the local and remote member disks **on THAT SYSTEM**, i.e. the "shutting down" system.

When the system reboots, it will remount the volume (using both units) and it should not require a copy since the Volume is already in a consistent state, (having been maintained that way by the system which stayed up).

Dave
Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Dave,

Please read Allan's attachment showing the configuration of the shadowset (posted Aug 31, 2009 13:10:45 GMT)

The problem is that dismounting the DSA0 VU on the system that is shutting down (RADIA64) does not cause the member device $3$DKA100:(RADIA64) to no longer be MSCP served by RADIA64 to the cluster, so the $3$DKA100: device remains in the shadowset and continues to be modified until RADIA64 completes the shutdown. At that time, $3$DKA100: goes into mount verify, shadowing software stalls all I/O to the DSA0 and after SHADOW_MBR_TMO expires, $3$DKA100: is ejected from the shadowset. If multiuse bitmaps are in effect, then the $3$DKA100: device will not have to go through a full copy when it is reintroduced into the shadowset, but the cost is that all I/O activity to the VU will be stalled for SHADOW_MBR_TMO seconds whenever RADIA64 shuts down.

Allan states that effect in his post from Aug 31, 2009 18:19:42 GMT. "Shutting down one node without dismounting the local disk causes the shadow set to go into a mount verification which renders it useless for quite a long time."

Jon
it depends
Jon Pinkley
Honored Contributor
Solution

Re: Volume Shadowing Copy After Reboot

Allan,

From your attachment:

--------------------------------------------------------------------------
~~~~~~~~~~~~~~~ On node RADI64 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ dism/cluster $3$dka100/policy=minicopy=optional
$ @SYS$SYSTEM:SHUTDOWN

(with automatic reboot)


~~~~~~~~~~~~~~~ After reboot on node RAD164 ~~~~~~~~~~~~~~~~~~~~~~~~~

$ mount/cluster dsa0:/shad=($2$dka100,$3$dka100)/policy=minicopy=optional users
%MOUNT-I-MOUNTED, USERS mounted on _DSA0:
%MOUNT-I-ISAMBR, _$2$DKA100: (OLDMOE) is a member of the shadow set
%MOUNT-I-SHDWNOMCPY, _$3$DKA100: (RADI64) added to the shadow set with a copy operation (unable to use minicopy)
--------------------------------------------------------------------------

If you want to see why minicopy is not being used, do the following interactively on RADI64 from a privileged account.

$ set prompt="RADI64$ "
RADI64$ mcr sysman set environment/cluster
SYSMAN> do show device/bitmap DSA0: ! this will show the minimerge bitmaps if they exist
SYSMAN> do show device DSA0: ! this should show 2 members
SYSMAN> exit
RADI64$ dismount $3$dka100:/policy=minicopy ! note: /cluster does nothing in this case, you should remove it.
RADI64$ mcr sysman set environment/cluster
SYSMAN> do show device/bitmap DSA0: ! others + minicopy bitmap with mastership on RADI64
SYSMAN> do show device DSA0: ! this should show only the member from OLDMOE
SYSMAN> exit
RADI64$ reply/enable=disk ! so you can see the shadowcopy
RADI64$ mount/system DSA0: /shadow=$3$dka100: users ! we don't need to specify /cluster or /policy here

This will result in the $3$dka100: member being added back into the shadowset with a minicopy

So why does it work here, but doesn't it work after RADIA64 reboots?

It works here because the minicopy mastering node is still up. However, the master bitmap is on RADIA64. And as the shadowing documentation states, when a node crashes or reboots, any write bitmaps that it is mastering are deleted. Once the bitmap is created, there is currently no way to move the bitmap master role to another node (that I am aware of).

However, you can control where the master bitmap is created.

Did you try my suggestion? For the people that don't want to follow links, here it is in a nutshell:

On the node that is shutting down use SYSMAN to dismount the member being served only by the node that is being shutdown

For your case, something like this:

Contents of exe_on_oldmoe.sysmanini
------------------------
set environment/node=oldmoe
set profile/privilege=log_io ! needed to create bitmaps
------------------------

Now on node RADI64

$ define/user sysmanini exe_on_oldmoe.sysmanini
$ mcr sysman do dismount $3$dka100:/policy=minicopy
$ @SYS$SYSTEM:SHUTDOWN

The advantage of this technique is that your DSA0 VU will not stall when the MSCP serving node stops.

The combination of this technique plus multiuse bitmaps (to handle the case of crashes) is about as close to what you are looking for as is (currently) possible with your configuration.

When all member devices can be accessed by all nodes without the node that is being shutdown, then these problems don't exist, since no members will need to be ejected from the shadowset.

If you are not going to have a quorum disk on a shared bus, you may as well put all your member devices on OLDMOE and MSCP serve them to RADIA64. Then RADIA64 can come and go as it pleases without affecting the state of the shadowset. When OLDMOE shuts down, it is going to stall the cluster anyway.

Jon
it depends
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Jon:

Your answer was dead on. The situation I described here was a test case. I had sole access to these systems where I could reboot at will and I wouldn't mess anything up.

I have already come to the same conclusion as you. I will post more on my final outcome later.

Thanks !!!
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

I have found somewhat of a solution ...

As Jon stated above, the issue is related to the location of the bitmap. I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node. Maybe this is something that be put on a wish list ;-)

Anyway, the solution we came up with was to add coding to the SYSHUTDWN.COM procedure to:

1) Make sure another node was currently up and running the SMISEVER.
2) Look for all disks that are currently shadow set members on the local node.
3) Make sure those disks are not currently in a COPY or MERGE state.

The procedure will then use SYSMAN SET/ENVIR=node to execute the DISMOUNT/POLICY=MINICOPY=OPTION disk on each shadow set member on the local node.

Assuming the remote node remains up, when the current system reboot is in progress, it will have the remote node re-MOUNT the disks. This allows the local shadow set member to be mounted using the bitmap created on the remote node from the dismount. This allows a mini-copy to be performed rather than a full copy.

If the node crashes, we are just out of luck. A full copy will always happen.

Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

The important thing is where the master bitmap is created, as the master has to exist on a node that will be around for the duration of the time the member is out of the shadowset.

It is not a requirement that the mount is done from the node that has the master bitmap. In fact, a node that isn't the bitmap master can do the copy.

You can verify this by dismounting the $3$DKA100: member on OLDMOE with /policy=minicopy, making some changes to the DSA0: VU and then on RADIA64, issuing the mount command to add $3$DKA100: back into DSA0:. The actual copy doesn't necessarily have to occur on the node where the master bitmap was, or the node where the mount command executed. See this thread: http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1176993

If a valid master bitmap exists in the cluster, then when the member is added back into the shadowset, it will use a minicopy; you don't even need to ask mount to use the bitmap. So your wish "I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node." is already true.

What isn't as obvious is that you have to do the dismount on a node that will be up for the duration of time that the member will be absent from the shadow set. Also, minicopy bitmaps created by a dismount are single mastership, so you have to choose where you what the master to be. If you know your are going to reboot, and you are in a two node cluster, then the bitmap should be created on the "other" node, but if you were removing a member for a static snapshot for backup, then it isn't as clear where it should be. In that case you would want to dismount the member that was being locally served by the node with the backup tape, and from a performance point of view, it may be better to have the master bitmap on the node that has the other member of the shadowset (that is still in the DSA VU), so in that case you may also want to have the master on the other node that isn't performing the backup. Better would be to have an additional disk drive available on the node with the backup tape drive and to dismount that drive for the backup. But if you are going to have three member shadowsets, you should have at least two on the "master" node that maintains quorum.

Your last paragraph "If the node crashes, we are just out of luck. A full copy will always happen." does not have to be true if you are at 8.3 or above.

Try this:

$ set shadow dsa0:/policy=hbmm ( -
_$ (master_list= (oldmoe), count=1, multiuse=1), -
_$ (master_list= (radia64), count=1, multiuse=1))
$ show device/bitmap/full dsa0:

Then force a crash on RADIA64. This will cause the DSA0 to stall for SHADOW_MBT_TMO then the $3$DKA100: being MSCP served by RADIA64 will be ejected from the DSA0 shadowset. AMCVP (Automatic MiniCopy on Volume Processing) will take effect, and the minimerge bitmap on OLDMOE will be converted to a multiuse bitmap (it acts like a minicopy bitmap, but it knows about all 127 block segments on DSA that have been written to since the last time it was zeroed, which means some time before RADIA64 crashed). It will probably have to copy more blocks than if the member would have been dismounted, but it will still be preferable to a full copy).

This is all discussed in http://h71000.www7.hp.com/openvms/journal/v11/HBMM_AMCVP_OpenVMS_shadowing.pdf which I referenced at the end of my comment posted Aug 31, 2009 19:59:10 GMT. The last page of the article has a state diagram showing the events that trigger the conversion of the HBMM bitmap to a multiuse bitmap and back to HBMM once the VU has completed the minicopy.

I haven't played with the multiuse feature, and we have SAN storage (directly visible to our nodes), so it isn't as beneficial to us as it would be for you. I would still force the member to be dismounted when you are in a normal shutdown, as you then don't have to experience the stall for SHADOW_MBT_TMO. But you should be able to simultaneously use both multiuse bitmaps and minicopy bitmaps created by explicitly dismounting the member that will no longer be MSCP served when the node reboots.

Jon
it depends
Allan Large
Frequent Advisor

Re: Volume Shadowing Copy After Reboot

Jon:

Thanks for the reply. I have indeed discovered what you said.

As for :

"If a valid master bitmap exists in the cluster, then when the member is added back into the shadowset, it will use a minicopy; you don't even need to ask mount to use the bitmap. So your wish "I had hoped that the cluster/volume shadowing software was smart enough so that when a cluster member rebooted, it could find the bitmap for the shadow set on another node." is already true."

What I meant was .... that I had hoped the node that was rebooting would be able to "see" the bitmap on the other node. This would allow it to mount its own disk(s) and the rebuild would happen on that node. In this scenario, the node must send a command to another node to do the mount.

It would just make life a little more simple.

Thanks again for taking the time to respond.

Bill Hall
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allan,

The real solution to this problem is adding inexpensive shared storage, assuming the two servers are within a few meters of other.the addition of about $6500 of hardware (rough list price). At the age of your rx2620s, used or HP refurbished should cut the cost in half. You'll then have a cluster that is more than a lab exercise and actually can provide a level of higher availability with probably one "single point" of failure, that being the four port I/O module of the MSA30.

One 359645-B21, MSA30MI Multi Initiator dual bus shelf, two A7131A, PCI-X Dual Channel U320 SCSI HBA, for the shared scsi buses. You would need a minimum of three universal drives to move the quorum disk and the shadowed USERS volume to the MSA30. Two more disks shadowed and you can move to a shared system/boot disk and closer to a real homogeneous cluster.

Bill

Bill Hall
Jon Pinkley
Honored Contributor

Re: Volume Shadowing Copy After Reboot

Allen,

I will try to make this as explicit as possible, since I was not clear enough:

The mount can occur on the newly booted node and it will find the bitmap that was created on OLDMOE. You don't need to get the SMISERVER involved to force the mount to happen on OLDMOE where the master bitmap was created. The mounting part really works like you want it to.

This should work and result in a minicopy:

~~~~~~~~~~~~~~~ On node RADI64 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ define/user sysmanini exe_on_oldmoe.sysmanini
$ mcr sysman do dismount $3$dka100:/policy=minicopy
$ @SYS$SYSTEM:SHUTDOWN

(with automatic reboot)


~~~~~~~~~~~~~~~ After reboot on node RAD164 ~~~~~~~~~~~~~~~~~~~~~~~~~

$ mount/system dsa0:/shad=($3$dka100) users

Once this mount happens (and the mount can happen anywhere in the cluster, including the just booted node), then a minicopy will start somewhere in the cluster (controlled SHADOW* system parameters, specifically SHADOW_MAX_COPY, and possibly SHADOW_SITE_ID). The $3$DKA100 will become a member of every nodes DSA0: VU; the use of /cluster is not relevant when adding a member to a DSA VU that is currently mounted somewhere in the cluster. If you have more that two nodes (for example if you had a third satellite node that was being served the shadowset members, then the use of /cluster would be significant, if the mount was instantiating the DSA0 VU, then the satellite would need to be notified to mount the new DSA0: Virtual Unit.

The minicopy write bitmap that gets created by dismount $3$dka100:/policy=minicopy is the only part where you have to force it to happen on a node other than the one that is shutting down.

Try it.

Jon

P.S. I have to agree with Bill Hall about getting some shared storage. It makes the systems much more robust and allows for rebooting either system without lengthy cluster stalls. See http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1018008 for some ideas.
it depends