Re: Disaster tolerance

Heinz W Genhart · ‎01-08-2008

Hi Community

I have a customer which builds up a second datacenter. The distance between the two datacenters is 100 Km. They built a layer 3 (not layer 2) network.
Our developers worked out a concept for an existing application. They don't extend the existing cluster over those both sites (one reason is the layer 3 network)

On site A) we do have a cluster with two GS1280 and an XP24000 San storage.
On site B) we do have a single node GS1280 cluster with also an XP24000 storage, which is the read-only system for the Application (replication is done on RdB level).

On site A) we are using triple member shadowsets. Two of the disks are located within the san storage at site A) and one member is located within the san storage at site B). (Example: DSA0 contains $1$dka0, $1$dka1 and $1$dka100. dka0 and dka1 at site A), dka100 at site B))

In case of a disaster, lets assume, that site A) is gone, the plan is to mount the Shadowset member $1$dka100 on the read only system at site B) and to start the application on the read-only system, which then becomes the master application system (without a read-only system).

This situation is extremely dangerous, because it is possible to mount $1$dka100 on the cluster at site A) and also on the cluster at site B). In case somebody mounts this disk uncontrolled on both sites, data loss will occur. (this could also happen, when the cluster on site B) is the active system and site A) comes back after a disaster and somebody just boots the cluster on site A). Then the systems on site A) will mount the disks during startup…. and in this case DKA100 would be mounted on two separate clusters!

Does somebody has an idea how we can solve this problem (unfortunately cluster extending over the two sites is no discussion point) ?
CA on the XP’s is also no discussion point. This because we could measure 2800 I/O’s on a triple Shadowset and we measured only 800 I/O’s on a two member Shadowset with CA on the XP to replicate the data to the 3rd member.

We could solve it with the zoning on the XP’s, but in case of a disaster I don’t trust the reliability of the San team, this because in case of a disaster they will have other problems too. We have 100 OpenVMS Systems, 400 Sun Solaris systems, 300 Windows Server, 150 Linux Servers and 90 Tru64 Systems and in case of a disaster all those systems may be involved.

Thanks in advance

Geni

Robert Gezelter · ‎01-08-2008

Geni,

This deserves the usual disclaimer, that this is a deeper question than is safe to give a quick answer.

My first flash is to use a file on this volume as an interlock of to prevent the situation. I would also consider implementing an interlocking scheme at a Level 3 to ensure that the problem situation (both sites coming up as Master).

I cannot advise enough caution in this regard. I would recommend that in-depth review and careful planning of this contingency is an excellent idea, I would also recommend retaining consulting expertise to review this situation in-depth (Disclaimer: My firm provides services in this area).

- Bob Gezelter, http://www.rlgsc.com

Wim Van den Wyngaert · ‎01-08-2008

geni,

Don't know for XP but on our HSG80 you can use "set unit x ?" to see the unit options.

In the text is enable_access_path and disable_access_path. With it you can specify who may use the unit. We use it to seperate the 2 clusters on 1 san.

Fwiw

Wim

Wim

Colin Butcher · ‎01-08-2008

Hello Geni,

Do you mean "DK" devices as SAN devices? They're usually $1$DGAnnnn.

You've got direct path devices with FC access to the array controllers (systems connected to SANswitches, then disc array controller pairs also connected to the SAN switches). You can control which HBA sees what devices by both using SAN zoning in the SANswitches and by changing the presentation in the disc array controllers.

If you're using long-distance data replication (with HBVS) but without long-distance clustering you might want to consider using something like DECdfs access to the remote site through the cluster at the remote site. Move the replication issue up a level and make it more of an "application writes the data to files" rathe than trying to copy the entire disc content. That would get you away from the risk of multiple systems / clusters mounting the same disc without co-ordinated access to the file system by the lock manager.

Another question worth considering is just how synchronous you need the remote copy to be (I assume you'd like it completely synchronous).

It might just be worth your while using direct path layer 2 long-distance networking and running a split-site cluster. It would get you out of quite a few problems and would reduce the risk of data corruption, but at a cost.

One of the most important things is to design it up front with unique device naming and suitable protection mechanisms. You may not need to mount all discs cluster wide, even system discs. You might choose to have some truly cluster-common discs and then per-site discs mounted locally within the site, yet being copied to the remote site by HBVS (host based volume shadowing).

I think that given the potential scale of the problem you would be well advised to get some help to discuss and think through all the possible failure scenarios and recovery strategies. You usually can't solve every problem, but you can usually minimise the risks and lock things down as best you can.

The biggest thing is to avoid automating too much decision making - by all means automate the actions using DCL and other scripts once you've decided what to do in terms of failing over between sites, but have well tested and documented processes and procedures to be followed very carefully - and make sure you have sufficient monitoring in place as to know what state things are in and what's happening out there.

HP have such a group (the DTCS group co-incidentally also based here in Bristol in the UK) and there are a number of people who contribute here who would be able to offer similar assistance should you need it (myself included).

It's an interesting area to work in. I'm designing a similar configuration at the moment - and we're going with a split-site cluster and accepting that the cost of long-distance layer 2 networking to run split-site clusters is worth the result in terms of minimising the risk of data loss / corruption and ensuring synchronous data replication by means of HBVS.

You might find this useful, although it's not down at the level of detail you're probably hoping for now (it's from the talk I gave at the OpenVMS TUD held in Utrecht back in October 2007): http://www.downloads.xdelta.co.uk/hp_nl_vmstud_2007/2007_10_01-nl_vmstud_hadt-colin_butcher-v1_0.pdf

Cheers, Colin (http://www.xdelta.co.uk).

Entia non sunt multiplicanda praeter necessitatem (Occam's razor).

Martin Hughes · ‎01-08-2008

I agree with Colin when he says to avoid unnecessary automation. You need to ask yourself what you gain by automating mounting disks etc during the startup sequence. Systems of this magnitude are typically 24x7 environments, where a system administrator will be available in any circumstance where a node is booted. So if you are going to be sitting in front of your laptop anyway, what do you gain by giving up control of when and how your disks are mounted by automating the process in system startup procedures?.

For the fashion of Minas Tirith was such that it was built on seven levels, each delved into a hill, and about each was set a wall, and in each wall was a gate. (J.R.R. Tolkien). Quote stolen from VAX/VMS IDSM 5.2

John Gillings · ‎01-09-2008

Geni,

If you're serious enough about disaster tolerance to build two sites and populate them with systems as large as GS1280s (and all your other stuff), why would you forumulate policies for recovery by asking questions in a web forum?

PLEASE go to the real experts. Yes, you'll have to PAY for the advice but then you've had to pay for insurance policies for things like floods and fires, right?

HP has a consulting service called DTCS "Disaster Tolerant Cluster Services". They will send people who deal with these kinds of issues every day to your site. They will examine your infrastructure, and discuss what kinds of disasters and failures you want to consider and produce a site specific document which clearly lists the steps required to recover from each failure scenario. The DTCS guys will consider things that you never thought of.

The readership of this forum is good, BUT "free advice is worth every cent" is NOT a prudent way to run your business critical systems!

A crucible of informative mistakes

Colin Butcher · ‎01-12-2008

On re-reading the beginning of the request and by looking at Geni's profile I think it's GFR Software Solutions AG's end customer who needs the overall advice and strategy assistance, which of course also needs to involve Geni and his colleagues. This kind of work is usually quite complex and has to be a collaborative exercise so that nothing gets missed - including all the stuff that other people with different experiences will see.

Cheers, Colin (http://www.xdelta.co.uk).

Entia non sunt multiplicanda praeter necessitatem (Occam's razor).

Heinz W Genhart · ‎01-14-2008

Hi community

Colin is right: It's a end customer who has those plans.

We have on site the local OpenVMS Ambassador and we had consulting by HP.
But the project leaders just ignore the recommondation coming from there.
They built a level 3 Network and they will not change this.

To John: What I'm looking for, are arguments to convince the management, that they overthink their decission. I think ITRC is the place to discuss also things like this, because it reaches many experienced OpenVMS specialists.

All of your answers was helpful and they approve our considerations. We know that this what we are going to do is very dangerous and must be planned very well.

So thanks a lot for your effort and in case that you are coming to the next bootacmp @Nashua, let's drink a beer together.

Best regards and thanks

Geni

Heinz W Genhart · ‎01-14-2008

Hi Bob, Wim, Colin, Martin and John

I think, I close this thread. We can do further discussions bilateral. I'm rechable by geni@hgenhart.ch

Best regards

Geni

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Disaster tolerance

Disaster tolerance