HPE EVA Storage
1752682 Members
5726 Online
108789 Solutions
New Discussion юеВ

Initial Copy & Fail Overs with DR Groups

 
ffej1979
New Member

Initial Copy & Fail Overs with DR Groups

I have been getting some conflicting information regarding how certain aspects of CA function and was hoping to clear them up here.

We have an EVA6400 in our primary site, and a EVA4400 in our backup site. Both are running 09501000 firmware. (we just got them a few months ago). CommandView is running 9.0.

We have a DS3 running between our sites and using 2 MPX110 devices to create the FCIP routes between sites. When setting up our DR Groups initially, I had them set to Async mode.

Now, I was initially told that when a DR group does its first copy-out, it does it in sync mode. That would definately explain the performance hit we took on our source EVA during the copy-out (our round trip delay is quite high, which we expect until we get new circuits, which is why I love Async). Once it was done the copy-out, and was writing to the log (async mode), the slowness was gone.

But now I am told that it can do the copy-out in async mode. Is that true? And in that case, where did our performance hit come from on our source EVA?

This leads to my other question about doing a failover in a DR Group.

My question with fail overs is why on earth does it need to do another full copy-out back to the source EVA after the failover is activated? Isn't the data on both the source and destination EVAs identical, or close enough if running Async mode? And why does it need to do it when running Sync mode as well, the data IS identical on both sides.

I have been told by once source at HP that it will always do a full copy when you fail over. I have been told by another source that it will only do a full copy if the data is inconsistent (such as data left in the log during a fail-over). When I tried a fail-over on a small vdisk, with the log empty, it still wanted to do another full copy-out. I even tried it in sync mode for giggles, still wanted to do the copy-out.

We have several 500GB vdisks that we are replicating to our backup EVA. Now, I want to do some failover testing so we can get the bugs worked out of the process of failovers. It took me days (even weeks) to finish replicating these vdisks to our backup site and not affect our users during business hours. I would really hate to just do a test and then have to replicate back (and plus our users having to go to our backup site for their data while the data is copied back to our source EVA.

Why blow away the source when it clearly has the proper data instead of writing to the log on the destination after failing over. Just swap the roles.

I know I can do snapshots/clones and present those for testing, but I think that is besides the point. I want to do a full out test, and having the erase the source just to recopy data it already had seems a little silly to me.


Thanks for any insights!

Jeff
5 REPLIES 5
SDrake_1
Advisor

Re: Initial Copy & Fail Overs with DR Groups

Jeff

I can see that you did not get any replies to this. Did you find out the answers to your questions yourself?

I am planning a DR test with a similar setup to you and i am trying to find out more information about the 'full copy after failover'.

I tested a 120GB DR group a couple of weeks ago and this initiated a full copy which took a few hours to complete and took me by surprise. I had assumed that the failover was like a switch.

Maybe during testing a failover you could unplug the FC/IP gateway and break the ISL to simulate a site failure?
ffej1979
New Member

Re: Initial Copy & Fail Overs with DR Groups

After talking to a number of HP reps, I finally got the answer. If you switch the DR Group to Synchronous Mode, you can fail over and fail back without a full copy. Then just switch back to Async when you are done.

I was skeptical at first, but tested it with a 5GB LUN, worked fine. Kind of wish that CommandView would tell you that when you fail over though.

The HP rep that finally told me the skinny on failover, sent me the "CA Implementation Guide" It has a TON of useful CA info, for me the document really tied the CA information I knew together. Its too large to attach, so I can email it to you if you would like.

Kind of wish that I had gotten that guide initially, would have saved a lot of headache.
V├нctor Cesp├│n
Honored Contributor

Re: Initial Copy & Fail Overs with DR Groups

CA implementation guide here:

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c01800459/c01800459.pdf

If you do a failover while in async mode, it's taken as an unplanned failover, as some of the data it's still on the log disk and has not been copied to the destination array.

You must switch to synchronous mode and let the log buffer empty. Then all the information on the source array will be on the destination array and you can reverse roles.

When performing a full copy, two DR I/Os are performed for each host I/O, to have the possibility of emptying the DR group log.

The latest firmware (095xxxxxx / 6220) have a fast normalization mode, when a vdisk is marked for full copy, only the blocks modified since the last full copy are transferred. Of course, when you create a DR group, there's nothing on the other side, so all the data must be copied.
SDrake_1
Advisor

Re: Initial Copy & Fail Overs with DR Groups

Thanks for responses

I think during my '120GB' test the write mode must've been left at asynchronous. I can't think of any other reason for a full copy to have been initiated.

Simon
Edorta S├бez
Advisor

Re: Initial Copy & Fail Overs with DR Groups

Hello,

I have been testing unplaned failover and I have 2 questions about this scenario:

1. I think that Asynchronous mode doesn't guarantee us that data will be consistent in the other site. What would happen if a Database was corrupted in the secondary site and a full copy was completed. I supose that this is the reason why the replication is suspended by default.

2. About fast normalization mode. I have been testing unplaned failover with a 30 GB vdisk. 10 minutes were needed to complete the process. I think that this is the same time of normal full copy. Is this a real improvement for this scenario?

Best Regards.