Re: Slow EVA 4400 DR group replication with MXP 110

Charlie llewellyn · ‎04-08-2010

We have two EVA 4400s at separate sites with a 1Gbps ISL dedicated to CA replication.

The logs are showing the following errors.

CA informational: MUL-EVA-SAN-201: Copy resources on the inter site link have been reduced. at Wed 7 Apr 2010 05:34:07 GMT+01:00

CA warning: MUL-EVA-SAN-201: Excessive Vdisk response time at the DR Destination has been detected. at Wed 7 Apr 2010 08:16:37 GMT+01:00

The data transfer then slows to an unacceptable level. The replication does eventually complete and the DR group finally merges.

Our network team have tested the ISL and say there are no latency issues. They have also completed a transfer of data speeds far exceeding those the EVA is throttling to.

Another issue we are experiencing is manually specify the DR group log. For example when we create a new DR group and specify a DR log size of half the vdisk size (380GB) the task fails with "attribute mismatch"; however, if we leave the default log size the DR group is successfully created. There is plenty of space available in the disk group.

Even when the traffic is limited I still have not seen the EVA utilise the DR log so do not understand the implication of having a smaller log. Is the log only used if the ISL fails?

Could anyone point me the right direction to try and resolve this issue if the ISL is not to blame?

EVA 4400 XCS: 09522000

Marcus Schack · ‎04-08-2010

Charlie, how far apart are the two sites? Do you have the proper SFPs in the ports for the ISL? How did you configure your fabrics for CA? Are the ISL ports set to a speed or did you allow AUOT NEGOTIATE? This may take a while, but we can start here.

Nader Qaid A. · ‎04-14-2010

Do you have any zoning in your SAN?
Do your users face any performance downgrade when your replicating while they are working?
did you try to test the QOS of your connection?
when you transfer a normal 1gb file can you manage to send me the time takes it to transfer.

main while try to off the DR EVA controllers after you stop the log transfer then On the controller and resume the DR log transfer.

Enterprise Servers and Storages Engineer

Charlie llewellyn · ‎04-16-2010

Hi Marcus and Narder thank you for the replies.
@Marcus
The distance between sites is 120 miles.
We have two sites with two switches at each site which are connected to the SAN and ESX hosts. This forms fabric A and B. We then have two MPX units which hang off one port on each switch at each site. These create the resilient paths to the ISL. An in-depth configuration review of the FC and FCIP switches has now been undertaken by an HP level 2 engineer and the only recommendation is to alter the minimum windows scaling. This should be updated by COP.
We do have proper SFPs.
The link speed is statically set to 1Gbps. Additionally we have limited the bandwidth on the MPX switches to 512 to see if that helped but it made no difference.

@Narder
We do have SAN zoning.
When we replicate the LUN we do not have any servers running on the LUN initially so no users have noticed degraded performance.
We do not have any QOS on the link however we are monitoring the bandwidth and it is nowhere near saturation, usually only around 40MBits/s.
As for link stats our network team has provided me with this:
RTT 4ms (average over 24 hours) max 84ms, packet loss < 0.0001%
I am sorry I do not understand the last point about resetting the controllers.

Marcus Schack · ‎04-16-2010

Charlie, do you have CA properly zoned as per HP Best Practice? Does CA have its own controller port to work over? Let me know, I have a few docs that might help. Some are really large so I can post them here. So you might need to send me and email address that I can send them to if you want them.

Marcus Schack · ‎04-16-2010

Found another doc that might help.

Nader Qaid A. · ‎04-16-2010

if you have the MPX110 please upgrade the firmware to 2.4.3.2 it fixes dome of the link issues.
your replication is sync or Async?

Enterprise Servers and Storages Engineer

Charlie llewellyn · ‎04-19-2010

Thank you once again for your responses.

Marcus, a level 2 and 3 engineer have reviewed the zoning and have signed it off as being setup correctly so as far as we are aware all best practices have been followed.

CA Does not have its own controller port, as it is a 4400 we only have two ports per controller and us such it is shared with the ESX hosts.

Narder, unfortunately we have already updated the firmware to 2.4.4.1.

Both, we have just tried updating the windows scaling to 2 on all MPXs but the problem is still there. I welcome any further ideas.

Thanks Charlie

Heather O'Neil · ‎11-01-2010

Charlie,

Did you ever find a solution to your problem? We've been experiencing the same issue you have for quite a while. Our controllers are running XCS 09534000. Aside, from the error you're getting, we also get an excessive ping error in the controller logs.

Thanks,
-Heather

Nader Qaid A. · ‎11-01-2010

new firmware for controlleres fixes this issue. and it did with my customer.
advice you to buy a firmware upgrade service and do it.

Nader

Enterprise Servers and Storages Engineer

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Slow EVA 4400 DR group replication with MXP 110

Slow EVA 4400 DR group replication with MXP 110