HPE EVA Storage
1753877 Members
7447 Online
108809 Solutions
New Discussion юеВ

Re: Eva 4400 (XCS 09522000). DR group issues

 
Dja_1
Frequent Advisor

Eva 4400 (XCS 09522000). DR group issues

We have seen this problem now several times now since the 09522000 upgrade, with varying levels of disruption to a clustered file server.

What appears to happen is that after periods of high I/O involving any DR groups, the 2 x Sync DRGs attached to a file server cause the server to hang and all clients attemtpting to connect to shares fail to connect or the client desktop OS will hang. The workaround in each case has been to suspend and re-enable the Sync DRGs or to disable and reenable the FC switch ports that replication traffic taraverses. Behaviour returns to normal at that point. Xcs on the 4400 is 09522000.

I have opended a new HP case but wondering whether anyone else has experienced similar behaviour.
8 REPLIES 8
Del_3
Trusted Contributor

Re: Eva 4400 (XCS 09522000). DR group issues

Sounds a bit like a case of "host port blocking". What kind of interconnect do you have between sites?
Dja_1
Frequent Advisor

Re: Eva 4400 (XCS 09522000). DR group issues

FCIP gateways - 1Gbe for the ISL.... little other traffic traveres this link during the occasions we have seen this behaviour.
Del_3
Trusted Contributor

Re: Eva 4400 (XCS 09522000). DR group issues

Assuming the line has low latency a GigE ISL should be fine. Have you checked for line errors or tried suspending one DR group to see if the error persists? Is is it possible a bad sfp or related on your ISL?
Dja_1
Frequent Advisor

Re: Eva 4400 (XCS 09522000). DR group issues

We have looked at all the interconnects along the ISL and L2/3 issues, nothing apparent. I have seen the ppt you attached to another forum response and the host port blocking issue still looks possible with 4Gb FC Vs GbE ISL??

HP have been thru the CELs, FC switch logs, MPX110 logs and config and a substantial capture from the primary cluster node. No joy. We are now running a 12 hour evaperf capture starting tomorrow a.m. Unlikely though that we will be lucky enough to capture the "event".
Del_3
Trusted Contributor

Re: Eva 4400 (XCS 09522000). DR group issues

The MPX110 FC gateway has a bit of a history of host port blocking I am afraid. Good luck!
Dja_1
Frequent Advisor

Re: Eva 4400 (XCS 09522000). DR group issues

Thanks for the responses. Co-oincidentally just saw the issue again 5mins ago for a few mins. The only evidence I have right now is a higher than normal drive queue depth (Windows PM with eva counters)for the FATA DG at the remote array whilst it was all heppening. The destination vDisks for all DRGs are in the FATA DG. Normally I see the queue depth bounce along the bottom of the chart. I disabled, enabled the FC switch ports for the mpx110s a couple of times. This normally clears everthing. I continued however to see the high queue depth on the remote FATA DG and the DRG logs had difficulty clearing down. It wasn't until the 3rd disable, enable that the queue depth dropped away and the DRG logs merged. About to update HP with the latest saga.
Dja_1
Frequent Advisor

Re: Eva 4400 (XCS 09522000). DR group issues

Not long after my last my last post I upgraded the firmware on the 4 x mpx110's to 2.4.3.2 (2.4.4.1 broke the FC port!) and amazingly after 10 or so days of normal production all the issues seem to have been resolved.
Dja_1
Frequent Advisor

Re: Eva 4400 (XCS 09522000). DR group issues

Hi Dell

Well.. you were right. I am sure you were thinking that the FW upgrade to the Mpx110s was an unlikely resolution. We had a major reoccurrence yesterday. I am interested in the detail behind the host port blocking issue. For us it is a random (apparently) issue. Is the randomness generated by the movement of vDisks in normal conditions between controllers/ports and replication being more statically assigned to controllers/ports? i.e. vDisks and replication traffic randomly end up being managed by the same controllers/ports and hence we get the performance degradation? It can be weeks between these events and the arrays are used quite intensively.