HPE EVA Storage

FCIP issues with SAN replication

Chandler Bing FA
New Member

FCIP issues with SAN replication

Greetings all,

My company is attempting to perform replication from one HP EVA SAN array to another HP EVA SAN array across the WAN. We have a metro Ethernet connection between the two with one Gigabit of shared bandwidth. We share the bandwidth with our other business units, with no QoS in place, but we have been told that the pipe has never been completely saturated, and we’re not rate limited. The SAN arrays are on 4Gbps fiber channel brocade switches. There are two devices called MPX110’s that send the data from fiber channel to Ethernet. Each MPX has redundancy groups they perform replication for, and although they have two Ethernet and two fiber channel ports on each, we only use one on each. Each MPX110 has a path they perform replication for to their counter parts on the other side. It is my understanding they negotiate a tunnel between them, Fiber Channel over IP. They’re each on their own 6509 which have a uplinks to a 3750 and that goes across the metro Ethernet to a 3560 on the other side, then up to a 3560 acting as the core and out to two 3560’s with an MPX on each one.

Now the problem, although we have one gigabit of bandwidth, they’ll only use about 13Mbps of it each, we’ve verified this with iperf. Each connection we’ll only take 13Mbps of bandwidth, parallel tests show each connection gets 13Mbps of bandwidth. The HP engineer told us that at >5Mbps we get approximately 1.3Mbps of actually data, which means that FCIP has 80% over head? Can that be right? The big huge problem is that after running for several hours they’ll eventually just die and have to rebooted to start replicating again. They’re already on the latest firmware ( The only error we get from the statistic screen of the MPX’s says they’re getting TCP timeouts.

I’ve performed captures on both sides’ MPXs’ and the errors I see in a 60 sec sample are FCP malformed packets (~4300), duplicate ACK’s (~41), previous segment lost (~3), fast retransmission (~3). When HP was questioned about the FCP malformed packets they stated that they use a proprietary protocol and that wireshark wouldn’t be able to decode it. I’ve since searched for this protocol but can find no references to it anywhere. The other errors seem so minor and few it would be hard to believe that they’re impacting the data stream that much if at all.

I’ll include a small sample of the captures, if it lets me.

Thanks in advance for your assistance.

Chandler Bing
Patrick Terlisten
Honored Contributor

Re: FCIP issues with SAN replication

Hello Chandler,

have you setup you switches right for FCIP and IP Distance gateway?

If you have an HP B-Series/ Brocade FC-Switch do this:

aptpolicy 1
portCfgISLMode (Set for all FCIP switch Ports)

Best regards,
Chandler Bing FA
New Member

Re: FCIP issues with SAN replication

Thanks for your help, I passed that on to our SAN engineer to implement. Can you explain a little what it does and what problem it's specifically addressing?

Now the update:

I discovered if I disable the FCP decode, Wireshark does decode it correctly as FCIP.

We applied a QoS config to flag SAN replication traffic as DSCP EF and have seen consistent ping times of ~36ms between sites and the bandwidth climb as high as 45Mbps on a 1Gbps link. They still fail after replicating for a few hours. Last time we watched them replicate for 12 hours and then fail. The TCP timer exceed counter seems to indicate that is the problem, but I have nothing significant on the wireshark captures to support this.

HP has decided that the MPX110 on the far side needs to be replaced. I'll post an update after that's done.