HPE EVA Storage

Mismatch deskew (cable length) causes I/O Latencies & Timeouts in our SAN 5300s v6.4.0b ?

 
DCESstorage
Occasional Advisor

Mismatch deskew (cable length) causes I/O Latencies & Timeouts in our SAN 5300s v6.4.0b ?

We experience I/O Latencies & Timeouts in our SAN Fabric two Brocade 5300s - v6.4.0b - connected via Trunked ISL (ports#0 and port#1 @ 8 Gbps)

 

If we match the cable length to get matching deskew numbers, do we expect to get less I/O latency errors?

Alternately, is going to 4 Gbps is a better method to acheive balanced thruput between ISL-ports in this trunk.

 

Summary:

Our core-SAN switches ISL-ports it appears there is a descrepency in cable length for the fiber used to form ISL-Trunk-Group between core switches (connecting 3rd and 4th floor core switches). Our goal is to make sure that deskew numbers do match-up on both ISL-ports in this trunk-group.

 

Analysis:

FL4R11-SWF-FA: >  trunkshow –perf

  1:  0->  0 10:00:00:05:1e:ee:cf:7e  2 deskew 28 MASTER -> 13 unit difference compared to next ISL in Trunk

       1->  1 10:00:00:05:1e:ee:cf:7e   2 deskew 15

    Tx: Bandwidth 16.00Gbps, Throughput 231.11Mbps (1.68%) 

    Rx: Bandwidth 16.00Gbps, Throughput 210.08Mbps (1.53%)  

    Tx+Rx: Bandwidth 32.00Gbps, Throughput 441.19Mbps (1.61%)

 

Run portperfshow  to show current ISL/trunk thruput on switches

FL4R11-SWF-FA: > portperfshow 0-1 -t 10

  0      1       Total

========================

   1.6m  55.6m  57.3m   -> Notice the significant difference in thruput between port#0 and port#1 of this Trunk

   1.8m  25.0m  26.8m  -> Same as above

   1.7m  39.3m  41.0m  -> Same as above

   4.7m  44.7m  49.4m  -> Same as above

 

 

SAN Configuration:

 

Each of our two floors (3 & 4), served by a Brocade 5300, floor's local hosts and storage, and these
two switches are connected by TWO 8 Gbps ISLs (trunked on port#0 and port#1). All hosts (mostly Windows) in our environment access storage on either floors., so ISLs are essential in our design.

 

We never get close to thruput limitation on ISL links, but instead we end up getting latency bottlenecks.


For example we see high number of "tim_txcrd_z" and high values for C3 frames received (261498201) on ISL trunk port.


FL2_BCD5300
tim_rdy_pri                        1042        Time R_RDY high priority
tim_txcrd_z                        3508284216  Time TX Credit Zero (2.5Us ticks)

 

FL1_BCD5300
tim_rdy_pri                        1041        Time R_RDY high priority
tim_txcrd_z                        4259587257  Time TX Credit Zero (2.5Us ticks)

 

Obviously, we see I/O delays and/or timeouts accessing SAN storage (Two EVA 8400s - one on each floor). EVA8400s doesn't appear to show any performance/latency bottleneck, so we imagine it must be caused by SAN and/or ISLs.

 

Oct 24 2011 09:23:52 GMT   Warning  "Severe latency bottleneck detected at slot 0 port 0".  Switch  259976  1  AN-1010  FL1_BCD5300_SWF-FA

 

ISL Fiber length is about 100-150 m - single-mode (long-distance) fiber, SFPs (long-haul) used.

1 REPLY 1
Johan Guldmyr
Honored Contributor

Re: Mismatch deskew (cable length) causes I/O Latencies & Timeouts in our SAN 5300s v6.4.0b ?

Are you running exchange or port based routing?
If it's port-based that could explain why one ISL is used a lot more.
Are these tim_txcrd_z increasing too? A lot?
Could also be that you have a slow drain device.

Brocade BCFA training material says:
* Deskew is the distance and link quality.
* That 2m difference is approximately 1 deskew.
* ~30m could cause performance degradation.
* The ISL with the least latency is set to 15 deskew.