- Integrated Systems
- About Us
- Integrated Systems
- About Us
11-16-2011 06:58 AM - edited 11-16-2011 06:59 AM
Mismatch deskew (cable length) causes I/O Latencies & Timeouts in our SAN 5300s v6.4.0b ?
We experience I/O Latencies & Timeouts in our SAN Fabric two Brocade 5300s - v6.4.0b - connected via Trunked ISL (ports#0 and port#1 @ 8 Gbps)
If we match the cable length to get matching deskew numbers, do we expect to get less I/O latency errors?
Alternately, is going to 4 Gbps is a better method to acheive balanced thruput between ISL-ports in this trunk.
Our core-SAN switches ISL-ports it appears there is a descrepency in cable length for the fiber used to form ISL-Trunk-Group between core switches (connecting 3rd and 4th floor core switches). Our goal is to make sure that deskew numbers do match-up on both ISL-ports in this trunk-group.
FL4R11-SWF-FA: > trunkshow –perf
1: 0-> 0 10:00:00:05:1e:ee:cf:7e 2 deskew 28 MASTER -> 13 unit difference compared to next ISL in Trunk
1-> 1 10:00:00:05:1e:ee:cf:7e 2 deskew 15
Tx: Bandwidth 16.00Gbps, Throughput 231.11Mbps (1.68%)
Rx: Bandwidth 16.00Gbps, Throughput 210.08Mbps (1.53%)
Tx+Rx: Bandwidth 32.00Gbps, Throughput 441.19Mbps (1.61%)
Run portperfshow to show current ISL/trunk thruput on switches
FL4R11-SWF-FA: > portperfshow 0-1 -t 10
0 1 Total
1.6m 55.6m 57.3m -> Notice the significant difference in thruput between port#0 and port#1 of this Trunk
1.8m 25.0m 26.8m -> Same as above
1.7m 39.3m 41.0m -> Same as above
4.7m 44.7m 49.4m -> Same as above
Each of our two floors (3 & 4), served by a Brocade 5300, floor's local hosts and storage, and these
two switches are connected by TWO 8 Gbps ISLs (trunked on port#0 and port#1). All hosts (mostly Windows) in our environment access storage on either floors., so ISLs are essential in our design.
We never get close to thruput limitation on ISL links, but instead we end up getting latency bottlenecks.
For example we see high number of "tim_txcrd_z" and high values for C3 frames received (261498201) on ISL trunk port.
tim_rdy_pri 1042 Time R_RDY high priority
tim_txcrd_z 3508284216 Time TX Credit Zero (2.5Us ticks)
tim_rdy_pri 1041 Time R_RDY high priority
tim_txcrd_z 4259587257 Time TX Credit Zero (2.5Us ticks)
Obviously, we see I/O delays and/or timeouts accessing SAN storage (Two EVA 8400s - one on each floor). EVA8400s doesn't appear to show any performance/latency bottleneck, so we imagine it must be caused by SAN and/or ISLs.
Oct 24 2011 09:23:52 GMT Warning "Severe latency bottleneck detected at slot 0 port 0". Switch 259976 1 AN-1010 FL1_BCD5300_SWF-FA
ISL Fiber length is about 100-150 m - single-mode (long-distance) fiber, SFPs (long-haul) used.
11-17-2011 12:50 AM
Re: Mismatch deskew (cable length) causes I/O Latencies & Timeouts in our SAN 5300s v6.4.0b ?
If it's port-based that could explain why one ISL is used a lot more.
Are these tim_txcrd_z increasing too? A lot?
Could also be that you have a slow drain device.
Brocade BCFA training material says:
* Deskew is the distance and link quality.
* That 2m difference is approximately 1 deskew.
* ~30m could cause performance degradation.
* The ISL with the least latency is set to 15 deskew.