HPE EVA Storage

I/O Latencies & Timeouts in our SAN Fabric with HP 8/80 switches (Brocade 5300) v6.4.0b

Occasional Advisor

I/O Latencies & Timeouts in our SAN Fabric with HP 8/80 switches (Brocade 5300) v6.4.0b

We have SAN Fabric configuration with HP 8/80 switches (Brocade 5300) running FOS v6.4.0b.
Each of our two floors (1 & 2), served by a Brocade 5300, floor's local hosts and storage, and these
two switches are connected by TWO 8 Gbps ISLs (trunked on port#0 and port#1). All hosts (mostly Windows) in our
environemnt access storage on either floors., so ISLs are essential in our design.


We never get close to thruput limitation on ISL links, but instead we end up getting latency bottlenecks.

For example we see high number of "tim_txcrd_z" and high values for C3 frames received (261498201) on ISL trunk port.


tim_rdy_pri                        1042        Time R_RDY high priority
tim_txcrd_z                        3508284216  Time TX Credit Zero (2.5Us ticks)
tim_rdy_pri                        1041        Time R_RDY high priority
tim_txcrd_z                        4259587257  Time TX Credit Zero (2.5Us ticks)


Obviously, we see I/O delays and/or timeouts accessing SAN storage (Two EVA 8400s - one on each floor).
EVA8400s doesn't appear to show any performance/latency bottleneck, so we imagine it must be
caused by SAN and/or ISLs.

Oct 24 2011 09:23:52 GMT   Warning  "Severe latency bottleneck detected at slot 0 port 0".  Switch  259976  1  AN-1010  FL1_BCD5300_SWF-FA 


When checking via Storage Essentials - we do not see thruput anywhere near/above 8% on ISLs.

By configuring additional ISLs on different port-group (ASIC), would it help to provide alternate route to the
SAN Traffic, and also provide additional BB-credits ?


Reading from documentation we understand - you can connect additional ISLs on different port-group? In such cases
those ISL won't simply be trunked, but still be used via DPS (Dynamic Path Selection) for ISL Traffic. Correct?


Any other ideas to chase this problem, or underlying issues?



Regular Visitor

Re: I/O Latencies & Timeouts in our SAN Fabric with HP 8/80 switches (Brocade 5300) v6.4.0b



First some basic question. On floor 1 you have one 5300 Switch, which is connected to another 5300 on the other floor 2 with two ISL's with trunking enabled (license needed), correct? So the EVA with all it ports and hosts are connected to one switch on every floor. Also CA running?





Occasional Advisor

Re: I/O Latencies & Timeouts in our SAN Fabric with HP 8/80 switches (Brocade 5300) v6.4.0b

All ports of a EVA connected to a single 5300 swich on that floor - correct. Only switches are connected via ISLs


Thanks for quick suggestions. We do not have CA, but might say something similar - HP SVSP (SAN Virtualization) - thru which we setup some mirroring on some volumes EVAs across floors (about10 TB of data-size, but updates/writes are moderate, not overly write intense). But from what I hear - we've had these I/O timeouts even before SVSP (and the mirroring across two EVAs) was put in place. Before this SVSP came into picture, the configuration was still mostly similar  - hosts on either floors still accessed data from either floors EVAs (understand this setup is not most optimal - but this is how we are today). We do not see the data-mirrors splitting due to these time-outs experienced by hosts. But, SVSP's setup disks are themselves mirrored across EVAs, occationally - these do get break and after a short while re-sync.


Most of the hosts are set @ 8 Gb (auto), while storage is on 4 Gbps (EVAs, SVSP - both can only do 4 Gbps) speed.


Distance between floors is may (length of long-haul cable) is 150 meters approx. We did not change BB Credits - I think its set at default value (26?) on both switches.


For example, SQL applications report errors: "

SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [G:\MSSQL.1\MSSQL\Data\Accolade_Data_1.mdf:MSSQL_DBCC117] in database [Accolade] (117).  The OS file handle is 0x0000000000000FFC.  The offset of the latest long I/O is: 0x00000297fe0000


And these translate to Perfmon latencies (not able to catch all events thru it, but do see occational longer latencies on theses hosts)




Johan Guldmyr
Honored Contributor

Re: I/O Latencies & Timeouts in our SAN Fabric with HP 8/80 switches (Brocade 5300) v6.4.0b

DPS - this depends a bit on if you have exchange based routing or port based. Exchange generally uses more bandwidth.
CLI command: aptpolicy

Are the 4G ports in the SAN fixed to 4G?

What about attenuation on the ports? (sfpshow)

Is it only the two error counters on the ports that are increasing?

150m ISL links, is that with multi-mode cables - is it OK to have that long without increasing buffers?