Switches, Hubs, and Modems
1748261 Members
3857 Online
108760 Solutions
New Discussion юеВ

Re: RX error rate on 3400cl 10G interfaces

 
Andr├й Beck
Honored Contributor

RX error rate on 3400cl 10G interfaces

Hi,

I've deployed three 3400cl switches which are connected as a triangle using their 10G module ports plugged with the 10G CX4 module and Infiniband cable. I'm seeing an abnormally high receive error rate on these 10G ports, averaging to an error rate of 0.4% which clearly seems too high. Additionally, the errors are not specified, they are neither FCS nor alignment, runts or giants, it's just the "Total RX errors" counter that is continuously bumping.

An example (roughly one hour after clearing the interface statistics, of course the interface was online all the time):

Status and Counters - Port Counters for port 50

Name : ISL to rtr-N1-2:26

Link Status : Up

Bytes Rx : 544,627,227 Bytes Tx : 1,292,842,677
Unicast Rx : 2,252,725 Unicast Tx : 2,321,017
Bcast/Mcast Rx : 79,368 Bcast/Mcast Tx : 22,431

FCS Rx : 0 Drops Rx : 4
Alignment Rx : 0 Collisions Tx : 0
Runts Rx : 0 Late Colln Tx : 0
Giants Rx : 0 Excessive Colln : 0
Total Rx Errors : 7906 Deferred Tx : 0

(potential hosing by unwanted line breaks courtesy the web forum).

I'm trying to track down so far unexplained occasional packet loss in that network, so this counter got me alert. Anyone having an idea what's going on here?

TIA,
Andre.
9 REPLIES 9
Ralph Bean_2
Trusted Contributor

Re: RX error rate on 3400cl 10G interfaces

Hello Andre -

I suggest that you Walk the SNMP MIB for Port 50. Most likely, you will then see a specific receive error indicated.

Ralph
Mike Satterfield
Occasional Advisor

Re: RX error rate on 3400cl 10G interfaces

Andre,

I have an open case with HP that include these symptoms. So far, swapping flex modules, optics and chassis on both sides have not resolved it. HP has been working diligently for a couple of weeks now to determine the problem. If anyone else has these symptoms, I recommend HP be contacted.

The problems have been visible in firmware versions 8.62 and 8.69
Andr├й Beck
Honored Contributor

Re: RX error rate on 3400cl 10G interfaces

Ralph:

Thanks for the SNMP hint. Neither Etherlike-MIB dot3stats nor RMON-MIB counters attribute these drops clearly to a failure reason, they just come up as RX Discards. This could, however, mean that they really are classic RX discards due to bursty traffic overrunning the input queue of the interface. Then again, in a sparsely traffic bearing 10G-10G-Setup with active RX+TX Flow Control, I would not expect to see a large number of input drops - for sure not in the promille range, ticking constantly upwards like the national debt counter. Should be more, hmm, bursty.

BTW, thanks to your hint, I found out that dot3stats has a counter for PAUSE frames which partially answers one of my earlier threads. Counting PAUSEs as Multicasts is then probably just an RMON MIB glitch.

Mike:

Yep, seems like we have the same problem. Any chance you could tell me the case# you got from HP? Now that we pre-qualified it that far, I'd really like to attach my case to yours so they know it's linked.

BTW, did you really experience *drops* or just the counter increasing? I'm still not sure whether there are indeed packets beeing discarded or whether it's just "counter fun" as every so often.

Thanks,
Andre.
Mike Satterfield
Occasional Advisor

Re: RX error rate on 3400cl 10G interfaces

Andre,

My apologies for taking so long to reply, I've been on vacation. It probably wasn't the best time for me to leave but, no one here would have paid my cancellation fees so...

I alerted HP support to your message the same day I replied. They (HP) were aware of an existing case in Europe. At that time, they were not able to link it to you. So, its not clear whether there are two or three similar cases. My case number is 3211106919.

I never got to the point of confirming drops. They as well as Rx errors increment though not at the same rate. Swapping hardware did not correct the situation.

I hope to know more this afternoon as my case was put on hold until my return. I'm waiting for a progress update.
Andr├й Beck
Honored Contributor

Re: RX error rate on 3400cl 10G interfaces

Re,

> My apologies for taking so long to reply,

No problem, that's a public forum after all, not slavery (and the whole last week I was doing customer hopping anyway) ;)

> I've been on vacation. It probably wasn't
> the best time for me to leave but, no one
> here would have paid my cancellation fees
> so...

;)

> I alerted HP support to your message the
> same day I replied. They (HP) were aware
> of an existing case in Europe. At that
> time, they were not able to link it to you.

I didn't actually open a case yet until some hours ago, that was after I think I've verified at least partially that my packet loss might be related to the 10G link. I routed a particularly noisy Insight Manager
client (sometimes multiple unreachables in one hour) around the 10G. Most interestingly, IM is silent about it since and extensive pinging didn't bring up any further lost packet. It is still routed on a 3400cl and still goes to/fro the 4100gl with the client over a Trunk, the only difference is that the other 3400cl and the 10G-link between them (which was another L2 hop before) is ruled out for these packets. I don't trust the results before 24h have passed, but it looks "promising".

> So, its not clear whether there are two
> or three similar cases. My case number
> is 3211106919.

Thanks, I'll try to link mine to it. I've also given them the URL to this thread so additional info like this will merge in smoothly.

> I never got to the point of confirming
> drops. They as well as Rx errors
> increment though not at the same rate.

"Drops RX"? Yeah, same thing here. Rate is approx. 1/10th of the "Total RX errors".

> Swapping hardware did not correct the
> situation.

Great, so I've got a reason to refuse useless device swapping orgies until there is reason to believe that it could actually help, i.e. a fix.

> I hope to know more this afternoon as my
> case was put on hold until my return. I'm
> waiting for a progress update.

Ok, let me hear about it. When I get my case details, I'll post here.

Thanks,
Andre.
Johan H├╢gberg
New Member

Re: RX error rate on 3400cl 10G interfaces

Hi!

We also have the same problems whith the 3400cl and 10g if, and hp support (europe)seems to stand clueless in this matter.

what to do?

// Johan


Andr├й Beck
Honored Contributor

Re: RX error rate on 3400cl 10G interfaces

Johan,

> We also have the same problems whith
> the 3400cl and 10g if, and hp support
> (europe)seems to stand clueless in this
> matter.

Argh. Which firmware level are you running at? Just so I know it isn't yet fixed up to that one. My try to open a call by EMail bounced back and I hate opening calls by phone *that* much (especially after having written multiple kilobytes of descriptive text and getting back just a "please call XYZ there-and-there") so it finally got lost. The counters are ever-increasing since then, but it isn't much of a problem thanks to the robustness of the upper layer protocols. I just hoped it will be fixed in a future version, but seemingly it isn't...
Johan H├╢gberg
New Member

Re: RX error rate on 3400cl 10G interfaces

We have just upgraded to M.10.06 and the problem is still there...
Andr├й Beck
Honored Contributor

Re: RX error rate on 3400cl 10G interfaces

Johan,

> We have just upgraded to M.10.06 and the
> problem is still there...

Strange. What I have meanwhile:

3400cl-24G@M.08.66: RX Errors increase
3400cl-48G@M.08.62: RX Errors increase
3400cl-48G@M.08.75: No RX Errors
3400cl-48G@M.10.06: No RX Errors
6400cl-6XG@M.08.102: No RX Errors

To me this looks like a fix present at least starting with M.08.75 and M.10.06, but to be sure I'll have to wait for one of the older boxes getting upgraded. But indeed neither my brand new 6400cl with a total of 8x10G nor my later installed two 3400cl-48G show any RX errors. If it's not software, it may be a hardware revision issue.

BTW, "Drops RX" simply bumps on every received frame on an STP blocked port, so it is no error indication of any sort, just noise.

Andre.