Email Subscription Notifications Suspended Temporarily
We are in the process of making navigation in the Servers and Operating Systems forums simpler and more direct. While doing this, we have to temporarily suspend email notifications for subscriptions. If you are subscribed to one or more discussion boards or blogs in the community, please check them daily to see new content. Notifications will be turned back on in a few days. We apologize for any inconvenience this may cause. Thanks, Warren_Admin
Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

PAUSE frame counting

André Beck
Honored Contributor

PAUSE frame counting

Hi,

in a recent lab test (involving 2626Bs, 2650Bs and 3400cls) I was surprised when, during a TCP throughput test, the port with the receiving end of the test showed up in PNM+ as severly overloaded with multicasts. Further investigation made clear that this was due to the test box's NIC driver (Linux b44), which issued rawly two PAUSE frames per received TCP segment. First I thought that counting PAUSEs as multicasts is an exceptionally silly idea, though they technically are - they won't flood (at least not through any sane device). But further thinking about it left me a bit ambivalent. Reasoning behind that:

Con#1: PAUSEs will not flood. Alarming on them the same way as on flooding multicasts is a false positive. A network admin seeing a >10k Mcast/s rate on his management station will probably go coronar before he finds out it is just a missinterpretation of PAUSEs.

But:

Pro#1: The enormous PAUSE rate is for sure a bug in the b44 driver. If something like this happens, a good network admin likes to be informed about it, so dropping information about PAUSEs is not good, either.

Pro#2: BPDUs bump this counter as well ;)

So I see a dilemma: I for sure don't like PAUSEs to be counted undifferentiated from "real" multicast, but then again, they should be counted somehow. While counting BPDUs (non-flooding multicasts as well) never gave me too bad a feeling (they are L2 after all), having the rather L1-ish PAUSEs drop on the same counter somehow feels incorrect. Ideally, there would be at least two counters, one for data plane multicasts (those which flood) ond one for control plane ones (like BPDUs and PAUSEs), even better PAUSEs had a counter completely on their own.

What do you think?
7 REPLIES
Jeff Brownell
Valued Contributor

Re: PAUSE frame counting

Andre,
please correct me if i have misinterpreted. Are you stating that PAUSE frames from link partner (linux b44 driver) are incrementing multicast Rx counters on the 2600/3400's as viewed with PCM+? What about the 'show interfaces ...' command? Is it the same thing? Are you certain the test you are running is isolated from actual multicast traffic?

Here at division there are some things in the works with respect to flow control and the 3400/6400 platforms. If you feel the behavior you have noted is accurate and needs to be modifed, please open a support call with as much detail as possible and reference this forums post. let me know the case id and i'll watch for it and assume ownership (if need be).
-Jeff
André Beck
Honored Contributor

Re: PAUSE frame counting

Hi Jeff,

> please correct me if i have misinterpreted.
> Are you stating that PAUSE frames from link
> partner (linux b44 driver) are incrementing
> multicast Rx counters on the 2600/3400's as
> viewed with PCM+?

Yes, exactly. In the given case, PCM+ shows a deep red multicast traffic bar and the dial is at the far right, denoting insane 14kpps of multicast.

> What about the 'show interfaces ...'
> command? Is it the same thing?

Yep, the counter "Bcast/Mcast Rx" goes a-racing upwards at an enormous rate, the same rate as displayed by PCM+.

> Are you certain the test you are running
> is isolated from actual multicast traffic?

Yes, at least from any such that happens at a significant rate. Of course STP and occasional ARP requests are still there, but they are a trickle compared to what happens when TCP traffic starts. Additionally:

- The counter stops racing and goes back to normal as soon as I stop the TCP load test.
- The counter never races even in case of a running TCP load test when I force PAUSE frame transmission to off on the linux b44 side using ethtool.

The problem is I can't easily sniff PAUSE frames due to their nature and have no access to hardware that could. IMO there is a certain chance the b44 driver actually generates *broken* PAUSE frames, some the switches don't recognize as such.

> Here at division there are some things in
> the works with respect to flow control and
> the 3400/6400 platforms. If you feel the
> behavior you have noted is accurate and
> needs to be modifed, please open a support
> call with as much detail as possible and
> reference this forums post. let me know
> the case id and i'll watch for it and
> assume ownership (if need be).

My posting here was a test balloon to see what others think or know about it. Now that I know the RMON MIB has a dedicated PAUSE counter, the question whether PAUSE frames should also sum up on the generic multicast counter is IMO clear - it shouldn't, in difference to any multicast that actually is infrastructure, but L2 related, like BPDUs or CDP. Then again, it is anyway a bug in b44 (have mailed the developers, but never got anything back), not that much in HP ProCurves, as normally noone would expect a PAUSE storm as in my case. So I didn't want to create a case unless there is real necessity for it. If you think my observations are sufficient to do so, I'll go and open one when time allows.

Thanks,
Andre.
Jeff Brownell
Valued Contributor

Re: PAUSE frame counting

Andre,
It boils down to bandwidth. If we have a case with a customer to pursue, then time could be set aside to investigate. Since you are not certain whether the b44 driver is sending "broken" or fragmented PAUSE frames or not (or maybe I mis-interprited your statement?), I would not be certain I was adderssing your question if I were to run with this issue via this forums post.

The questions and data to gather would be:

1) Are incrementing broadcast/multicast Rx
on "gargage" packets ("broken" PAUSE's)?
2) or are we incrementing
broadcast/multicast Rx on good PAUSE
frames?

Maybe increnment mcast Rx with both. And if so, then we need to find the developer and see if this is by design and why and then we need to run thruogh the gambit of getting the behavior modifed. So it is not a trivial amount of effort to modify this (assuming it is determined it needs to be modified) based on the (sound) logic you present below..

> Now that I know the RMON MIB has a
> dedicated PAUSE counter, the question
> whether PAUSE frames should also sum up
> on the generic multicast counter is IMO
> clear - it shouldn't, in difference to
> any multicast that actually is
> infrastructure, but L2 related, like
> BPDUs or CDP. Then again, it is anyway a
> bug in b44 (have mailed the developers,
> but never got anything back), not that
> much in HP ProCurves, as normally noone
> would expect a PAUSE storm as in my case.
> So I didn't want to create a case unless
> there is real necessity for it. If you
> think my observations are sufficient to
> do so, I'll go and open one when time
> allows.

Yes please open a case so that we can dedicate the time needed to address such an issue...
-Jeff
Jeff Brownell
Valued Contributor

Re: PAUSE frame counting

Andre,
I spent a little time today and educated myself as to the format of PAUSE frames (http://www.techfest.com/networking/lan/ethernet3.htm) and found that they can be either unicast or mcast. I am willing to bet that the link partner (linux b44 NIC driver) is sending to the globally assigned multicast address. Therefore for the switch to increment ifInMcast is what we should do.

If you want to take it a step further please verify that the PAUSE frames from the linux box are actually unicast before asking that we increment ifInUcast rather than ifInMcast. If the PAUSE frames are indeed unicast and we are incrementing the mcast rx counters, this is certainly something we would look to address.
-Jeff
André Beck
Honored Contributor

Re: PAUSE frame counting

Re Jeff,

> I spent a little time today and educated
> myself as to the format of PAUSE frames
> (http://www.techfest.com/networking/lan/ethernet3.htm)

This link just gives me a 404?

> and found that they can be either unicast
> or mcast.

That's strange. Up to now I was sure they are always mcast. AFAIK there is only one destination MAC they are supposed to be sent to as per 802.3-latest, and that is a special mcast address from the range reserved by IEEE for control plane traffic in the vicinity of 802.1D.

> I am willing to bet that the link partner
> (linux b44 NIC driver) is sending to the
> globally assigned multicast address.

Yep, I assumed that from start.

> Therefore for the switch to increment
> ifInMcast is what we should do.

Is it? This is the whole question I started this thread with. Of course it is theoretically Ok to count it as received mcast as, technically, it is. Then again, the use of an mcast MAC is just pure IEEE 802.3 pragmatism, but the frame is actually not a mcast frame in the real sense of the word, as with any non-broken hardware (such that negotiates flow control before actually sending PAUSE frames) these frames will *never* flood (I've once heard of a broken hardware that sent PAUSEs without negotiation, connected to switches which didn't know about PAUSEs and flood them, but they in turn connected to some that did react properly - consider the effect devastating). IMO that is the sound reason why they got an extra counter in RMON. They are fundamentally different compared to BPDUs or CDP, as they are an L1 mechanism that just needed some way of signaling and IEEE oddly choose a special frame for it instead of something out of band like, say, modulated fast link pulses.

I perfectly understand that some ASICs cannot make a difference as their mcast-counter might be just hardwired to the serialized first bit of the destination MAC beeing one.

It's just that I discovered - in the lab case of stumbling over a supposedly broken NIC driver - that an intense rate of PAUSE frames causes a certain counter to increment and that PCM+ makes this into a catastrophe which it really isn't - there is no multicast flooding my network at an insane rate like in a storm, it's just PAUSEs who get sinked at the receiving port as part of L1 mechanisms anyway. Maybe it was a bit compact, but I thought I already described this dilemma in my initial post ;)

> If you want to take it a step further
> please verify that the PAUSE frames from
> the linux box are actually unicast before
> asking that we increment ifInUcast rather
> than ifInMcast.

I have and had absolutely no doubt that they are multicast. I also don't want them to be counted on either ifInUcast or ifInMcast. I wanted to discuss whether they, beeing an L1 flow control mechanism rather than some normal multicast frame, should be counted on one of these counters at all or demand their own independend counter. Now that I know they have one in the RMON world, things get really interesting. IMO the decision what to do with these frames should be made by the MIB2 designers and the IEEE as only they know what they meant the IF-MIB and dot3 counters to be and whether existence of an RMON counter might modify this behavior (given that RMON is not a mandatory part of MIB2).

> If the PAUSE frames are indeed unicast
> and we are incrementing the mcast rx
> counters, this is certainly something we
> would look to address.

I never assumed something like that. I just have a bad feeling that PAUSEs might be misinterpreted as multicast data plane traffic and this causing wrong decisions by the network management staff. The question is what should be changed: The SNMP counters or PCM's behavior in counting multicast? Ideally it would just count *flooded* multicast, even better it would give a differentiated view of received vs. flooded mcast. If the status quo is not about to change (counters stay as they are), the easiest solution would be to let PCM subtract the PAUSEs (read from RMON) from the total received mcasts and display only the remainder as real mcast. This is mcast which is more likely going to flood and thus deserves special attention when coming in at 10kpps and beyond.

Finally, this is a design question to be answered, no hardware bug to be fixed IMO.

Thanks & hope that made things more clear,
Andre.
André Beck
Honored Contributor

Re: PAUSE frame counting

Re,

>> (http://www.techfest.com/networking/lan/ethernet3.htm)
>
> This link just gives me a 404?

Silly me, overlooked that closing paranthesis at the end. They say one can use the individual station MAC as DA and that makes sense. But here the question is the same in another color: Should these frames be counted as received unicasts? When both is possible, having a simple counter in the RMON MIB breaks my subtraction idea, you cannot just subtract PAUSEs from mcast or ucast counters to correct them as you don't know what they have been when bumping the counter.

Getting complicated...

Andre.
Jeff Brownell
Valued Contributor

Re: PAUSE frame counting

Andre,
yes, thx for the clarification.

> I have and had absolutely no doubt that
> they are multicast. I also don't want them
> to be counted on either ifInUcast or
> ifInMcast. I wanted to discuss whether
> they, beeing an L1 flow control mechanism
> rather than some normal multicast frame,
> should be counted on one of these counters
> at all or demand their own independend
> counter. Now that I know they have one in
> the RMON world, things get really
> interesting.

I tend to think of MAC control frames as part of L2 as opposed to L1 and consider the incrementing of the applicable Rx snmp counter as appropriate, but am certainly open to suggestions. And like you say, the chip itself may be the limiting factor in how to count (let alone recieve) the mac control (specifically the pause) frame.

> The question is what should be changed:
> The SNMP counters or PCM's behavior in
> counting multicast? Ideally it would just
> count *flooded* multicast, even better it
> would give a differentiated view of
> received vs. flooded mcast. If the status
> quo is not about to change (counters stay
> as they are), the easiest solution would
> be to let PCM subtract the PAUSEs (read
> from RMON) from the total received mcasts
> and display only the remainder as real
> mcast. This is mcast which is more likely
> going to flood and thus deserves special
> attention when coming in at 10kpps and
> beyond.

since our switches are hard set to store-and-forward, it would not be that important what DA the sender sends to; only how we react to the received pause frame (are we coded to look at the MAC to react? or the MAC control opcode? - I dont know off the top). This would be important in how PCM is coded to handle pause frames.

It may be a minor request to have PCM subtract the PAUSE frames from total mcast frames, but this certainly wont happen without an enhancment request. So if you are up for it, please submit the enhancement. I can not say one way or the other if it will be incorporated but it will certainly be evaluated.
-Jeff