Re: RRPP over bridge aggregation - big loop on my network

vincentmunier · ‎02-13-2015

Hi,

I would like to know if RRPP over bridge aggregation is supported? In my case, i have a ring between several switchs (HP10500) using RRPP.

One of the RRPP link is a bridge aggregation (2 ports 1 Gpbs). Last week, one of the link has failed and caused a loop on my network. I suspect this was due to a problem of design. In fact, i think the Hello RRPP packet could be loss (because of the failure of one of the 2 link of the bridge aggregation), so the blocked port of master node transit in forward state, but as the other link if operational, a loop is created. Is it right?

Thanks for your advices.

Vincent

Sietze Reitsma · ‎02-15-2015

Bridge Aggregation is supported in RRPP. One link down in a BAGG should not impact the ring.

Although there is a writeup regarding disable STP on both BAGG and underlaying ports.

http://h20565.www2.hp.com/hpsc/doc/public/display?sp4ts.oid=4218344&docId=emr_na-c03056623&docLocale=en_US

Also check if you have DLDP running on you links.

vincentmunier · ‎03-04-2015

Hi,

Thanks for your answer. I had open a case and HP said me the same thing, but i had to investigate why it does not wok in my environnement.

First, i can confirm that with a lab, it's working well. When i disconnect one of the aggregation link, if it was the link were the RRPP Hello packet was send, the LCAP mecanism immediately detects the failure and the next RRPP Hello packet is send by the second link. So the RRPP master always see the Ring as "complete" and maintains his secondary port "blocked"

But in my case there is a little difference: the 2 links in my aggregation are not directly connected on my HP switches. The HP switches are connected to " transparents switches" (operator equipement) with a copper link. (in fact, it's like a tranceiver). The diifférence is important because, when one of the WAN link failed, the copper local connection remains UP. So the LACP protocol doesn't detect immediately the link failure, and i think the RRPP hello packet is still send on the same port by the switch RRPP transit. Then the Master RRPP no longer receives RRPP hello packets and consider the ring status as "failed". The secondary port changes to "UP": a loop is formed because the secondary link of the aggregate is still working well....

After a long time (more than 1 minute), LACP timers are expired, the transit RRPP switch now forwards RRPP hello packets by the second link. The master RRPP switch now see the Ring as "complete" as expected.

Conclusion: in this case, we can't say that RRPP is compliant with Link Aggregation. A failure of i backup link can cause a loop during more than 1 minute. I would like to discuss about it directly with HP....

Peter_Debruyne · ‎03-10-2015

Hi,

LACP is using by default 30sec timer (which also acts as LACP hello), with a dead count of 3, this means up to 90 seconds of downtime before the link will be removed from the link-agg.

You will need a faster control plane protocol to perform a logical heart-beat between the 2 switches. Make sure the RRPP timer is set to a higher value as whatever option you select below, to ensure you do not get false detection of ring failures.

2 options here for comware5:

1/ make lacp link-aggregation, set the lacp short timer on the interface. This will reduce the LACP hello from 30 seconds to 1 second, to after 3 seconds the interface will transition to unselected state. RRPP hello+dead should be configured to be above 3 seconds.

2/ use ethernet oam, which allows for faster detection. OAM will use a keepalive packet to check remote switch reachability. Timers can be configured at global level. When timer expires, the link will be reported as DOWN (even when physical UP), so the link-agg application will immediately place the interface in unselected state, ensuring faster failover.

Example (must be done on both sides of the link)

[SW]int g1/0/1
[SW-GigabitEthernet1/0/1]oam enable

# review default timers

[SW]dis oam configuration
Configuration of the link event window/threshold :
--------------------------------------------------------------------------------
Errored-symbol Event period(in seconds)           :     1
Errored-symbol Event threshold                    :     1
Errored-frame Event period(in seconds)            :     1
Errored-frame Event threshold                     :     1
Errored-frame-period Event period(in ms)          :     1000
Errored-frame-period Event threshold              :     1
Errored-frame-seconds Event period(in seconds)    :     60
Errored-frame-seconds Event threshold             :     1

Configuration of the timer :
--------------------------------------------------------------------------------
Hello timer(in ms)                                :     1000
Keepalive timer(in ms)                            :     5000

[SW]

# change timers (example from a 5120, possible ranges vary per platform, use ? to see valid ranges)

[SW]oam timer hello 500
[SW]oam timer keepalive 1000

Hope this helps,

Best regards,Peter

vincentmunier · ‎03-20-2015

Hi Peter,

Thanks a lot for your help. I didn't know the oam fonctionnality. I have tried it and it's working as expected.

In my case, it is more efficient than using short lacp timers because this the minimum oam timers, the link detection state occurs in 1s (so better compared to the 3s of lacp fast detection).

Vincent

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: RRPP over bridge aggregation - big loop on my network

RRPP over bridge aggregation - big loop on my network