Re: IRF Domain purpose and IRF in L2 stacks

pattap · ‎09-23-2020

Hi all

It's a two part question with regards to IRF.

We have multiple sets of IRF stacks, ranging from old 5800's to new 5900's and 5710's.

In our IRF's we always use mad bfd. Now based on documentation irf domain with mad bfd is not required as per below from 5700 IRF config guide:

If LACP MAD, ARP MAD, or ND MAD runs between two IRF fabrics, assign each fabric a
unique IRF domain ID. (For BFD MAD, this task is optional.)

I'm not able to find more details what exactly purpose is the domain ID serving and in what situations.

Second question is; is there a point of running any of the MAD mechanisms on layer two - access switches. Unlike with IRF L3 devices split brain would cause some issues on the network by having two gateways responding to clients queries. What is the benefit of having mad running on L2?

Ivan_B · ‎09-23-2020

Hi @pattap !

IRF domain ID is an identifier of a fabric. It is extremely useful in sutuations where you have multiple IRF fabrics that exchange MAD packets (either as TLVs in LACP PDUs, or ARP or ND) over common channels, so there is a situation when Fabric A may hear a MAD packet of Fabric B. In this situation Domain ID is that tag that will either instruct a fabric to process the packet (if Domain ID in the packet matches local ID) or ignore it (if it the packet contains Domain ID that do not match fabric's Domain ID).

Let me rephrase your question, because practically you ask "how a split-brain in a L2 stack may affect traffic flows?" While you are completely right about split-brain between L3 switches causes a complete havoc in the network segment, the split of L2 is not so severe. An issue that I can guess right now is a situation when fabric just splits and single-connected hosts that were reachable over IRF ports are not reachable anymore and traffic sent between them will be flooded over all ports in respective VLAN as a unknown unicast. This will last until hosts retain ARP cache and until MAC tables on both devices adapt to the new situation. If IRF was the only connection between hosts, the MAC table will never re-learn these addresses, but hosts will keep sending unicast and switches will keep flooding it according Ethernet logic. This will last until hosts give up and decide to re-send ARP. Then it will stop as ARP cache on hosts will expire and they will stop sending unicast traffic to unreachable hosts.

However, even if this sounds insignificant, keep in mind that the management IP of both devices will be still the same, even if your switches are all L2 - I hope they have at least one IP to manage them, right? And split stack members will retain the same IP, so practically you will have unstable and unpredictable access over SSH/Telnet/Web.

I am an HPE employee

pattap · ‎09-23-2020

Hello @Ivan_B

Thanks for the extensive response.

So by common channels you mean any of the MAD mechanisms, apart from mad bfd which doesn't require ID domain to be configured?

Reason for the question is that I'm trying to understand if there's a risk of things going wrong in the following scenario:

- two separate IRF stacks are connected to each other via bridge aggregation which is used to transport user traffic only - not used in any part of IRF configuration.

- both stacks have the same (default) domain ID

it would look the below, I hope it makes sense

Let's use the same topology for the L2 split brain situation. So imagine L2 stack 2 lost its IRF links - no MAD configured. I would hope that clients connected to both master and slave would still be able to communicate one with another via L3 IRF stack 1 - acting as a gateway. I agree with the management side of things where both switches would have the same management IP address but from client perspective there would be no service disruption? I'm just not certain how bridge aggregation would behave in this situation.

Ivan_B · ‎09-23-2020

Hi @pattap !

By common channels I mean physical links where MAD messages from different IRF domains are visible. Like BR1 in your example. Let's not reinvent the wheel and refer to the documentation, I think it explains it quite extensive:

IRF domain ID
One IRF fabric forms one IRF domain. IRF uses IRF domain IDs to uniquely identify IRF fabrics and
prevent IRF fabrics from interfering with one another. As shown in Figure 2, IRF fabric 1 contains Device A and Device B, and IRF fabric 2 contains Device C and Device D. Both fabrics use the LACP aggregate links between them for MAD. Then a member device receives an extended LACPDU for MAD, it checks the domain ID to see whether the packet is from the local IRF fabric. Then, the device can handle the packet correctly.

As we all know the IRF domain ID is required for all kinds of MAD except BFD MAD. I emphasize this - required, it's not an option, there is no room for "do I really need it?" in this case. You simply must have it if you follow the documentation. And this requirement is there for purpose - if both stacks use same domain id it means that in LACP, ARP and ND scenarios both stacks will exchange messages undistinguishable one from another, e.g. Stack1 will be thinking Stack2 MAD messages belong to Stack1 IRF. Here is what the guide says about detection of the multi-active situation:

MAD identifies each IRF fabric with a domain ID and an active ID (the member ID of the primary). If multiple active IDs are detected in a domain, MAD determines that an IRF collision or split has occurred.

And the outcome is very obvious - each stack will think there is a split brain, because there will be 2 primarys. And you will end up in a situation when one of your stacks will go down completely. Or stacks will be flapping fighting for the primary's role.

And why BFD MAD doesn't require IRF domain ID if it's so important? Very simple - BFD MAD uses dedicated "channels" for each IRF stack. By "channels" in this case I mean BFD sessions. And since BFD sessions are unique for each stack (uniqueness is achieved by using unique IP addresses) there is no scenario when BFD MAD messages from Stack1 will become visible to Stack2.

Now let's get back to a scenarion where there is no MAD configured in L2 stack: "Let's use the same topology for the L2 split brain situation. So imagine L2 stack 2 lost its IRF links - no MAD configured. I would hope that clients connected to both primary and secondary would still be able to communicate one with another via L3 IRF stack 1 - acting as a gateway"

Hmm, it's not that simple. When L2 stack splits technically BR1 is not a BAGG anymore - each L2 switch is separate device now and since you can't really have a LAGG between one device on one end and many devices on another end, LACP on L3 stack will bring one of the BR1's physical ports down. Thus one of your L2 switches will become isolated. If you don't have LACP on the BR1 it's even more dangerous - there will be a loop over the L3 stack inside VLAN 3 (and native VLAN if you have it allowed on the BR1). Practically you have three scenarios in case of L2 split brain:

1. Configure MAD in L2 and thus when IRF link goes down one of your L2 switches will become isolated in a controlled and predictable way.
2. Do not configure MAD. Use LACP on BR1. When IRF link goes down hope that LACP will engage correctly and disable one of L3's BR1 ports. You can control which port will go down by tweaking LACP port priorities. It's a less cotrolled way then the first with MAD and less desirable, as physically ports facing hosts on the L2 switch that will loose its uplink will stay up and they won't be able to realise that their link is not working anymore.
3. Do not configure MAD. Use static aggregation on BR1 (no LACP). When IRF link between L2 switches goes down, keep your phone nearby to call HPE support and raise Severity 1 (network down) or Severity 2 (critically degraded) ticket depending how lucky you are and how far the broadcast storm will propagate. You may say that having loop-detection and/or STP in the network should help to avoid the disaster and it is absolutely correct, but all consequences should be carefully assessed.

Summary: IRF domain ID is required for LACP, ARP and ND MAD. You need to use it and each IRF fabric should have unique domain ID. If you don't want to change IRF domain ID of L2 fabric - use BFD MAD in the fabric as BFD MAD doesn't require unique domain IDs.

I am an HPE employee

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: IRF Domain purpose and IRF in L2 stacks

IRF Domain purpose and IRF in L2 stacks