Comware Based

Transient Loops in the topology

Occasional Contributor

Transient Loops in the topology

Dear Esteemed members, 

we recently upgraded our network by introducing physical loops from the edge switches back to the core switch to achieve link redundancy using RSTP. ( default timers of 2s hello, 15s forward delay, 20 max hops)

Please see attached network diagram. 

There are basically 5 to 6 of such loops, from each switching closet back to the 2 members IRF cluster. 

Since doing so, we have experienced occassional network lasting about 1~2 min each time,  occurring up to 1 to 2 times a day.  On a good day, none. 

Looking at the logs on the core switches and edge switches, it seems each time the outage occurs

1) Core switch (C2) stops sending BPDU , for some unknown reason

2) Link L2 goes into discarding state due to Loop Protection 

3) Link L3 goes into forwarding state 

4) After some time, C2 starts sending BPDU again

5) L2 goes back into forwarding state

6) L3 goes back into discarding / blocked state

Assume events happen the above order, is it possible for a loop to form between event 5 and 6, that could have caused the transient network outages ? 
If so any way to make L3 goes into blocked state before L2 goes into forwarding state ? 

Any idea what could have caused Core switch to stopped sending BPDUs ? 

BTW the devices were using manullay configured system time. No NTP server was in use. 

Thanks for reading ! 

Occasional Contributor

Re: Transient Loops in the Network

upon in depth investigation, and further reading we do not think that transient loops were formed. 
RSTP sync takes care of that. 
The devices in the network were probably unreachable briefly due to the STP convergence and the rebuilding of the ARP cache after it was flushed from the coreswitch upon receiving TCN from the E2. 

Still the question remains as to why did the edge switch STP infromation  age out on the E(edge) switches ? 
Does the STP info age out after forward delay or max_age ? 

According to this article , it should be 3 x hello , that would mean 6 sec ... but testing seems to suggest other wise. 
Can someone point me to any HP article on this "rcvdinfowhile" timer ?