Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

VRRP Problem on clustered backbone switches

SOLVED
Go to solution

VRRP Problem on clustered backbone switches

Hi,
I want to ask for a solution for our VRRP problem on clustered 8212zl switches. Although we have configured our two backbone switches with active & passive and Single Instance Spanning tree configuration, from time to time the 2nd switch act as a master on several vlans in a few seconds then assigns itself as backup again.

Our enviroment briefly described as below;

1- HP ProCurve 8212zl cluster with two identical nodes 8212-AO1(master) and 8212-AO2(backup).
2- Currently both nodes have K.15.02.0005 firmware.
3- There was 10Gig coaxial connection between two nodes but we created 10 Gig fiber connection insted of coaxial a few days ago (which could not solve our problem too). Moreover we used GigaBit port connection located in different module, failed too.
4- VRRP configuration is following;

***************************************************
VRRP Configuration on AO1
...
router vrrp
router vrrp virtual-ip-ping
...
spanning-tree
spanning-tree priority 0
vlan 1
ip pim-dense
ip-addr any
exit
vrrp vrid 1
owner
virtual-ip-address 10.0.0.254 255.255.0.0
advertise-interval 3
priority 255
preempt-delay-time 60
enable
exit
exit
vlan 30
ip pim-dense
ip-addr any
exit
vrrp vrid 30
owner
virtual-ip-address 10.0.30.1 255.255.255.0
advertise-interval 3
priority 255
preempt-delay-time 60
enable
exit
exit
....


********************************************************
VRRP Configuration on AO2

...
router vrrp
router vrrp virtual-ip-ping
...
spanning-tree
spanning-tree priority 1
vlan 1
ip pim-dense
ip-addr any
exit
vrrp vrid 1
backup
virtual-ip-address 10.0.0.254 255.255.0.0
primary-ip-address 10.0.0.253
advertise-interval 3
preempt-delay-time 60
enable
exit
exit

vlan 30
ip pim-dense
ip-addr any
exit
vrrp vrid 30
backup
virtual-ip-address 10.0.30.1 255.255.255.0
primary-ip-address 10.0.30.2
advertise-interval 3
preempt-delay-time 60
enable
exit
exit

we have already tried the followings;
* Upgreaded to the latest firmware namely K.15.02.0005
* Restarted the nodes respectively. Node one with name 8212-AO1 is master for all vlans however the other node 8212-AO2 swaps its condition to some vlans master to backup and the master to backup.
* When we debug vrrp on 8212-AO1, It can be seen that 8212-AO1 always sends hello packages however 8212-AO2 doesn't receive all ackeges from 8212-AO1.For instance, there are the packages from VLAN 1 and 30. I mean AO2 doesn't receive all the advertise packages from AO1. "Advertise Pkts Rx" number increases continuesly while we execute "sh vrrp vlan 1" or "sh vrrp vlan 30" in every second. However AO2 makes itself as master on these vlans. Not all VRRP packeges aren't received by AO2.
* We have increased advertise-interval to 3 sec but the problem occured again (longer interval doesn't solve the problem instead decreases the occurance number).

Some of console outputs those you may want to see are below;

=========================================================================
*******8212-AO1 outputs at 16:09 on 04.Jan.2011*************

8212-AO1(vlan-1-vrid-1)# sh vrrp vlan 1

VRRP Virtual Router Statistics Information

Vlan ID : 1
Virtual Router ID : 1
State : Master
Up Time : 108 mins
Virtual MAC Address : 00005e-000101
Master's IP Address : 10.0.0.254
Associated IP Addr Count : 1 Near Failovers : 0
Advertise Pkts Rx : 21572 Become Master : 2
Zero Priority Rx : 0 Zero Priority Tx : 1
Bad Length Pkts : 0 Bad Type Pkts : 0
Mismatched Interval Pkts : 0 Mismatched Addr List Pkts : 0
Mismatched IP TTL Pkts : 0 Mismatched Auth Type Pkts : 0

8212-AO1(vlan-1-vrid-1)# sh vrrp vlan 30

VRRP Virtual Router Statistics Information

Vlan ID : 30
Virtual Router ID : 30
State : Master
Up Time : 106 mins
Virtual MAC Address : 00005e-00011e
Master's IP Address : 10.0.30.1
Associated IP Addr Count : 1 Near Failovers : 0
Advertise Pkts Rx : 21493 Become Master : 2
Zero Priority Rx : 0 Zero Priority Tx : 1
Bad Length Pkts : 0 Bad Type Pkts : 0
Mismatched Interval Pkts : 0 Mismatched Addr List Pkts : 0
Mismatched IP TTL Pkts : 0 Mismatched Auth Type Pkts : 0

8212-AO1(vlan-1-vrid-1)#



**********The followings are from 8212-AO2 at 16:06 on 04.Jan.2011********

8212-AO2(vlan-1-vrid-1)# sh vrrp vlan 1

VRRP Virtual Router Statistics Information

Vlan ID : 1
Virtual Router ID : 1
State : Backup
Up Time : 99 mins
Virtual MAC Address : 00005e-000101
Master's IP Address : 10.0.0.254
Associated IP Addr Count : 1 Near Failovers : 594
Advertise Pkts Rx : 8874 Become Master : 50
Zero Priority Rx : 0 Zero Priority Tx : 0
Bad Length Pkts : 0 Bad Type Pkts : 0
Mismatched Interval Pkts : 6 Mismatched Addr List Pkts : 0
Mismatched IP TTL Pkts : 0 Mismatched Auth Type Pkts : 0

8212-AO2(vlan-1-vrid-1)# sh vrrp vlan 30

VRRP Virtual Router Statistics Information

Vlan ID : 30
Virtual Router ID : 30
State : Backup
Up Time : 101 mins
Virtual MAC Address : 00005e-00011e
Master's IP Address : 10.0.30.1
Associated IP Addr Count : 1 Near Failovers : 600
Advertise Pkts Rx : 8998 Become Master : 54
Zero Priority Rx : 0 Zero Priority Tx : 0
Bad Length Pkts : 0 Bad Type Pkts : 0
Mismatched Interval Pkts : 42 Mismatched Addr List Pkts : 0
Mismatched IP TTL Pkts : 0 Mismatched Auth Type Pkts : 0


8 REPLIES
Antonio Milanese
Trusted Contributor

Re: VRRP Problem on clustered backbone switches

Hi Candan,

since there are a lot of near failovers on backup router there's definitely a communication problem between the two switches

i'm inclined to think about:

a) interface issue : post a "show int"
b) a STP issues: post a "show span"
c) a vlan ip config issues..is vrrp vip "bound" to correct real VLAN interface ip

could you post the pertinent config lines (stp,vlan,lacp,ecc) and a "show loggin" ?


Regards,

Antonio

Re: VRRP Problem on clustered backbone switches

Hi Antonio thans for your interest to our problem. Corresponding outputs are attached.

A few things to be known
1- Two switches are connected over G1 port (10Gb cupper port).
2- No LACP trunking exists between two switches.

thanks in advance.
PCurver
Advisor

Re: VRRP Problem on clustered backbone switches

Make sure the Spanning Tree Root is AO1 (since AO1 is the owner of the VRID's)

Re: VRRP Problem on clustered backbone switches

Thanks a lot for your reply but we have already configured the owner as spanning tree root and the root of back up switch is G1 which is the connection port between two swintches.

8212-AO1(config)# SH SPanning-tree

Multiple Spanning Tree (MST) Information

STP Enabled : Yes
Force Version : MSTP-operation
IST Mapped VLANs : 1-4094
Switch MAC Address : 0021f7-03ce00
Switch Priority : 0
Max Age : 20
Max Hops : 20
Forward Delay : 15

Topology Change Count : 185,444
Time Since Last Change : 49 mins

CST Root MAC Address : 0021f7-03ce00
CST Root Priority : 0
CST Root Path Cost : 0
CST Root Port : This switch is root

----------------------------------------------------

8212-AO2(vlan-103-vrid-103)# sh spanning-tree

Multiple Spanning Tree (MST) Information

STP Enabled : Yes
Force Version : MSTP-operation
IST Mapped VLANs : 1-4094
Switch MAC Address : 0021f7-bce200
Switch Priority : 4096
Max Age : 20
Max Hops : 20
Forward Delay : 15

Topology Change Count : 3459
Time Since Last Change : 49 mins

CST Root MAC Address : 0021f7-03ce00
CST Root Priority : 0
CST Root Path Cost : 2000
CST Root Port : G1
Antonio Milanese
Trusted Contributor
Solution

Re: VRRP Problem on clustered backbone switches

Hi Candan,

there must indeed be a problem with STP operations since you have a lot of topology changes:

Topology Change Count : 185,444
Time Since Last Change : 49 mins

could you provide a litte scheme of your topology?
Do you have cisco gears with pvst active?
pvst will interfere with mstp operations and you should filer those bpus..
and since you use the default ist instance your topology is more susceptible to interactions with other switches via RSPT or STP bpus.
I suggest to define edge port explicitly too
is some dynamic routing protocol (rip/ospf) active on switches?

And last but not least =) could add some more details..nominally:

"show span all detail" "show span root-history ist" "show loggin -r stp"

Regards,

Antonio

Re: VRRP Problem on clustered backbone switches

Hi Antonio thans for your reply your are right it seems that we have a problem about STP;

Our topology is consist of pure hp products.
1- Root switches are 8212zl active-passive clustered and connected with VRRP (port G1 cupper 10Gb).
2- There are 56xx, 25xx, 26xx and 29xx series switches. Edge switches are heterogeneous I mean some of them are 25xx, some 26xx and some servers are directly connected to bacbone switches.
3- No cisco device and therefore no cisco gears.
4- No dynamic routing neither RIP nor OSPF

You have suggested to define edge ports explicitly but should we define in backbone switches or in all switches? If you suggest to define edge port in all switches it is too difficult since we have nearly 200 switches at all.

Another question is
Topology Change Count is 186,518 in 45 days. Is this topology change count is extraordinary in 45 days?

And some more detailled outputs are attached.

Regards,

Re: VRRP Problem on clustered backbone switches

attachments :)
Antonio Milanese
Trusted Contributor

Re: VRRP Problem on clustered backbone switches

Hi Candan,

>You have suggested to define edge ports explicitly but should we define in backbone switches or in all switches?

strictly speaking at least on edge switches but if you have a "stable and controlled enviroment" where no one plug&play a cable without calling you =) you could use other solutions to control TC propagation/flooding and speedup convergence: bpdus filtering , loop protection , bpups protection, bpdus guards ecc.. read procurve advanced traffic guide for the gory details =)

>If you suggest to define edge port in all switches it is too difficult since we have nearly 200 switches at all.

well if I were you I would sleep better with an accurate diagram of my physical uplink / spanning tree topology under the pillow =)

>Topology Change Count is 186,518 in 45 days. Is this topology change count is extraordinary in 45 days?
well yes..a lot of TCs

from attached outputs i think you have a schizophrenia between CST versus multiple regions/IST..
i suspect that your switches are pratically all configured:
a) MSTP-operation compatibility (per default on lastest firmwares)
b) with default per-switch MSTP config / region
c) with only instance 0 IST into the region
d) different priority on CST

i dont see lot of TCNs on CST span statistics for legacy STP bpdus floating around (you have MSTP running don't you=) but IST root-history simply point to different regional roots and i suspect those elections are flapping with CST perturbations of the force =)
anyway i bet that following commands will show more interesting facts:

"sh span instance ist" + "sh span debug-counters" vs "sh span debug-counters ports all instance 0"

because we should found most of TCNs coming from downlinks ports (i.e. boundary ports to other regions)

Well i suggest you to define some common regions as per physical or,even better, logical topology (f.e.2-layer or 3-layer but at least 1 common region between OA1/OA2) to control stp calculations/convergence and control CST to IST topology interactions.
Another thing to pay attention of is to VLAN memberships on uplinks/LACP ports since MSTP take care of all vlans in the region build its instance topology..

ah..and i'll suggest you one of the best concise articles/readings to tame MSTP inner logics from a REAL guru

http://blog.internetworkexpert.com/2010/02/22/understanding-mstp/

Regards,
Antonio