Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

Better load balancing on trunk lines

SOLVED
Go to solution
Willman Lee
Occasional Advisor

Better load balancing on trunk lines

I have two idential 5300xl switches connected to each other with 3 gigabit lines on one trunk (trk1) with ports (A1, B1, C1). We are running into an issue where one of the trunk lines (A1) has a bandwidth utilization of 50-60% of traffic while the 2nd (B1) is at about 20% and the 3rd (C1) at about 15%. Due to the high amount of traffic between the switches, we are getting a lot of dropped packets on A1. The trunk is configured as a static trunk group and I have tried using LACP and Trunk protocols with no difference in the dropped packets.

We have already tested the cables, modules and switches for faults and everything is clean. We have about 20 VLANs currently configured and the two switches are running IP routing. Each VLAN is using approximately the same amount of bandwidth, there is no one VLAN that is using an excessive amount more than another VLAN.

I need to figure out a way to properly balance out the traffic to eliminate the errors.

Would OSPF help balance the traffic?
Would adding a 4th trunk line help?
Would going to a MSTP design with three trunks (with 2 gigabit lines per trunk) and assigning 1/3 of the VLANs to each MSTP instance/trunk be possible?
Is there any difference in removing the static trunk configuration and use a dynamic LACP to configure the trunks instead?

Any other suggestions would be greated appreciated as we are really struggling with trying to eliminate the dropped packets due to the high utilization. This is a video network, not a data network so that is why eliminating dropped packets is critical.
17 REPLIES
Matt Hobbs
Honored Contributor
Solution

Re: Better load balancing on trunk lines

The trunking uses a basic algorithm which hashes the SA/DA mac-addresses to randomly assign conversations to a given link. Usually this results in good distribution when many SA/DA pairs are involved.

With the 5300 series you can add up to 8 ports in a trunk group, so I would see if adding a 4th port gave you better distribution.

Don't bother with Dynamic LACP.

With MSTP that could work, but you'd need to spend a fair bit of time trying to determine your traffic patterns to see if that would work. It's also a large change without a guarantee of success.

OSPF, once again this is unlikely to solve your problem as it does load-balancing via destination network only (not host).

One thing I would definitely try to prevent some of these dropped packets would be to simply enable 'flow-control'. I spent a lot of time recently with another customer that was seeing random drops and this miraculously fixed their problem. I didn't even need to enable flow-control per port, I just enabled it globally and bingo problem solved.

So I would try enabling flow-control first, followed by adding more links to the trunk (up to 8). If the majority of traffic is being generated by only a few hosts though, the load balancing isn't going to be as effective and you may need to look into 10GbE.
OLARU Dan
Trusted Contributor

Re: Better load balancing on trunk lines

Matt needs 10 points.

What I've done for a 4-gig uplink 4108GL==cisco 3750 is put 1 heavy-vlan on 1 optical circuit (not-trunked Cisco-wise), one other heavy-vlan on another optical circuit (not-trunked Cisco-wise), 2 lightly used vlans trunked (Cisco-wise) on 1 optical circuit, and 10 non-important vlans on the 4-th optical circuit trunked (Cisco-wise). Would this be ok for you (I had NO packet drops on any of the 4 optical circuits)

Cheers,
Dan
Willman Lee
Occasional Advisor

Re: Better load balancing on trunk lines

Thanks for the good information Matt. I will definately try enabling the flow control to see the effect. The problem I see it might have is that essentially flow control attempts to load balance by sending pause frames to the overloaded links. In our network 99% our data is live video streams which can not be retransmitted like packets of regular data so sending pause frames would likely cause video frame drops.

I'm hoping just adding one or two more ports into the trunk will be enough to load balance. We seem to just be unlucky in our SA/DA algorithm hash right now. In our network we have a guaranteed and sustained 700 Mbit unicast video streams spread on 17 VLANs, then a dynamic multicast video stream on another VLAN (potential max of 430 Mbit) and three more VLANs with minimal regular data packets (probably 100 Mbit). There are about 900 hosts generating the majority of traffic going to about 30 different destinations.


Could you explain to me how you have configured the 4-gig uplinks Dan? We may have to spread out the VLANs manually if nothing else will work. I had tried once to configure two different trunks between the switches and tagging different VLANs to the trunks but due to STP it ended up blocking a trunk from being used.

Thanks,
Will
Matt Hobbs
Honored Contributor

Re: Better load balancing on trunk lines

In regards to enabling 'flow-control' - I only needed to enabled it globally which I suspect enables it internally between the modules. Since there was no need to enable it on ports, there was no risk of pause frames being sent outside of the switch.

Syntax: [ no ] flow-control
Enables or disables flow-control globally on the switch, and is required before you can enable flow control on specific ports.
To use the no form of the command to disable global flowcontrol, you must first disable flow-control on all ports on the switch. (Default: Disabled)

Mainly I think it just helped improve buffering.
Willman Lee
Occasional Advisor

Re: Better load balancing on trunk lines

Ok we tried this last night:

Enabled flow control globally = no effect
Enabled flow control on trunked ports = no effect
Disabled LACP on all ports = no effect
Added two more ports to the trunk (for a total of five) = little to no effect
Hard coded port speeds on the trunk = no effect

This is getting really frustrating as we can't figure out why the SA/DA is so bad at randomly assigning conversations across all the trunk ports. The other strange issue is that all the dropped packets are occuring on one switch whereas the other switch is almost at zero.

I've attached a snapshot of the port utilization on the two switches. The top switch is the one having all the dropped packets. Currently I have ports b15, b16, c15, c16 and D15 assigned to the trunk. As you can see port B15 is way above the other trunk ports. If I physically remove the patch from B15, the +40% traffic moves to B16. If I remove B16, then the traffic moves to C15. It seems to always utilize the first trunk port for the majority of the traffic. That should rule out hardware problems with the actual module. We have even tried using fibre between the ports and there is no difference. Right now all the cables are CAT6 tested. We have even replaced the switch chassis with a spare new one and there is no change.

To give a little more background on the network layout. There are about 800 cameras each with different MAC addresses streaming to 17 network video recorders (unicast) and 12 viewing stations. Each recorder is in it's own VLAN with a unique MAC address, and there are about 48 cameras in each VLAN streaming to the recorder in the respective VLAN. Each viewing station can randomly pull up to 34 multicast streams from the cameras (all depends what the operator wants to see). We are running IGMP, RIP and PIM on the switches and there are no errors in the logs other than the dropped packets. We have done the bandwidth calculations and the theorical maximum bandwidth that would need to go over a trunk line is about 1 Gbit, of which 700 Mbit is a sustained unicast streams from the cameras to the recorders.
Matt Hobbs
Honored Contributor

Re: Better load balancing on trunk lines

Hi Willman,

A couple of other ideas. First of all, it looks as though you're using the J4907A 16-port gig module which is over-subscribed by design. I would avoid using this module for switch-to-switch links if at all possible. You're much better off using the 4-port Gigabit module for this purpose.

Further to this, I'm assuming that it's Drops TX that are incrementing. Have a look at the reviewers guide and check out the section on Packet Buffer Memory Management - http://www.hp.com/rnd/pdfs/5300_Reviewers_guide_May06.pdf

The 'walkmib ifoutdiscards' and 'walkmib ifindiscards' commands will give you some idea of if the packets are being dropped inbound or outbound.

What you may be able to do is adjust the size of the outgoing buffers by changing the queue depth with the Guaranteed Minimum Bandwidth commands.

Syntax: [ no ] int < port-list > bandwidth-min output [ < queue1% > < queue2% > < queue3% > < queue4% >]

Alternatively, you could try strict queuing by going into the port that is dropping packets and enter 'no bandwidth-min output'

None of this of course will help you with better load balancing between the members of the trunk group. You seem to have enough devices that I would imagine would give better distribution than what you're seeing, but they definitely seem to be hashing those conversations on to that first link.

Right now, my strongest recommendation would be to use the 4-port gigabit module for your trunk.

Matt
OLARU Dan
Trusted Contributor

Re: Better load balancing on trunk lines

Willman,
this time I need 10 points ;-)

please find attached 2 combed configs one for the cisco an one for the hp procurve

it worx for me brilliantly

Cheers,
Dan
OLARU Dan
Trusted Contributor

Re: Better load balancing on trunk lines

???
OLARU Dan
Trusted Contributor

Re: Better load balancing on trunk lines

!!!
Willman Lee
Occasional Advisor

Re: Better load balancing on trunk lines

Thanks for the sample configs Dan, I'll look at them and see if I can adapt it for our two core HP switches.
Willman Lee
Occasional Advisor

Re: Better load balancing on trunk lines

Thanks for that document link. I read it and noticed something in it:

================
Trunking in a Layer 3 Environment
Traditional trunking uses MAC (Layer 2) addresses to determine which link in the trunk a
particular traffic flow travels over to avoid the problem of out-of-sequence packets. In a Layer 3
environment between two routing switches this would cause all packets to flow over only one
link because the source and destination MAC addresses for all packets would be the same â the
MAC address of the two connected routing switches.
To avoid this situation the ProCurve Switch 5300 Series uses the source and destination IP
addresses to determine which link a particular packet flow uses. This will provide a good overall
distribution of traffic across the different links in the trunk.
================

That seems to be the case which is happening on our two core switches (using the switches as SA/DA). Is there a configuration setting I'm not seeing that is causing the switches to use MAC addresses instead of IP addresses for distribution? The two switches are layer 3 and acting as routers.
Matt Hobbs
Honored Contributor

Re: Better load balancing on trunk lines

Nope, you're not missing anything, that occurs automatically when you have a point-to-point L3 routed link between two switches. It needs to do this obviously since only two mac-addresses are involved between the two routers.
Teknisk Drift_1
Occasional Advisor

Re: Better load balancing on trunk lines

hmm.. tricky one, this..

It's was said here that distribution of traffic is random. That isn't quite right, if you read the manual (You also see that in the manual quote that you included in the last post). Rather, the switch calculates (from the SA/DA pair) which port to send traffic through. HP uses an XOR of the last three bits of each address on other equipment, I would guess it's something along those lines here too.

This means that the imbalance in traffic /could/ be based on the distribution of IP-adresses that are communicating. In that case, your only option is to add more links to the trunk, or change which IP's are in use.

You have 800 sources, and 30 destinations, but that doesn't mean you have 800x30 SA/DA pairs, because traffic isn't between random hosts. This will result in an uneven load distribution.(Use PCM or other tool to see who are responsible for the majority of load, there could be clues to better balancing of traffic/addresses there)


Secondly: what is "a lot" of errors? How large percentage of packets fail?
(That is: are you sure you have a problem?)

But, something else strikes me here:

1. Certain HP equipment doesn't distribute all traffic to all ports
2. When you added/removed a link, distribution didn't change, you just moved a whole bunch of traffic from one port to the next.

Could this be because multicast is always forwarded on the same port?

The ProCurve 6400/5300 manual says that non-unicast traffic is spread evenly. But for our GbE2's the manual say: "Multicast, broadcast, and unknown unicast will be forwarded on the lowest port number in the trunk by default".

Could it be that the ProCurve manual is wrong? I'd run this by ProCurve Support.
Matt Hobbs
Honored Contributor

Re: Better load balancing on trunk lines

That's a great point. Multicast/broadcast traffic will only utilise one link. Going by the traffic screenshots attached earlier, this looks to be what's happening.

The only way to load balance that better would be via the MSTP method, certain VLANs utilising certain links.

Willman Lee
Occasional Advisor

Re: Better load balancing on trunk lines

Great replies, finally I get an answer on why we have so much traffic on the one port. We do have a ton of multicast traffic on the network and that would finally explain why we are having our problems.

In regards to the first two paragraphs in Teknisk's reply, all the multicast traffic originates from IP addresses 10.10.101-117.x (VLANs 101-117) directed to the 12 video display units at 10.10.20.x (VLAN 200). We have used PCM to look at traffic and it's fairly even from the VLANs (there are 4 or 5 that have a bit more than others but not excessively). Could the traffic imbalance be due to the fact that all destination addresses for multicast traffic is on the 10.10.20.x devices? I have already tried adding more links with out success. Changing IPs is not an option as the video display units have to be in the same VLAN for the video system to work properly.

A lot of errors is about 1000 every 5-10 minutes on our secondary core switch. I know that HP's acceptable level of errors is zero but in real life situations about 5-10 over a course of a week is ideal. We definately know we have a problem as we can see the dropped packets on the video streams. The way our video system works (according to the manufacturers) is that when a multicast stream is called from a video display the first packet is like an initilization packet. If that packet gets dropped, the video display units are not smart enough to know that the packet got dropped and all that follows is video that is black. Partly this is a design fault that they are addressing, but at the same time their engineers say that most good networks should have near zero dropped packets so black video is normally very rare.

I have Dan's sample configs on configuring VLAN traffic on specific ports between switches but I'm finding it hard to understand as it's between a HP and Cisco switch. I have no experience on Cisco equipment so if anyone has a sample config on how to configure VLAN traffic directed to different ports on Procurve switches it would be most appreciated.

Thanks.
Matt Hobbs
Honored Contributor

Re: Better load balancing on trunk lines

For those drops, I still highly recommend you use the 4-port gigabit module only for your switch to switch links. The 16-port is oversubscribed and has to share more buffer space with other ports.

Now that it appears that the broadcast/multicast traffic appears on the first port of the trunk - if you could move that module to slot A and at least use one of it's ports as part of the trunk, you should be able to get around those drops.
Teknisk Drift_1
Occasional Advisor

Re: Better load balancing on trunk lines

>>would finally explain why we are having our problems.

Excellent. But since the manual says that it should distribute multicast evenly, I'd chek it out with HP. Maybe there's a fix...

>>Could the traffic imbalance be due to the fact that all destination addresses for multicast traffic is on the 10.10.20.x devices?

Well, theoretically, I guess. But that would be if most of your DA's have the same last three bits in the address. Even then, you'd have to be unlucky in the distribution of the SA's.
I'm more for the multicast-on-one-port-solution.


>if anyone has a sample config on how to
>configure VLAN traffic directed to
>different ports on Procurve switches

Uh.. No expert on this, but I think that multiple trunks and MSTP is your option there. How Dan's config works, I have no idea, where is STP in there?

Anders :)