Aruba & ProVision-based
1753877 Members
7250 Online
108809 Solutions
New Discussion

Re: 5308xl packet loss on routed traffic

 
chalowther
Occasional Advisor

5308xl packet loss on routed traffic

I've been trying to track down a packet loss problem with our 5308xls and I am hoping somebody has some insight. We experience packet loss on traffic through our 5308xl when the traffic has to be routed. The packet loss will manifest as anything from 5-80% packet loss (or out-of-order packets) for periods of less than a second.

The 5308xl are running the latest firmware, E.11.43

The tests use iperf to send UDP traffic streams. I have a UDP stream that needs to be routed and a UDP stream that does not need to be routed. The routed traffic is the only stream to see the packet loss.

This is my attempt to visualize the issue.  The cores both 5308xl layer-3 switches doing switching and routing.  When traffic has to go through routing, we see problems.

visualize.png

 

 

The UDP streams travel across the same physical links for the entire path, so I do not believe this is any issue with physical cabling.  This includes the same link in the LACP trunk.

The packet loss only happens during the day, so it does appear to be load based.

Counters for transmit drops and receive errors do not increase while the packet loss is observed.

The system-information always appears to indicate low CPU util, and plenty of free Memory and Packet Buffers.

We have tried moving the LACP trunk to a dedicated module, adding capacity so it went from 2x to 4x 1Gb LACP trunk, and updating the firmware to the latest

 

2 REPLIES 2
Vince-Whirlwind
Honored Contributor

Re: 5308xl packet loss on routed traffic

In troubleshooting, you should always do the "most likely cause" or "easiest thing to test" first.

Step1 would be to read the configs to see if anything looks funny.

Step2 personally I would disable all LACP members except 1, or (less convenient) create a simple non-LACP link before doing any more testing.

Step3 I would test different protocols, eg, ftp or file transfers, or something else that gives you a good amount of traffic.

Step4  I'd elimate that VLAN from the "A" switch, move that interface address to the "B" switch (and sort out some routing so it works, presumably a new point-to-point VLAN between them) to see if the behaviour was the same, or if it was different depending which VLAN you route it to

It would be good to read your configs.

chalowther
Occasional Advisor

Re: 5308xl packet loss on routed traffic

Thanks for the reply, I appreciate the tips.

I've been continuing tests since upgrading the firmware, and the packet loss has reduced compared to before.  However, I still see packet loss on routed traffic and no packet loss on the switched traffic.

I have been going through the configs closely and have cleared out a lot of cruft.  No config changes made any noticeable difference.

I would like to bring the LACP membership down to one link, but I am worried about affecting production traffic.  Since the issue is only seen during business hours, I haven't felt comfortable enough to do that.

I've tried a few other tests, but I've settled on the UDP streams as the most reliable way to see the issue.  For example, I will run a TCP test using iperf, but it will almost immediately reach a throughput close to 1Gbps.  Since the packet loss is periodic, I don't want to push this much bandwidth continously until it happens.  Packet loss is also part of  TCP's congestion design, so I expect to see packetloss frequently anyway as the congestion window is increased.

I simplified my picture, but I do also have a test running where the packet gets routed in 'B' and then sent through 'A' without routing.  This test shows almost 0% packet loss.  The core in 'A' and in 'B' both do L3 routing.  Most VLANs only exist in one building and the core in the building has an SVI for the gateway.  A few VLANs stretch between buildings for various reasons.

I do not have a dedicated point-to-point VLAN for routing between the cores.  This is one config change I considered, but it would cause more packets to follow the problem path and I was worried about making the issue more noticeable to users.

A sanitized config for the core switch in 'A' is availble at https://pastebin.com/1FSaUhbd

The 'B' config is very similar.  Traffic arrives in 'A' over Trk11 from 'B'.  Most of the routing table is created by RIP advertisements from the cores, and a couple of edge routers.  VLAN 2 is used for the routing and RIP traffic.