LAN Routing

Random static routing issue on HP 3500/6600

 
stlm
Occasional Advisor

Random static routing issue on HP 3500/6600

Hello all,

I'm having a very strange and random (static) routing issue from one internal network (LAN A) to other 2 (local) networks.

"LAN A" has a 6600 HP switch as the router/GW (static routing) for this LAN (VLAN 100). All hosts have a default GW to this router (so they connect to the other LANs via this router).

"LAN B" (a 3500 HP router) has multiple VLANs and connects to "LAN A" via VLAN 1000 (for the link between the 2 routers). All hosts have a default GW pointing to their routers in their VLANs (which are configured in the 3500 HP router).

"LAN C" is in VLAN 200 and connects to "LAN A" through an HP 2810 switch uses the same 6600 as the router for reaching VLAN 100. All hosts in "LAN C" have a static routeto reach "LAN A" that points to the 6600 router.

While monitoring (ping) some hosts from LAN A to the other LANs, I might see some pings that fail (either "unreachable" or very high values) but is so random that it can take days for the errors to show. And I can't replicate it at will... it just happens. (btw, network load is usually the same).

These pings (3 packets) that fail are not the first ones, meaning, usually another host on LAN A already successfully ping some host in the other network and after that one (random) host will fail the ping. So I'm ruling out any "cache" or MAC issue.

Also each host in LAN A pings multiple hosts, but only one of them might fail. So that also kind of rules out issues with the host itself. 

The configuration of this router is almost the default one, except for the VLANs and the static routing.

And, again, it might take days until I see this behavior (I run pings every 3 hours).

I reinstalled the 6600 router from scratch, upgraded the firmware to the latests on from HP and from Aruba... I even tried a different HP router... no luck so far.

Does anyone have a suggestion of what I could try? I'm running out of ideas... 

 

 

13 REPLIES 13
-Alex-
HPE Pro

Re: Random static routing issue on HP 3500/6600

Hello  stlm,

I would start with the same test (ping) but within the same vlan without going between the vlans in order to check if the issue can be seen without being routed.

Hope this helps! 

I am an HPE Employee

Accept or Kudo

stlm
Occasional Advisor

Re: Random static routing issue on HP 3500/6600

Hello Alex,

thanks for the suggestion, but I'm already doing that, the script also pings 2 IPs in the very same VLAN/LAN with no issues so far.

In fact we identified this issue because we added a script that uses ssh to connect from "LAN A" to the other LANs and from time to time we got some ssh errors (unreachable), while we never saw that in the very same VLAN/LAN.

Regards,

-Alex-
HPE Pro

Re: Random static routing issue on HP 3500/6600

Hello  stlm,

If you do a packet capture on the remote host which is failing do you see the ICMP requests coming and replyes going out?

Can you do such test?

I am an HPE Employee

Accept or Kudo

stlm
Occasional Advisor

Re: Random static routing issue on HP 3500/6600

Hello Alex,

thank you for your reply.

The problem is the randomness of the issue. I did try a sniffer on (on some source/destination hosts), but as the problem didn't appear, I needed to end it (too many logs). The same happened when I mirrored the router's interface.

That's why this issue is so annoying, because I can't replicate it at will...  

Regards,

-Alex-
HPE Pro

Re: Random static routing issue on HP 3500/6600

Hello  stlm,

You may try to limit the capture with a filter to ICMP including only the ip addresses of interest. 

This way you will not have additional triaffic captured.

In wireshark it is called capture filter (not the display filter) - it should be set before capture start.

On the router you may try to use acl to count or to use the following way to mirror:

HP 5830 Switch Series - Configuring Traffic Mirroring (hpe.com)

Also you may search for some drops in the counters of the routers under the interfaces.

Hope this helps!

I am an HPE Employee

Accept or Kudo

stlm
Occasional Advisor

Re: Random static routing issue on HP 3500/6600

Hi Alex,

thank you for the suggestions. 

About the drops, I see 54 "Tx drops", as I'm not sure the implication of those drops and when they occurred, I'm monitoring the statistics for that ports (after clearing the stats, although the clearing is only valid for a single session).

Also thank you for the link, I will give that a try, as well as the filtering, which I was about to test, but wasn't sure if ICMP alone (with the affected IP addresses) would've be enough (wasn't sure if any other kind of protocol is involved, as my first test for each host is to test a tcp port in the remote host, and then a ping..., which sometimes the ping works after the tcp test fails).

 

cheers,

 

-Alex-
HPE Pro

Re: Random static routing issue on HP 3500/6600

Hello  stlm,

DId you find if the ICMP request is reaching the devices?

The ICMP with few drops is not always a problem as in a lot of devices the ping itself is with very low preference so if the other services are running fine this should not be a problem.  The TCP and UDP and other protocols which are important for the needed services between the hots are more important as you mentioned. If you see very big per cent of drops in ICMP but usually from hundreds of pings this could also be sign of some issue. I hope you found more about the dropped packets. If you have any questions in regards to the topic please let me know.

 

I am an HPE Employee

Accept or Kudo

stlm
Occasional Advisor

Re: Random static routing issue on HP 3500/6600

Hi Alex,

was about to add some more info and I see you replied to my previous comment, thanks again.

Last week it failed 1 test (after 7 days without any issues), and also today at 4:00 (is an internal network, and at that time there is no load at all...)

The test that failed was ssh from one host to another (only the first host, the rest were all ok), and right after that the ping was fine.

I had port mirroring and a sniffer running and I see that host A sent 4 ARP requests (asking for the gateway's MAC)  and after the 4th request I see the pings.

I see a reply between those 4 ARP requests, but that might have been another host (that also made the same request). So I can't be 100% that it was related to host A... (I can't see the destination for the reply) 

but, host A sent 4 ARP requests... and that may account for:

req 1 to 3 = tests to ssh port, as there are 3 attempts (seq # 0,1,2) 

req 4 = first ping, which succeded. 

So that would mean that the first 3 requests failed, and then the 4th succeded (I see an arp reply a few miliseconds after that last request).

Could that be the problem? that some ARP requests get "lost" (still, I'm sniffing the mirrored router port, so I assume the router actually sees the requests...) and that's why I get this issue? is there any fix on that?

(Although that won't explain why sometimes after a failure, the first pings that succeded have very high times...)

Since today I'm running scheduled dumps, and also added verbose for the ARP protocol, in the hope I can see the destination host of the ARP replies.

Regards,

 

 

 

VainFox
Occasional Visitor

Re: Random static routing issue on HP 3500/6600


@stlm wrote: prepaidcardstatus.com

Hi Alex,

was about to add some more info and I see you replied to my previous comment, thanks again.

Last week it failed 1 test (after 7 days without any issues), and also today at 4:00 (is an internal network, and at that time there is no load at all...)

The test that failed was ssh from one host to another (only the first host, the rest were all ok), and right after that the ping was fine.

I had port mirroring and a sniffer running and I see that host A sent 4 ARP requests (asking for the gateway's MAC)  and after the 4th request I see the pings.

I see a reply between those 4 ARP requests, but that might have been another host (that also made the same request). So I can't be 100% that it was related to host A... (I can't see the destination for the reply) 

but, host A sent 4 ARP requests... and that may account for:

req 1 to 3 = tests to ssh port, as there are 3 attempts (seq # 0,1,2) 

req 4 = first ping, which succeded. 

So that would mean that the first 3 requests failed, and then the 4th succeded (I see an arp reply a few miliseconds after that last request).

Could that be the problem? that some ARP requests get "lost" (still, I'm sniffing the mirrored router port, so I assume the router actually sees the requests...) and that's why I get this issue? is there any fix on that?

(Although that won't explain why sometimes after a failure, the first pings that succeded have very high times...)

Since today I'm running scheduled dumps, and also added verbose for the ARP protocol, in the hope I can see the destination host of the ARP replies.

Regards,


The TCP and UDP and other protocols which are important for the needed services between the hots are more important as you mentioned. If you see very big per cent of drops in ICMP but usually from hundreds of pings this could also be sign of some issue. I hope you found more about the dropped packets. If you have any questions in regards to the topic please let me know.