IMC
cancel
Showing results for 
Search instead for 
Did you mean: 

Device does not respiond to ping packets

 
FoxtrotOff
Advisor

Device does not respiond to ping packets

Hi all

Weird one for you today.

Over the past few weeks, we've been getting 2 switches in particular that now and again IMC seems to think it can't see. It throws a critical and we obviously spring into action to find that the switches are running perfectly fine?

While IMC is in a critical state saying a switch cant be reached, i can ping the switches no problem, and even SSH to them, checking the logs on the switches themselves reveals no issues whatsoever..

For some reason, this only happens on 2 of our 43 devices, both these devices are connected at 10g, one ethernet, one fibre.

We have updated to 7.2 last week, and i have also changed the uplink on the switch side to see if that would resolve the issue, unfortunately it has not.

Has anyone else had the issues i am describing or could point me in the direction of a fix?

Thanks!

19 REPLIES
LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets

There probably is some connectivity issue between IMC & those switches. The challenge is in tracking down exactly where.

We had a thread discussing something similar here http://community.hpe.com/t5/IMC/Polling-false-negatives-on-C7000-modules/td-p/6844948

To isolate the problem, you could start with a Wireshark capture on the IMC server. Look for ping packets to/from the devices in question. When you get an alarm in IMC, check the packet capture. If the ping response was never received by the IMC server, then the problem lies outside IMC, and IMC is alerting you to a genuine issue somewhere in the network. IMC has multiple retries for ping, so it's not sending an alarm because one single ping got dropped. 

That packet capture will at least help in telling you which direction to go next with your troubleshooting.

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

Thanks for the suggestion.

I checked the firmware on the devices as there were newer versions released on the 6th.

Ive deployed these firmwares yesterday and this seems to have resolved the issue, we didnt have a single down alarm yesterday on one switch, the second switch i deployed this morning and so far, no down alarms either.

I think this is going to be blamed on bad firmware to be honest but ill update this in a few days with the results of the firmware deployment.

Switches we're using are 2920-24G-PoE, Suspected offending firmware level: WB_16_01_0004

LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets

Hmmm, interesting. I wonder if there was something going on with that firmware that was causing the control plane CPU to be too busy, and unable to respond to ping requests?

Hopefully it settles down now.

RogerKaram
Occasional Advisor

Re: Device does not respiond to ping packets

Hello,

When the devices are critical, can you ping them from IMC itself? It could be a span-tree or balancing issue going to a link where a specific VLAN is not available.

RK

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

Hi guys

Unfortunately my hunch with the firmware hasn't panned out. Here's what i have found over the last few days of messing with IMC and the switches:

Wireshark does show an issue where pings are showing port unreachable, i am not very good with wireshark so ill post a screenshot of the errors on my next capture.

We're getting many more switches going down with does not respond to pings now. I have increased the timeout from 2 seconds to 15 seconds, the alarms usually persist for about 1 minute.

Having looked at the alarms in the past, some of them are happening within seconds of each other, right now, glancing at IMC i can see six critical alarms all starting within 10 seconds of each other and persisting for around 1 minute and 2 seconds approximately.

I have verified that i can speak with the switches from another machine, switch event logs show no disruptions and no warnings whatsoever about an issue.

I rebuilt the IMC server yesterday with a new VM and that continued the same behaviour so there is an issue on that network somewhere.

Some info about the network we have:

All our switches are on a management vlan, IMC has 2 NICs to allow monitoring from the curriculum network we are on. We havent had this kind of issue before and it started around a month ago with the original two switches going offline during that period but is now getting worse.

One final thing, im seeing a few response time of device minors, getting upwards of 100ms.

As mentioned, my next wireshark capture i will post to imgur and get your opinions on it.

Thanks

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

As promised, here is a wireshark capture

http://imgur.com/xvjWRgZ

I did ping the switch from IMC and indeed it couldnt be reached, when i pinged from my machine, i was able to receive replies but the latency was in the range of 1100ms

 

LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets

Seems like you've got a few network issues. Ping times of 1100ms is extremely long. Usually you only see that sort of latency with satellite links. Time to do some investigating into your network. 

Start with the usual things. Map out your network. Understand what the interconnections are, and what path your packets take. Look for congestion, errors, duplex mismatches, etc. 

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

Hi again

We're pretty up on our infrastructure as we have network maps, trunk lists, all of the good stuff. The issue we're experiencing just does not make any sense

Here is a switch experiencing an issue of high latency, CPU and memory util is low, i've also provided an interface list of TX and RX bandwidth and as you can see, there is little to no utilization happening of the switch itself.

http://imgur.com/2yqrTEz

This switch is 1 hop from our core.

Also, no errors or drops are being TX'd on ANY of the ports..

Drawing a loss here sadly and its getting more frustrating by the day

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

Oh and one final thing, all the switches that are having this latency issue, are ALL 2510s

LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets


 


You're looking at stats for the device that's reporting high latency/timeouts, but it is probably not the source of the problem, it's just a symptom.

When you were running those pings showing ~1100ms latency, what device were you running that from? Is it in the same subnet as the device being pinged? What L2 path does it take? How many devices? What about pings to intermediate devices - what do they show? You mention the switch is 1 hop from your core, but what about the source device? And what about the core switches, their interfaces, etc?


FoxtrotOff wrote:

Also, no errors or drops are being TX'd on ANY of the ports..

Errors usually aren't detected at TX. Normally they're seen at RX. You do have a few discards going on there though.

Oh and one final thing, all the switches that are having this latency issue, are ALL 2510s

Be wary of over-focusing on a specific device type. It _might_ be the problem, it might not. It might be related to where in your network you use those devices, or the loads on them.

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

I ping from two locations, both locations are into the core (5412zl), first ping is from my machine into a gig T port  on the core, leaves the core via the trunk going about 30cm to the switch shown in the pictures, the trunk is 2gb ethernet. 

Second ping is from the IMC server itself, hosted on VMware host, ethernetted into the core.

Both locations im pinging from are in the same subnet and vlan as the switch im pinging to. The L2 path the ping would take is:

(my machine) 172.16.1.99/(IMC server) 172.16.1.245 - 1gb -> (core) 172.16.1.11 2gb trunk -> (offending switch) 172.16.1.17

Regarding pings to intermediate devices, this is where it goes a bit weird:

we have 3 switches in parallel to an uplink switch which is 10gb to the core via Fibre, one of the switches behind the uplink will show 1100+ms, however the uplink switch is in the normal range of 1-5ms

The core devices is a 5412zl, has 8 slots filled at the moment, 1 wireless controller, 1 10gb fibre module, 1 1gb fibre module, rest are ethernet. We've never done any manual configuring of duplex so a duplex missmatch is highly unlikely, but obviously still could happen i suppose.

Regarding the discards you are seeing on the image, ive been looking at something to do with buffers, im not very good with networking as im not specialized in it, i work in a school so we have to be dogs boddies, but what could this indicate?

Regarding focusing on a specific type of device, these are the only devices experiencing this issue, but it isnt all the time, its random times of the day,

 

 

NeilR
Respected Contributor

Re: Device does not respiond to ping packets

I'd check the config on the 2gb trunk - feels like some path issue. Have you tried making that a single 1gb connection?

LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets


FoxtrotOff wrote:

we have 3 switches in parallel to an uplink switch which is 10gb to the core via Fibre, one of the switches behind the uplink will show 1100+ms, however the uplink switch is in the normal range of 1-5ms

Is it only one of those switches that shows 1100ms latency, or does it vary? If it's one, then I would look closely at those interfaces, and see what's going on.

We've never done any manual configuring of duplex so a duplex missmatch is highly unlikely, but obviously still could happen i suppose.

Assuming that everything is Gigabit, then generally duplex mismatches are not a problem, and you are correct to not configure anything. Auto-negotiation is best for Gig. It used to be different with 100Mb.

Regarding the discards you are seeing on the image, ive been looking at something to do with buffers, im not very good with networking as im not specialized in it, i work in a school so we have to be dogs boddies, but what could this indicate?

Some discards are normal networking behaviour, and are particularly common when you have a mismatch in interface speeds, or a fan-in problem (e.g. multiple 1G sources hammering a single 1G destination). High levels can indicate congestion. Start by looking for errors before discards though. Do you have any errors on any of your links?


Neil's advice re: checking the trunk link is good too. You can get very odd behaviour when trunks have a problem, especially if it's only on one link in the bundle.

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

The thing is, the examplar i gave you is one of a multiple of switches having the same issue, it isnt just one switch, its 13 switches, all of the 2510 variety.

 

We have enabled IGMP and STP to see if that will help, unfrotunately it has not helped whatsoever today and at the moment im looking at 6 2510s currently down not responding to pings, but are available over SSH but very slowly.

Make that 7 =-) as another has just gone offline.

I have checked the buffers, cpu and memory on all the switches that IMC is currently seeing as "down" and they are all perfectly fine, there is also no high tx or rx going on across the switch ports themselves, or the uplinks

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

Secondly, is there a way to get IMC to check devices using something else? clearly ICMP is being prioritized lower than other traffic as the rest of the network is working perfectly fine, i have IMC authenticating with the switches via SSH now which is a massive improvement over telnet.

LindsayHill
Honored Contributor

Re: Device does not respiond to ping packets

Why did you change IGMP and STP? What behaviour were you trying to change? What hypothesis were you testing?

Making random configuration changes is not helpful, as you don't know if you're introducing new problems. 

NeilR
Respected Contributor

Re: Device does not respiond to ping packets

You show above that you see the same latency from your own workstation, so its not imc having a lower priortiy for icmp. 

But I did get curious about repsonse time as recorded so I reviewed some of my own data for switches. 2910s connected to a 5412 using 10gb. See attached for this last sunday, when the user switches  should not be busy. Note the little hump in cpu.

Sometimes, looking back over the past months I see periods where response time is consistently low, other times I see some spikes where there is high response, but not consitent like you show and not failures to respond like you show.

Does ping in imc when using ping from the tools option for a selected device give the same result? How about when you log into 5412 cli and ping from there.

I still wonder about your configuration. Do you have a management vlan defined? are the switch addresses the only devices on this vlan? Have you tried setting up an isolated test with a pair of switches in a differnet address range? 

I know you suspect all of one switch model, but something in the switch config common to only those?

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

We were told to enable STP and IGMP by the network engineer who came to have a look last week, he was unable to find an issue either due to the weird nature of the symptoms.

The pings aren't just high from my workstation, they are high from the core device itself, i can SSH onto it, run a repetead ping 100 times and it will be over 1100ms when the problem arises. They are also high from switch to switch, only when the issue arises.

yes we have a management vlan an d other networks are segregated into their own vlans too.

Management is on 10, curriculum which is only school authorized devices are on 11, wireless on 16, cctv on 17 and so forth..

I have scanned the management network and there are no unsuspected devices on that network, plus with the issue only being during daily ops on a weekday, im guessing the issue lays somewhere within our curriculum network, but it is very tough to troubleshoot during a day as we can't interupt schools operations to try and switch off links to try and triangulate the issue.

When the issue appears today, i will attempt the ping from within IMC itself, however i have run pings from the command line of imc's server and that showed straight unreachable before so im not very hopefull for IMC to show anything different considering the alarms.

Also, i have enabled bandwidth monitoring and tx/rx rate across the entire network to see if we're seeing high traffic as previously suggested, so far we arent seeing anything really, the busiest i saw was last week with 18 ports transmitting or receiving in the range of 50mb maximum. Considering the way our network is made up, the speed of the uplinks and such, i highly doubt that is an issue

FoxtrotOff
Advisor

Re: Device does not respiond to ping packets

It looks as if my initial hunch about the specific switch version was correct. I today replaced a 2510 with a 2530, we're currently experiencing high latency across all 2510s in school, but the 2530 we just put in is happily pinging away at an average of 22ms.

Im going to contact ProCurve about getting the 2510s replaced or a firmware update to fix the issue, so once i find out from them what the specific issue is, ill update this thread, but for now it looks as if its the 2510s that are at fault.