M and MSM Series
1752564 Members
4436 Online
108788 Solutions
New Discussion

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

 
Johnb_2
Advisor

MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

We recently added 150 additional access points to our MSM760 controllers. We were running 50 right before the install. With all AP's connecting to either of our controllers, we are experiencing wierd issues and messages in our unfiltered log such as

Many of these:

Jan 20 13:59:29 debug maestro_sc [3C:D9:2B:7A:4B:4F] 1 attempt(s) made to apply device radio configuration

This message every so often:

debug monitord CPU in last minutes: 100 99 100 99 100 100 100 100 100 99 99 99 100 99 100 99 100 100 99 87 15 15 17 15 17 15 14 15 15 15 15 15 14 15 14 17 17 16 18 36 15 16 15 15 15 15 16 24 100 100 100 100 100 100 100 100 100 100 100 100

This message even though the configurations did not change:

debug mapconf processing ["MAC ADDRESS"] SYNCCONF_CONFIGCHANGE_LIST

 

The insfrastructure is loop free and the PoE switches are not under any kind of network stress.

 

I did a packet sniff of the AP subnet and see lots of udp broadcasts from the AP's to port 3490 destination 255.255.255.255.

Any help would be greatly appreciated while we wait for info on renewing the hardware warranty.

 

8 REPLIES 8
jguse
HPE Pro

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Hello John,

 

Here's a few things you could check:

 

1) Do the "MSM760 Controllers" (how many exactly, and are they teamed?) actually have licenses to support a total of 200 APs? By default they will each support up to 40, and to support more a license like J9371A would be needed, up to a grand total of 200 APs per Controller.

 

2) Port 3490 is the standard Colubris management port, you're seeing traffic from the APs trying to reach the Controller. If point 1 above was already in place, I'd say to check if there's a firewall blocking this port or one of the other ports required to be open for the APs and Controllers to communicate, but then again if it was working before with 50 this does seem unlikely.

 

Let me know if that helps. If the issue persists it may be best to open a support case when you are able to.

Best regards,
Justin

Working @ HPE
Accept or Kudo
Johnb_2
Advisor

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Hi Justin,

1) Do the "MSM760 Controllers" (how many exactly, and are they teamed?) actually have licenses to support a total of 200 APs? By default they will each support up to 40, and to support more a license like J9371A would be needed, up to a grand total of 200 APs per Controller.

 

The controllers are not teamed. The one controller has a licenses for up to 200 the other for 160.

 

2) Port 3490 is the standard Colubris management port, you're seeing traffic from the APs trying to reach the Controller. If point 1 above was already in place, I'd say to check if there's a firewall blocking this port or one of the other ports required to be open for the APs and Controllers to communicate, but then again if it was working before with 50 this does seem unlikely.

 

The controllers and AP's we moved to their own vlan. There is nothing inbetween them.

So if these AP's can reach the controllers, I shouldnt be seeing them broadcast to udp port 3490? Because this traffic is constant when I do a packet sniff.

 

I just was watching the one controller again and it just randomly jumped from 100 AP's synchronized to 100 of them pending.

jguse
HPE Pro

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Yes, if the AP cannot reach the Controller (anymore) it seems logical that it would broadcast in order to try to do so. If you say they get synchronized and then drop back to Pending, it sounds like they really can reach the Controllers, but something is causing them to drop back to pending.

 

I've only seen a similar issue with "Radio status pending" once, and unfortunately we never got to the bottom of it because after a few days it disappeared and everything worked fine, hasn't been seen since. Due to the environment it was also not clear whether it was a "cosmetic issue" or if they were actually un-connectable (the only clients seen as connected on the Controller were on synchronized APs). Thus it doesn't seem like quite the same issue either, since yours all drop to Pending at the same time.

 

I've got a few more questions then:

1) What SW Version are you running on the controllers exactly - 5.4.? 0/1/2 ? Were the new APs also running the same software version when they were deployed? I've seen at least one known issue with APs getting desynchronized if the software is newer, but I think that was on 5.3.x. Might be relevant though.

2) Are the new APs all the same model, and which model are they?

3) Do you know whether the APs are still visible, connectable, etc. when they are in Pending status?

4) You mentioned the switches looking fine in terms of CPU and such. Have you checked the show logging? Are there reports of excessive broadcasts, and any excessive drops to be found?

 

If we don't get much further like this you could also send me a copy of the whole filtered + unfiltered log if you'd like, perhaps it could reveal more than the entries seen here.

Best regards,
Justin

Working @ HPE
Accept or Kudo
Johnb_2
Advisor

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Yes, if the AP cannot reach the Controller (anymore) it seems logical that it would broadcast in order to try to do so. If you say they get synchronized and then drop back to Pending, it sounds like they really can reach the Controllers, but something is causing them to drop back to pending.

 

I've only seen a similar issue with "Radio status pending" once, and unfortunately we never got to the bottom of it because after a few days it disappeared and everything worked fine, hasn't been seen since. Due to the environment it was also not clear whether it was a "cosmetic issue" or if they were actually un-connectable (the only clients seen as connected on the Controller were on synchronized APs). Thus it doesn't seem like quite the same issue either, since yours all drop to Pending at the same time.

 

*Well as far as all the AP's unsyncing, they all started doing this once we started adding the 73 access points from the first new building and then added the next new 73 AP's. They all joined the default AP group then I had to manually move them to the appropriate group. I know for a fact that the AP's see the controller and there is initial connectivity because the MSM is also their dhcp server. They get an IP address just fine.*

 

 

I've got a few more questions then:

1) What SW Version are you running on the controllers exactly - 5.4.? 0/1/2 ? Were the new APs also running the same software version when they were deployed? I've seen at least one known issue with APs getting desynchronized if the software is newer, but I think that was on 5.3.x. Might be relevant though.

 

*Running 5.4.2.74-01-10103

 

2) Are the new APs all the same model, and which model are they?

 

*All are model MSM410

 

3) Do you know whether the APs are still visible, connectable, etc. when they are in Pending status?

4) You mentioned the switches looking fine in terms of CPU and such. Have you checked the show logging? Are there reports of excessive broadcasts, and any excessive drops to be found?

 

*I had fault finder set to high and warn. No excessive broadcasts or drops in the logs.

 

 

If we don't get much further like this you could also send me a copy of the whole filtered + unfiltered log if you'd like, perhaps it could reveal more than the entries seen here.

 

*I attached an unfiltered log hopefully there is something useful there. My game plan tomorrow morning is to default the controllers to factory settings and then reimport a backed up config.

jguse
HPE Pro

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Thanks for the info and logs. Here's what I can see so far that looks odd...

 

1) SSH info messages constantly on random ports from 10.0.0.10 - any idea why this might happen? Looks like something is logging in constantly, or the controller tries to retrieve a config from somewhere, though I'm not too familiar with these messages.

There's that message about functional config identifier being retrieved and right after that it processes changes on the APs, as though something has changed in the configuration. Do you have anything connecting to the Controller like SNMP Management SW, and/or anything like a config backup running daily on there?

 

Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 43180 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 55433 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 57382 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 44320 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 52836 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 41463 ssh2
Jan 21 22:42:58 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 46622 ssh2
Jan 21 22:42:59 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 44759 ssh2
Jan 21 22:42:59 debug	mapconf      retrieved functional configuration identifier <0x00374d25>.
Jan 21 22:42:59 info	sshd         Accepted keyboard-interactive/pam for admin from 10.0.0.10 port 35017 ssh2
Jan 21 22:42:59 debug	mapconf      [00:24:A8:87:2B:2F] processing SYNCCONF_CONFIGCHANGE_LIST.
Jan 21 22:42:59 debug	mapconf      [3C:D9:2B:7B:00:32] processing SYNCCONF_CONFIGCHANGE_LIST.
Jan 21 22:42:59 debug	mapconf      [3C:D9:2B:7A:4B:49] processing SYNCCONF_CONFIGCHANGE_LIST.

2) Looks like there's a crash of "sfhandler" related to "sfproxy". Hard to say if this is just a one-off event due to all the "spam" in the logs, so they don't exactly cover a large timespan. My guess would be that this crash is related to sFlow though,  possibly an sFlow proxy configuration. Are you running sFlow on there?

There were some fixed in 5.4.2.74 related to high CPU Usage and HTTP Proxy using all the CPU. Perhaps that was fixed but sFlow also has proxy issues. I'd test without any proxy or sflow configuration if it applies.

 

Jan 21 22:42:30 warning	monitord     Unexpected termination for process 'sfhandler' [pid 6289, up for 405 sec(s)]
Jan 21 22:42:30 debug	monitord     Stopping [2,8]: 'sfhandler' [pid 6289, up for 405 sec(s)]
Jan 21 22:42:31 debug	monitord     Starting [3,3]: 'sfhandler' (pid='15388')
Jan 21 22:42:34 debug	monitord     Msg:       <SFPROXY>     <ADD-PROCESS>  (from sfhandler)

 

Aside from that your plan to reset and reconfig sounds good. Hopefully you can get this resolved soon.

Best regards,
Justin

Working @ HPE
Accept or Kudo
Johnb_2
Advisor

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

1) The SSH is normal. That is our NAC logging in and pulling client  associations.

 

2) I turned off sFlow because I am not even sampling data currently.

 

3) I reset to defaults and uploaded the backed up config. Same symptoms that I opened this forum question with. 

Johnb_2
Advisor

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

It just so happens that the reason the AP's have been jumping around is because due to the network access control that connects via ssh. It sends commands via CLI such as mac lockout and no mac lockout. For some reason or another the increase in access points is causing a problem with running these commands. I am working with the company that we purchased the NAC from to get to the bottom of the issue.

jguse
HPE Pro

Re: MSM760 wireless controller randomly going to pending then synced code on the MSM is 5.4

Hi John,

That's very interesting, I was going to suggest stopping those NAC logins temporarily to see if that helps too. Do let us know how this works out with the NAC company :)
Best regards,
Justin

Working @ HPE
Accept or Kudo