M and MSM Series
cancel
Showing results for 
Search instead for 
Did you mean: 

MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

 
Highlighted
MSM67
Occasional Advisor

MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

We are running an MSM760 with approx 70 APs (MSM 422s & MSM460).  It has been a stable configuration for us for a number of years. However this year (since January) our clients are experiencing regular, severe network interruptions.

 

The primary manifestation of the problem is the following:

- Clients have full signal strength but no access to the network.

 

Initially we thought that all the traffic in/out of the machine was failing, however we noted an occasional packets being received at our firewall. When we pinged from the client the packets were timing out, but after adjusting the ping timeout we discovered that ping times had actually blown out from the regular 1-5ms to a full 5-10sec.

 

Further detail:

  • The network connectivity loss can occur with a single client or with a cluster of 6-10 clients at the same time.
  • Other clients on the same access point appear to be operating perfectly normally.
  • The issue appears to occur randomly on any access point on the network. (It is not isolated to a faulty WAP)
  • The issue appears on Access points that are physically across different rooms and buildings
  • The issue is occuring regularly through the day. Many times and may be effecting more than one AP at a time
  • Access points are connected to multiple switches (and in different physical locations) [not a faulty POE switch]
  • Access points are all connected to HP switches [not a brand interoperability issue]
  • The issue effects all models and manufactures of laptops that we use. [A range of Toshiba & Fujitsu laptops and a range of models]
  • Access points can have clients in the range 5-30 and experience problems

 

Our current software version is 6.4.1.0-17746 (Hardware revision: B:48).

 

 

Troubleshooting attempted so far:

  • We have a second MSM760 controller and we have changed physical controller. (Only one controller plugged in at a time!) [Rules out physical controller issue].
  • We have removed our 802.11X configurations and changed to a preshared key so as to eliminate any potential Radius or certificate issues.
  • We tried upgrading to the latest 6.5 firmware [with no success], but have rolled back to our 6.4.1.0 version that we were running successfully last year.
  • While running the 6.5 firmware we added some HP560 points. It was unclear if these added/reduced any problems in those locations, but removed these on returning to 6.4.1.0
  • We have run Wireshark to see if anything obvious was occurring in the way of packets. There did not appear to be an excessive broadcast storms or ARPs occurring.
  • Rolled back firmware and config to our backups from August last year (when the system was stable) -- but to no effect.
  • Rolled back HP firmware on all our switches to our configs from last year.

 

MSM760 settings changes attempted with no success:

  • Access Control On/Off
  • Removal of other SSIDs
  • Band steering On/Off
  • Broadcast filtering On/Off
  • Quality of Service DiffServ/Disabled
  • Turning Radio 1 On/Off
  • Turning Radio 2 On/Off
  • Disabling 2.4GHz
  • Disabling 5 GHz
  • Adjusting Beacon interval to 50,100,200
  • Changing distance between APs (Medium/Large/Small)
  • Guard interval (Short/Long)
  • Limiting Max Clients to 20 per radio
  • Changing Transmit power control to Use max power / Automatic power control
  • Changing multicast Tx Rate to a small number / large number

Our users have found that they can sometimes restore their network connectivity by turning their wireless on/off multiple times. However this can be difficult when the applications are all locking up due to network loss.

 

We were intending on replacing out MSM422s with HP560s, but have removed our HP560s for the moment and won't be adding any more until we can resolve our core stability.

 

This issue is absolutely destroying our site stability and my clients are really becoming distraught! Is there anything else that we can try to resolve this urgently?

27 REPLIES 27
Highlighted
MSM67
Occasional Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Further information:

 

Our clients are all running Windows 8.1

Highlighted
Z273
Frequent Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Dear user,

I feel very sad when I see the user has bad experience with MSM.

I dissected your post to see if I can be of any help, I saw you did experiment quite alot of the wifi config but no prevail.

 

You also mentionned your system is stable a year ago and now everything is downhill, I suspect a change if the RF environment that might introduce undesirable interference, killing your wifi.

 

One thing you can look is the Client Wireless page on the controller, and look at the wifi rates being used by both your client device (chipset??) and the AP, usually the client rate stats gives away some hints regarding the wifi state.

If you see both AP and client are doing low data rate and despite your device is near the AP, it is a sign something is fishy there over the air.

 

Let say you have strong interference, again from my own experience a persistant interference is often seen as high noise level by the AP, high noise level affects alot the AP from scheduling a transmit, as you know wifi is a listen-before-talk protocol, if the interfering signal is strong enough, it can halt the AP or delay it from tx because the AP is doing the ClearChannelAssement(CCA).  So if you record the noise seen by the AP, that is helpful.

 

Also if you are in the Euro zone under ETSI regulation, there is a new rule that is effective Jan2015 that all shipping APs must meet the "adaptivity" rule:

  • ETSI EN 300 328 V1.8.1- to improve usage and quality of data transmission equipment operating in the 2.4 GHz ISM band
  • ETSI EN 301 893 V1.7.1- to prove adaptivity of devices operating in the 5 GHz ISM band to the most appropriate channels

In a nutshell, the rule says the AP must stop tx if it sees a signal wifi or not at a specific threshold, and resume tx when the energy level drops below the threshold.

 

These rules only apply to ETSI, not FCC domain (USA, CANADA,..).

 

Before I overwhelm you with too much info, let recap my post focusing on the RF-interference theory.

 

Do you also have the problem on both 2.4 and 5ghz band or more on a peticular band?

 

Another posibility is somebody might doing some NAV attack on your network, NAV attack really slow down the wifi but not to the point of doing DOS, in anycase if you can do a wireless trace during the test on the AP with problem, that is also helpful.  If you have bought the IDS license for the controller, you can inspect the log to see if there are attack on your wifi network, but if you don't have IDS, a wireless trace sometime can reveal a lot on the problem.

 

Hang tight, I'm more than happy to help, I do work for HP, however my time on this forum is pretty much pro bono.

 

Cheers.

 

 

Highlighted
Evan_ISS
Frequent Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)


@MSM67 wrote:

We are running an MSM760 with approx 70 APs (MSM 422s & MSM460).  It has been a stable configuration for us for a number of years. However this year (since January) our clients are experiencing regular, severe network interruptions.

 

The primary manifestation of the problem is the following:

- Clients have full signal strength but no access to the network.

 

...

 

This issue is absolutely destroying our site stability and my clients are really becoming distraught! Is there anything else that we can try to resolve this urgently?


Hi there,

 

do you use RRM?

Highlighted
MSM67
Occasional Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

The Auto-Channel (system-wide) is currently selected.

 

Auto-power, Radio-down mitigation and AP load balancing are all currently NOT selected.

 

The schedule automatic analysis is selected to occur Weekly on Sunday at 2AM, but the Automatically apply new analysis is NOT selected.

Highlighted
MSM67
Occasional Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Thank you for some areas to investigate further. We have the Premium license so IDS is included. I have turned it on now, and will monitor the logs. It seems to have gone out to all of our MSM460s, but I don't think the MSM422s support IDS.

 

I will try and catch the Wireless Rates next time I get a reported problem. One of my struggles is the users are getting to the point where they don't call us much when the problem is occuring 'live' and only report later in the day about all the issues they have had.

 

 

We have been seeing quite a lot of this in our messages:

Mar 5 06:46:24 warning eventmgr ALARM[9] <- EVENT[72176] Raised. AP (name='00:24:A8:86:2F:92') is silent, it may still be providing services but is unmanaged. Reason: (value='Secure connection to AP went down').
Mar 5 06:46:24 warning eventmgr EVENT[72176] AP (name='00:24:A8:86:2F:92') is silent. It may still be providing services but is unmanaged. Reason: (value='Secure connection to AP went down')
Mar 5 05:16:23 warning eventmgr ALARM[8] <- EVENT[54446] Raised. AP (name='00:24:A8:87:E0:5E') is silent, it may still be providing services but is unmanaged. Reason: (value='Secure connection to AP went down').
Mar 5 05:16:23 warning eventmgr EVENT[54446] AP (name='00:24:A8:87:E0:5E') is silent. It may still be providing services but is unmanaged. Reason: (value='Secure connection to AP went down')

 

But there doesn't seem to be an corrolation between the point indicating that the Secure connection went down and the area experiencing WiFi issues.

 

 

 

Highlighted
Evan_ISS
Frequent Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)


@MSM67 wrote:

The Auto-Channel (system-wide) is currently selected.

 

Auto-power, Radio-down mitigation and AP load balancing are all currently NOT selected.

 

The schedule automatic analysis is selected to occur Weekly on Sunday at 2AM, but the Automatically apply new analysis is NOT selected.


RRM might be your culprit as it is the one thing that can perform dynamic changes to configuration in an otherwise (perceived as) stable setup.

 

I would turn it off altogether and if possible fall back to the full manual RF survey setup (if you have one).

Highlighted
Z273
Frequent Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Hi MSM67,

 

Just a suggestion, can you inspect the current clients connected OK to your network regarding the data rate being use, especially in the area where problems had been reported, can you also take note of the client signal seen by the AP.

Just to see when wifi is let say working normal vs when it is not working.

 

Cheers.

Highlighted
Teclatin
Occasional Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Hello,

 

I have a team of MSM765zl controllers and have basically been experiencing the same things. I have already disabled RRM. Initailly one of my issues turned out to be a lack of IP addresses on the vlan that the users were connecting to. Another issue that I found was that while the access points can support up to 255 client connetions per radio, I had nurfed it down to 60 per radio. This was creating a bottle neck effect when my users would try and do bandwidth steering so I bumped it up to 200. These two major tasks resolved most of my issues.

 

Still, I would get complaints from users not connecting to the wireless. Looking at the signal and SNR they looked good. When I reviewed the logs I noticed that the users that were complaining were connecting to 2.4ghz, then would band steer to 5ghz but something would happen and make them fall back to 2.4ghz. I am still looking into the possibility that the signal strength on the 5ghz range might be too weak but its a bit much to accept at this point since I have two MSM460 APs int he rooms that are experiencing this issue.

 

As a side note, I did notice that 9 out of 10 users that were experiencing this were on mac laptops. This got me to look into wireless issues with macs. From what I learned, if the mac has a problem connecting the wireless network, it will dig through its wireless profile to find a suitable network to connect to. THe test users that I am testing this one had the wireless profile down on the list. One of them had the SSID down to slot 30! . I changed this on the test users so that the SSID is listed 1st or 2nd on the list. I also found that it is very common for macs to experience corruption in their PRAM that could cause issues with wireless connectivity so I reset this on the test users just to cover all bases.  This was done yesterday and I will report back in a week or two.

 

Not really sure if this would help anyone but thought I would share what I am dealing with so far...

Highlighted
Z273
Frequent Advisor

Re: MSM760 - Clients loosing network connectivity (ping times 5-10 full seconds!)

Hi Teclatin,

 

Good posting you did, one question: did you enable back RRM on your system or it is still disabled?

 

2nd good point about MAC and its huge list of profile, in fact MAC is using BSD underneath at the kernel level.

 

The wireless tool used to connect to wifi is wpa_supplicant from open source, in fact all tablets/phones Android/linux or Apple are using wpa_supplicant, and you are right about the priority in the profile list as if you inspect the config file used by wpa_supplicant, you can actually see how each network profile are in the wpa_supplicant.conf.

 

You can see the wpa_supplicant if you can open a shell in your MAC and do 'ps' to list all the process running.

 

There is another tool from wpa_supplicant suite that nobody use is wpa_cli, you can 'cli' into the wpa_supplicant task and take control of that process, one command I often use is to disable all the others profiles and leave only 1 profile to speed up the connect as the software does not have to scan all the profiles.

 

Cheers.