M and MSM Series
1832891 Members
2219 Online
110048 Solutions
New Discussion

Re: Losing access to network

 
ISoliman
Super Advisor

Re: Losing access to network

use 6.2.0.1 and let me know.. 6.3 having Bonjour feature for Apple but no stable version yet
Ryeman
Occasional Contributor

Re: Losing access to network


@ISoliman wrote:
use 6.2.0.1 and let me know.. 6.3 having Bonjour feature for Apple but no stable version yet

Hi - I've just upgraded my MSM710 to what I thought was the latest Firmware 5.7.4.0 and, if anything, it seems to have made the client connection issues worse! Where can i get hold of 6.2.0.1? The Download section in HP My Networking does not show this version.

PS HP support have told me this product is end of life and only begrudgingly gave me a link to their FTP site for version 5.7.4.0. I have a valid Software Care pack

 

Evan_ISS
Frequent Advisor

Re: Losing access to network


@Ryeman wrote:

Hi - I've just upgraded my MSM710 ...

PS HP support have told me this product is end of life and only begrudgingly gave me a link to their FTP site for version 5.7.4.0. I have a valid Software Care pack

 


If you have a valid Software Maintenance (not Hardware, it doesn't qualify) Care Pack you need to link it to your HPSC account and then it will (hopefully) be linked to the Software and Updates portal from where you will be able to download the latest software for your controller.

 

The MSM710 is indeed EOL and software development has stopped at the 6.0 version, the latest version for it is 6.0.2 if I am not mistaken.

Ryeman
Occasional Contributor

Re: Losing access to network

Looking at the exteded System log for 'Controlled AP's' I focused on one particular Client who had just lost network on their Windows Mobile HandHeld Device. Here is the section where i think it starts to go wrong;

 

May  2 10:13:10 debug kernel       TW1419N0FC Sending deauthentication to 00:1a:6b:a6:3a:ed (requested by eapolserver)

 

May  2 10:13:10 debug eapolserver  TW1419N0FC Disassociating wireless client 00:1A:6B:A6:3A:ED

 

Why would the 'eapolserver' suddenly request deauthentication?

 

 

MSM67
Occasional Advisor

Re: Losing access to network

Was firmware the only solution to these issues? I seem to be having the same problem with 6.4.1.0-17746.

 

 

 

Here are my experiences: http://h30499.www3.hp.com/t5/MSM-Series/MSM760-Clients-loosing-network-connectivity-ping-times-5-10-full/m-p/6716088/thread-id/4205#.VPfEmmB0270

 

sjordet
Advisor

Re: Losing access to network

Well, I have found out that the drivers of the wifi-card has a lot to say. It seems the MSM infrastructure is more sensitive than others for bad wifi-drivers. I can't explain why, though.

 

I've had several versions of Intel Wifi drivers giving me a lot of headaches, even quite recent ones...

 

-Stian

Arimo
Respected Contributor

Re: Losing access to network

There's a lot of  Intel 7240 WiFi chipset going around. This one's known to have problems with any wless. Intel has published a fix. MSM also has a fix on 6.4.2 -FW. There are also other client connectivity fixes, including DHCP connectivity.

 

So if you're entitled, I'd suggest updating.


HTH,

Arimo
HPE Networking Engineer
Dmitry2012
New Member

Re: Losing access to network

The same issue. But i using one msm720 with 9 ap's msm410. Anyone knows what to do with this trouble?

ISHR
New Member

Re: Losing access to network

Hello,

we're facing very similar problems here at International School in Hanover, Germany. Running MSM765 controllers with MSM422 and MSM466 APs and a total of 74 APs. We have appr. 250 clients using the WiFi of which 90% are MacBooks.

 

Our case ID is 4762319356

 

With kind regards,

ISHR

Break_dontfix
Occasional Visitor

Re: Losing access to network

I also am having this issue.  I have read through the 6 pages of blog, and find that I am not alone.  I have an MSM410, with NO controller, a stand alone unit.  I was running Firmware 5.5 for about 3 years with NO problems.  (Same clients, same hardware connecting and using the AP, about 20 devices).  After 3 years, I decided to upgrade the firmware, so that I could connect with browser (SSL v3 not supported on newer OSes) and manage the device.  Bad move!  I upgraded about 10 days ago, and every 24 - 48 hours, I get called that our WiFi is dead.  I log into it, and it seems like it is up and running, but the people using just lost connectivity.  I restart (from web panel or PoE cable disconnect) cures the problem and life goes on.

There is only 1 VSC and wireless security is NOT checked.

Has anyone been able to make this work.  Firmware is current at 6.6.2.0-22792

 

Aarón
Frequent Advisor

Re: Losing access to network

Hello,

I'm having some big throughput issues. Recently we added a 5th controller to our team with 100 extra APs, having a total of ~760 APs and peaks of 4000 concurrent users. On the weekends it seems to be fine, we have a openwrt client making measures all the time to see how the throughput/latency and so on is working. The problem comes as soon as the number of users arise.

When there are about 1.5-2K simultaneous users connected, the manager controller CPU stays continuosly at 90-100% usage until the number of users drop, then the CPU lowers (it still has some spikes though),

From what I gathered the culprit seems to be a process called rrdsampler, it hogs the CPU and it is affecting the service. It is affecting the authentication process as well, I noticed that we have a ton more 802.1x timeouts than before, the throughput drops drastically and ping loss and latency increases. That happens on an AP without many users and the total throughput of the AP on the ethernet port is very low.

There are no big interferences detected, I went there with a spectrum analyzer to check if it could be an RF issue but I didn't find any problems, just a nearby AP that was on a different channel so no channel overlap there (5 channels of difference between them).

I know that RRDtool is used for graphing and storing statistics, maybe the issue here is trying to get too many statistics from each and everyone of the users. When there are few users it's ok, but when that value spikes it's just not working.

We are running 6.6.2.0, we have many 3 VSCs, 2 of them are tunneled through the controllers but the third one is not tunneled (sends the traffic directly from the AP to a VLAN tagged directly onto it). We are not using the team for control access, just for authentication through an external RADIUS server.

Our configuration is like this:

 - We have the lower allowed speed rates disabled (11Mbps or higher are only allowed) to assure a good connection for each user.

 - RRM enabled with auto-channel, auto-power and AP load balancing.

 - Tx protection -> RTS/CTS with 1024 RTS Threshold to mitigate the hidden node problem (we took measures to see if this affected the overall throughput and it didn't seem affect that much).

I already opened a case with support but I would like to know if someone is experiencing the same issues I'm having. Mostly the rrdsampler process issue, if you want to check whether the process is hogging the CPU SSH the controller/AP and type top.

 

Aarón

 

Thanks!

Aarón

CraigStrydom71
Occasional Advisor

Re: Losing access to network

Hi Aaron,

Do you have LLDP enabled?

Disabling LLDP dropped our CPU usage from 90-100% to 27-60%.

I have 4x MSM760 teamed with 400X MSM460 APs on software ver 6.6.2.0.

Regards,

Craig.

Aarón
Frequent Advisor

Re: Losing access to network

No I haven't, I will try that and let you know how it goes, thanks!

Aarón
Frequent Advisor

Re: Losing access to network

I just disabled LLDP but the CPU it's still very high, here is the top output command:

Mem: 2244736K used, 862380K free, 0K shrd, 315404K buff, 722896K cached
Load average: 3.51, 3.73, 3.70    (State: S=sleeping R=running, W=waiting)

  PID USER     STATUS   RSS  PPID %CPU %MEM COMMAND
25481 root     R        15M   449 91.5  0.5 rrdsampler ---> That is the process that hogs the CPU
 5853 root     R       141M   449 19.1  4.6 rfmgr_sc
  478 root     S       736M   449  9.5 24.2 regng
  815 root     S       3088   449  4.0  0.0 openvpn_master
  728 root     S        36M   449  2.7  1.2 openvpn
  452 root     S       6228   449  2.7  0.2 store-devices

Also here is a screenshot where you can see how the users increase (top graph), the manager controllers CPU increases as well (middle graph) and the bandwidth report deacreases (last graph). It happens everyday except on the weekends, where I the bandwidth was far better and consistent.

I added a second graph but with a week timespan where the you can see the behaviour I mean.

 

Any ideas? When I get some more info I'll keep on posting plus with any news from support.

 

Regards,

Aaron

CraigStrydom71
Occasional Advisor

Re: Losing access to network

Another setting may be IGMP proxy under Home -> Network -> IGMP.
Have seen it mentioned together with high CPU.

Also, Radius accounting really adds a lot of processing. Disable it if you do not use it on the VSC.

Perhaps disable RRM to test

Check if you have severe interference checks enabled on the radios - also made my CPU usage higher.

Will post again if I think of anything else.

Regards,

Craig.

Aarón
Frequent Advisor

Re: Losing access to network

Hello Craig,

this is how we have it configured:

 - IGMP is disabled

 - We need RADIUS for authentication

 - We need RRM enabled as well so the APs are assigned to their channels in a way that make sense, we had issues before using it due RF interference between or APs.

 - Severe interferences is disabled, we had a case prior to this one that showed that the APs were hopping between channels constantly and prevented RRM from running as those APs weren't on a "stable condition".

I still think it's just a graphing problem, let's see what does support say about it.

 

Regards,

Aarón

CraigS1971
Valued Contributor

Re: Losing access to network

Hi Aaron,

Radius authentication does not need accounting except if you use it for some kind of bandwidth limit etc.
I disabled radius accounting and users are still happily authenticating. ;-D

I also enabled RRM for auto channel and auto power, but then switched it off because the environment should not change all the time. You can perhaps disable it and run it manually once a week/month.

I would like to know what HP support says.

Hope you win.

Regards,

Craig

Aarón
Frequent Advisor

Re: Losing access to network

Hi Craig,

my bad, I read your prior post incorrectly, we only do authentication with a remote radius, no accounting whatsoever.

We have RRM enabled it so it runs automatically every night at 5.00AM, I'll wait for HPs response, maybe it would be better to not run it every single day, just once a week. I'll wait to see what they say.

As soon as I get any news on the case I'll post back.

 

Thanks again!

Aarón

PS: I hope I win too :D

Aarón
Frequent Advisor

Re: Losing access to network

Hi guys,

so I have some news on the case. The rrdsampler is a process that just manages the dashboard, it retrieves some info and then feeds the webs dashboard.

This process was running on a higher priority than it should so when there were many users connected it was gathering information from all the APs plus the clients statistics. When that happened the controller had no CPU left to do authentication and even if a client could connect properly the bandwidth was very poor.

The temporary fix is to kill that process as it only affects the dashboard so no harm there.

After killing it the wireless complaints seem to have decreased but I'm still troubleshooting a couple of complaints.

Just out of curiosity, how are your RRM results? I've been checking them and I some decissions do not make sense, like having three nearby APs in the same channel even though the neighbor channels have a low noise-floor.

Evan_ISS
Frequent Advisor

Re: Losing access to network

I tend to trust my RF surveys more than RRM, only use it in small sites (less than x20 APs).

NCGnet
Advisor

Re: Losing access to network

Interesting post, and exactly the same issue we are having with 4 MSM760's and 480 AP's. In top I can see that the rrdsampler process is peaking up to 90-100% cpu usage. However, one question, how do you kill the rrdsampler process? I tried from the CLI but I think I need shell acess and there seems to be a challenge, and I don't know how to work out the response.

CraigS1971
Valued Contributor

Re: Losing access to network

Hi NCGnet,

Did you try to disable RRM, LLDP and IGMPproxy?

I did not need to kill rrdsampler.

 

Aarón
Frequent Advisor

Re: Losing access to network

Hi NCGnet,

this issue is related to firmware version 6.6.2.0 with high client volume environments (in our case the problem showed up at approximately 1500 users) and it seems to be because the rrdsampler process has a higher priority than it should. If you see that the load on your manager is very high (you can see this in the top command through the CLI) you should open a ticket with support to see if they can help you, as you can't kill that process by yourself.

 

Cheers,

Aarón

NCGnet
Advisor

Re: Losing access to network

Hi and thanks for the replies,

Every weekday we have between 2000 and 2800 concurrent guest connections and this is when we see the same issues. However our controllers are at V6.6.3.0-22868. I still see the same issues as with V6.6.2.0 which we used to run.

I have opened a call with HPE regarding this. We run HP iMC with the WSM module and used to run Solarwinds Orion NPM. We recently switched off Orion and the snmpd process which was hogging a lot of cpu time dropped drastically. So I thought we had nailed it then, but the rrdsampler seems to be another culprit. We rarely ever look at the dashboard on the controllers, so to lose it would not be any great loss.

I'll try turning off LLDP and CDP first and if no luck I will log another call with HPE and see where we get.

Many thanks for the help!

Rob

NCGnet
Advisor

Re: Losing access to network

Hi,

We don't run RRM as it caused more problems than it resolved when we tried. IGMPproxy is off already too, I have just disabled LLDP and CDP to see how that affects things.

Thanks for your reply

Rob