M and MSM Series
1826415 Members
3741 Online
109692 Solutions
New Discussion

Re: Losing access to network

 
JvR_NM
Occasional Visitor

Re: Losing access to network

Hello,

 

Stil have the same problem with version 5.5.3.0. Error while removing vlan id:10 (radius) errors in the log. We use Radius on a Windows 2003 SP1 server, do other people with the same problems use Radius on a Windows 2003 server ?  

snaslund
New Member

Re: Losing access to network

We get the same kind of errors with users unable to connect and getting 169 block addresses.  This is with MSM760 teamed controllers and MSM422AP.  We have had so many issues getting feature to work and so many mysterious log messages, we have pretty much given up and just got used to rebooting controllers regularly.  We also have issues with the internal web server dying.  This is all on release 5.5.2.1 and we cannot move up to the newer releases because they don't work right with the active directory setup we have (the NetBIOS domain name does not match the AD domain name).

 

This product line has us very frustrated in general with its extreme bugginess and HP support telling us they have never seen a particular problem before when it is all over their own forums.  We have also noticed that they ask for more and more and more data on a problem and then seem to start over in troubleshooting from the beginnning.  This is not a good product line at all.

 

Steven Naslund

Senior Network Engineer

Medline Industries Inc.

Craig-Lyndes
Advisor

Re: Losing access to network

Wow!  This is still going on!

 

I posted here originally on 3/25/2010 and persued it with HP support.  One tech candidly told me that multiple standards (b,g,n) at 2.4 GHz was the problem.  I set our 33 MSM410 radios to g only and the problem went away.

 

Now I have some MSM430 radios (1:1 initiative with lots of kids with lots of notebooks) and they can not be set to G only!!! 

 

THE PROBLEM IS BACK! 

 

It would appear HP has not solved this.

 

How can we band together to get HP support to at least acknowledge that this is a problem?

 

Craig Lyndes

St Albans City School

BGraham_1
Frequent Advisor

Re: Losing access to network

We have been working with HP over th past few months. Last week they told us that they have found the underlying issue. We are now waiting for them to come back with the fix. They said it would likely be in the 5.5.3 code. I will post as soon as I hear anything.

 

Bob

ISoliman
Super Advisor

Re: Losing access to network

5.5.3.0 is released, I had the same issue and updated the firmware to that and for 2 days there are no complaints, will have to monitor the status for longer time before 100% confirming that.

Craig-Lyndes
Advisor

Re: Losing access to network

Confirmed that problem persists.

 

Software version: 5.5.3.0-01-10326

 

MSM 760 Controller

 

MSM 410 Access Point

 

Set to B,G,N experience lockups where clients see SSID and show signal strength but can not reach network or Internet.  Local servers, DHCP, DNS and Default router unreachable.  Access Point is still visible from wired ethernet network.  Responds to Reset command from MSM760.  Reset or powercycling the access point solves the problem.

 

This has happened 3 times.  I am resetting the MSM 410 to G only.  Am unable to do this on the new MSM 430s!

 

5.5.3 seems to not have solved the problem.

 

Craig Lyndes

St Albans City School

ISoliman
Super Advisor

Re: Losing access to network

Please collect the filtered and unfiltered logs from the controller and upload them to the post.

 

Is your issue affecting all type of clients or only specific laptops/devices?

 

Can you please upload the config also? and make sure under the VSC disable/uncheck the "wireless security filter" option as this will restrict the traffic to the controller and will reset/stop the connection after a while if the connection is idle.

Craig-Lyndes
Advisor

Re: Losing access to network

I can't give you the log because the time that the access point quit has already rolled off.  I have reset the access point to B,G,N and will wait for it to happen again and then will capture the log.

 

When it happens it effects all clients connected to that access point.  In many months of observing this I have noticed that it most often happens with there is a Mac (airport chip set) added to the large number of Intel, etc chip sets that are most common in our environment - we are primarily Windows with only a few Macs. 

 

I do not have the "wireless security filter" set.  Open, unencrypted, wireless network.  One SSID, no VLANs. 

 

Craig Lyndes

St Albans City School

payne324
Occasional Visitor

Re: Losing access to network

Hi we are seeing the exact same symptoms and can make a similar correlation to the Mac "airport chip set" as you mentioned. We have 40 MSM710 controllers and 8 MSM760 controllers running firmware version 5.4.1 on the majority of controllers/aps and 5.5.3 on a few and see it everywhere. We have over 400 MSM410 access points. Currently we can only run at G only (2.4Ghz) or 5Ghz N. The moment we switch to 2.4Ghz G and N on our AP's we see the symptoms listed "lockups where clients see SSID and show signal strength but can not reach network or Internet.  Local servers, DHCP, DNS and Default router unreachable.  Access Point is still visible from wired ethernet network.  Responds to Reset command from MSM760." We cannot switch all APs over to 5Ghz N as we do support a VSC for Public Guest access so teachers and students can utilize their own wifi devices in the schools. This has become a big pain point. The lockups happen on both an access controlled VSC and a non access controlled VSC. The client never becomes disassociated with its AP or the controller and the AP's uptime and Controllers are good.

 

Hopefully there will be a firmware fix from HP soon.

 

Steve Payne

Hastings and Prince Edward District School Board

Craig-Lyndes
Advisor

Re: Losing access to network

ISoliman

 

Here are the log files and the configuration file.  The access point is a MSM410.  The controller is a MSM760.  The access point is named Server Room.  You will see that at the time I took the configuration I had reset it back to 802.11G only.  The crash was about 5 minutes previous to the time the log was taken.  Look for the access point starting up.

 

Craig Lyndes

St Albans City School

 

OK, so this forum doesn't want to make uploading easy.... 

ISoliman
Super Advisor

Re: Losing access to network

Without going further after checking the logs I found the below:

 

BID check failure

 

this is caused by an issue with the internet port, please contact the support so that they can fix this issue for you, the link below is officially from HP regarding this issue

 

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c03054362

 

After this is fixed check if the issue is still showing.

ISoliman
Super Advisor

Re: Losing access to network

Also to add to this, from the latest logs I can see that "N" is still on, check below logs:

 

AutoChannelSelection: ifname=wvlan0, selecting best initial channel (1,-89dBm)
Nov 30 12:47:37 debug    kernel       SG9202C0KX 11n protection mode has been enabled due to the presence legacy access points
Nov 30 12:47:37 debug    kernel       SG9202C0KX 11n protection mode has been enabled due to the presence legacy clients

 

See the last regarding the protection mode, this means N is not disabled.

 

The last thing, from below logs (as you mentioned the issue showed approx. 5 mins before resetting the AP:

 

Nov 30 12:29:33 debug    kernel       SG9202C0KX
Nov 30 12:29:33 info    kernel       SG9202C0KX wvlan0: decreasing transmit power to 10
Nov 30 12:32:23 debug    monitord     SG9202C0KX Stopping: <mgetdate>
Nov 30 12:32:24 debug    mgetdate     SG9202C0KX SNTP time synchronization using server secure_msc
Nov 30 12:32:42 debug    openvpn      SG9202C0KX TLS: soft reset sec=0 bytes=10245766/0 pkts=60484/0

 

you can see that before the reset the AP decreased the power to 10, this might cause the clients away from the AP to loose connection and not able to connect, I will not recommend anything now since i didn't check the config yet but you can try and disable the auto power control option (after solving hte BID issues and calling the support) then see if it works fine or not.

Craig-Lyndes
Advisor

Re: Losing access to network

ISoliman,

 

The reason it is upset about DHCP on the Internet port is there is nothing connected to the Internet port.  The controller is not routing.  It is only handing out configuration to the access points and gathering statistics.  The wireless network would run without the controller as long as the access points were not power cycled.

 

The only connection to the access point is the LAN (Port 2).

 

Craig Lyndes

St Albans City School

ISoliman
Super Advisor

Re: Losing access to network

Yes I noticed that you are using the controller for management only, but this error is critical so you better fix it, call the support they will ask for specific thing and provide you with the fix, also I noticed you are using firmware version 5.5.3.0, are you entitled for it ? if not then I would recommend using any other version for 2 reasons, 1st because you are not entitled for it (if), 2nd is because this branch is not that stable yet as it was made to support the new 460 466 & 430 APs, also check the config and make sure to disable the auto power control thing, one more thing is to do a site survey as the controller might be decreasing the power of the AP to the minimum to avoid an interference.
Craig-Lyndes
Advisor

Re: Losing access to network

ISoliman,

 

You are correct.  When the access point was rebooted it was still in b,g,n mode.  I reconfigured it after the reboot, so what you see in the logs would reflect the b,g,n configuration.  After I captured the logs I reconfigured the access point, then I exported the configuration.  Sorry to muddy things up.

 

The vairable transmit power is relatively new.  I have used it both set at 100% and at auto power.  My access points are close together, so I hoped that by using the auto power I could decrease the RX FCS errors, which are very high.  This is a separate problem from the clients getting kicked off. (maybe?)

 

The event that I tried to capture effected every device connected to the access point.  My notebook was within 5' of the access point with nothing but air between.  My notebook was unable to reach DNS, Default Gateway, DHCP server or other servers.  The access point was however visible to the controller and showed no issues (which is why the logs don't seem to be of much use, at least for this problem).

 

Please keep looking though, there has to be a reason.  And if you see anything at all bring it to my attention.  And if my "rationalizations" seem flawed please point the flaws out to me.  I have spent a lot of time trying to resolve this problem, and it seems to be unsolvable.  Having better minds than mine look at it is very appreciated.

 

I'm very suspicious of the routing inside the access point.  This is because the radio side seems to be functioning and the ethernet side also functions.  This doesn't support the solution, only use 802.11G, what does that have to do with routing?

 

Craig Lyndes

St Albans City School

ISoliman
Super Advisor

Re: Losing access to network

What i know is that the 5.5.3.0 branch had a big issue with clines disconnecting that's why I recommended using another branch like 5.4.x, the only thing that forces you to use the 5.5.3.0 is if you have the new APs, other than that i won't recommend using it, I received a new version from HP that solved the disconnection issue (might be the same issue you are facing) but to be honest it might be with the interference you have since you mentioned that the APs are close to each other and yes this will cause the clients to disconnect also, so 1st of all use 5.4.x if you don't have any of the new APs 460 or 430, then decrease the power automatically in all the APs to avoid the interference (better to do a site survey), also there is an option under the Radio configuration for the APs where you set the distance between the APs, to know more about this option please check the management and configuration guide.
Craig-Lyndes
Advisor

Re: Losing access to network

Again, Thank you ISoliman for your time.

 

I switched to the 5.5.3.0 because I added 3 MSM430 radios (replaced MSM410's in classrooms that were having an issue) to accomodate our 1:1 initiative where every student is given a laptop.  60+ students connecting simultaneously to a MSM410 locked at 802.11G was not pretty.  That is also why my access points are relatively close together.  I have a situation where I can have a flash mob of up to 60 devices at any time in any classroom (we have teams of 3 teachers with roughly 20 students each, so when they get the team together for a group activity is can be quit a challenge for the wireless system). 

 

I have the radios set for Distance between APs: Small and I have the auto power turned on as you noted.

 

I'm still suspicious that the Rx FCS errors and the disconection from the network are 2 different problems.  One point I made earlier is that having a Mac with an Apple Airpoirt wifi chip set come into range of the access point seems to be able to trigger the loss of network - sometimes.  This has happened enough so that I believe it is more than a coincidence.  This seems to go along with the solution of locking the access points to 802.11G, it probably has to do with guard frames and the Mac jamming.  I'm also suspicious that the problem only occurrs when there is a lot of network traffic.  Again probably something to do with timing.  I'm not a RF engineer though, so some of this stuff isn't very clear to me.

 

Thanks again for looking at this!

 

Craig Lyndes

St Albans City School

ISoliman
Super Advisor

Re: Losing access to network

Ok then since you have 430 APs then ask HP for the latest maintenance release for the 5.5.3.0 (not the one you have there is a new one).

Also for 5.5.x it has a new feature called "Band Steering", you can use that to allow 1 430 AP to support the maximum number of clients if the clients are having dual ban wireless cards, this will allow the controller to shift the clients that can work in 5 GHz to connect to Radio 1 so that both Radios can be utilized, in this manner you can support between 50-60 users per AP.

For the Airport wifi chip you can simply reproduce this by getting a mac machine and trying to access the wireless on specific area and see if it will show the same result, if you are able to replicate it 100% everytime you use the mac machine then simply capture the wireless traffic from the AP side (from the controller go to groups then click on the AP then on tools and choose wireless trace/capture and start it, then replicate the issue and collect the logs and the capture and log a case with HP stating all the steps you have done so that they can reproduce the issue from their side also this is the only way to get this fixed if it is a bug :).

FYI, I have a setup with 460 APs and clients where disconnect a lot same as what you have mentioned, I used the latest version provided by HP and after that only everything worked fine for me :), I remember that the issue was showing even when no Mac machines are nearby, might be that some users where using the iphones not sure about that but anyway it was solved and i'm glad :) hope that latest version will work with you also
tschaps
Valued Contributor

Re: Losing access to network

I was experiencing a similar situation with my MSM430's in autonomous mode before we got the controller working. It would be interesting to find out if the little trick I discovered works in your situation: when you have a laptop that has lost connectivity (but still has IP and a wifi signal) ping the gateway. If like mine, it should time out/fail. Note its IP address. Then ping this laptop from ANOTHER computer on the same subnet, it should work, and then see if all of a sudden your connectivity is restored. 

Note, I don't tout this as a solution, but if it is the same behavior, it may help isolate the problem. 

BGraham_1
Frequent Advisor

Re: Losing access to network

I believe HP finally has the problem fixed. We have been giving them information and working closely with the support rep over the past few months. THey sent us an engineering release they sent us a couple of weeks ago and I installed it. Since then, we have not been getting any 169.254.x.x addresses (we were only getting these on Radio 2) and we have not been getting any IP addresses assigned to the clients from the APs untagged (management) VLAN on the MSM460s.

They did not say when they are going to release it. I asked if I could share it, but they said no.

You might want to open up a case with support and ask for version 5.5.3.0-fa2-10564.

 

I hope this helps.

Bob

temuri426
Frequent Advisor

Re: Losing access to network

hello, 

 

 

Dec 19 21:41:47 warning maestro_mas SG1383N2R6 Cannot take proper handling action on event from the system message bus, skipping (event='evLocalFirmwareRetrieval')

Dec 19 21:43:33 warning maestro_mas SG1383N2R6 Last message repeated 18 times

Dec 21 16:26:23 err kernel SG1383N2R6 e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang

Dec 21 16:26:40 err kernel SG1383N2R6 Last message repeated 4 times

Dec 21 16:26:40 warning statspoller SG1383N2R6 [44:1E:A1:C2:80:E2]: Transfer of SC statistics information failed: (28/0)

Dec 21 16:54:09 warning maestro_mas SG1383N2R6 Cannot take proper handling action on event from the system message bus, skipping (event='evLocalConfigChanged')

Dec 21 16:57:53 warning mapconf SG1383N2R6 SOAP configuration process <29215>: terminating upon maestro request.

Dec 21 16:59:35 warning maestro_mas SG1383N2R6 Cannot take proper handling action on event from the SC management message bus, skipping (event='evLocalTunnelDown')

Dec 21 17:00:54 err maestro_sla SG1383N2QV Failed to identify discovery port.

Dec 21 17:04:24 warning webs SG1383N2R6 Logout requested by login takeover

Dec 21 17:12:23 warning maestro_mas SG1383N2R6 Cannot take proper handling action on event from the system message bus, skipping (event='evLocalConfigChanged')

Dec 21 17:14:49 err mapconf SG1383N2R6 SOAP FAULT: SOAP-ENV:Client "Invalid configuration"

Dec 21 17:14:49 err mapconf SG1383N2R6 Detail: <error><errorcode>1009</errorcode><errorinfo>Configuration commit timeout.</errorinfo></error>

Dec 21 17:14:49 err mapconf SG1383N2R6 [98:4B:E1:E8:23:9E] CONFIG_full_vsc_security() failed, VSC <Agruni/2/2>.

Dec 21 17:14:58 warning maestro_mas SG1383N2R6 Cannot take proper handling action on event from the SC management message bus, skipping (event='evLocalTunnelDown')

Dec 21 17:16:30 warning monitord SG1383N2QV Watchdog Update after 32 seconds uptime=156855

Dec 21 17:16:30 err maestro_sla SG1383N2QV Failed to identify discovery port.

 

 

many thanks

 

mikeguk
New Member

Re: Losing access to network

We have been having the same issue with our 760.  

 

In case others are interested, 5.7.0 is now out and lists the following as an explicit fix

https://h10145.www1.hp.com/Downloads/SoftwareReleases.aspx?ProductNumber=J9421A&lang=en,en&cc=us,us&prodSeriesId=3963981

 

"47397 Devices become unreachable on the management LAN after 20 to 30 minutes."

 

Installed this morning and now waiting to see if the problem is finally resolved.

temuri426
Frequent Advisor

Re: Losing access to network

hi all, 

 after updating to 5.7.0 , it looks like the problem is fixed . 

 

p.s when we've upgraded firmware problem repeated once again after two dayes, but i restarted all devices and now everything works good 

    

                   12 days uptime :))  cool

goAtsy
Occasional Contributor

Re: Losing access to network

mikeguk that fix is in the release notes for Firmware 5.4.0. HP must have posted the wrong document. This page links to the correct release notes for 5.7...

 

https://h10145.www1.hp.com/downloads/SoftwareReleases.aspx?ProductNumber=J9325A

 

 

nogirs
Occasional Contributor

Re: Losing access to network

Hi eveybody, I read through this interesting thread and as we are having the same trouble. I've gotten confused as to the the many solutions mentioned.

 

Is 5.7 the answer or has 5.3 provided the best possiblity stable solution?

 

Also, we're on 5.4.2 - can I jump to 5.7 or should I do the ladder (5.3, 5.5, 5.7)?

 

Thanks