Software Defined Networking
1827474 Members
2033 Online
109965 Solutions
New Discussion

controller lose connection to switch

 
SOLVED
Go to solution
sllow
Occasional Advisor

controller lose connection to switch

Hi

 

I got this strange problem in which controller will lose connection to switch ( 2920 -24g) .

It is a test environment , there is minimal load .

 

I am using HP protector 1.3.13.458 , HP SDN controller 2.5.15.1175 and our custom build app

 

Anyone can provide a pointer on how to troubleshoot the problem ?

 

Thanks

 

[2015-07-27 01:36:59.077] ERROR http-bio-8443-exec-508       hp.keystone                                                       Failed to validate token 2250accef6414d17b97de9f6e41be635 due to com.hp.api.auth.AuthenticationException: Validation error code 404
[2015-07-27 01:36:59.092] WARN  of-io-74-thread-6            hp.of.ctl                                                         Datapath REVOKED: 10:00:40:a8:f0:ce:86:40, neg=V_1_3, ip=192.168.10.101
[2015-07-27 01:36:59.092] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I SwitchListener event DATAPATH_REVOKED
[2015-07-27 01:36:59.092] ERROR of-io-74-thread-6            hp.of.ctl                                                         Intercepted unexpected exception: java.lang.IllegalStateException: Main connection already established for dpid 10:00:40:a8:f0:ce:86:40
  com.hp.of.ctl.impl.OpenflowController.panic(OpenflowController.java:1151)
  com.hp.of.ctl.impl.OpenflowController.newMainConnectionReady(OpenflowController.java:859)
  com.hp.of.ctl.impl.OpenflowController.handshakeComplete(OpenflowController.java:848)
  com.hp.of.ctl.impl.OpenflowMessageBuffer.handshakeComplete(OpenflowMessageBuffer.java:234)
  com.hp.of.ctl.impl.OpenflowConnection.inBoundFeaturesReply(OpenflowConnection.java:256)
  ...
[2015-07-27 01:37:00.080] ERROR http-bio-8443-exec-527       hp.keystone                                                       Failed to validate token 2250accef6414d17b97de9f6e41be635 due to com.hp.api.auth.AuthenticationException: Validation error code 404

[2015-07-27 01:37:08.490] WARN  of-idle-timer                hp.of.ctl                                                         Closing unresponsive connection from 192.168.10.101
[2015-07-27 01:37:08.492] INFO  of-idle-timer                hp.of.ctl                                                         Datapath removed: 10:00:40:a8:f0:ce:86:40, neg=V_1_3, ip=192.168.10.101
[2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Device Owner event type: OWNERSHIP_LOST. Datapath ID: 10:00:40:a8:f0:ce:86:40
[2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 New owner: null
[2015-07-27 01:37:08.492] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I SwitchListener event DATAPATH_DISCONNECTED
[2015-07-27 01:37:08.492] INFO  DpQPool-9-thread-6           com.mimos.nc.listeners.SwitchListener                            DE0005I Remove switch 10:00:40:a8:f0:ce:86:40 Clear client lists
[2015-07-27 01:37:08.493] INFO  DpQPool-9-thread-6           com.mimos.nc.impl.NetworkAccessControlSwitchManager              DE0005I --[NetworkAccessControlSwitchManager removeSwitch] Removed switch 10:00:40:a8:f0:ce:86:40
[2015-07-27 01:37:08.493] INFO  MsgDispatcher-5932-thread-1  com.hp.magellan.devicethrottling.ThrottlerServiceImpl            DE0005I Stopping device counter for dpid 10:00:40:a8:f0:ce:86:40.
[2015-07-27 01:37:08.492] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Previous owner: null
[2015-07-27 01:37:08.493] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Is the controller master for the dpid 10:00:40:a8:f0:ce:86:40? true
[2015-07-27 01:37:08.493] INFO  devown-lh-18-thread-4        com.hp.magellan.ha.MagellanSystem                                 Stopping device actor for dpid 10:00:40:a8:f0:ce:86:40.
[2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-24 com.hp.magellan.ha.actor.ControllerActor                          Received device message: {dpid: 10:00:40:a8:f0:ce:86:40, request: STOP}
[2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-24 com.hp.magellan.ha.actor.ControllerActor                          Stoping device actor for dpid: 10:00:40:a8:f0:ce:86:40
[2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-23 com.hp.magellan.bandwidthmonitor.actor.BandwidthControllerActor   Stoping device bandwidth actor for dpid: 10:00:40:a8:f0:ce:86:40
[2015-07-27 01:37:08.494] INFO  .actor.default-dispatcher-23 System.out                                                        [ERROR] [07/27/2015 01:37:08.494] [mysystem-akka.actor.default-dispatcher-23] [akka://mysystem/user/bandwidthcontrolleractor/bandwidthactor10:00:40:a8:f0:ce:86:40] Kill (akka.actor.ActorKilledException)
[2015-07-27 01:37:08.495] INFO  .actor.default-dispatcher-23 System.out                                                        [ERROR] [07/27/2015 01:37:08.494] [mysystem-akka.actor.default-dispatcher-24] [akka://mysystem/user/controlleractor/deviceactor10:00:40:a8:f0:ce:86:40] Kill (akka.actor.ActorKilledException)
[2015-07-27 01:37:08.550] INFO  sdn-topo-12-thread-1         hp.sdn.net.topo                                                  DE0005I Received new topology: DefaultTopology{supplierId=com.hp.sdn.topo.compute, activeAt=2015-07-27T05:37:08.550Z, deviceCount=0, clusterCount=0, linkCount=0, data=DefaultTopologyData{ts=418637434599668, computeTime=28us}} due to: DEVICE_AVAILABILITY_CHANGED:1

3 REPLIES 3
Carlos
Frequent Advisor

Re: controller lose connection to switch

sllow,

 

I am not sure if you checked this article but it seems they are related:

 

http://h30499.www3.hp.com/t5/SDN-Discussions/Keystone-Validation-Error-Code-404/td-p/6646356#.VbeDafnOd1A

 

If you still have issues then let us know and we will try to help.

 

Best Regards,

 

Carlos

CoE SDN Team

sllow
Occasional Advisor
Solution

Re: controller lose connection to switch

Thanks

For my case , i realize  this

[2015-07-28 23:10:28.008] ERROR http-bio-8443-exec-1005      hp.keystone                                                       Failed to validate token 0937a63c15484b1abf2152f3cabe1ee1 due to com.hp.api.auth.AuthenticationException: Validation error code 404

Is due to the fact when you login to say https://172.16.4.8:8443/sdn/ui/ , but did not close the page and let it idle there ,  you will get the above error once the page show "expired", all you need to do is to close the page in your web browser and the error will go away .


Anyway my problem is more related to

[2015-07-27 01:36:59.092] ERROR of-io-74-thread-6            hp.of.ctl                                                         Intercepted unexpected exception: java.lang.IllegalStateException: Main connection already established for dpid 10:00:40:a8:f0:ce:86:40
  com.hp.of.ctl.impl.OpenflowController.panic(OpenflowController.java:1151)
  com.hp.of.ctl.impl.OpenflowController.newMainConnectionReady(OpenflowController.java:859)
  com.hp.of.ctl.impl.OpenflowController.handshakeComplete(OpenflowController.java:848)
  com.hp.of.ctl.impl.OpenflowMessageBuffer.handshakeComplete(OpenflowMessageBuffer.java:234)
  com.hp.of.ctl.impl.OpenflowConnection.inBoundFeaturesReply(OpenflowConnection.java:256)
 

 

After some investigation , it seems to related to a random dead lock that occurs in our application event processing thread .

Whenener that occurs , it block the event processing thread and trigger the above problem , I suspect  it will cause  controller to get stuck and thus panic , once it panic , it recreates the connection to the switch .

 

Once the dead lock issue is solved , the problem goes away .

 

Thanks for your help !!

 

Carlos
Frequent Advisor

Re: controller lose connection to switch

Sllow,

 

Could you provide us with more information on what application is causing the controller to panic?

 

Best Regards,

 

Carlos