Software Defined Networking
1753798 Members
7097 Online
108805 Solutions
New Discussion

Re: HP 3800 switches disconnect into Backoff status after controller reboot

 
EightBitRob
Occasional Advisor

HP 3800 switches disconnect into Backoff status after controller reboot

So, now that I have an updated deployment of the HP VAN SDN Controller with a couple of switches in place, I'm having some consistent usability problems that I'm wondering if anyone else has experienced.  My deployment goes off without a hitch, I create an aggregate instance, I define my controller, and everything is as happy as can be...

 

Until I reboot the controller.

 

At which point both switches fall into Backoff status (an expected behavior, if I understand all of this correctly).  The problem is, however, that they remain there indefinitely.  I can disable and reenable OF globally or at an instance level, I can remove and re-add the OF instance entirely, I can change the backoff interval (60 seconds by default) to 1 second, but nothing I do will re-engage those switches into the OF topology.  They stay in Backoff status forever.

 

The only solution I have to date that allows me to cleanly demo is to keep a snapshot at the moment that deployment is complete and at its cleanest state...  And then revert to that snapshot if the controller loses power or reboots, at which point the switches re-engage immediately and everything is normal.  Obviously not optimal.

 

Any thoughts?

6 REPLIES 6
ScottReeve
Advisor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

I've never had any issues with switches re-connecting smoothly after a controller reboot.

 

Are both switches that are connected to the controller Procurve?

Have you tried with any other switches?

 

Do you have a wireshark trace that shows the full power cycle of the controller?

(a pc into a hub that has the switches and controller would be optimal)

 

You say that removing/inserting the OF instances gets it going again?

When you say "reboot the controller" do you mean restart the controller (service sdnc restart) or restart the entire server?

 

What happens in each case:

1) only restarting the service

2) restarting the whole server

 

Also: can you elaborate on "snapshot".  Is this a snapshot like in Vmware, etc. ?  I.e. are you running the controller in a VM? (not that it should matter one bit - we all do)

 

Regards,

 

Scott

 

 

 

 

 

EightBitRob
Occasional Advisor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

****UPDATE!!!  PLEASE DISREGARD THE BELOW POST AND REFER TO THE FOLLOWUP BELOW!!  ISSUE IS STILL UNRESOLVED****

 

Thanks for the reply.  I've been digging in a little deeper and seem to have found the culprit, but I'll start by answering your questions...  Interestingly enough, the problem was present with the restarting of both the SDNC service as well as the server in its entirety.  Also, removing the OF instance and even the entire OF config from the switch side did NOT resolve the issue.  The only thing that did was reverting to snapshot, which was indeed a VMware snapshot.

 

I concluded that the issue had something to do with some sort of inconsistency that happened somewhere in the server when it rebooted or restarted the process.  Strangely enough, however, it never occured to me to check the firewall because Ubuntu, by its nature, does NOT apply its own firewall rules in a clean installation, and I never ran into any problems with the web interface, SSH, or any other connectivity.

 

Turns out that the new SDN package writes some firewall rules for Cassandra into an iptables script and applies them.  It also turns out that, for some reason, those rules block OF communication upon connection loss.  Go figure.  I haven't dug too deeply into netstat yet to figure out if there's some sort of outgoing port randomization that's happening with each connection or what, but I find it incredibly strange that the communication is perfect on initial installation, and then breaks the second the connection is terminated unless the firewall rules are purged.

 

Either way, the crisis is averted and I can go about my day.  If anyone has any thoughts on what might be causing this, I'd like to clean up the firewall scripts a little bit.  I'm not going to mark a solution yet because, as my old IT mentor used to say, disabling security features is NOT a solution.  :)

EightBitRob
Occasional Advisor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

Turns out that the above post still remains inexplicably unresolved.  I was able to get the switched to re-engage with the controller by flushing the IPtables rules, but that was either a one-time thing or a fluke in timing, because that has not been successful since.

 

To further address the questions, both switches are HP Procurve 3800s with the most recent firmware (KA.15.15.0008), and no amount of OF reconfiguration on the switch side is able to resolve the Backoff status.  The issue appears when the server is rebooted as well as when the SDNC application is rebooted.

sdnindia
Trusted Contributor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

Hello EightBitRob,

 

Doing a follow up to see if your problem is solved or you still face the issue.

 If you arer still facing some problem please let us know about the problem.

 

Thanks,

HP SDN Team

Gerhard Roets
Esteemed Contributor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

Hi EightBitRob

 

Would you mind posting the switch configuration please.

 

Thanks in advance

Gerhard

EightBitRob
Occasional Advisor

Re: HP 3800 switches disconnect into Backoff status after controller reboot

Unfortunately, this issue is not resolved.

 

Switch config listed below.  I've simplified it from it's original deployment in an attempt to pinpoint the problem, but there's no solution as of yet:

 

Running configuration:

; J9575A Configuration Editor; Created on release #KA.15.15.0008
; Ver #06:08.19.ff.ff.3f.ef:c7

hostname "d-sdn-sw-1"
module 1 type j9575x
timesync sntp
sntp unicast
sntp server priority 1 172.17.215.240
time timezone -8
ip default-gateway 172.17.215.254
snmp-server community "public" unrestricted
openflow
controller-id 1 ip 172.17.215.240 controller-interface vlan 215
instance aggregate
controller-id 1
enable
exit
enable
exit
oobm
ip address dhcp-bootp
exit
vlan 1
name "DEFAULT_VLAN"
no untagged 1-26
no ip address
exit
vlan 215
name "management_traffic"
tagged 24
ip address 172.17.215.253 255.255.255.0
exit
vlan 216
name "sdn_traffic"
tagged 1-26
no ip address
exit
spanning-tree
password manager