- Community Home
- >
- Networking
- >
- Switching and Routing
- >
- Aruba & ProVision-based
- >
- How to recover from hold time expired for BGP peer...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2021 08:16 AM
03-26-2021 08:16 AM
How to recover from hold time expired for BGP peer link between Aruba switch and Kubernetes speaker?
When the "hold time expired" occurs in the peer link, the switch BGP state machine is back in the IDLE state. It will stuck in the IDLE until the user do "clear bgp neighbor_IP_address". Even I have restart the kubernetes speaker pod, the peer link between the kubernetes speeaker and the Aruba 8320 is still NOT estabished.
the speaker log showed that it received "connection reset by peer" about every 90 seconds and the switch BGP debug log showed detecting "A connection's FSM state has deteriorated" about every 15 seconds. I wonder what is the correct way to the "hold time expired". Should the switch BGP agent restart FSM after couple "connection reset"? should it be considered as bug in Speaker or switch? or it is working as design? Thanks
The following is log from Kubernetes speaker log:
{"caller":"bgp.go:58","error":"read OPEN from \"10.252.0.2:179\": read tcp 10.252.0.17:51963-\u003e10.252.0.2:179: read: connection reset by peer","localASN":65533,"msg":"failed to connect to peer","op":"connect","peer":"10.252.0.2:179","peerASN":65533,"ts":"2021-03-24T15:32:08.897404559Z"}
{"caller":"bgp.go:58","error":"read OPEN from \"10.252.0.2:179\": read tcp 10.252.0.17:43437-\u003e10.252.0.2:179: read: connection reset by peer","localASN":65533,"msg":"failed to connect to peer","op":"connect","peer":"10.252.0.2:179","peerASN":65533,"ts":"2021-03-24T15:34:08.89886829Z"}
The following is from switch BGP debug log:
2021-03-24:15:01:43.038654|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|FSM Input = 0X08VRF Name = default.
2021-03-24:15:01:43.038630|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|New state = 0X00
2021-03-24:15:01:43.038606|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Previous state = 0X01
2021-03-24:15:01:43.038582|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Incoming? = False
2021-03-24:15:01:43.038559|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Scope ID = 0
2021-03-24:15:01:43.038535|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote port = 0
2021-03-24:15:01:43.038512|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote address = 10.252.0.17
2021-03-24:15:01:43.038487|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local port = 0
2021-03-24:15:01:43.038463|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local address = (none)
2021-03-24:15:01:43.038437|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Entity index = 268763136
2021-03-24:15:01:43.038407|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|A connection's FSM state has deteriorated.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-28-2021 12:54 AM
03-28-2021 12:54 AM
Re: How to recover from hold time expired for BGP peer link between Aruba switch and Kubernetes spea
Hello,
Seems device is not getting response from neighbors thats why it is reaching to hold down time expire.
I have some queries:
1. Is BGP neighborship was established earlier?
2. If yes, any recent changes made?
3. What is product number of the device 'JXXXXX' and which one is the neighbor device?
4. Please share 'show log' or 'display log' output from the device?
5. Have you tried to swap the physical cables?
6. What is the running software version?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-29-2021 06:29 AM
03-29-2021 06:29 AM
Re: How to recover from hold time expired for BGP peer link between Aruba switch and Kubernetes spea
Hi akg7 HPE pro
Thank you very much for the respond.
That is correct "the device is not geting response from neighbors that is why it is reaching to hold time expire".
My question is how to recover from "the hold time expire". What I saw the symptom is that I left the cluster over night, and saw some of my BGP links are in "idle" state in the morning. The switch BGP debug log showed "hold time expired" error. I think there is a problem in the cluster, which we are debugging, causing the Kubernetes Speaker pod not responding/spending "keepalive" for 90 seconds, Thus causing the "hold time expired". However, the cluster is healing itself, but the BGP peer link is not. In the moring, I did not need to touch anything in the Cluster side, just do "clear bgp xxx.xxx.xxx.xxx (neighbor ip address)" the link is re-established.
My answers of your questions are the following:
1. Is BGP neighborship was established earlier?
Yes. the BGP neighborship was established earlier.
2. If yes, any recent changes made?
I think there is a problem in the cluster causing the kubernetes speaker pod hung for while over night.
3. What is product number of the device 'JXXXXX' and which one is the neighbor device?
Service OS Version : TL.01.05.0003
BIOS Version : TL-01-0013
kibo-mgmt-sw0# show device
Invalid input: devic
kibo-mgmt-sw0# show system
Hostname : kibo-mgmt-sw0
System Description : TL.10.05.0001
System Contact :
System Location :
Vendor : Aruba
Product Name : JL581A 8320
Chassis Serial Nbr : TW04KCW00G
Base MAC Address : 20677c-539fc0
ArubaOS-CX Version : TL.10.05.0001
In the log:
10.252.0.17 is neighbor device that is a Kubernetes speak pod IP address.
10.252.0.2 is the Aruba switch IP address.
4. Please share 'show log' or 'display log' output from the device?
2021-03-24:15:01:43.038654|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|FSM Input = 0X08VRF Name = default.
2021-03-24:15:01:43.038630|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|New state = 0X00
2021-03-24:15:01:43.038606|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Previous state = 0X01
2021-03-24:15:01:43.038582|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Incoming? = False
2021-03-24:15:01:43.038559|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Scope ID = 0
2021-03-24:15:01:43.038535|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote port = 0
2021-03-24:15:01:43.038512|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote address = 10.252.0.17
2021-03-24:15:01:43.038487|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local port = 0
2021-03-24:15:01:43.038463|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local address = (none)
2021-03-24:15:01:43.038437|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Entity index = 268763136
2021-03-24:15:01:43.038407|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|A connection's FSM state has deteriorated.
2021-03-24:15:01:28.036738|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|FSM Input = 0X08VRF Name = default.
2021-03-24:15:01:28.036714|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|New state = 0X00
2021-03-24:15:01:28.036690|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Previous state = 0X01
2021-03-24:15:01:28.036665|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Incoming? = False
2021-03-24:15:01:28.036641|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Scope ID = 0
2021-03-24:15:01:28.036617|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote port = 0
2021-03-24:15:01:28.036594|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote address = 10.252.0.17
2021-03-24:15:01:28.036569|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local port = 0
2021-03-24:15:01:28.036545|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local address = (none)
2021-03-24:15:01:28.036520|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Entity index = 268763136
2021-03-24:15:01:28.036483|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|A connection's FSM state has deteriorated.
2021-03-24:15:01:13.037499|hpe-routing|LOG_DEBUG|AMM|-|BGP|BGP_EVENT|send_bgpBackwardTransition Trap Remote Address= 10.252.0.17 LastError = , status 1, vrf name t
2021-03-24:15:01:13.037468|hpe-routing|LOG_DEBUG|AMM|-|BGP|BGP_EVENT|Peer status - 1, vrfId - 0
2021-03-24:15:01:13.035246|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Entity index: 268763136VRF Name = default.
2021-03-24:15:01:13.035222|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Passive:? False
2021-03-24:15:01:13.035199|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Neg hold time: 90
2021-03-24:15:01:13.035176|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Scope ID: 0
2021-03-24:15:01:13.035153|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote port: 0
2021-03-24:15:01:13.035129|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote address: 10.252.0.17
2021-03-24:15:01:13.035105|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local port: 0
2021-03-24:15:01:13.035081|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local address: (none)
2021-03-24:15:01:13.035056|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|A connection has left Established state.
2021-03-24:15:01:13.034713|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|FSM Input = 0X03VRF Name = default.
2021-03-24:15:01:13.034690|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|New state = 0X00
2021-03-24:15:01:13.034666|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Previous state = 0X06
2021-03-24:15:01:13.034642|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Incoming? = False
2021-03-24:15:01:13.034618|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Scope ID = 0
2021-03-24:15:01:13.034595|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote port = 0
2021-03-24:15:01:13.034572|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Remote address = 10.252.0.17
2021-03-24:15:01:13.034548|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local port = 0
2021-03-24:15:01:13.034525|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Local address = (none)
2021-03-24:15:01:13.034501|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|Entity index = 268763136
2021-03-24:15:01:13.034476|hpe-routing|LOG_INFO|AMM|-|BGP|BGP|A connection's FSM state has deteriorated.
2021-03-24:15:01:13.034121|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Error subcode = Unspecific (0)VRF Name = default.
2021-03-24:15:01:13.034097|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Error code = Hold Timer Expired (4)
2021-03-24:15:01:13.034074|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Remote BGP ID = 10.252.0.17
2021-03-24:15:01:13.034051|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Remote AS number = 65533
2021-03-24:15:01:13.034026|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Scope ID = 0
2021-03-24:15:01:13.033996|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Remote port = 0
2021-03-24:15:01:13.033973|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Remote address = 10.252.0.17
2021-03-24:15:01:13.033950|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Local port = 0
2021-03-24:15:01:13.033927|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|Local address = (none)
2021-03-24:15:01:13.033903|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|NM entity index = 268763136
2021-03-24:15:01:13.033878|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|problem.
2021-03-24:15:01:13.033848|hpe-routing|LOG_ERR|AMM|-|BGP|BGP|A NOTIFICATION message is being sent to a neighbor due to an unexpected
5. Have you tried to swap the physical cables?
I did not swap the physical cables because I did not think this is a cable problem. I can recover by "clear BGP xxx.xxx.xxx.xx" without touch any hardware.
6. What is the running software version?
kibo-mgmt-sw0# show version
-----------------------------------------------------------------------------
ArubaOS-CX
(c) Copyright 2017-2020 Hewlett Packard Enterprise Development LP
-----------------------------------------------------------------------------
Version : TL.10.05.0001
Build Date : 2020-07-09 18:09:54 PDT
Build ID : ArubaOS-CX:TL.10.05.0001:53cb98af4936:202007092355
Build SHA : 53cb98af4936ce4b2e61fb4bb9dde7e7925c5e29
Active Image : secondary
Service OS Version : TL.01.05.0003
BIOS Version : TL-01-0013
Thank you very much for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2021 02:54 PM
04-02-2021 02:54 PM
Re: How to recover from hold time expired for BGP peer link between Aruba switch and Kubernetes spea
Hello @RichardYu,
Apologies for delayed response.
I am seeing here Notification message was sent and FSM was deteriorated.
Keepalive ensures that BGP neighbors are still alive and the default interval is 60sec's. If a device does not receive a keepalive from a peer for hold-time period then the device declares that peer is dead. Default hold-time is 180 sec's.
I don't think so recover from hold time will help you here. This issue needs more investigation from support.
I request you to log a case on HPE Support Center portal for further resolution using the link: https://support.hpe.com/hpesc/public/home/
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-05-2021 01:20 PM
04-05-2021 01:20 PM
Re: How to recover from hold time expired for BGP peer link between Aruba switch and Kubernetes spea
Thank you, akg7
i will open a case for support.
Thanks again.
Richard.