- Community Home
- >
- Networking
- >
- Switching and Routing
- >
- Comware Based
- >
- Help finding cause of core switch crash
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2019 11:58 AM
03-07-2019 11:58 AM
Recently all 4 of our core switches model H3C S5820 crashed without warning, affecting network connectivity for 500 users. These switches are in an IRF stack. When they came up, they were giving a message "invalid version". We discovered when they rebooted, they tried to update the firmware, which had been added to the switches 3 years ago. The reason for the invalid version message is because we had to run a
"brand" command was to change the brand from h3c to hp.
This downtime was 2 1/2 hours long. We resolved it with HPE switch support's assistance by setting the firmware to the previous version, as well as the backup firmware.
I had a case open with HPE switch support, but they could not find a root cause in the switch log files. IMC support also looked at the issue and said the following:
"From the events I notice stack port going down and causing the switch reboot. Generally this occurs due to software exception and hence firmware upgrade was recommended to avoid the same in future"
and "Unfortunately the old logs of syslog will be removed if the database does not have enough space. I have reviewed the logfiles and there are 2 scenarios which are occurring.
1. Switch stack reboot
2. Switch stuck at bootrom menu due to invalid image file".
My manager would still like to know a root cause. Does anyone from the community have any idea of something else to check? With IRF we thought we would be protected from all 4 switches going down at the same time.
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2019 09:44 AM - last edited on 06-29-2021 04:53 AM by Ramya_Heera
03-12-2019 09:44 AM - last edited on 06-29-2021 04:53 AM by Ramya_Heera
Re: Help finding cause of core switch crash
Hi,
Could you please provide below details:
- What was the Software version running at the time of crash?
- Please upload #display diagnostic-information (latest as well as if captured right after the issue recovery)
- Confirm if you are able to see log file and diag file in the flash [run this command to check >dir]
It may not be possible to find RCA without relevant logs available. We can check if there was any known issue or a bug with the software version running at the time of crash. Post your update on the details, will check and get back to you if the crash is due to any software bug or not.
I am an HPE Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2019 07:22 AM
03-14-2019 07:22 AM
Re: Help finding cause of core switch crash
1. H3C Comware Platform Software, Software Version 5.20, Release 1807P022
2. Already have uploaded logs to cases, do not want to upload the logs to this location, as they contain config of core switches.
3. I can see logfile.log and default.diag files in flash. I do not see any other diag file, and the default.diag file has not been updated since 2012.
Could you answer a question about how important it is to configure MAD for IRF? I have been doing some reading on this, and we do not currently have it configured. I think that we should probably implement BFD MAD, but I do not want it to be a disruptive process. We are also planning on replacing our switches within the next year, so I'm not sure that it is worth the trouble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-17-2019 03:05 AM - last edited on 06-17-2021 05:28 AM by Parvez_Admin
03-17-2019 03:05 AM - last edited on 06-17-2021 05:28 AM by Parvez_Admin
SolutionHi,
Version 1807P022 - Is very old software version and it is below the supported level
5800-5820X_5.20.R1810P16 - Latest version available on portal, please download the release notes from the below link. As per release notes, there are several bugs that are fixed related to switch reboot or crash.
https://h20628.www2.hp.com/km-ext/kmcsdirect/emr_na-a00061393en_us-1.pdf
How important is it to configure MAD for IRF?
- When you have IRF in place, it is imported to configure MAD to avoid split-brain scenario
- Split-brain: Members of IRF, exchange keep-alives on IRF-ports. If for any reason members miss these keep-alives, then they consider other member is down. Suppose you have an IRF stack of 2 switches, member 1 is primary and member 2 is secondary. If keep-alives are missed, then member 2 thinks member 1 (primary) is down and it declares itself as primary. At this stage, in your network you have two switches with same IP-address and claiming to be primary this is called split-brain scenario.
- To avaoid split-brain scenario, MAD should be configured
- MAD can be configured in different variants, such as BFD MAD, ARP MAD, LACP MAD
- Please read the documents to choose the suitable method
- For BFD MAD, you may configure online and then connect the cable without any downtime
Recommendation:
- I suggest you to plan for a maintenance window for about 1 hour
- Upgrade the switch software version to latest as per your organisation policy such as Nth version or N-1th version (read release notes prior to version selection)
- During down time, configure BFD MAD and connect the cable
- During the down time window, you may test MAD functionality aswell by disconnecting or shutting IRF-ports
I hope this information is helpful!
I am a HPE Employee
