Aruba & ProVision-based
cancel
Showing results for 
Search instead for 
Did you mean: 

HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot Hang

 
Highlighted
Frequent Collector

HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot Hang

Hello "Expert Day" Online HPE Experts!

Have been keeping 5400 and 3500 series switches software up to date without any problems for years - including several BootROM updates - until just now.

Routinely applied K.16.02.0030 to Primary Software Image location on a HPE 5406zl and issued reboot command and the 5406 could not complete the reboot.

All communication was lost so connected serial console cable and reapplied mains power.

This is what appeared in PuTTY

IS2S0123
ROM information:
Build directory: /ws/swbuildm/btmrom_t5b_qaoff/rom/build/btmrom(btmrom_t5b_qaoff)
Build date: Nov 12 2012
Build time: 13:42:32
Build version: K.15.30
Build number: 11194

 

Boot Profiles:

0. Monitor ROM Console
1. Primary Software Image [K.16.02.0030]
2. Secondary Software Image [K.16.02.0029]

Select profile (primary):


Booting Primary Software Image...

Decompressing...done.
Uncompressed CRC does not match file CRC
File CRC 167fa419 Calculated f3ace05d


Bad code in FLASH

 


Flash memory needs reprogramming or chassis could be faulty.
Use a PC as the console and perform the update procedure
by serial Xmodem download of the current Switch Image.
If unsuccessful w/ downloading, then try replacing chassis.

HP ProCurve Switch 5406zl (J8697A)
ROM Build Directory: /ws/swbuildm/btmrom_t5b_qaoff/rom/build/btmrom(btmrom_t5b_qaoff)
ROM Version: K.15.30
ROM Build Date: 13:42:32 Nov 12 2012
ROM Build Number: 11194

Now if you select the Secondary Image the 5406 appears to boot up and seems to run OK.

However once up and running go back into the CLI running under K.16.02.0029 and try the "verify signature flash" command (see below) and you are told that "Signature is valid" for both primary and secondary flash locations BUT the BootROM disagrees and is saying..... 

Booting Primary Software Image...

Decompressing...done.
Uncompressed CRC does not match file CRC
File CRC 167fa419 Calculated f3ace05d


Bad code in FLASH

While the booted K.16.02.0029 firmware is contradicting the BootROM by saying the following in CLI

HP-Switch-5406zl# show flash
Image Size (bytes) Date Version
----------------- ------------ -------- --------------
Primary Image : 15740889 05/21/20 K.16.02.0030
Secondary Image : 15741591 04/15/20 K.16.02.0029

Boot ROM Version
----------------
Primary Boot ROM Version : K.15.30

Default Boot Image : Secondary

HP-Switch-5406zl# verify signature flash primary
Signature is valid
HP-Switch-5406zl# verify signature flash secondary
Signature is valid
HP-Switch-5406zl# show version
Image stamp:
/ws/swbuildm/maint_spokane_qaoff/code/build/btm(swbuildm_maint_spokane_qaoff_ma
int_spokane)
Apr 15 2020 16:38:22
K.16.02.0029
63
Boot Image: Secondary

Boot ROM Version: K.15.30
HP-Switch-5406zl#

If you unplug the Ethernet switching modules and power on the switch you can then boot up into K.16.02.0030 and then hot plug the Ethernet switching modules back in and it all runs - but then fails on reboot.

So how to get out of this unexpected mess (ironically caused by a minor firmware upgrade) to regain the ability to update/upgrade Switch Firmware and also get back to a smooth unattended boot process - re if a mains power cut outlasts our uninterruptible power supply battery run time.

8 REPLIES 8
Highlighted
HPE Pro

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Hello,

Could you please collect show log when switch was booted with secondary image and also share the ethernet module product number?

Thanks!

I am an HPE Employee

Accept or Kudo

Highlighted
Frequent Collector

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Thank you for your reply - here is the information you requested:-

J9307A - 24-port 10/100/1000 PoE+ module for zl series switches

J9308A - 20-port 10/100/1000 PoE+ and 4-port mini-GBIC module for HP ProCurve zl series switches

Log as requested is below.......

I 06/16/20 20:18:47 00803 usb: port enabled.
I 06/16/20 20:18:37 00422 chassis: Slot B Ready
I 06/16/20 20:18:37 02612 mgr: chassis subsystem saved the whole running config
to startup config.
I 06/16/20 20:18:37 00422 chassis: Slot A Ready
I 06/16/20 20:18:37 02612 mgr: chassis subsystem saved the whole running config
to startup config.
I 06/16/20 20:18:35 03803 chassis: System Self test completed on Slot B
I 06/16/20 20:18:35 03803 chassis: System Self test completed on Slot A
I 06/16/20 20:18:34 03802 chassis: System Self test started on Slot B
I 06/16/20 20:18:34 03802 chassis: System Self test started on Slot A
I 06/16/20 20:18:26 00179 mgr: SME CONSOLE Session - MANAGER Mode
I 06/16/20 20:18:25 00376 chassis: Slot B Download Complete
I 06/16/20 20:18:24 00376 chassis: Slot A Download Complete
I 06/16/20 20:18:23 00375 chassis: Slot B Downloading
I 06/16/20 20:18:22 00375 chassis: Slot A Downloading
I 06/16/20 20:18:20 04911 ntp: The NTP Server 192.168.1.1 is unreachable.
I 06/16/20 20:18:19 03401 crypto: Function POWER UP passed selftest.
I 06/16/20 20:18:19 04911 ntp: The NTP Server 8.8.8.8 is unreachable.
I 06/16/20 20:18:18 00068 chassis: Slot B Inserted
I 06/16/20 20:18:18 00068 chassis: Slot A Inserted
I 06/16/20 20:18:18 00066 system: System Booted
I 06/16/20 20:18:17 04260 dhcp-server: Conflict-logging is disabled
I 06/16/20 20:18:17 04257 dhcp-server: Ping-check configured with retry count =
2, timeout = 1
I 06/16/20 20:18:17 02012 mtm: A non-multicast client: Non-Mcast client Op, is
registered with client ID: 3
I 06/16/20 20:18:17 00410 SNTP: Client is enabled.
I 06/16/20 20:18:17 02633 SNTP: Client authentication is disabled.
I 06/16/20 20:18:17 00688 lldp: LLDP - enabled
I 06/16/20 20:18:17 00417 cdp: CDP enabled
I 06/16/20 20:18:17 00128 tftp: Enable succeeded
I 06/16/20 20:18:17 04695 auth: Command authorization method set to none.
I 06/16/20 20:18:17 00433 ssh: Ssh server enabled
I 06/16/20 20:18:17 00400 stack: Stack Protocol disabled
I 06/16/20 20:18:17 00110 telnet: telnetd service enabled
I 06/16/20 20:18:17 02638 srcip: RADIUS oper policy for IPv6 is 'outgoing
interface'
I 06/16/20 20:18:17 02637 srcip: RADIUS admin policy for IPv6 is 'outgoing
interface'
I 06/16/20 20:18:17 02638 srcip: SFLOW oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: SFLOW admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: SNTP oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: SNTP admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: TFTP oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: TFTP admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: TELNET oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: TELNET admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: SYSLOG oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: SYSLOG admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: RADIUS oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: RADIUS admin policy is 'outgoing interface'
I 06/16/20 20:18:17 02638 srcip: TACACS oper policy is 'outgoing interface'
I 06/16/20 20:18:17 02637 srcip: TACACS admin policy is 'outgoing interface'
I 06/16/20 20:18:17 00690 udpf: DHCP relay agent feature enabled
I 06/16/20 20:18:17 02604 dhcpv6r: Inclusion of client link-layer address in
DHCPv6 relay message is disabled.
I 06/16/20 20:18:16 05177 ip: Setting IP address 0.0.0.0 as default gateway.
I 06/16/20 20:18:16 00092 dhcp: Enabling Auto Image Config Download via DHCP and
turning off auto-tftp if enabled
I 06/16/20 20:18:16 02012 mtm: A non-multicast client: Non-Mcast client DT, is
registered with client ID: 1
I 06/16/20 20:18:16 02759 chassis: Savepower LED timer is OFF.
M 06/16/20 20:18:16 00064 system: Operator cold reboot from CONSOLE session.
I 06/16/20 20:18:16 00063 system: Mgmt Module 1 went down: 06/16/20 19:30:46
I 06/16/20 20:18:16 00061 system: -----------------------------------------
I 06/16/20 20:18:12 03803 chassis: System Self test completed on Master
I 06/16/20 20:18:12 03802 chassis: System Self test started on Master
I 06/16/20 20:18:12 03803 chassis: System Self test completed on Master
I 06/16/20 20:18:12 03802 chassis: System Self test started on Master

Highlighted
HPE Pro

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Hi,

It seems either Management Module or Compact Flash or both are corrupted. To isolate the issue, kindly replace MM with spare MM if you have or else you need to log a case with support. This is a hardware issue.

Thanks!

I am an HPE Employee

Accept or Kudo

Highlighted
Honored Contributor

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Hi! since the good flashed application firmware seems to be the one on Secondary Flash Area have you tried first to copy it on the Primary Flash Area to overwrite the primary area (command: copy flash flash primary <- this copy the booted - secondary in your case - to the primary).

Then, once Secondary Flash is copied into Primary Flash (check with the usual show flash), perform the signature verification check (command: verify signature flash primary) to be sure the simple area-to-area copy performed correctly.

If the check pass try to copy (flash) the new KB.16.02.0030 (do perform an hash check against the downloaded SWI file) to the Primary Flash area as you initially did and schedule the reboot from the updated Primary Flash area (command: boot system flash primary) as you already did. Cross your fingers....If I were you I will also be connected directly to Serial Console to see booting messages.

Kudos and Accepted Solution banner
Highlighted
Frequent Collector

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Long post but it is a long process to work through.

Thank you for your reply - yes we are thinking along the same lines - we are trying a clean/blank start - this morning we XMODEM transferred v30 (using HyperTerminal) via serial console cable overwriting primary flash - after an hour transferring at 115200bps (over 5 hours! to transfer 15MB was calculated at 9600bps) the transfer was complete all went well and the switch - AUTOMATICALLY - rebooted itself into v30 with its switching modules installed throughout - all good.

But on reboot some time later (over an hour) the switch could (yet again) not finish the boot process (that had completed successfully - automatically - after the XMODEM transfer) - so after half an hour of hoping against hope we intervened plugged in the serial cable and rebooted again to see the same error messages from the BootROM - so booted the secondary image v29 and boot was successful.

So to that end we copied the working flash v29 from secondary to primary - rebooted - verified both secondary and primary images and then Zeroized (via BootROM) the 5406 so it should now be a blank - with v29 in primary and secondary.

So far we have tried 3 separate downloads of v30 and we did today finally check and compare the file hash of the downloads - the hashes match - which is both a relief and also an annoyance.

We also tried swapping the flash memory contents locations over - no difference.

Have erased the v30 flash contents and then replaced with v29 erased and then replaced with v30 - no difference.

We have yet to see the Verify signature flash primary/secondary command say anything other than "Signature is valid" - even when run from V30 or v29 - been verifying every flash image left, right and centre - does that command actually work? Has anyone ever seen it give any output other than "Signature is valid" - by this point you do begin to wonder - but I digress.

So you would just leave the Serial Console Cable connected - I agree - unless you are actively doing something the serial cable should really make no difference to the boot process and outcomes - serial is essentially only passively listening.

Intend to try another XMODEM transfer of v30 (will definitely verify hash again) to overwrite the v29 on the currently Zeroized switch later on today - will check the time and date are set correctly (clutching at straws - just in case) before starting anything on the switch - re switch crypto signing calculations for flash images might not like an illogical time or date?

Obviously the switch will be running only from the BootROM during XMODEM transfer so there will be no running firmware image and therefore no file locking issues from running firmware - hopefully this XMODEM following Zeroization will reach the parts that other transfer have yet to reach.

Any suggestions for commands to execute before staring this process are welcome.

Any suggestions for commands to execute AFTER XMODEM - BUT BEFORE a manual reboot - so as to encourage the switch not to boot-hang are even more welcome!

Any suggestions for commands to execute from BootROM?

I.E. to encourage the writing of valid last known good states?

parnassus - I think we have carried out your valuable ideas so far - am happy to try out any other ideas that you have

The question is why does the switch automatically reboot successfully into v30 following XMODEM transfer but then fail on next reboot - is it better to disconnect the mains for a reboot? - wait 10 seconds - and then reconnect mains power?

What is the better way to reboot in this situation?

I think pressing and holding RESET and CLEAR buttons for 15 seconds once got the switch to reboot into v30 successfully.

So how many different ways are there to reboot a switch and why do some of them result in boot failure (hang) and others successfully load v30 firmware?

Highlighted
Honored Contributor

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Hi, just a note about verify signature flash [primary|secondary] command: it's just an awarness control mechanism to verify (and permit) that downloaded swi file is digitally signed by HPE...if the actual running firmware is "verifying signature capable" (and 16.02.0027 is capable of it for sure) an application software without correct signature should generate this message: "This software image does not contain a digital signature and cannot be validated as originating from HP. You may bypass this validation by using the 'allow-no-signature' option. Please see www.hpe.com/networking/swvalidation
for the list of software versions that contain signatures."...that's to say I'm not totally sure it can be used as a check to be sure flashing procedure succeeded correctly against all flash area (zeroization would be the way to go...).

Just curious: what is the serial console's output during the reboot (into Primary 16.02.0030)?

Kudos and Accepted Solution banner
Highlighted
Frequent Collector

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Thank you for pointing out that the CRC Check that fails from within the BootROM process is NOT the same as the Verify Signature Check taking place from within up and running Firmware – they are different – which also clarifies how the same firmware can repeatedly pass signature verification but fail CRC.

Came across the Verify Signature command in the notes on the HPE software download page for v30.

However, all of that does not explain how v30 can pass BootROM CRC check on automatic reboot following an hour long XMODEM transfer of v30 onto switch via Serial Console Cable into BootROM - and then fail the same CRC check on the next reboot?

Perhaps a warm boot is taking place (a Reload instead of a full Reboot) that does not solve the problem it only masks the problem.

Wonder if Zeroising straight after another XMODEM transfer (would have to first allow the automatic reboot to take place) of v30 would fix the problem?

So, the acid test for fixing this Network Switch is - will it reboot more than once unattended?

We are not in the habit of routinely rebooting network switches (or anything else for that matter) but we do expect and demand that equipment must boot while unattended (like it used to) and go back to a normal operating state without human intervention - in case of a prolonged mains power failure.

Here are the differences between CRC and Code Signing
http://en.wikipedia.org/wiki/Cyclic_redundancy_check
http://en.wikipedia.org/wiki/Code_signing

Will have a look through the commands to see if a CRC check can be triggered from CLI of v30 or v29 running firmware – as curious to see if it passes or fails CRC when firmware is running having booted up.

Pretty sure that at end of XMODEM process verification (probably CRC) of the transferred v30 image automatically takes place and passes – so why does v30 fail CRC next boot time?

parnassus – will try to get a look at the serial console output on reboot of v30 as you suggest – I also want to take a look at it – BUT – will have to guess correctly when to disconnect, change connection speed in HyperTerminal back down to 9600 from 115200 and reconnect.

Google.Com search term :- HP Procurve Corrupt Flash

There are various online articles regarding how to clear this type of CRC Boot Hang Error on HP Network switches but most of them are around 10 years old so deal with older Firmware versions and older BootROM versions for instance: -

http://community.spiceworks.com/how_to/44828-recovering-an-hp-procurve-from-a-corrupt-image trying to figure out the CLI commands for the BootROM shown on website - there is also a file “2910-trash-nebeker-v1” that supposedly gets rid of temp files etc. although it seems to be for a different series of HP Switches and is pretty old so could have unpredictable effects on 7 years newer firmware on a different switch range – but the basic concept of clearing out corruption and left over temporary files is interesting.

http://www.ipuptime.net/2012/07/procurve-corrupted-flash-recovery/ I also got the switch to boot following XMODEM transfer fix – problem being the fix did not last beyond the automatic reboot.

http://robandersonit.wordpress.com/2010/01/25/procurve-corrupt-flash/

Below is an earlier XMODEM transfer - without changing the speed back to 9600 no more text comes onto the screen

XMODEM just before auto boot of v30XMODEM just before auto boot of v30

Highlighted
Honored Contributor

Re: HP 5406zl Minor Update to Firmware K.16.02.0030 is suddenly Unsuccessful and causes Switch Boot

Have you tried the fsck way once into the Monitor ROM Console menu?

Kudos and Accepted Solution banner