Servers - General
1752579 Members
3989 Online
108788 Solutions
New Discussion

NVMe disk firmware rollback

 
paoloP
Regular Visitor

NVMe disk firmware rollback

Hello all,

 we hit a potential issue with a FW recently released [September 27] for certain disk models  (LO2000KEFJU)  (HPK4 version).

In short, the servers seem to lose disks after the reboot needed to flash the disk firmware (rebooted 2 servers, lost 2 disks). Unfortunately we rolled out the fiirmware to a lot of servers (in our process we first mass-deploy patches, and then schedule the reboots with the application owners). So now I find myself with another 40 servers pendign a reboot, with this frmware primed to load on boot.

Does anybody know if there is a way to tell the disk NOT to apply the fimware pending for install at the next reboot? Can I just override it by forcing deployment of the previous version? 

Thanks!

7 REPLIES 7
Torsten.
Acclaimed Contributor

Re: NVMe disk firmware rollback

Gen10?

Not sure about, but I would look into the ILO5 repository for pending installs.

Hope this helps!
Regards
Torsten.

__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.

__________________________________________________
No support by private messages. Please ask the forum!

If you feel this was helpful please click the KUDOS! thumb below!   
paoloP
Regular Visitor

Re: NVMe disk firmware rollback

Thanks. G9, unfortunately.

If I look at the bios package install logs the status is listed as below. I need a way to tell the disk NOT to apply this firmware (we tried on a third server today, it lost a disk too).

[ Nov 19 15:08:01 ] 880973568: Sending Resume Background Activity command to device HBA:3692555625
[ Nov 19 15:08:01 ] Device HBA:3692555625 does not support OPERATION_WRITE_BMIC_COMMAND
[ Nov 19 15:08:02 ] Devices [Drive CVMD731500042P0YGN (NVME HBA PCIe Data Center SSD in Slot ATTR_VALUE_SLOT_UNKNOWN)]: 21
[ Nov 19 15:08:02 ] Deferred flashes will be performed on next system reboot
[ Nov 19 15:08:02 ] Devices [Drive CVMD731500242P0YGN (NVME HBA PCIe Data Center SSD in Slot ATTR_VALUE_SLOT_UNKNOWN)]: 21
[ Nov 19 15:08:02 ] Deferred flashes will be performed on next system reboot
[ Nov 19 15:08:02 ] Devices [Drive CVMD7321007J2P0YGN (NVME HBA PCIe Data Center SSD in Slot ATTR_VALUE_SLOT_UNKNOWN)]: 21
[ Nov 19 15:08:02 ] Deferred flashes will be performed on next system reboot
[ Nov 19 15:08:02 ] Internal Exit Status: 21
[ Nov 19 15:08:02 ] See log at /var/cpq/CP036935_2018_11_19_15_07_58.log

sudhirsingh
HPE Pro

Re: NVMe disk firmware rollback

Hi,

I feel you can avoid  "pending for install at the next reboot",

however i would suggest you to report this to HPE(open a case) to investigate this, if there is an issue with the drive firmware release.

Regards,

Sudhir

While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company

>Accept or Kudo

paoloP
Regular Visitor

Re: NVMe disk firmware rollback

Thanks for your answer! We have 3 cases open, one for each server that lost a disks after the reboot. 

Do you know how I could avoid the FW upgrade at the next reboot? Would forcing the disk to load of a previous version (HPK3, in this case) via setup /force  do it?

Thanks.

 

sudhirsingh
HPE Pro

Re: NVMe disk firmware rollback

Hi,

There is a correction my last post:

Please read following line as: "I feel you can't avoid " pending for install at the next reboot,

However , can you please let us know the drive firmware status before reboot?

Since you have already submitted cases with HPE, we would request you to kindly follow-up on the cases for resolution.

Regards,

Sudhir

 

While I am an HPE Employee, all of my comments (whether noted or not), are my own and are not any official representation of the company

>Accept or Kudo

paoloP
Regular Visitor

Re: NVMe disk firmware rollback

Hello Sudir,

we are workign with HPE L2 resources on this case.

An indicator of trouble is the disk FW update process exiting with 'Internal exit status: 107'  (on linux it can be checked with something like "egrep 107 /var/cpq/C* "). Once this happen, the disk will fail at the next reboot.. We tipycally see this on disk P/N LO2000KEFJU 

It is unclear at the moment if the failure is in the boot loader or in the FW itself, HPE is investigating. We have provided them with disks in both failed and pending failure status.

I will post an update here once I hear back.  If you work for HPE, my recommendation woudl be to pull that FW package (CP036935) from your site until when the root cause has been identified.

Paolo

paoloP
Regular Visitor

Re: NVMe disk firmware rollback

In case it helps anybody, it seems a possible solution for disks reporting status 107 after FW upgrade is to retry to activate the FW trough the nvme-cli package:

#nvme fw-activate /dev/nvme0n1 -s 1 -a 1

(repeat for all disks pendign FW upgrade)

Once you get a successful activation, reset the NVMe drive:

#nvme reset /dev/nvme0n1

(repeat for all disks)

At  thsi point the disk will either show up with teh correct FW verion, or will be lost and require replacement.

The process seems ot work most of the times, but we still see some occasional failures. Having spares around is rather handy.