- Integrated Systems
- About Us
- Integrated Systems
- About Us
03-03-2021 05:19 AM
iLO integrated flash card failure on G8 and G9 servers - Why did they do that?
There are many Gen8 and Gen9 servers dying right now and the reason is a faulty SD flash chip on the mainboard. Obviously too many write cycles worn that chip out prematurely. HPE has released new firmware to address the problem but that will not help if your chip already died.
Maybe an HP engineer can answer this question: Why did HP solder that chip onto the mainboard instead of putting it on a replaceable module or even simpler... use a Micro-SD card that can be replaced if failed?
Everyone in the electronics and IT industry knows that NAND flash chips have a limited write-cycle lifetime. A flash chip is a consumable item, it wears out.
In fact, the SD card interface chip that handles I/O to that flash chip also manages the existing Micro-SD card slot. The only thing to do was to add a second SD slot instead of a soldered chip.
This way hundreds, if not thousads, of Gen8 and Gen9 boards or entire servers are going to scrap because of this tiny little chip that fails and can not be replaced... or can it?
I've found an electronics company that is able to replace those tiny BGA chips on those huge boards. This is not an easy task because a board of this size has an enormous thermal mass and you need a massive pre-heater to achieve a proper soldering. Not to speak about the BGA pick and place machine...
I'm not sure, but even on some Gen10 models this chip is still soldered to the main PCB. I just hope that on Gen11 HP finally changes that. It's a small change but a giant leap for the environment!
"If it seems illogical... you just don't have enough information"
a month ago
Re: iLO integrated flash card failure on G8 and G9 servers - Why did they do that?
Thank you for the Post and for highlighting the points.
We do understand your concern here, but the purpose of the NAND is to store various types of server data, configuration information, and programs that may change during the life of the server. NAND technology was selected because it will hold the data during a complete loss of power and can be rewritten multiple times.
HPE incorporated NAND flash devices in ProLiant servers for Gen8-series servers, followed by Gen9, Gen10 and Gen10+. Also NAND implementation has been updated including the use of an eMMC controller dedicated only to the NAND, and use of the latest eMMC protocol to enable use to obtain more comprehensive NAND health information.
One limitation of NAND technology is that the memory cells have a defined number of program/erase cycles. When the limit is reached for a particular cell, the cell is considered worn out. Reserved or spare blocks are used in place of worn out cells. When the reserved blocks or the number of cell writes reaches a critical limit, the NAND is considered worn out and NAND writes will be disabled.
HPE has implemented a change to reduce the number of NAND writes by tailoring the AHS NAND program cycles to the optimal block size NAND writes for HPE's NAND chips these are done by the new Firmwares.
Updating the large number of Servers can be achieved using GUI / Scripting / CLI regardless of the state of the host server or Operating System.
Also this update doesnt require a reboot of the Host. The best practice would be is to keeping the Server up-to-date regularly (this not only for the iLO / NAND, this can be followed for all the components by doing which the Enhancements / Fixes available on the latest Firmware would be applied and potential issues can be avoided)
HPE would definitely be looking into such facts and will always look forward for further developments.
We thank you again for the post.
I work for HPE