- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- "Critical Temperature Threshold Exceeded" Before B...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-10-2020 06:30 PM
10-10-2020 06:30 PM
"Critical Temperature Threshold Exceeded" Before Boot in 20C Environment
I am attempting to upgrade some GPUs in a WS460c Expansion Blade and am observing a rather strange error condition - if I install a Tesla T4 card and insert the blade into the chassis, it fails before ever successfully booting with a "Critical Temperature Threshold Exceeded" error in the IML, refuses to power on via either iLO, OneView, or the physical button, and the red light on the blade itself flashes while the expansion blade piece remains green. Removing the card and placing the blade back in the enclosure with either no cards in the PCIe slots, or one or two of the MXM carriers (w/ GPUs) works fine - no issues and boots successfully as normal. The issue only occurs when installing the T4 card.
The strange thing is this card has no external connectors - I don't know how it would even be communicating temperature to the BIOS, unless something is trying to read a temperature from the card and getting an invalid value back somehow? Maybe something is loading via UEFI? In any event, there are no thermal problems in the environment - facility keeps the air temp at 18-20C and it is quite stable - and the expansion blade itself reads well below any caution temps in the OA web UI when I check it while the WS460c blade itself is showing failed. In normal operation all temps show well below caution on the WS460c in the OA view as well. It is also strange that the IML does not identify the location / ID of temp sensor that is causing the issue, but it has been a long time since I have seen a blade fail out with this error. ( think the last one was a G6..?)
All components have been updated to the 2020.09 SPP baseline prior to installing the card.
Is there a way to identify what sensor is triggering the fault code?
Or does anyone know if there is some sort of special flash or update I need to do on a T4 card to prevent it from triggering this issue?
Failing that, is there any way to disable temp checking for PCIe cards somehow...?
Thanks for any help with this one!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2020 11:53 PM
10-11-2020 11:53 PM
Re: "Critical Temperature Threshold Exceeded" Before Boot in 20C Environment
Hello,
The system board has a sensor to read the error form all the installed components.
You can try to check after GPU replacement. I would suggest you to have a proper case be logged with HPE, and share the appropriate logs for further analysis,f the issue still persists
Because you already done the basic troubleshooting.
If you feel this was helpful please click the KUDOS! thumb below!
Regards,
I am a HPE Employee.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]