- Community Home
- >
- Servers and Operating Systems
- >
- BladeSystem
- >
- BladeSystem Server Blades
- >
- Re: "Critical Temperature Threshold Exceeded" Befo...
-
- Forums
-
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
- HPE Blog, Austria, Germany & Switzerland
- Blog HPE, France
- HPE Blog, Italy
- HPE Blog, Japan
- HPE Blog, Middle East
- HPE Blog, Russia
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
-
Blogs
- Advancing Life & Work
- Advantage EX
- Alliances
- Around the Storage Block
- HPE Blog, Latin America
- HPE Blog, Middle East
- HPE Blog, Saudi Arabia
- HPE Blog, South Africa
- HPE Blog, UK & Ireland
- HPE Ezmeral: Uncut
- OEM Solutions
- Servers & Systems: The Right Compute
- Tech Insights
- The Cloud Experience Everywhere
-
Information
- Community
- Welcome
- Getting Started
- FAQ
- Ranking Overview
- Rules of Participation
- Tips and Tricks
- Resources
- Announcements
- Email us
- Feedback
- Information Libraries
- Integrated Systems
- Networking
- Servers
- Storage
- Other HPE Sites
- Support Center
- Aruba Airheads Community
- Enterprise.nxt
- HPE Dev Community
- Cloud28+ Community
- Marketplace
-
Forums
-
Blogs
-
Information
-
English
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
10-10-2020 06:30 PM
10-10-2020 06:30 PM
"Critical Temperature Threshold Exceeded" Before Boot in 20C Environment
I am attempting to upgrade some GPUs in a WS460c Expansion Blade and am observing a rather strange error condition - if I install a Tesla T4 card and insert the blade into the chassis, it fails before ever successfully booting with a "Critical Temperature Threshold Exceeded" error in the IML, refuses to power on via either iLO, OneView, or the physical button, and the red light on the blade itself flashes while the expansion blade piece remains green. Removing the card and placing the blade back in the enclosure with either no cards in the PCIe slots, or one or two of the MXM carriers (w/ GPUs) works fine - no issues and boots successfully as normal. The issue only occurs when installing the T4 card.
The strange thing is this card has no external connectors - I don't know how it would even be communicating temperature to the BIOS, unless something is trying to read a temperature from the card and getting an invalid value back somehow? Maybe something is loading via UEFI? In any event, there are no thermal problems in the environment - facility keeps the air temp at 18-20C and it is quite stable - and the expansion blade itself reads well below any caution temps in the OA web UI when I check it while the WS460c blade itself is showing failed. In normal operation all temps show well below caution on the WS460c in the OA view as well. It is also strange that the IML does not identify the location / ID of temp sensor that is causing the issue, but it has been a long time since I have seen a blade fail out with this error. ( think the last one was a G6..?)
All components have been updated to the 2020.09 SPP baseline prior to installing the card.
Is there a way to identify what sensor is triggering the fault code?
Or does anyone know if there is some sort of special flash or update I need to do on a T4 card to prevent it from triggering this issue?
Failing that, is there any way to disable temp checking for PCIe cards somehow...?
Thanks for any help with this one!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
10-11-2020 11:53 PM
10-11-2020 11:53 PM
Re: "Critical Temperature Threshold Exceeded" Before Boot in 20C Environment
Hello,
The system board has a sensor to read the error form all the installed components.
You can try to check after GPU replacement. I would suggest you to have a proper case be logged with HPE, and share the appropriate logs for further analysis,f the issue still persists
Because you already done the basic troubleshooting.
If you feel this was helpful please click the KUDOS! thumb below!
Regards,
I am a HPE Employee
Hewlett Packard Enterprise International
- Communities
- HPE Blogs and Forum
© Copyright 2021 Hewlett Packard Enterprise Development LP