- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Does Proliant DL380p servers really monitor CPU te...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-06-2020 11:12 AM - edited тАО01-06-2020 12:42 PM
тАО01-06-2020 11:12 AM - edited тАО01-06-2020 12:42 PM
Does Proliant DL380p servers really monitor CPU temp?
To me is clear that CPU sensors are not used to control fan speed.
I think is using another motherboard sensor, close to the CPUs, which is consistently giving far low temp readings. Especially when you have very low inlet temperatures. In that case the difference between real CPU temp and the temp reported by "PROCESSOR_ZONE" sensor can be up to 35C degrees.
What is going on????
I've made temp tests on all linux flavors and windows 10. From iLO versions from 2.02 to 2.72. In every case the same problem. I've also seen on the web questions about this isse, going back to iLO 1.30 and seems never been solved.
Does proliant server have a design problem???
ANSWERS, PLEASE
ps: The BIOS option for "Increased" cooling is just brute force, does not solve bad temp readings.
In summary you have essentially 2 options: 1) either generate a lot of unnecessart noise and set "increased" cooling, or 2) use CPUs at 90C when having 100% cpu load which in turn generates that the CPU reduces its speed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-06-2020 01:42 PM - edited тАО01-06-2020 01:43 PM
тАО01-06-2020 01:42 PM - edited тАО01-06-2020 01:43 PM
Re: Does Proliant DL380p servers really monitor CPU temp?
Just browsed the web for a while and I've seen dozens of temp reports from DL380p and in each one the CPU temp is fixed to 40C.
What a joke.
3D Sea of bullsh*t.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-06-2020 03:35 PM
тАО01-06-2020 03:35 PM
Re: Does Proliant DL380p servers really monitor CPU temp?
I guess that the "sea of sensors" bull**bleep** could not operate with each one of the possible CPU models out there so what they did is they assume a min temp for the cpu at 40C and they sum up the temp coming from the inlet sensor (which they assume is around 30C). Using that they run the fans in the "optimal" configuration no matter the real CPU temp. If on top of that, the inlet sensor increases, they just increase by a certain factor the fan speed.
The problem arises when the inlet temp is way below the one they expect, and then the fan speed is insufficient. I have no other explanation if this insane issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-07-2020 04:22 AM
тАО01-07-2020 04:22 AM
Re: Does Proliant DL380p servers really monitor CPU temp?
Hello,
I am suspecting that this a HPE ProLiant DL380p Gen8 Server.
This server does have sensors to monitor the CPU temperature in the CPU location and not the inlet temperature sensor.
Please make sure you have the latest firmware/SPP installed for the server as there are critical fixes in the fan noise/temperature sensor issues.
Also please make sure the fan configuration is dones as per the below document:
https://support.hpe.com/hpsc/doc/public/display?docId=mmr_sf-EN_US000043586&docLocale=en_US
If you are experiencing any Hardware issues like server shutdown/reboot due to server overheat, then I would recommend you to please log a HPE Support and log a hardware case.
Regards,
I am a HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-07-2020 04:09 PM
тАО01-07-2020 04:09 PM
Re: Does Proliant DL380p servers really monitor CPU temp?
Thanks for the answer, it is appreciated.
Double checked fan config as per the document in the link and it is ok.
BIOS is from 05/21/2018 and iLO firmware is v2.72.
Here is a typical temp reading for the server (yes a DL380p) with ZERO CPU load (and BIOS thermal config of "Increased"):
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 AMBIENT 12C/53F 42C/107F
#2 PROCESSOR_ZONE 40C/104F 70C/158F
#3 PROCESSOR_ZONE 40C/104F 70C/158F
#4 MEMORY_BD 13C/55F 87C/188F
#5 MEMORY_BD 13C/55F 87C/188F
#6 MEMORY_BD 14C/57F 87C/188F
#7 MEMORY_BD 14C/57F 87C/188F
#8 MEMORY_BD 13C/55F 87C/188F
#9 MEMORY_BD 13C/55F 87C/188F
#10 MEMORY_BD 14C/57F 87C/188F
#11 MEMORY_BD 14C/57F 87C/188F
#12 SYSTEM_BD 35C/95F 60C/140F
#13 SYSTEM_BD 44C/111F 105C/221F
#14 POWER_SUPPLY_BAY 18C/64F -
#15 POWER_SUPPLY_BAY - -
#16 POWER_SUPPLY_BAY 14C/57F 75C/167F
#17 SYSTEM_BD 21C/69F 115C/239F
#18 SYSTEM_BD 19C/66F 115C/239F
#19 SYSTEM_BD 20C/68F 115C/239F
#20 SYSTEM_BD 20C/68F 115C/239F
#21 SYSTEM_BD 18C/64F 115C/239F
#22 SYSTEM_BD 20C/68F 115C/239F
#23 SYSTEM_BD 17C/62F 90C/194F
#24 SYSTEM_BD 14C/57F 90C/194F
#25 SYSTEM_BD 40C/104F 100C/212F
#26 SYSTEM_BD 17C/62F 90C/194F
#27 I/O_ZONE - -
#28 I/O_ZONE - -
#29 I/O_ZONE - -
#30 I/O_ZONE - -
#31 I/O_ZONE - -
#32 I/O_ZONE - -
#33 I/O_ZONE - -
#34 I/O_ZONE 15C/59F 65C/149F
#35 I/O_ZONE 16C/60F 66C/150F
#36 I/O_ZONE 16C/60F 66C/150F
#37 I/O_ZONE - -
#38 I/O_ZONE - -
#39 I/O_ZONE - -
#40 I/O_ZONE 17C/62F 66C/150F
#41 I/O_ZONE - -
#42 SYSTEM_BD 14C/57F 95C/203F
#43 SYSTEM_BD 22C/71F 90C/194F
#44 SYSTEM_BD 17C/62F 80C/176F
#45 SYSTEM_BD 9C/48F 65C/149F
#46 SYSTEM_BD 18C/64F 75C/167F
#47 SYSTEM_BD 16C/60F 75C/167F
#48 SYSTEM_BD 18C/64F 75C/167F
#49 CHASSIS_ZONE 16C/60F 75C/167F
#50 CHASSIS_ZONE 16C/60F 75C/167F
The fan speeds are:
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 SYSTEM Yes NORMAL 32% Yes 0 Yes
#2 SYSTEM Yes NORMAL 32% Yes 0 Yes
#3 SYSTEM Yes NORMAL 32% Yes 0 Yes
#4 SYSTEM Yes NORMAL 34% Yes 0 Yes
#5 SYSTEM Yes NORMAL 43% Yes 0 Yes
#6 SYSTEM Yes NORMAL 43% Yes 0 Yes
The first observation is that, given the inlet temp of 12C and ZERO CPU load, its temp could hardly be 40C. That's just logic. Confirming this, CPU temp readings (as per psensor on linux MX 18.3) are:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +22.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +20.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +22.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +18.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +20.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +22.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +21.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +20.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +20.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +17.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Here is the reading for the same server after 10 minutes of 100% CPU load:
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 AMBIENT 12C/53F 42C/107F
#2 PROCESSOR_ZONE 40C/104F 70C/158F
#3 PROCESSOR_ZONE 40C/104F 70C/158F
#4 MEMORY_BD 15C/59F 87C/188F
#5 MEMORY_BD 15C/59F 87C/188F
#6 MEMORY_BD 15C/59F 87C/188F
#7 MEMORY_BD 15C/59F 87C/188F
#8 MEMORY_BD 13C/55F 87C/188F
#9 MEMORY_BD 14C/57F 87C/188F
#10 MEMORY_BD 15C/59F 87C/188F
#11 MEMORY_BD 15C/59F 87C/188F
#12 SYSTEM_BD 35C/95F 60C/140F
#13 SYSTEM_BD 44C/111F 105C/221F
#14 POWER_SUPPLY_BAY 19C/66F -
#15 POWER_SUPPLY_BAY - -
#16 POWER_SUPPLY_BAY 17C/62F 75C/167F
#17 SYSTEM_BD 21C/69F 115C/239F
#18 SYSTEM_BD 23C/73F 115C/239F
#19 SYSTEM_BD 20C/68F 115C/239F
#20 SYSTEM_BD 21C/69F 115C/239F
#21 SYSTEM_BD 21C/69F 115C/239F
#22 SYSTEM_BD 20C/68F 115C/239F
#23 SYSTEM_BD 18C/64F 90C/194F
#24 SYSTEM_BD 16C/60F 90C/194F
#25 SYSTEM_BD 41C/105F 100C/212F
#26 SYSTEM_BD 18C/64F 90C/194F
#27 I/O_ZONE - -
#28 I/O_ZONE - -
#29 I/O_ZONE - -
#30 I/O_ZONE - -
#31 I/O_ZONE - -
#32 I/O_ZONE - -
#33 I/O_ZONE - -
#34 I/O_ZONE 18C/64F 65C/149F
#35 I/O_ZONE 18C/64F 66C/150F
#36 I/O_ZONE 18C/64F 66C/150F
#37 I/O_ZONE - -
#38 I/O_ZONE - -
#39 I/O_ZONE - -
#40 I/O_ZONE 21C/69F 66C/150F
#41 I/O_ZONE - -
#42 SYSTEM_BD 16C/60F 95C/203F
#43 SYSTEM_BD 24C/75F 90C/194F
#44 SYSTEM_BD 20C/68F 80C/176F
#45 SYSTEM_BD 9C/48F 65C/149F
#46 SYSTEM_BD 20C/68F 75C/167F
#47 SYSTEM_BD 20C/68F 75C/167F
#48 SYSTEM_BD 21C/69F 75C/167F
#49 CHASSIS_ZONE 18C/64F 75C/167F
#50 CHASSIS_ZONE 17C/62F 75C/167F
The fan speed are:
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 SYSTEM Yes NORMAL 32% Yes 0 Yes
#2 SYSTEM Yes NORMAL 32% Yes 0 Yes
#3 SYSTEM Yes NORMAL 32% Yes 0 Yes
#4 SYSTEM Yes NORMAL 34% Yes 0 Yes
#5 SYSTEM Yes NORMAL 43% Yes 0 Yes
#6 SYSTEM Yes NORMAL 43% Yes 0 Yes
CPU temp readings as per psensor are:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +43.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +39.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +41.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +41.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +43.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +47.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +45.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +44.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +47.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +44.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
It obvious that the CPU temp cannot be 40C under 0 load and under 100% load. I think the problem is clear now. "PROCESSOR_ZONE" sensors are stuck in 40C/104F no matter what is the CPU temperature.
Under "Optimal" config, and here is the main issue, we have for ZERO CPU load:
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 AMBIENT 12C/53F 42C/107F
#2 PROCESSOR_ZONE 40C/104F 70C/158F
#3 PROCESSOR_ZONE 40C/104F 70C/158F
#4 MEMORY_BD 14C/57F 87C/188F
#5 MEMORY_BD 14C/57F 87C/188F
#6 MEMORY_BD 14C/57F 87C/188F
#7 MEMORY_BD 14C/57F 87C/188F
#8 MEMORY_BD 15C/59F 87C/188F
#9 MEMORY_BD 15C/59F 87C/188F
#10 MEMORY_BD 15C/59F 87C/188F
#11 MEMORY_BD 14C/57F 87C/188F
#12 SYSTEM_BD 35C/95F 60C/140F
#13 SYSTEM_BD 44C/111F 105C/221F
#14 POWER_SUPPLY_BAY 20C/68F -
#15 POWER_SUPPLY_BAY - -
#16 POWER_SUPPLY_BAY 17C/62F 75C/167F
#17 SYSTEM_BD 21C/69F 115C/239F
#18 SYSTEM_BD 23C/73F 115C/239F
#19 SYSTEM_BD 23C/73F 115C/239F
#20 SYSTEM_BD 21C/69F 115C/239F
#21 SYSTEM_BD 22C/71F 115C/239F
#22 SYSTEM_BD 21C/69F 115C/239F
#23 SYSTEM_BD 19C/66F 90C/194F
#24 SYSTEM_BD 18C/64F 90C/194F
#25 SYSTEM_BD 43C/109F 100C/212F
#26 SYSTEM_BD 20C/68F 90C/194F
#27 I/O_ZONE - -
#28 I/O_ZONE - -
#29 I/O_ZONE - -
#30 I/O_ZONE - -
#31 I/O_ZONE - -
#32 I/O_ZONE - -
#33 I/O_ZONE - -
#34 I/O_ZONE 16C/60F 65C/149F
#35 I/O_ZONE 17C/62F 66C/150F
#36 I/O_ZONE 17C/62F 66C/150F
#37 I/O_ZONE - -
#38 I/O_ZONE - -
#39 I/O_ZONE - -
#40 I/O_ZONE 20C/68F 66C/150F
#41 I/O_ZONE - -
#42 SYSTEM_BD 15C/59F 95C/203F
#43 SYSTEM_BD 26C/78F 90C/194F
#44 SYSTEM_BD 19C/66F 80C/176F
#45 SYSTEM_BD 9C/48F 65C/149F
#46 SYSTEM_BD 21C/69F 75C/167F
#47 SYSTEM_BD 18C/64F 75C/167F
#48 SYSTEM_BD 21C/69F 75C/167F
#49 CHASSIS_ZONE 19C/66F 75C/167F
#50 CHASSIS_ZONE 19C/66F 75C/167F
with fans:
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 SYSTEM Yes NORMAL 6% Yes 0 Yes
#2 SYSTEM Yes NORMAL 6% Yes 0 Yes
#3 SYSTEM Yes NORMAL 6% Yes 0 Yes
#4 SYSTEM Yes NORMAL 16% Yes 0 Yes
#5 SYSTEM Yes NORMAL 27% Yes 0 Yes
#6 SYSTEM Yes NORMAL 27% Yes 0 Yes
Observe fans now are in a much more reasonable speed than before.
sensors readings:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +24.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +20.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +24.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +19.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +22.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +30.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +27.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +27.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +29.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +26.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
After 10 mins of 100% usage we can see the main problem:
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 AMBIENT 11C/51F 42C/107F
#2 PROCESSOR_ZONE 40C/104F 70C/158F
#3 PROCESSOR_ZONE 60C/140F 70C/158F
#4 MEMORY_BD 15C/59F 87C/188F
#5 MEMORY_BD 15C/59F 87C/188F
#6 MEMORY_BD 15C/59F 87C/188F
#7 MEMORY_BD 16C/60F 87C/188F
#8 MEMORY_BD 17C/62F 87C/188F
#9 MEMORY_BD 20C/68F 87C/188F
#10 MEMORY_BD 19C/66F 87C/188F
#11 MEMORY_BD 16C/60F 87C/188F
#12 SYSTEM_BD 35C/95F 60C/140F
#13 SYSTEM_BD 44C/111F 105C/221F
#14 POWER_SUPPLY_BAY 25C/77F -
#15 POWER_SUPPLY_BAY - -
#16 POWER_SUPPLY_BAY 25C/77F 75C/167F
#17 SYSTEM_BD 25C/77F 115C/239F
#18 SYSTEM_BD 31C/87F 115C/239F
#19 SYSTEM_BD 24C/75F 115C/239F
#20 SYSTEM_BD 21C/69F 115C/239F
#21 SYSTEM_BD 26C/78F 115C/239F
#22 SYSTEM_BD 28C/82F 115C/239F
#23 SYSTEM_BD 20C/68F 90C/194F
#24 SYSTEM_BD 24C/75F 90C/194F
#25 SYSTEM_BD 44C/111F 100C/212F
#26 SYSTEM_BD 23C/73F 90C/194F
#27 I/O_ZONE - -
#28 I/O_ZONE - -
#29 I/O_ZONE - -
#30 I/O_ZONE - -
#31 I/O_ZONE - -
#32 I/O_ZONE - -
#33 I/O_ZONE - -
#34 I/O_ZONE 22C/71F 65C/149F
#35 I/O_ZONE 23C/73F 66C/150F
#36 I/O_ZONE 23C/73F 66C/150F
#37 I/O_ZONE - -
#38 I/O_ZONE - -
#39 I/O_ZONE - -
#40 I/O_ZONE 27C/80F 66C/150F
#41 I/O_ZONE - -
#42 SYSTEM_BD 17C/62F 95C/203F
#43 SYSTEM_BD 28C/82F 90C/194F
#44 SYSTEM_BD 22C/71F 80C/176F
#45 SYSTEM_BD 8C/46F 65C/149F
#46 SYSTEM_BD 25C/77F 75C/167F
#47 SYSTEM_BD 22C/71F 75C/167F
#48 SYSTEM_BD 26C/78F 75C/167F
#49 CHASSIS_ZONE 24C/75F 75C/167F
#50 CHASSIS_ZONE 21C/69F 75C/167F
fans:
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 SYSTEM Yes NORMAL 10% Yes 0 Yes
#2 SYSTEM Yes NORMAL 10% Yes 0 Yes
#3 SYSTEM Yes NORMAL 10% Yes 0 Yes
#4 SYSTEM Yes NORMAL 16% Yes 0 Yes
#5 SYSTEM Yes NORMAL 27% Yes 0 Yes
#6 SYSTEM Yes NORMAL 27% Yes 0 Yes
sensors:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +51.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +46.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +47.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +49.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +51.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +92.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +89.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
CPU starts to throttle:
CPU MHz: 3391.856
CPU max MHz: 3500.0000
CPU min MHz: 1200.0000
So the main conclusions are:
1) Cooling is not aware of real CPU temp.
2) "PROCESSOR_ZONE" sensor, whatever it is, may report up to 30C below he real CPU temp, with a fixed minimum of 40C.
Summarizing, Optimal thermal config option is awful.
When you say " in the CPU location" you mean the CPU sensors or other sensors external to the CPU?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-07-2020 08:19 PM
тАО01-07-2020 08:19 PM
Re: Does Proliant DL380p servers really monitor CPU temp?
I forgot to mention that I have another DL380p (bought from a different provider) which has older BIOS and iLO which are experiencing similar issues, so I seriously doubt the it's a firmware problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-07-2020 09:21 PM
тАО01-07-2020 09:21 PM
Re: Does Proliant DL380p servers really monitor CPU temp?
This advisory discusses some throttling issues which may be related to what you are seeing. The thermal and iLO groups did extensive testing when this was reported. The system uses more than just the sensors on the board and does monitor the internal readings of the CPU. What ends up happening as the CPU temp spikes and before the iLO can react and start ramping up the fan speed, short CPU throttling events can occur. Many times the MCE events happen so quickly, the the event stating throttling is over doesn't get logged. If the fans are set to react more aggressively, then customers start complaining about the increased fan noise so there is a fine line that has to be balanced. If you are running your CPU's under a load that starts tripping the throttling events, then the solution is to set the system to increased cooling instead of optimal cooling
https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00020196en_us
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-08-2020 06:51 AM
тАО01-08-2020 06:51 AM
Re: Does Proliant DL380p servers really monitor CPU temp?
Fan noise is exactly the problem I am facing because the small data center is closeby to an office environment, so noise levels from "Increased" cooling are unacceptable.
Besides the throttling, what I've shown is that "Optimal" setting is not optimal at all. Please look again at the reading at 100% CPU use:
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 SYSTEM Yes NORMAL 10% Yes 0 Yes
#2 SYSTEM Yes NORMAL 10% Yes 0 Yes
#3 SYSTEM Yes NORMAL 10% Yes 0 Yes
#4 SYSTEM Yes NORMAL 16% Yes 0 Yes
#5 SYSTEM Yes NORMAL 27% Yes 0 Yes
#6 SYSTEM Yes NORMAL 27% Yes 0 Yes
sensors:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +51.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +46.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +47.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +49.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +51.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +92.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 0: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 1: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 2: +91.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Core 3: +89.0┬░C (high = +94.0┬░C, crit = +102.0┬░C)
Do you see it?
I estimate that setting *all* fans at 20%, you will get around 50C in both CPUs (and acceptable temps for essentially the whole board), and noise level is *very low*.
If you set "Increased" cooling, fans are spinning much faster than needed generating a lot of noise.
There is any way to fix fan speed at certain point? In my current setting fixing the fans at 20% will solve the problem.