Operating System - Linux
1752306 Members
5258 Online
108786 Solutions
New Discussion

HP Proliant DL785 G6 Server

 
arunbasu
Occasional Contributor

HP Proliant DL785 G6 Server

Hi,

 

I am a Linux Admin, I have  installed Redhat 6.1 Linux os on  HP Proliant DL785 G6 Server , when server is booting I have noticed few errors. Could you help me please how should I recover these errors. Plesae go through below attached file , where contents all error. Thanks

 

Arun Basu

3 REPLIES 3
Matti_Kurkela
Honored Contributor

Re: HP Proliant DL785 G6 Server

 

[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 430076)

 

- As it says, this is a BIOS firmware bug. Check your BIOS version and see if updated versions are available from HP for your server model, then update if possible.

 

-----------------

ERST: Can not request iomem region <.....> for ERST.


- ERST is the persistent storage for critical error messages. Linux would use it for storing Machine Check Errors because writing to disk is not guaranteed to be successful when a MCE happens. This could make it easier to troubleshoot MCE errors.

 

ERST only works with compatible hardware, and apparently this kernel does not know how to make it work on yours. This is not critical, but updating to a newer kernel version and/or BIOS firmware might fix this.

 

------------------

rport-x:y-z: blocked FC remote port time out: removing rport

 

- This might indicate FibreChannel communication issues. Check with SAN admins.

 

------------------

schedule_timeout: wrong timeout value fffff......

 

This is a known issue if you have a large number of LUNs. Check /etc/audit/audit.rules: you will find a setting for audit buffers, with a comment "Make this bigger for busy systems". Increase the value and see if the error messages go away.

 

------------------

microcode: failed to load file amd-ucode/microcode_amd.bin

 

Your CPUs can receive microcode updates, but your microcode_ctl RPM is out of date and does not include the AMD processor microcode update file. If your system can access RedHat update servers, run "yum update microcode_ctl". You'll want version 1:1.17-7 or greater.

 

------------------

k10temp <PCI device identifier>: unreliable CPU thermal sensor: monitoring disabled

 

Your CPUs have an internal temperature sensor, but it is known to be unreliable in this CPU model. This is a known issue in Socket F/AM2+ processors. The Proliant DL785 G6 has plenty of other temperature sensors, so you can just blacklist the k10temp module to get rid of the messages. Run these commands:

echo "blacklist k10temp" >/etc/modprobe.d/badsensor.conf
depmod -a

 

MK
arunbasu
Occasional Contributor

Re: HP Proliant DL785 G6 Server

 

Hi team,

 

Thanks for the quick responce.

 

Another problem I have noticed on  booting time.Plesaego through attched file and pls give me the solution how should I resolve. Thanks

Matti_Kurkela
Honored Contributor

Re: HP Proliant DL785 G6 Server

The first iLO screenshot contains the k10temp and AMD microcode messages I already described in my previous post.

All the other messages are normal messages when the "quiet boot" mode has been disabled or the ESC key has been pressed while the system is booting.

 

The second screenshot:

More normal boot messages, and two FibreChannel rport errors I already described. The only new error-like messages are:

[Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found.
[Firmware Bug]: powernow-k8: Try again with the latest BIOS.

 This is the powernow-k8 kernel module (which handles CPU powersaving for certain AMD processor models) telling you that the processor is one of the models the module could support, but the ACPI firmware does not have the data objects it would need. It suggests that a BIOS update might help. If there is no newer BIOS version available, you must accept that some powersaving features of the hardware are unusable at this time.

 

Without powersaving, the CPUs would effectively run at full power all the time. If your datacenter is not on the verge of overheating, this is probably not a very big issue.

 

The third screenshot contains many messages like this:

<disk device>: Superblock last mount time is in the future. 
(by less than a day, probably due to the hardware clock being incorrectly set) FIXED.

 Looks like plain English to me.

The boot-time filesystem check noticed that the current system time is less than the "last mount time" timestamp on the filesystems. As the difference is not too large, it suggests the hardware clock is (or was) incorrectly set.

The word "FIXED" at the end means the problem was fixed automatically.

 

You should run "date" on the system to verify that the system time and timezone settings are correct.

 

One of the disk devices listed is /dev/vda1: this tells me this may be a Guest OS (= a Virtual Machine) on some kind of a virtualization platform.  Guest OSs should always be configured to use NTP to synchronize their clocks, as the standard hardware-based timekeeping techniques may be unreliable on virtualized hardware.

 

Most VM guests get the system time from the VM host at start-up. Has the system time of the VM host been adjusted recently?

 

See if the vendor of the virtualization software has released any documents on how to avoid timekeeping issues on their virtualization platform, and follow their recommendations. If this is RedHat Virtualization add-on, please see:

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/chap-Virtualization_Host_Configuration_and_Guest_Installation_Guide-KVM_guest_timing_management.html

MK