- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- ProLiant DL585 G7 DRAM ECC error
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2017 02:45 AM
01-20-2017 02:45 AM
ProLiant DL585 G7 DRAM ECC error
Hello all,
We have 4 ProLiant DL585 G7 servers out of warranty
- 2 servers are HP ProLiant DL 585 G7, configured for 12 cores
- 2 servers are HP ProLiant DL 585 G7, configured for 16 cores
All of them are running continuously since the installation. But recently a hardware error has appeared on both of the 12 core machines.
on one:
[6896093.455573] [Hardware Error]: MC4 Error (node 1): DRAM ECC error detected on the NB.
[6896093.468312] EDAC MC1: 1 CE on mc#1csrow#0channel#0 (csrow:0
channel:0 page:0xc9139a offset:0xb30 grain:0 syndrome:0xb903) [6896093.468317] [Hardware Error]: Error Status: Corrected error, no action required.
on the other:
[6653460.796494] [Hardware Error]: MC4 Error (node 7): DRAM ECC error detected on the NB.
[6653460.809233] EDAC MC7: 1 CE on mc#7csrow#0channel#0 (csrow:0
channel:0 page:0x5531e0b offset:0xfb0 grain:0 syndrome:0x100) [6653460.809245] [Hardware Error]: Error Status: Corrected error, no action required.
The error is correcting itself, but it's quite annoying.
Both machines have the same amount of memory (384 GB, 24 modules of 16 GB).
It appears only when you go to "top" performance, that is, under heavy use.
It is independent of the OS, I changed recently from Scientific Linux 7 to CentOS 7.
Any idea about why the error is appearing, and what to do to fix it?
Thanks in advance!
- Tags:
- DIMM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-20-2017 10:48 AM - edited 01-20-2017 11:08 AM
01-20-2017 10:48 AM - edited 01-20-2017 11:08 AM
Re: ProLiant DL585 G7 DRAM ECC error
IMHO this blog post seems to be very pertinent (Reported errors were taken from SL/CentOS Linux dmesg/syslog...isn't it?).
I'm not an HPE Employee
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2017 01:59 AM
01-25-2017 01:59 AM
Re: ProLiant DL585 G7 DRAM ECC error
Great thanks! So I read that there are modules that are giving errors. On one of them:
mc1: csrow0: mc#1csrow#0channel#0: 83 Corrected Errors
And on the other
mc7: csrow0: mc#7csrow#0channel#0: 869 Corrected Errors
This is already a big advance. But how can I "physically" localize them? Is there a standard naming, or should I try to swap the module until "edac" is no more giving errors? Thanks for your help in advance!