ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Frequent Server Problems

Muneef
Occasional Visitor

Frequent Server Problems

We have many HP Proliant servers (ML350 G5, DL380 g5,g6) in Data Center.

in this year I notice that problems of hard drive failure and RAM hang and Power Supply failure occur more than once.

we have checked our power systems UPS and earthing and they are OK.

What do you think the problem?

3 REPLIES
Jimmy Vance
HPE Pro

Re: Frequent Server Problems

Have you looked at the servers IML and iLO logs?




__________________________________________________
No support by private messages. Please ask the forum!      I work for HPE

If you feel this was helpful please click the KUDOS! thumb below!   
Muneef
Occasional Visitor

Re: Frequent Server Problems

I have looked in IML loh of Pro;iant DL380 g6 server and it is as attached image

IML log has at least one critical event, but no IML Log Entries.

and the same in Proliant ML350 g5 server

 

Muneef
Occasional Visitor

Re: Frequent Server Problems

I re-check the IML logs of DL380 g6 server and get the following logs:

POST Error: 1785-Drive Array not Configured 10/14/2016 4:39AM 10/14/2016 4:39AM

POST Error: 1720-S.M.A.R.T. Hard Drive Detects Imminent Failure 10/14/2016 4:39AM 10/14/2016 4:39AM

POST Error: 207-Invalid Memory Configuration - Processor 1, DIMM 7 incorrectly installed. Please refer to Memory Population Rules in Documentation. This Memory will not be utilized. 10/14/2016 4:38AM 10/14/2016 4:38AM

Internal SAS Enclosure Device Failure (Bay 1, Box 1, Port 1I, Slot 0) 10/13/2016 9:09PM 10/24/2016 9:09AM

Network Adapter Link Down (Slot 0, Port 4) 10/13/2016 9:05PM 10/13/2016 9:05PM

Network Adapter Link Down (Slot 0, Port 3) 10/13/2016 9:05PM 10/13/2016 9:05PM

POST Error: 1785-Drive Array not Configured 10/13/2016 9:03PM 10/13/2016 9:03PM

POST Error: 1720-S.M.A.R.T. Hard Drive Detects Imminent Failure 10/13/2016 9:03PM 10/13/2016 9:03PM

Corrected Memory Error threshold exceeded ((Processor 2, Memory Module 9)) 9/12/2016 9:38AM 10/13/2016 7:17PM

 

and the other DL380 g6 server have the following logs:

POST Error: 1716-Slot X Drive Array - Unregenerable Media Errors Detected on Drives during previous Rebuild or Auto-Reliability Monitoring (ARM) scan. Problem will be fixed automatically when the sector(s) are overwritten. 10/14/2016 6:02AM 10/14/2016 6:02AM

POST Error: 1786-Drive Array Recovery Needed 10/14/2016 6:02AM 10/14/2016 6:02AM

Network Adapter Link Down (Slot 0, Port 4) 10/14/2016 4:41AM 10/14/2016 4:41AM

Network Adapter Link Down (Slot 0, Port 3) 10/14/2016 4:41AM 10/14/2016 4:41AM

POST Error: 1716-Slot X Drive Array - Unregenerable Media Errors Detected on Drives during previous Rebuild or Auto-Reliability Monitoring (ARM) scan. Problem will be fixed automatically when the sector(s) are overwritten. 10/14/2016 4:39AM 10/14/2016 4:39AM

POST Error: 1786-Drive Array Recovery Needed 10/14/2016 4:39AM 10/14/2016 4:39AM

Network Adapter Link Down (Slot 0, Port 4) 10/4/2016 8:14AM 10/4/2016 8:14AM

Network Adapter Link Down (Slot 0, Port 3) 10/4/2016 8:14AM 10/4/2016 8:14AM

POST Error: 1716-Slot X Drive Array - Unregenerable Media Errors Detected on Drives during previous Rebuild or Auto-Reliability Monitoring (ARM) scan. Problem will be fixed automatically when the sector(s) are overwritten. 10/4/2016 8:12AM 10/4/2016 8:12AM

POST Error: 1786-Drive Array Recovery Needed 10/4/2016 8:12AM 10/4/2016 8:12AM

Internal SAS Enclosure Device Failure (Bay 3, Box 1, Port 1I, Slot 0) 8/11/2016 2:47PM 10/23/2016 1:20PM

Internal SAS Enclosure Device Failure (Bay 3, Box 1, Port 1I, Slot 0) 7/25/2016 2:44AM 8/30/2016 2:38PM

and I have two ML350 g5 servers with same problem, and the problem occur in different times for each server.

In the first, I doubt that there static in servers, but we have check the UPS and the Earthing and they are OK.

What are the reasons that can cause these problems?