ProLiant Servers (ML,DL,SL)
1748034 Members
4854 Online
108757 Solutions
New Discussion юеВ

Re: Server continues down

 
SOLVED
Go to solution
batbold_hp
Occasional Advisor

Server continues down

Dear experts,

We have HP ProLiant DL 585 Server which continuesly downs sometimes. Then we have to switch on it & start running applications.
I tried to investigate the reasons of downing, but couldn't determine yet.
It usually downs when there is high software load on the server.
Its memory is:
MemTotal: 65849012 kB

CPU x8:
AMD Opteron (tm) Processor 885

What do you think how I should investigate it?
Please share me your experience

Thank you very much
8 REPLIES 8
devabhaskar dey
New Member

Re: Server continues down

Which storage controller are you using? Also please specify the memory configuration.
Check for any memoy faults.
KarloChacon
Honored Contributor

Re: Server continues down

hi

just shuts down??? or is it a reboot?

any error in integrated management log?
windows event log errors? (assuming is windows you did not mention)

have you replaced any part so far?

bye
Didn't your momma teach you to say thanks!
batbold_hp
Occasional Advisor

Re: Server continues down

Thank you very much for quick response

RedHat 5 installed on it. Kernel updates were installed until March 5, 2008.
Linux version 2.6.18-53.1.14.el5

Storage controller:
HP Storage Works
Modular Smart Array 1000

HP intergated log(hplog -v) is shown here.

ID Severity Initial Time Update Time Count
-------------------------------------------------------------

0003 Repaired 23:59 05/31/2007 13:48 08/29/2007 0001
LOG: ASR Detected by System ROM

0004 Repaired 15:58 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 7)

0005 Repaired 15:58 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 5)

0006 Repaired 16:00 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 3)

0007 Repaired 16:03 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 1)

0008 Repaired 21:24 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 3)

0009 Repaired 21:35 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 7)

0010 Repaired 21:37 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 5)

0011 Repaired 22:02 08/28/2007 13:48 08/29/2007 0001
LOG: Corrected Memory Error threshold exceeded (Slot 2, Memory Module 1)

0012 Repaired 13:20 08/29/2007 13:48 08/29/2007 0002
LOG: ASR Detected by System ROM

0045 Caution 11:31 10/19/2008 11:31 10/19/2008 0001
LOG: Corrected Memory Error threshold exceeded (Slot 4, Memory Module 4)

0046 Caution 11:40 10/19/2008 11:40 10/19/2008 0001
LOG: Corrected Memory Error threshold exceeded (Slot 4, Memory Module 8)

Today October 28, 2008 server was down again. But no log about this event in the hplog & /var/log/messages. It is just shutdown, we have to switch on the power button in order to operate.

KarloChacon
Honored Contributor
Solution

Re: Server continues down

batbold_hp
Occasional Advisor

Re: Server continues down

Thank you Karlo very much

We updated the system BIOS version from ProLiant DL585 (A01) (2006-01) to A01 (2007-02-14).

Thank you again
batbold_hp
Occasional Advisor

Re: Server continues down

Even the system ROM version updated to ProLiant (A01) 2007-02-14, it still downs when there's high software load.
Then I go to the server, Therm Trip & TEMP LEDs colour were orange. I think it may be thermal shutdown. But only this server still downs, other servers not.


What is your opinion? How should we investigate?

Thank you
devabhaskar dey
New Member

Re: Server continues down

Hi,

there are too many memory errors on slot2 and a few on slot 4....

What is the memory configuration and # of processors used....

Also mention the memory type..
batbold_hp
Occasional Advisor

Re: Server continues down

It is:

AMD Operaton

Four PC2700 DIMMs 266MHz
2.6 GHz (1 MB L2)

Thanks