ProLiant Servers (ML,DL,SL)
1752511 Members
4756 Online
108788 Solutions
New Discussion юеВ

Re: Unexpected reboots on a DL380 G3

 
SOLVED
Go to solution
Albator
Occasional Contributor

Unexpected reboots on a DL380 G3

Hello everyone,

we have a DL380 G3 which keeps rebooting with the following error messages:

"The previous system shutdown at 9:34:38 PM on 3/29/2006 was unexpected."

The event log time for the error above is at 10:36 PM on the same day (3/29/2006). Now the following warning show up at 10:35 pm in the event log:

"System Information Agent: Health: Post Errors were detected. One or more Power-On-Self-Test errors were detected during server startup.
[SNMP TRAP: 6027 in CPQHLTH.MIB]"

The frequency of the reboots is the following:
-Sometimes no reboots for 1 or 2 days
-sometimes reboots 5 or 6 times in a row.

I have to add that the server was running for 3 years without any problems. Then an 6400 Array controller was added with an MSA with 1 channel (the server is a file server). Exactly 7 days later, we started have this reboot issue.

Has anyone seen such behavior?

Thanks

4 REPLIES 4
Prashant (I am Back)
Honored Contributor

Re: Unexpected reboots on a DL380 G3

Hi,

We need to check for few thing.
like:

1)what is there in IML log?
2)It is talking about post error's so you need to check when server is booting up is there any error. you can do so by restarting the server by your self also.
3)run ADU (Array Daignostics Utility) to check the status of controller and raid we have.

since it all started after the addition of the new hardware, we can check with h/w logs as mentioned above.

Regards
Prashant S.
Nothing is impossible
Albator
Occasional Contributor

Re: Unexpected reboots on a DL380 G3

The message in IML is:

"POST Error: 1785-Drive Array not Configured"

Brian_Murdoch
Honored Contributor
Solution

Re: Unexpected reboots on a DL380 G3

Hi,

Event ID 1123 with SNMP trap 6027 can be the result of a battery failure on the Smart Array 6400. Run the Array Configuration Utility (ACU) and look at the battery count and battery status information. The status may say failed.

The first step to fix this is to update the SA6400 firmware as there have been a number of false battery failure issues with the firmware versions prior to V2.58a.

Download and run V2.58a firmware from here. This should sort the problem if it is a firmware issue. A reboot is required for the new firmware to be activated.

http://h18023.www1.hp.com/support/files/server/us/download/23678.html

If the problem persists after the firmware update you may have a genuine battery problem. To identify which battery is faulty (there are two on the SA6400), you need to run the Array Diagnostic Utility (ADU) version 7.30 or above and look at the report. In the line which starts with "Failed Batteries:" there will be a value of 0001 or 0002 indicating battery 1 or battery 2. There may also be a clear message indicating which battery has failed.

Battery 1 is the battery pack at the top of the card , battery 2 is the one at the bottom.
(See attached image).

The replacement battery pack spare part number is 307132-001 (Same part number for both modules). It is simply clipped into place if you don't require an HP engineer to fit it.

I hope this helps.

Brian





Albator
Occasional Contributor

Re: Unexpected reboots on a DL380 G3

The explanation I got from whoever installed and configured the MSA was that the Post Error message comes from the fact that the MSA was configured to use 1 channel for all 14 drives. In the array configuration utility, there are 2 controller icons for the 6400, one that holds all 14 drives, the other holds none. That's why the error message shows up.

What's strange is that the Post error message started showing up the first time the server was rebooted after the MSA install (1 week after the install), and this is exactly the same date the server rebooted started...

The dev. recently made some change on their app. so that it accesses this file server less often. The server did not reboot for 2 weeks but this week it started again. We get a reboot every second day or some.