ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML150 G3 Restarts

Brian Messenger
Occasional Advisor

ML150 G3 Restarts

System is ML150 G3 approx 2 years old
Single Xeon Dual Core 1.86GHz Processor 3GB RAM
Proliant BIOS O07 3/31/2008
2 x 500GB SATA Hard Disks setup as RAID1 array
Running Windows SBS 2003 with all SPs & updates (probably not relevant)

All HP drivers also are the latest

System is intermittently resetting, runs for anything from 24 - 96 hours. Usually happens early hours of the morning 3am - 6am but has done it during the day

This is a hard reset (not a clean windows shutdown) there are no errors in the Windows event logs other than when the server comes up there is an event that says last shutdown was unexpected

I have noted some similar posts in these forums and have Disabled the OS Watchdog in BIOS

DMI log is empty

The BMC System log has the same four entries each time

GEN ID 20 00
SEL MSG REV 04
SENSOR TYPE 04 - Fan
SENSOR NUMBER 32 - CPU1 Fan
SEL EVENT TYPE 01 - Threshold
EVENT DESCRIPTION Lower Critical Going Low, Assertion
SEL EVENT DATA 52 FF FE


GEN ID 20 00
SEL MSG REV 04
SENSOR TYPE 02 - Voltage
SENSOR NUMBER 32 - CPU1 VCore
SEL EVENT TYPE 01 - Threshold
EVENT DESCRIPTION Upper Critical Going Low, Deassertion
SEL EVENT DATA 58 FF A4


GEN ID 20 00
SEL MSG REV 04
SENSOR TYPE 02 - Voltage
SENSOR NUMBER 32 - CPU1 VCore
SEL EVENT TYPE 01 - Threshold
EVENT DESCRIPTION Upper Critical Going High, Assertion
SEL EVENT DATA 59 FF A4

GEN ID 20 00
SEL MSG REV 04
SENSOR TYPE 27 - LAN
SENSOR NUMBER 5D LOM Link Status
SEL EVENT TYPE 03 - Digital Discrete
EVENT DESCRIPTION State Asserted, Assertion
SEL EVENT DATA 01 FF FF

My thinking is that this is Power Supply, but possibilities are also Motherboard or CPU. Given these are premium priced parts I would like to get it right first time "if possible" :-)

I ran Insight Diags overnight once and the server did reset but there was nothing in the server logs to indicate any errors. This is a production server so I can't have it offline for an extended period, most frustrating!

MTIA
9 REPLIES
RaMpaNTe
Trusted Contributor

Re: ML150 G3 Restarts

Hi, when you ran the Insight Diags off line edition did the server reboot while the diags were running? If you have a maintenance window during nights try to run the test but choose at least 15 or 20 so that we can make sure the server is going to be Stressed during the whole night. If the server does not shutdown while doing those tests and if they come out clean then chances are that you are having a SW issue.

Pls let me know.
You heve a question... I have an aswer!!!
Brian Messenger
Occasional Advisor

Re: ML150 G3 Restarts

Hi

thanks for the response, as per my original post the server did reset during the diagnostics, which pretty much eliminates a software error. But thas fryour reply
RaMpaNTe
Trusted Contributor

Re: ML150 G3 Restarts

Om so try the following, minimize the server to its base config, ie minumum memory, no PCI cards etc. Run one loop and see if the server reboots if not then start adding the components you have removed one by one so you can find which one is causing the issue. Then call Hp to get the part replaced (If the server is under warranty or contract) If not willing to spend more time on this then I would suggest you to replace the power supply.

TY
You heve a question... I have an aswer!!!
Brian Messenger
Occasional Advisor

Re: ML150 G3 Restarts

Thanks I will give this a go, but running diags for just a pass or two may not tell me anything as the system rarely resets in anything less than 24 hours and when I did run the insight diags and it reset all I know is that the system had been running for around 2 hours when I left for the evening and had reset when I returned approx 12 hours later.

I appreciate the input, I was hoping someone might be able to comment on what the SEL entries were telling me as these are CONSISTENT on EVERY reset

Thanks
David J Campbell
Occasional Visitor

Re: ML150 G3 Restarts

Hi Brian,
I'm experiencing a very similar problem with my client's ML110G4.
There is an exact correlation (apart from the minor clock offsets) between the SEL and the SBS2003 event log.
i.e. SEL has Lower Critical Going Low, Assertion then event log has unexpected shutdown.
Interestingly the SEL logs are all at 51 minutes past the hour (+/- 30 seconds). The hour appears to be random. So what happens to my server at HH:51 ??

I'm thinking/hoping power supply too but just a guess really. Would love to find out more.
Brian Messenger
Occasional Advisor

Re: ML150 G3 Restarts

Hi David

the restarts on my server became more frequent for a few days.

i.e. started at around 4:00am and very similarly to you restarted almost exactly an hour apart for three or four times.

I stripped my system back to absolute essentials, no hard drives, 1GB RAM and ran HP Insight Diags overnight and NO RESET I then put everything back one piece at a time. I did a thorough scan of both SATAand found no errors. Everything has now been back together for two days and still NO resets. I have done a thorough visual inspection without finding anything. At this stage I still don't know if it is fixed as at some stages I have gone four or five days without an issue

I'm the same as you I wish someone could tell me what those SEL entries indicated

Regards
David J Campbell
Occasional Visitor

Re: ML150 G3 Restarts

Hi Brian,
I called HP Support and they offered me similar advice: ie reseat the CPU. So I did that today and will see how we go. They seemed to think it wasn't likely to be the power supply since the voltage is regulated on the motherboard. So I presume from that that the power supply is an all or nothing proposition. They also suggested I try a new processor to rule the existing processor in or out as the culprit. I don't have any other devices attached to the motherboard so I guess it's one or the other. Still under warranty...I think :)
How's your box going?
David
Brian Messenger
Occasional Advisor

Re: ML150 G3 Restarts

Hi David

mine reset shortly after my last post. I stripped all non essentials out again and ran diags overnight and it reset again, I am beginning to think as HP suggest that it is motherboard. My system is not under warranty, so it becomes a pricey proposition to "try" a new motherboard or processor. Sill not much choice I guess.

Thanks for your input

Regards

Brian
Brian Messenger
Occasional Advisor

Re: ML150 G3 Restarts

Well my system went completely offline this morning, The system reset and then all the fans were running flat out, no POST no video or anything.

I have for some time been suspecting that the problem was related to cold as it "nearly" always happens in the early hours when the room is at the coldest. I am in Australia so it doesn't get very cold the room rarely gets beow 12 celicius. But this morning was around 8-10

I disconnected EVERYTHING (incl RAM)out of the system except motherboard and CPU with no change, I tested all the PSU outputs and they were ok.

When I removed the CPU I noticed one of the gold contacts had some oxidation/ discoloration possibly due to a bad contact. I VERY carefully cleaned the contact as well as I could (there was a matching discoloration on the m/b socket)

The system immediately came back up. I can't say of course whether this was just because by this time the room was back up to around 20 celcius. ALSO the CPU is DEFINITELY suspect as I could not get the contact completely clean.

MY QUESTION now is, if I get a new CPU can I use it in the SECOND socket, or does there have to be a CPU in socket 1? I would rather not have to immediately replace m/b AND cpu if I can help it.

MTIA