ProLiant Servers (ML,DL,SL)
1748078 Members
5257 Online
108758 Solutions
New Discussion юеВ

Random Reboots - DL380 G3

 
Brian_Murdoch
Honored Contributor

Re: Random Reboots - DL380 G3

Hi Dave,

There's nothing much in the new ADU report however the format of the V8.XX ADU report is different from the V7.XX and always seems to report less on the fault side.

Would it be possible for you to boot from a Smartstart 7.8 or lower CD and run ADU to generate a V7 report. This may reveal more. Alternatively you could downgrade the online version and run it from there. Sorry if this causes any hassle.

The firmware is low on the controller and the drives as suggested. The last Smart Array 5i firmware was 2.62. It's probably best if you run the firmware maintenance CD and upgrade all required devices.

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=uk&prodTypeId=15351&prodSeriesId=397634&prodNameId=3288132&swEnvOID=1005&swLang=8&mode=2&taskId=135&swItem=MTX-425e6194d2f64d049eed07abeb

Regards,

Brian
Brado23
Advisor

Re: Random Reboots - DL380 G3

I have a dual 3.06GHz DL380 G3 with the same issues. I have been struggling with this issue for 1 year now and haven't been able to find the fault. The most I have had my system up for in the last year is 38 days. Being a VMware ESX server this is unacceptable. I have ruled out every other cause besides a hardware problem, but haven't replaced any hardware yet as 1) the server is 1 year out of warranty and I dont have funds to keep throwing at parts until I get lucky and find the faulty part, and 2) I didn't have any solid leads until recently as to what part may be faulty. I have redundant Power supplies connected to separate power feeds to eliminate the chance of a UPS issue causing the fault, etc.

Recently a colleague of mine said he had the random reboot issue fixed on a DL380 G3 system he manages by replacing the DC converter module in the server (HP Spare Part Number 316052-001). As this part is about $150AU I'm going to try my luck and replace it and see how it goes. If it fixes my problem I'll post back here, but it may be some time as I wont know for about 40 days after I replace it.
gregersenj
Honored Contributor

Re: Random Reboots - DL380 G3

>Brado23

Your best freind is insight manager and the IML.
ESX is supported, if you take your time, and find out how to install and use the agents. Then you might get the information that may help you to find the problem.

It's not a crystal ball, but in most cases it will tell you whats wrong.

If you suspect, that the powersub system drops, you can check the Ilo eventlog. If so you might see some "Power restored entries"

Accept or Kudo

Dave McGough
Occasional Advisor

Re: Random Reboots - DL380 G3

@Brado23, I would appreciate any feedback you can supply. fwiw, the machine has been up since May 24 @ 10:10am. That is probably the longest uptime since this started happening. I think its power related but I can't prove it. Both my power supplies are connected to the same UPS, however, no other machines to my knowledge are being affected including an identical dl380 that sits on the same UPS.
Calil
New Member

Re: Random Reboots - DL380 G3

Hello all,

We have a DL 380 G3, with MS Windows Server 2k3 SP2, that runs for an "isolated" (few users, specific purpose) application with Oracle DB.

This server started its "random reboots" about 3 weeks ago.

Since then, we:
. Updated the OS;
. Updated the storage controller drivers;
. Updated the storage controller firmware;

But nothing seems to change. The IML registers a line "Blue Screen Trap (BugCheck, STOP: 0x0000007F (0x00000008, 0x80042000, 0x00000000, 0x00000000))",Operating System,Critical,1,25/6/2008 09:45,25/6/2008 09:45" at each reboot.

Now we are going to updated all the drivers and keep watching the server.
However, there are no further plans.

Can anyone help us?

Thanks in advance.
Brado23
Advisor

Re: Random Reboots - DL380 G3

I haven't used Insight Manager but have checked the IML and ILo logs, and have also run smartstart diagnostics several times. Have also tested memory using memtest and everything has been OK. I have also pulled power supplies when server is running to check the redundancy is working, which it is.
I also have the HP Agents V8.00a installed on the server which doesn't report any abnormalities. Have tried many combinations of agent install types just incase the agents were causing the reboots, but all configs have same result.

However, as you say, when the reboot occurs I have a "Server Reset" message in the ILo log followed immediately with a "Server power restored" message. No logs in VMware indicate any reason why a restart has occurred. I do suspect it is a problem with the power subsystem dropping. I've got my finger crossed anyway as I really want to sort this out. I've arranged a replacement, so will hopefully know this soon.
Brado23
Advisor

Re: Random Reboots - DL380 G3

OK, replaced the part I previously mentioned and the server still crashes. Seems to be getting worse and worse too.

I set up webcam recoding of the server so I could see what it does when it crashes. I have caught it in the act two times and have some ideas for a few more troubleshooting steps as a result. If I find out any more I'll post back here again.

I'm really hoping its not a motherboard problem as a new one is $1200. I could always get a second hand one off ebay I guess.
Calil
New Member

Re: Random Reboots - DL380 G3

I have contacted MS and, from de memory dump file, they concluded that a driver (in case, de client32, of Novell Netware) was causing a stack overflow.
We updated the driver and the server is on, since then (last week).

I suggest you to disable the AUTOMATICALLY RESTART option of the System Recovery Advanced Options, so you can receive a blue screen and find wich driver is causing the shutdown.

Hope it helps.
MT19
Valued Contributor

Re: Random Reboots - DL380 G3

Dave, you may want to replace the SCSI backplane.

mark
Jason Slinks
Occasional Advisor

Re: Random Reboots - DL380 G3

Hi Guys,

I have the exact same issue and we've had this for over a year.

The IT dudes before me apparently had the RAM and MOBO replaced but still no joy.

The longest i have seen the server up is 18 days and that was after i upgraded to the latest PSP.

The server can randomly reboot at any time - once a day, twice a day, once every 3 days, midnight, afternoon, morning etc.

I have ran the HP Smart Start CD and done full diags on a 10 time loop and everything reported fine.

I've also disabled ASR. There are no HEAT issues as the LEDS are green.

So here's what i've recently done.

1) Plugged Server Direct Into Power Outlet and took UPS out of the equation.

2) Upgraded PSP a couple of times in the last 6 months as per HP advice.

3)Replaced the PSU.

My last test (last night) was to boot server and let the XP DVD boot. I then selected XP Pro Corp and after Windows does it's checks, i left it be.

When coming into the office this morning, i was expecting to see the same screen i had last night BUT NO - the server had rebooted back to the logon screen.

I ran a system uptime batch file within windows and learned the server was up for a total of 10 hours, which indicated the server went down around 23:00 last night.

Now i am certain this isn't software related. I mean how can it be? Windows was NOT in session last night so something else triggered it to reboot.

I have also had the pleasure of seeing the server boot in front of me and NOTHING shows up. The best i can describe it is if you were to hit the RESET button on your PC - screen goes black and starts booting.

When i replaced the PSU, it was a straigh swap so i thought that the problem could be the actual slot the PSU slips into.

So instead of changing, i got another PSU and fired up both PSUs. I was cockily thinking i had the issue sussed until the server rebooted about 20 mins ago with the same exact crap about "System Previously Shutdown Unexpectadly".

We have another server (exact model) and it's been up for months.

Now my next move is to compare both BIOS on both servers and make sure settings are replicated.

I will then run some Diags via the utils on the Hirens CD.

Something is losing power and causes the system to reboot.

Could it be thermal grease on CPU, could it be RAID controller? I have no fricken idea but please let's collectivity sort this out.

Regards,

Jay