ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Absolute nightmare of a DL380 G7

 
BrightMinds
Frequent Advisor

Absolute nightmare of a DL380 G7

To fill those in that don't already know...

Our DL380 G7 has been a nightmare since day 1. Firstly the machine would randomly reboot. HP replaced the motherboard and afterwards it just randomly shutdown. This was narrowed down to a possible UPS issue (see http://www.geakeit.co.uk/2010/11/04/review-avoid-the-hp-dl380-g7/).

UPS replaced (at significant cost) the server now BSODs with 0xF4.

I've installed the latest PSP and been through the entire device manager and updated everything that's there (driver and firmware). I'll reboot it shortly once the morning load is out of the way.

Anyone else have any ideas where to go next with this waste of money?
78 REPLIES
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

By the way when the server reboots the storage controller sometimes fails to initialize. When it does initialize (normally following a further forced reboot) it displays post error 1719 (controller failure) and lockup code 0x13.
Jan Soska
Honored Contributor

Re: Absolute nightmare of a DL380 G7

Hello,
we have only limited numbers on G6 server, but newer had such problem. We use original HP ups's and APC Symetra PX ups's.
It seem high quality HP PSU require really good online ups.
Why do blame HP? Modern PSU's with active PFC and very high eficiency (80+) are common in home computers, finally global vendors push them into server world to save energy...

Jan
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

none of our other servers exhibit this behavior and HP never made any mention of this when selling the server.

Also, their technical support barely know what a UPS even is..
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Hi there Brightminds, I share your pain !

We recently purchased 17 DL380 G7's for a variety of uses; a couple of 4-node Hyper-V clusters, a couple of 3-node Xen Server clusters and a 3-node SQL cluster.

We have intermittently been having server hangs/reboots with some, but not all of these servers, which sound very similar to the issue you are reporting.

Initially we were getting fairly frequent Stop 0xF4's with some of the servers running Windows Server 2008 R2, with no crash dump file but always an Integrated Management Log entry on the next powerup saying "POST Error: 1719 - A controller failure event occurred prior to this power-up". This suggests to me that the array controller was hanging, and as you say, sometimes struggling to even get the server to reboot again, with the server getting stuck on the BIOS Option ROM screen initialising the array controller.

We then upgraded the P410I Array Controller BIOS on the server to v3.52, which we thought had fixed the issue. (This update isn't listed on the DL380 G7 Support and Drivers page for some reason, you need to goto the P410i support and drivers page), but we now have had the issue re-occur, but gut feeling is that it happens less often now.

I'm just about to raise the issue with HP Support again, but its always a painful experience that I have no expectation of finding a solution (Run raid diagnostics, reseat the cache memory, reset the NVRAM etc. etc.)

I just hope that HP know about the issue and are already working on a fix !

Out of curiosity, hum much memory do you have in your servers ? for us the servers that suffer the most have more memory in them than the others, 60GB.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

Simon - Absolutely fantastic, someone with the same issues as us!

Yes the IML has post errors with 1719 - a controller failed etc after hanging/BSOD'ing with 0xF4.

Updated the P410i firmware to 3.52 and it still happens.

Should take delivery of a new UPS tomorrow but I'll be amazed if that makes a difference.

My email address is josh {at} my username dot co dot uk. We're desperate for something to fix as it's our dedicated SQL server!
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Hi Joel, we run our servers off a building UPS, so I can't believe power is an issue. What's the spec of your G7? Ours is as follows:

Part Number: 583970-421 DL380 G7 - 2xXeon X5660/2.8Ghz, 2x750W PSU, Smart Array P410i/1G FBWC
Memory: 60GB (6x2GB and 6x8GB)
Storage: 2x 72GB 6G SAS 15K SFF Dual Port (Mirrored)
3 x NC364T PCI Express Quad Port Gigabit Server Adapter
O/S: Windows Server 2008 R2 Enterprise

Do you get a crash dump file with your BSODs ?
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

It's Josh not Joel!

Our DL380 G7 is...

2x X5650
32GB Ram
2x 72GB Raid1
2x 72GB Raid1
4x 146GB Raid 5

Don't think I have the crash dump but I'll look tomorrow.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

Also btw we're running Server 2008 R2 x64.

Don't have memory dump I'm afraid :( I've set it to create next time though.
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

It will be interesting to see if you get a crashdump...

What cache size and type have you got on your array controller ?

Next thing I'm trying is upgrading the firmware on our SAS drives, and then temporarily removing the Flash Backed Write Cache from one of our servers to see if that may be the culprit.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

We're using the onboard HP 410i controller upgraded to the 3.52 firmware. All default options, difficult to access the config because it needs to be online.

Yeah I'll be interested to see if upgrading the hard drive firmware helps, keep me posted!

How often by the way do yours fail?
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Of the 17 DL380 G7's we have, 4 are 'live' (Hyper-V cluster), 7 are in DR and sitting doing nothing at the moment, and 6 are still being built.

Of these, we have the controller failure log entries in the IML of 5 of them; 3 of the live ones and 2 DR, with the worst culprits having the issue about once per week.

Since we upgraded the P410i BIOS to v3.52, we have only had the issue re-occur on one server, but 3 times in 1 week! (and it was one of the live Hyper-V servers too.) We have only been running the 3.52 BIOS for 3 weeks though.

You should be able to see your Array cache details on-line using either the Windows Array Configuration Utility, or the HP System Management Homepage, if you have them installed.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

Here's an export from ID...
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Yep, you've got the 1GB FBWC on your array controller the same as ours. This was only offered as a default option on the G7's so I wonder if it is related to the problem... i.e. most people using the P410i controller on the G6's would only have had a maximum of 512MB BBWC by default.

I've had the issue twice again since, both on the same server though. I've now disabled the write cache on the array controller for that server to see if it still re-occurs. I will hopefully be able to upgrade the disk firmware in the next couple of days to see if that has any effect.

How often is your server having the problem ?
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

It dies about once every 3 days however on Sunday it died within 20 minutes of booting!
Martijn Goudkamp
Occasional Advisor

Re: Absolute nightmare of a DL380 G7

Good, I'm not the only one with a rubbish DL380 G7! Spent â ¬ 8000 on a server that randomly hangs with SCSI problems.

Config: dual quad-core, 24GB ram, 4x146GB SAS 15k, 2x 300GB SAS 10k. Running ESX 4.1 with 10 VM's. P410i 1GB BBWC, dual power, etc.

Updated RAID to v3.52, even all the disks got an new firmware. Now, the server hangs once or twice a week, most of the time in the weekends, ESX is unresponsive except for the the ALT-F11 screen: this gets filled with SCSI errors. Then, after reboot, I get the same RAID errors as you guys get:

POST Message 10/30/2010 19:37 10/30/2010 19:32 2 POST Error: 1719 - A controller failure event occurred prior to this power-up
POST Message 11/01/2010 20:26 11/01/2010 20:26 1 POST Error: 1719 - A controller failure event occurred prior to this power-up
POST Message 11/01/2010 20:26 11/01/2010 20:26 1 POST Error: 1792-Drive Array Reports Valid Data Found in Array Accelerator
POST Message 11/05/2010 23:29 11/05/2010 23:29 1 POST Error: 1719 - A controller failure event occurred prior to this power-up



At least with the latest iLO firmware (1.10) it doesn't hang during a remote reboot and I can restart all my VM's while sitting on the couch but my trusty ML350 G6 NEVER gave me any problems.

Anyone with a solution?
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

Sorry the link to my story is actually this

http://www.geakeit.co.uk/2010/11/11/review-avoid-the-hp-dl380-g7/

I guess we all need to pester HP to ensure they know their customers are having problems.

I'll be interested to see if disabling the write cache helps, could indicate a bad memory batch.
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Hi there guys.

Disabling the write cache on the array made no difference. The server crashed with this issue five times yesterday ! I have now physically removed the array cache card from the server to see what happens.

I'll keep you posted !
Martijn Goudkamp
Occasional Advisor

Re: Absolute nightmare of a DL380 G7

Yesterday, after contacting support via mail, HP called me back, asking me for extra information: seems we're not the only ones as they want to replace the RAID card but also the whole motherboard on-site.

I'll let them; hope this will solve the problems.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

We've already had our motherboard changed but since starting this thread it seems the problem lies with the 410i storage controller.

I'll be very interested to hear if you still have the issues after they replace it!
Martijn Goudkamp
Occasional Advisor

Re: Absolute nightmare of a DL380 G7

Well well well, a HP technician just popped out of nowhere and left me a new 1GB BBWC module. Since the ESX server is live, I can now replace it myself whenever I feel fit. Let's see if this will solve the problems.

Note: last lockup was in the evening of November 5th; almost a week of having a stable server.

I remembered that I took pictures of the monitor when the server locked-up last time; I'll attach the ESX messages here.
Simon.H
Advisor

Re: Absolute nightmare of a DL380 G7

Well, the server that has been crashing 4 or 5 times a day for the last few days has now been stable for 24 hours since I removed the FBWC, so its looking more and more as if that is the problem.
I'm going to compare it with the FBWC in one of the other DL380 G7's I have that doesn't have the problem, to see if it looks any different...

Marco Bagnoli
Occasional Visitor

Re: Absolute nightmare of a DL380 G7

I'm about to get 12 G7 ...,
should i cancel the delivery ??

Really hope HO does fix it,
thanks everybody
ciao
Martijn Goudkamp
Occasional Advisor

Re: Absolute nightmare of a DL380 G7

Ciao Marco; non è a me ti sconsigliare un server HP; purtroppo questo modello in questo configurazione ha un sacco di problemi. Magari con un config diverse sei forse più fortunato.

---

Had another crash today: ESX 4.1 was running smoothly and then a nice SCSI error halted the server. See attached screenshot. Still got to replace the cache module.
BrightMinds
Frequent Advisor

Re: Absolute nightmare of a DL380 G7

Likewise, had an unresponsive system most likely with BSOD and 0xF4.

Took 4 reboots before the P410i managed to boot one of the arrays. Calling HP now to get a replacement P410i.