ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Several debilitating problems with HP DL145 G2 hardware.

michael halligan
Occasional Advisor

Several debilitating problems with HP DL145 G2 hardware.

Where to begin? First off, What version of the BIOS, and Firmware should I be using? I want IPMI 2.0, so I grabbed whatever appeared to be the latest iLO firmware. This begins the problems. Every server with this iLO update sooner or later becomes unusable through the LOM. Sometimes I can get a console, if I have somebody go and physically un-plug the server, but not all the time.

Second, I seem to be having some serious failure issues. I have 20 of these servers, I've had 5 of these server's motherboards replaced (and have had one rma'd and replaced). I don't know if the motherboard is dying, or if just the console is, but the symptom is this:

I cannot access the console via the LOM, or a keyboard and motherboard.

The third problem I've been having, may or may not be hardware related. I've been having these servers crash under high load running Suse Linux Enterprise Server 9, SP2 with the latest kernel (-201). When it's happened, one, or all of the programs have been running: nfs, apache2, rsync. The crash isn't a full crash, so it's hard to detect, but the server is unusable.


13 REPLIES
michael halligan
Occasional Advisor

Re: Several debilitating problems with HP DL145 G2 hardware.

Does anybody have any ideas? These servers are proving to be very unstable.
Ruslan
Respected Contributor

Re: Several debilitating problems with HP DL145 G2 hardware.

Hmm... So bad statistics. What version of iLO firmware do you use? What version of SUSE - AMD64 or x86? Try to update drivers and firmware http://h18023.www1.hp.com/support/files/server/us/family/model/6155.html
michael halligan
Occasional Advisor

Re: Several debilitating problems with HP DL145 G2 hardware.

The most crucial issue I'm running into right now is these servers are crashing with any network related disk i/o



Our error consistently occurs when backing up using rsnapshot/rsync to backup partitions. It is rather reproducible.
This happens both with rsync in daemon mode, as well as rsync over NFS. Happened once after a 450mb upload through
a customer's xml-rpc application, which would have been written through a webserver, onto an NFS partition.

When the crash occurs, the server responds to pings, and tcp sockets remain open.

Our operating environment is:

Hardware:

HP DL145 G2, Single & Dual Opteron 246s
Dual 80gb Maxtor drives or Dual 400gb Western Digital drives

Software/Configuration

SLES 9 SP2 x86_64
Kernel 2.6.5-7.201-smp
boot options: append = "resume=/dev/rootvg/swaplv selinux=0 load_ramdisk=1 acpi=off console=tty0 console=ttyS2,57600 acpi=off splash=silent elevator=cfq"
- or -
boot options: append = "resume=/dev/rootvg/swaplv selinux=0 load_ramdisk=1 acpi=off console=tty0 console=ttyS2,57600 apm=off splash=silent elevator=cfq insmod=bcm5700"
All partitions, except for /boot, and swap, are reiserfs, on top of lvm2, on top of software raid1.

Kernel modules:
sg 51128 0
sr_mod 26788 0
ipv6 317432 23
af_packet 33676 2
dm_snapshot 25016 0
bcm5700 157660 0
sata_nv 18564 0
ata_piix 19204 0
libata 59656 2 sata_nv,ata_piix
dm_mod 69344 11 dm_snapshot
raid1 24704 1
reiserfs 264816 8
sd_mod 30208 0
scsi_mod 144128 4 sg,sr_mod,libata,sd_mod

Misc related software:
rsync-2.6.2-8.14
rsnapshot-1.2.1-1
ebyrne
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

You may want to try BIOS 2.13 with the updated iLO (1.21). I was experiencing the same problems with iLO 1.21 with the 2.14 BIOS. Machines that still had 2.13 worked fine. Backdated some from 2.14 to 2.13, and everything is fine with them so far. Have only been running them for a day or two on 2.13 though.

I have also found the latest RHEL4 AMD64 kernel is unstable if you don't run at least the 2.13 BIOS. Older kernels seemed to work fine with older BIOS revisions. SUSE kernels may have similar issues.

I have not had any stability problems, except for some SCSI connection related ones. They seem to have a problem with coming loose during shipping, or sometimes even moving from one room to another. If you are using the SCSI version of the server, I would make sure the riser, SCSI card, and cable connections are all tight. They usually just result in a non-boot, but sometimes they will be connected well enough to work fine for a while, and then suddenly start spouting SCSI errors until they are reseated.
michael halligan
Occasional Advisor

Re: Several debilitating problems with HP DL145 G2 hardware.

Ebyrne,

I was wondering, when you say the same problem, do you mean the Console failing? Or the sata drives timing out?

Also, how do you retrograde the bios? I've been unable to install the old bios versions. It seems once you've upgraded, you've upgraded .. HP even told me that I couldn't downgrade.
Allen Todd
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

Hi Michael -- did you ever get a good solution for your problems with the DL145 and SUSE sles9 sp2? We have had similar problems where the DL145 refuses to autonegotiate to gigabit and remains stuck at 100MB with some eval hardware.
michael halligan
Occasional Advisor

Re: Several debilitating problems with HP DL145 G2 hardware.

Allen,

I'm afraid I have not gotten any of my problems solved. If you're running suse, do NOT buy these servers. They are not compatible, no matter what the worthless YES certification says. I've wasted about a month of working this problem with HP & Novell. As it is, our HP servers are being reimaged with Debian, and we're looking for a better hardware solution. We're probably going to replace these all with servers from www.opensourcestorage.com .. They actually support Linux.

Michael
greg_gti
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

I absolutely agree with you, we too have very many DL145, also DL140 amd and intel variety.

We see the server completely useless or barf under high load, we were able to fix this issue by rolling back the SuSE kernel to 2.6.8!!

Ok so a message for HP engineers: Support us like you promise!!! This issue is very apparent! Just give us a patch already or some answers at least, my customers are losing faith in us...

Check out this cpu graph from our customer before and then after the kernel was rolled back to 2.6.8
Jon Ward
Trusted Contributor

Re: Several debilitating problems with HP DL145 G2 hardware.

If you have problems getting answers from HP Technical Support, then ask to have the case sent to the next level. If the case goes through enough levels, you should eventually get a resolution or a conclusive support statement from HP engineering.

Give each support level a chance, but if you are spending more than a couple of hours talk time to the first agent, then ask to be transferred to the next level.
Allen Todd
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

Now that SP3 is out has anyone had any better luck with the DL145s?
Jon Ward
Trusted Contributor

Re: Several debilitating problems with HP DL145 G2 hardware.

It is hard to say at this time if SP3 will make a difference in every scenario listed here, but HP has an advisory claiming that one issue will be fixed in SP3:

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00546261
ebyrne
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

Michael - I meant the console failling. Also, iLO firmware 1.22 seems to have solved the compatibility problem with BIOS 2.14.

I'm using the SCSI version of the server, so I can't help you with SATA.
david ash_1
Occasional Visitor

Re: Several debilitating problems with HP DL145 G2 hardware.

Well I just found these postings after spending a month myself trying to diagnose high throughput NFS issues/flakiness on a DL145 G2 box running SuSE 10. Major major major headaches.

But then running SuSE9SP3 on another DL145 things were flawless. I'd really steer clear of these machines if you're going to be running SuSE. Save yourself a few weeks of frustration.