Operating System - Linux
1833767 Members
2324 Online
110063 Solutions
New Discussion

HP DL145 G2 completly unstable under SUSE SLES 9 SP2

 
michael halligan
Occasional Advisor

HP DL145 G2 completly unstable under SUSE SLES 9 SP2

Our error is very reproducible, unfortunately I've been unable to get HP to do anything but ship me new parts. Suse is starting to be a bit more helpful, but the process is very slow.

The problem is directly related to moderate or heavy network I/O that involves reading from, or writing to disk.

I have updated my servers to the latest SATA firmware, the latest LOM firmware, and the latest BIOS firmware. I'm also using the BCM5700 ethernet driver from SUSE, the bcm5700-8.13.3a-1.

The only log errors generated, are sent to console, and look like these :

hde: dma_timer_expiry: dma status == 0x24

When the crash occurs, the server responds to pings, and tcp sockets remain open. It will keep spewing out the dma_timer_expiry logs, but nothing else really works.

Right before the crash, watching vmstat, all processes are shown to move into wait state.

Our operating environment is:

Hardware:

HP DL145 G2, Single & Dual Opteron 246s
Dual 80gb Maxtor drives or Dual 400gb Western Digital drives

Software/Configuration

SLES 9 SP2 x86_64
Kernel 2.6.5-7.201-smp
boot options: append = "resume=/dev/rootvg/swaplv selinux=0 load_ramdisk=1 acpi=off console=tty0 console=ttyS2,57600 acpi=off splash=silent elevator=cfq"
- or -
boot options: append = "resume=/dev/rootvg/swaplv selinux=0 load_ramdisk=1 acpi=off console=tty0 console=ttyS2,57600 apm=off splash=silent elevator=cfq insmod=bcm5700"
All partitions, except for /boot, and swap, are reiserfs, on top of lvm2, on top of software raid1.

Kernel modules:
sg 51128 0
sr_mod 26788 0
ipv6 317432 23
af_packet 33676 2
dm_snapshot 25016 0
bcm5700 157660 0
sata_nv 18564 0
ata_piix 19204 0
libata 59656 2 sata_nv,ata_piix
dm_mod 69344 11 dm_snapshot
raid1 24704 1
reiserfs 264816 8
sd_mod 30208 0
scsi_mod 144128 4 sg,sr_mod,libata,sd_mod

Misc related software:
rsync-2.6.2-8.14
rsnapshot-1.2.1-1
4 REPLIES 4
Jaroslav Matys
Respected Contributor

Re: HP DL145 G2 completly unstable under SUSE SLES 9 SP2

I would try to change the scheduler (elevator). More details in /usr/src/linux/Documentation/kernel-parameters.txt and /usr/src/linux/Documentation/as-iosched.txt
michael halligan
Occasional Advisor

Re: HP DL145 G2 completly unstable under SUSE SLES 9 SP2

Tried changing the scheduler today, to no avail.
michael halligan
Occasional Advisor

Re: HP DL145 G2 completly unstable under SUSE SLES 9 SP2

To date, HP has had me generate Kernel dumps, to which they have responded by saying "boot with noapic". This just makes the crash more complete.

Does Novell actually mean anything when they "YES Certify" hardware, or is that a big marketing lie? Sles9 is completly unstable on these boxes. My customers running Debian have never crashed.
michael halligan
Occasional Advisor

Re: HP DL145 G2 completly unstable under SUSE SLES 9 SP2

Does HP or Novell actually know how to run the hardware/software combinations they so gleefully boast in their marketing material as being certified to work with each other? I'm feeling a little bit suckered here.

I bought this hardware combination because hey, Novell said it worked on this specific platform according to YES certification. HP's certification matrice agreed to this.

This is a vendor problem. Where the is the Vendor in trying to fix this? Where is the best practices document that will tell me what I'm doing wrong?

No wonder both Novell and HP are in dire financial straights. This is why Dell is hurting. Treat your customers right, and they'll treat you right. Treat them wrong, and they'll find a new vendor.