Operating System - Linux
1847211 Members
2355 Online
110263 Solutions
New Discussion

Proliant DL365 G5 and OpenSUSE 11.1

 
James Stallings
New Member

Proliant DL365 G5 and OpenSUSE 11.1

Greetings.

I have two new servers that just hit the rack around the first of the year. Both are Proliant DL365 G5s running openSUSE 11.1 in a server configuration. Both employ the integrated p400i storage controller. The smaller, operating in a dual primary role of MySQL and Apache2 server, has 8 GB Ram and 6x180 GB 10K rpm SAS as Hardware RAID5 presenting a linear 733 GB volume; the larger, operating in a primary role as a web application server, has 32 GB Ram and 5x250 GB 7.2K rpm SATA as Hardware RAID5 presenting a linear 750 GB volume.

These machines are, at present, not in production and so are only very lightly loaded (deployment pending final configuration, still WIP).

The smaller of the two performs flawlessly and completely without issue of any sort; the larger, at intervals of two days to two weeks, emits into the logs the following message:

Feb 17 21:06:57 simulacrum kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Feb 17 21:06:57 simulacrum kernel: ata1.00: cmd a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Feb 17 21:06:57 simulacrum kernel: cdb 4a 01 00 00 10 00 00 00 08 00 00 00 00 00 00 00
Feb 17 21:06:57 simulacrum kernel: res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 17 21:06:57 simulacrum kernel: ata1.00: status: { DRDY }

Typically after we see this message, within a period as short as two hours in one instance, and two days at the longest, this box will throw a kernel panic and lock up. We have been unable to examine the kernel stack dump when this occurs as the system terminal sleeps in the data center and will not wake up after the panic (no iLO).

This sequence of events has occurred four times since the boxes were installed in the second week of January.
Yesterday, after a bit of research and some assistance from HP support, we placed the following boot parameters onto the kernel command line in /boot/grub/menu.lst and rebooted:

noacpi pci=biosirq

The kernel invocation already contained the parameter 'clock=hpet', per the documentation addendum included with the systems.

Unfortunately, in spite of these changes to the kernel invocation, the error showed up in the logs again this morning, and I am now anticipating a crash.

Can anyone offer any suggestions as to the likely cause for this sequence of events? We would like to have had these systems in production long before now.

Thanks in advance for any assistance anyone can provide :D

Cheers and best regards
James

2 REPLIES 2
James Stallings
New Member

Re: Proliant DL365 G5 and OpenSUSE 11.1

My apologies, I see I somehow got this post in the wrong area - if a forum mod would be so kind as to relocate it to the proper area this would be appreciated :D

Thanks
James
Ciro  Iriarte
Valued Contributor

Re: Proliant DL365 G5 and OpenSUSE 11.1

Would be a good start to have the latest Bios/firmware version (from Firmware Maintenance CD) and latest openSUSE patches (11.1 is really new)...

Also, haven't touch Proliants that don't run on SCSI, but some of this issues are solved using the SATA controller in AHCI mode in regular PCs (not sure if it applies to p400i)....