ProLiant Servers (ML,DL,SL)
1748177 Members
4242 Online
108758 Solutions
New Discussion

DL380/G3 -- Ultra160 hangs (RH8.0)

 
Ethan VanMatre
New Member

DL380/G3 -- Ultra160 hangs (RH8.0)


I'm running a 2.4.18-14smp kernel with the 6.2.28 aic7xxx driver. The
system hangs when doing I/O to the hardware raid disks attached to the
aic7899 interface. The released (with RH8.0) driver could not even do more
than fsck to the arrays.

It appears that the hangs happen when there is a large volume of writes
or more than one write stream. We want to use this system as a NFS server.

When the system hangs there is nothing written to the logs, a running top
shows nothing unusual (lots of memory, sway and a reasonable load
average). The keyboard does not respond and video is blank. The network
card does not respond to a ping.

The only thing unusual with this system is that the hardware raid
controller is set up to show 8 devices using 1 scsi id and 8 lun.

So where to look? Is it a problem with the aic7xxx driver, the interrupt
system in the compaq or the phase of the moon.

As a sided note: the raid controller using this configuration was moved
over from a DL380 G1 machine running RH7.1 (unknown aic7xxx version) where
it worked without problems for over a year.

Thanks, Ethan
1 REPLY 1
Ryan Byrne
New Member

Re: DL380/G3 -- Ultra160 hangs (RH8.0)

I hve struck a similar problem. The system seems to hang after what seems to be a random period of inactivity (up to 2-3 days of uptime).
I have tried with the default kernel 2.4.18-14smp and with 2.4.18-24.8.0smp and both have the same problem. I configured the iLO port, and using a remote console, was able to view what appeared to be a kernel panic (graphic of which is attached). I have two machines and both seem to have this problem. I am trying one of them now with a non-smp kernel to see if this can give any more details. The machines have a second processor, extra RAM and 3 disks (all purchased at the same time from HP - about 4 weeks ago), but other than that they are standard hardware. Since the machines are not in production, they have absolutely no load on them, and are in an idle state when they crash, so reproducing the problem is difficult. I will let you know if the nonsmp kernel does not crash.