Operating System - Linux
1839075 Members
5153 Online
110136 Solutions
New Discussion

Re: Server reboots roughly every three hours

 
Sander Marechal
New Member

Server reboots roughly every three hours

Hello all,

I am running Debian/etch on a ProLiant ML370 G3 server. I have also managed to install the hpasm tools.

The install went great but the server occasionally reboots itself spontaneously. Usually that is after three hours or so. There are no error messages in /var/log/kern.log or /var/log/messages. Also, "hplog -v" does not show anything about a reboot. Only a POST warning "Array Accelerator Battery Charge Low". That is all.

Has anyone experienced this before? Any suggested fixes, or even a way to find out why the reboot occurs (besides staring at the console for three hours straight?)

Thanks in advance for any help.
9 REPLIES 9
Ivan Ferreira
Honored Contributor

Re: Server reboots roughly every three hours

Maybe you have a colling problem or a failed fan and is rebooting due to enviromental problems.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Sander Marechal
New Member

Re: Server reboots roughly every three hours

I thought about that too, but apparently not. I monitored the temperature at various points up until the crash/reboot and it's always between 33-38C/89-100F.

I happened to view one of the crashes in action by pure chance. The system froze for a couple of seconds, then a large amount of stuff was dumped to the console and then it rebooted. The stuff went by so fast that I didn't have a chance to see what it was, but it wasn't random binary garbage. It did say something. And apparently it's not saved anywhere.

Is there a way to log all console output so I can see what it was?

In the mean time I'm going to reboot, disable ASR and wait for the next crash in the hope that my console will stay visible.
Ivan Ferreira
Honored Contributor

Re: Server reboots roughly every three hours

Try stopping the HP ASR service. There is an option to avoid the server reboot, but I don't remember where is configured.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Florian Heigl (new acc)
Honored Contributor

Re: Server reboots roughly every three hours

I don't know the answer, but does debian/etch happen to have kexec/kdump support?

it might be helpful to gather a crash dump here. or you could always set the console to a serial port and hook up something there, so you can record the error as it passes by.

florian
yesterday I stood at the edge. Today I'm one step ahead.
Sander Marechal
New Member

Re: Server reboots roughly every three hours

Hmm, strange. Since I have turned off ASR the server has stopped crashing. Can it be a bug in ASR itself?

The server hasn't run since end-2004. Have there been firmware updates that fix ASR related crashes since then? Or mayve ASR doesn't work properly on a Xen kernel? I am using linux-image-2.6.18-4-xen-686 kernel. hpasm is installed in Xen domain0.

I am going to turn ASR back on and see if it starts crashing again.

> or you could always set the console to a serial port and hook up something there, so you can record the error as it passes by.

Good idea but I don't have anything to hook up. Maybe I can borrow something.
Florian Heigl (new acc)
Honored Contributor

Re: Server reboots roughly every three hours

Should be more like a bug with the watchdog support then. Ummm, we ocne had this HP sponsored linux workshop (marketing day), where I asked if HP (even in some proprietary form) supported their own watchdog devices when running linux, and the answer was a plain no.

(Next thing I asked was if adapting the support in a hotfix kernel patch would be covered in software support :)
yesterday I stood at the edge. Today I'm one step ahead.
Sander Marechal
New Member

Re: Server reboots roughly every three hours

Well, I managed to crash the server with ASR enabled and disabled. This time, with ASR disabled the messages stayed on the screen and showed a nice, fat dom0 kernel crash.

One of the Xen guest systems is used as an NFS server. When I upload a couple of GB from my desktop to the NFS share, the system comes down. So it looks like a Xen kernel / NFS issue. I've submitted a bug at Debian's BTS.

The system logs show no crash after a reboot but that could be because NFS is keeping the drives busy upto the point of the crash.

Anyway, thanks for the help so far. I guess I'm in the market for a different file server protocol that does behave well under Xen :-)
Florian Heigl (new acc)
Honored Contributor

Re: Server reboots roughly every three hours

Hi,

I used to run various domUs based on nfs -
a) linux nfs code is _not stable_, no matter what people state.
b) i currently have a linux domU that servers as fileserver, and often push 10-80GB in or out, without stability issues.
c) I remember having NFS-bound crashes taking the system down, but back then I ran nfs in dom0 (stupid idea). The reason was i nfs-exported loopback-mounted filesystem images that were corrupt. the fs corruption error message only went to the kernel console and it took days to finally see the error message.

if, as you write your nfs server is in a domU, but the dom0 crashes, then this is not an nfs, but a load issue. (still points to the xen kernel though ;)
yesterday I stood at the edge. Today I'm one step ahead.
Sander Marechal
New Member

Re: Server reboots roughly every three hours

Hi Florian

> b) i currently have a linux domU that servers as fileserver, and often push 10-80GB in or out, without stability issues.

Pushing out isn't the problem. It's taking in that I experience crashes with NFS.

> c) I remember having NFS-bound crashes taking the system down [...] The reason was i nfs-exported loopback-mounted filesystem images that were corrupt.

I am exporting whole LVM volume groups. I don't use loopback filesystems, so that can't be it.

> if, as you write your nfs server is in a domU, but the dom0 crashes, then this is not an nfs, but a load issue. (still points to the xen kernel though ;)

Yup :-) I have managed to find a workaround though. I replaced the nfs-kernel-server package with unfs3, a userspace NFS3 server. It's a lot more stable now. The only downside is that it doesn't support file locking but that's not really an issue for me. I use it in a SOHO setting with only a few computers using the fileserver (and mostly for reading at that).

Thanks!