ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Lockup issues on DL140 + Linux

Lockup issues on DL140 + Linux

(also posted in the Linux forum)

We're using quite a few HP DL140's as PPTP Concentrators -- servers running Fedora Core 1 + the PoPToP package. Our DL140's are dual processor 2.4GHz machines.

We've noticed that randomly, but generally within 7 days of startup, these servers freeze up and have to be reset. There is always a kernel Oops that I have captured with Netdump / serial console:

ksymoops 2.4.9 on i686 2.4.21-20.ELcustom-mppe-20040928.1. Options used
-v /usr/src/linux-2.4/vmlinux (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-20.ELcustom-mppe-20040928.1/ (default)
-m /usr/src/linux/System.map (default)

Unable to handle kernel NULL pointer dereference<7>divert: not allocating divert_blk for
non-ethernet device ppp453
00000000
*pde = 38853067
Oops: 0000
CPU: 0
EIP: 0060:[<00000000>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: e8929000 ebx: e87a7000 ecx: e8965900 edx: c01a61b6
esi: 00000000 edi: e87a7000 ebp: efe72100 esp: c87bfed4
ds: 0068 es: 0068 ss: 0068
Process pptpctrl (pid: 23254, stackpage=c87bf000)
Stack: c01aa5d0 e8929000 00000000 c01a7c55 e87a7000 efa80380 00000005 e87a7000
efe72100 00000004 00000010 c01a45a5 e87a7000 efe72100 00000000 00000000
efe72100 c016db84 efe72100 00000000 c87be000 00000145 c87be000 00000004
Call Trace: [] pty_chars_in_buffer [kernel] 0x32 (0xc87bfed4)
[] normal_poll [kernel] 0x105 (0xc87bfee0)
[] tty_poll [kernel] 0x83 (0xc87bff00)
[] do_select [kernel] 0x230 (0xc87bff18)
[] sys_select [kernel] 0x33c (0xc87bff5c)
Code: Bad EIP value.


>>EIP; 00000000 Before first symbol

>>eax; e8929000 <_end+2841f5e8/38303648>
>>ebx; e87a7000 <_end+2829d5e8/38303648>
>>ecx; e8965900 <_end+2845bee8/38303648>
>>edx; c01a61b6
>>edi; e87a7000 <_end+2829d5e8/38303648>
>>ebp; efe72100 <_end+2f9686e8/38303648>
>>esp; c87bfed4 <_end+82b64bc/38303648>

Trace; c01aa5d0
Trace; c01a7c55
Trace; c01a45a5
Trace; c016db84
Trace; c016deee

We pulled our hair out over this one for weeks... I tried using Red Hat Enterprise ES3, dropping back to Red Hat 7.3, using various versions of the MPPE module we load... nothing worked. I also began using the bcm5700 from HP instead of the built-in tg3 driver that comes with Red Hat. Still the freezes would occur.

Finally I fired up the server in nosmp noapic mode... things got more stable and in fact there weren't any more crashes! Through the grapevine, I heard from others using Broadcom NIC's that they had issues and that noapic mode sometimes solved the problem.

So I rebooted again using only noapic instead of nosmp also... alas the lockups still occurred.

So currently all our DL140's are running in noapic nosmp mode which basically wastes one entire processor. But at least we're stable now.

Anyone have any insight into this? Anything I should try next? Would love to have SMP working again... but I cannot afford to have hundreds of customers getting disconnected at random hours throughout the day. :-)

Thanks...
1 REPLY
Michael Williams_6
Trusted Contributor

Re: Lockup issues on DL140 + Linux

The DL140 is recent enought for you to have manufacturer support if you didn't buy a Carepaq.

Log a call, something seems very wrong...