System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

RHEL 6 kernel panic on DL585 G7

Christian Masopust
Occasional Contributor

RHEL 6 kernel panic on DL585 G7

Hi all,

both of my DL585 G7 (48 cores, 128GB RAM)
RHEL6-systems are facing a kernel panic from
time to time (appr. every 10-15 days).

facts:
- all filesystems ext4
- nfs4 enabled
- 3 bonding devices, each having 2 physical devices
- 2 of the bonding devices configured for jumbo frames (MTU=9000)
- "latest-greatest" firmware applied
- kernel is 2.6.32-71.14.1.el6.x86_64


Here's the console-log from one of the HP's:

------------[ cut here ]------------
kernel BUG at fs/inode.c:1333!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu47/cache/index2/shared_cpu_map
CPU 4
Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler
lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h
pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]

Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler
lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h
pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]
Pid: 3393, comm: lockd Tainted: G W ---------------- 2.6.32-71.14.1.el6.x86_64 #1 ProLiant DL585 G7
RIP: 0010:[] [] iput+0x69/0x70
RSP: 0018:ffff88082b86fce0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802fc8616c8 RCX: 000000000000c60e
RDX: ffff88202e13a901 RSI: ffffffffa0341de0 RDI: ffff8802fc8616c8
RBP: ffff88082b86fcf0 R08: 000000000002ac45 R09: 0000000000000000
R10: 000000000000000f R11: 0000000000000000 R12: ffff880227b49c00
R13: ffffffffa034e060 R14: ffff88202e13a940 R15: 00000000fffffff5
FS: 00007fac6a0247c0(0000) GS:ffff88002c240000(0000) knlGS:00000000f77916c0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fac6a048000 CR3: 0000000c2da36000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process lockd (pid: 3393, threadinfo ffff88082b86e000, task ffff88082d9ab4e0)
Stack:
ffff88082b86fd40 ffff8802fc861680 ffff88082b86fd10 ffffffff813fdbf1
<0> ffff880227b49c00 ffff880227b49c00 ffff88082b86fd30 ffffffffa03351f8
<0> ffff88082b86fd30 ffff880227b49c10 ffff88082b86fd60 ffffffffa0341e2c
Call Trace:
[] sock_release+0x71/0x90
[] svc_sock_free+0x48/0x70 [sunrpc]
[] svc_xprt_free+0x4c/0x70 [sunrpc]
[] ? svc_xprt_free+0x0/0x70 [sunrpc]
[] kref_put+0x37/0x70
[] svc_xprt_put+0x19/0x20 [sunrpc]
[] svc_xprt_release+0xc1/0xe0 [sunrpc]
[] svc_recv+0x2ed/0x830 [sunrpc]
[] ? default_wake_function+0x0/0x20
[] lockd+0xc1/0x230 [lockd]
[] ? lockd+0x0/0x230 [lockd]
[] kthread+0x96/0xa0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xa0
[] ? child_rip+0x0/0x20
Code: 38 48 c7 c0 f0 7c 18 81 48 85 d2 74 12 48 8b 42 20 48 c7 c2 f0 7c 18 81 48 85 c0 48 0f 44 c2 48 89 df ff d0 48 83 c4 08 5b c9 c3 <0f>
0b eb fe 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f
RIP [] iput+0x69/0x70
RSP
ÿMounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Free memory/Total memory (free %): 456164 / 495584 ( 92.0457 )
Loading jbd2.ko module
Loading mbcache.ko module
Loading ext4.ko module
Loading crc-t10dif.ko module
Loading sd_mod.ko module
Loading ata_generic.ko module
Loading exportfs.ko module
Loading autofs4.ko module
Loading ipmi_msghandler.ko module
Loading sunrpc.ko module
Loading ipv6.ko module
Loading uinput.ko module
Loading hwmon.ko module
Loading bnx2.ko module
Loading edac_core.ko module
Loading edac_mce_amd.ko module
Loading sg.ko module
Loading hpilo.ko module
Loading nx_nic.ko module
Loading cdrom.ko module
Loading pata_acpi.ko module
Loading pata_atiixp.ko module
Loading ahci.ko module
Loading hpsa.ko module
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
Loading i2c-core.ko module
Loading dm-mod.ko module
Loading nfs_acl.ko module
Loading auth_rpcgss.ko module
Loading ipmi_devintf.ko module
Loading ipmi_si.ko module
Loading lockd.ko module
Loadingpower_meter ACPI000D:00: Ignoring unsafe software power cap!
bonding.ko module
Loading dm-log.ko module
Loading power_meter.ko module
Loading amd64_edac_mod.ko module
Loading i2c-piix4.ko module
Loading sr_mod.ko module
Loading drm.ko module
Loading i2c-algo-bit.ko module
Loading nfsd.ko module
Loading dm-region-hash.ko module
Loading ttm.ko module
Loading drm_kms_helper.ko module
Loading dm-mirror.ko module
Loading radeon.ko module
Waiting for required block device discovery
Waiting for 8 sdd-like device(s)...Found
Creating Block Devices
Creating block device loop0
Creating block device loop1
Creating block device loop2
Creating block device loop3
Creating block device loop4
Creating block device loop5
Creating block device loop6
Creating block device loop7
Creating block device ram0
Creating block device ram1
Creating block device ram10
Creating block device ram11
Creating block device ram12
Creating block device ram13
Creating block device ram14
Creating block device ram15
Creating block device ram2
Creating block device ram3
Creating block device ram4
Creating block device ram5
Creating block device ram6
Creating block device ram7
Creating block device ram8
Creating block device ram9
Creating block device sda
Creating block device sdb
Creating block device sdc
Creating block device sdd
Creating block device sr0
mdadm: No arrays found in config file or automatically
Free memory/Total memory (free %): 432796 / 495584 ( 87.3305 )
Saving to the local filesystem /dev/sdd1
e2fsck 1.41.12 (17-May-2010)
Homes: recovering journal
Homes: clean, 9073003/164782080 files, 387571383/659105347 blocks
Free memory/Total memory (free %): 427248 / 495584 ( 86.211 )
Copying data : [ 2 %]
Copying data : [100 %]
Saving core complete
Restarting system.

Backtrace from crash-dump utility shows:

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/lib/modules/2.6.32-71.14.1.el6.x86_64/vmlinux
DUMPFILE: ./vmcore [PARTIAL DUMP]
CPUS: 48
DATE: Wed Feb 9 09:30:52 2011
UPTIME: 14 days, 13:57:19
LOAD AVERAGE: 3.65, 3.39, 3.25
TASKS: 1663
NODENAME: hydra.sie.siemens.at
RELEASE: 2.6.32-71.14.1.el6.x86_64
VERSION: #1 SMP Wed Jan 5 17:01:01 EST 2011
MACHINE: x86_64 (2095 Mhz)
MEMORY: 128 GB
PANIC: "kernel BUG at fs/inode.c:1333!"
PID: 3393
COMMAND: "lockd"
TASK: ffff88082d9ab4e0 [THREAD_INFO: ffff88082b86e000]
CPU: 4
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 3393 TASK: ffff88082d9ab4e0 CPU: 4 COMMAND: "lockd"
#0 [ffff88082b86f9a0] machine_kexec at ffffffff8103695b
#1 [ffff88082b86fa00] crash_kexec at ffffffff810b9068
#2 [ffff88082b86fad0] oops_end at ffffffff814cc6e0
#3 [ffff88082b86fb00] die at ffffffff8101733b
#4 [ffff88082b86fb30] do_trap at ffffffff814cbfb4
#5 [ffff88082b86fb90] do_invalid_op at ffffffff81014ee5
#6 [ffff88082b86fc30] invalid_op at ffffffff81013f5b
[exception RIP: iput+105]
RIP: ffffffff81186bf9 RSP: ffff88082b86fce0 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802fc8616c8 RCX: 000000000000c60e
RDX: ffff88202e13a901 RSI: ffffffffa0341de0 RDI: ffff8802fc8616c8
RBP: ffff88082b86fcf0 R8: 000000000002ac45 R9: 0000000000000000
R10: 000000000000000f R11: 0000000000000000 R12: ffff880227b49c00
R13: ffffffffa034e060 R14: ffff88202e13a940 R15: 00000000fffffff5
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff88082b86fcf8] sock_release at ffffffff813fdbf1
#8 [ffff88082b86fd18] svc_sock_free at ffffffffa03351f8
#9 [ffff88082b86fd38] svc_xprt_free at ffffffffa0341e2c
#10 [ffff88082b86fd68] kref_put at ffffffff8125cb97
#11 [ffff88082b86fd88] svc_xprt_put at ffffffffa0340f29
#12 [ffff88082b86fd98] svc_xprt_release at ffffffffa0341191
#13 [ffff88082b86fdc8] svc_recv at ffffffffa03415bd
#14 [ffff88082b86fe58] lockd at ffffffffa02f6291
#15 [ffff88082b86fee8] kthread at ffffffff81091a76
#16 [ffff88082b86ff48] kernel_thread at ffffffff810141ca
crash>

any idea? any hint? what else can i do to find the reason for these panics? how to solve it?

thanks a lot,
christian
2 REPLIES
Zinky
Honored Contributor

Re: RHEL 6 kernel panic on DL585 G7

I would go engage RHEL support who can haul HP into the picture.

RHEL 6.0 is so new and may see occasional hiccups which is true for a new Linux release.

I've had RHEL 6.0 though on a DL380 G6 (Lab/Test Environment) and no issues whatsoever.

These DL-5XXX G7's are Magny Cours based right? you have 4 sockets at 12 cores/per... Maybe you are simply missing known updates? You have your RHEL 6.0 patches current? What about PSP bits? Is it installed?

I can also recommend installing the Hardware Validation Suite for RHEL 6.0 - it should be available from one of the Software Channels on your RHEL subscription. Try running the stress test suite for a few days and see if it manifests.

Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
Christian Masopust
Occasional Contributor

Re: RHEL 6 kernel panic on DL585 G7

the problem for these panics was within the used RHEL6 kernel (bug in nfs-lockmgr). RedHat fixed this bug with one of the latest kernels (don't know the exact kernel-version, but if you update to RHEL6.1 at least this bug is fixed :-))