- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: memory issues causing server hang
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-12-2010 09:21 PM
тАО01-12-2010 09:21 PM
my RHEL4 box hanged day before yesterday. I had a chance to look at it before it stopped responding. Some java and oracle processes were torturing the CPU and memory. Before i could release these resources the server stopped responding. The guy onsite had to hardboot the server.
He mentioned he had to fsck on swap FS(lvol1) to bring the server UP.
Below are some logs which indicate that there were some memory issues.
I did not see any SCSI errors.
----does these logs indicate a serious trouble for future??
----what else should i troubleshoot??
Jan 11 16:42:49 renault kernel: kswapd0: page allocation failure. order:0, mode:0x50
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:42:53 renault kernel: [
Jan 11 16:43:26 renault kernel: [
Jan 11 16:43:31 renault kernel: [
Jan 11 16:43:34 renault kernel: [
Jan 11 16:43:37 renault kernel: [
Jan 11 16:43:42 renault kernel: [
Jan 11 16:43:43 renault kernel: [
Jan 11 16:43:45 renault kernel: [
Jan 11 16:43:47 renault kernel: [
Jan 11 16:43:49 renault kernel: [
Jan 11 16:43:51 renault kernel: [
Jan 11 16:43:53 renault kernel: [
Jan 11 16:43:55 renault kernel: [
Jan 11 16:43:55 renault kernel: [
Jan 11 16:43:57 renault kernel: [
Jan 11 16:43:57 renault kernel: [
Jan 11 16:44:06 renault kernel: Mem-info:
Jan 11 16:44:06 renault kernel: DMA per-cpu:
Jan 11 16:44:07 renault kernel: cpu 0 hot: low 2, high 6, batch 1
Jan 11 16:44:10 renault kernel: cpu 0 cold: low 0, high 2, batch 1
Jan 11 16:44:11 renault kernel: Normal per-cpu:
Jan 11 16:44:12 renault kernel: cpu 0 hot: low 32, high 96, batch 16
Jan 11 16:44:14 renault kernel: cpu 0 cold: low 0, high 32, batch 16
Jan 11 16:44:15 renault kernel: HighMem per-cpu:
Jan 11 16:44:16 renault kernel: cpu 0 hot: low 32, high 96, batch 16
Jan 11 16:44:17 renault kernel: cpu 0 cold: low 0, high 32, batch 16
Jan 11 16:44:19 renault kernel:
Jan 11 16:44:20 renault kernel: Free pages: 704kB (704kB HighMem)
Jan 11 16:44:21 renault kernel: Active:570649 inactive:9253 dirty:16 writeback:6656 unstable:0 free:176 slab:44719 mapped:571245 pagetables:208368
Jan 11 16:44:22 renault kernel: DMA free:0kB min:16kB low:32kB high:48kB active:196kB inactive:4kB present:16384kB pages_scanned:410 all_unreclaimable? yes
Jan 11 16:44:22 renault kernel: protections[]: 0 0 0
Jan 11 16:44:23 renault kernel: Normal free:0kB min:936kB low:1872kB high:2808kB active:185860kB inactive:31000kB present:901120kB pages_scanned:264 all_unre
claimable? no
Jan 11 16:44:26 renault kernel: protections[]: 0 0 0
Jan 11 16:44:27 renault kernel: HighMem free:704kB min:512kB low:1024kB high:1536kB active:2096540kB inactive:6008kB present:3276800kB pages_scanned:0 all_un
reclaimable? no
Jan 11 16:44:28 renault kernel: protections[]: 0 0 0
Jan 11 16:44:29 renault kernel: DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
Jan 11 16:44:31 renault kernel: Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
Jan 11 16:44:32 renault kernel: HighMem: 48*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 704kB
Jan 11 16:44:33 renault kernel: Swap cache: add 225863514, delete 225830289, find 30533854/57772199, race 626+1803
Jan 11 16:44:35 renault kernel: 0 bounce buffer pages
Jan 11 16:44:36 renault kernel: Free swap: 4227380kB
Jan 11 16:44:37 renault kernel: 1048576 pages of RAM
Jan 11 16:44:38 renault kernel: 622504 pages of HIGHMEM
Jan 11 16:44:39 renault kernel: 206102 reserved pages
Jan 11 16:44:40 renault kernel: 5185907 pages shared
Jan 11 16:44:43 renault kernel: 33225 pages swap cached
Jan 11 17:52:07 renault kernel: kswapd0: page allocation failure. order:0, mode:0x50
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
Jan 11 17:52:14 renault kernel: [
================After reboot================
Jan 12 09:10:14 renault kernel: BIOS-provided physical RAM map:
Jan 12 09:10:14 renault kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
Jan 12 09:10:14 renault kernel: BIOS-e820: 0000000000100000 - 00000000cffa8000 (usable)
Jan 12 09:10:14 renault kernel: BIOS-e820: 00000000cffa8000 - 00000000cffb7c00 (ACPI data)
Jan 12 09:10:14 renault kernel: BIOS-e820: 00000000cffb7c00 - 00000000d0000000 (reserved)
Jan 12 09:10:14 renault kernel: BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
Jan 12 09:10:14 renault kernel: BIOS-e820: 00000000fe000000 - 0000000100000000 (reserved)
Jan 12 09:10:14 renault kernel: BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
Jan 12 09:10:14 renault kernel: Warning only 4GB will be used.
Jan 12 09:10:14 renault kernel: Use a PAE enabled kernel.
Jan 12 09:10:14 renault kernel: 3200MB HIGHMEM available.
Jan 12 09:10:14 renault syslog: klogd startup succeeded
Jan 12 09:10:14 renault kernel: 896MB LOWMEM available.
Jan 12 09:10:14 renault kernel: found SMP MP-table at 000fe710
Jan 12 09:10:14 renault kernel: Using x86 segment limits to approximate NX protection
Jan 12 09:10:14 renault kernel: zapping low mappings.
Jan 12 09:10:14 renault kernel: DMI 2.4 present.
Jan 12 09:10:14 renault kernel: ServerWorks chipset detected. Disabling timer routing over 8254.
Jan 12 09:10:14 renault irqbalance: irqbalance startup succeeded
Jan 12 09:10:14 renault kernel: ACPI: PM-Timer IO Port: 0x808
Jan 12 09:10:14 renault kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Jan 12 09:10:14 renault kernel: Processor #0 6:15 APIC version 20
Jan 12 09:10:14 renault kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled)
Jan 12 09:10:14 renault kernel: Processor #4 6:15 APIC version 20
Jan 12 09:10:14 renault kernel: WARNING: NR_CPUS limit of 1 reached. Processor ignored.
Jan 12 09:10:14 renault kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
Jan 12 09:10:14 renault kernel: highmem bounce pool size: 64 pages
Jan 12 09:10:14 renault kernel: Total HugeTLB memory allocated, 0
free
total used free shared buffers cached
Mem: 3369896 3325700 44196 0 24188 1656060
-/+ buffers/cache: 1645452 1724444
Swap: 12910584 230332 12680252
Thanks
Sunny
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-13-2010 12:11 AM
тАО01-13-2010 12:11 AM
Solution(a kernel stack trace follows)
Looks like your system was critically low on "normal" and/or "DMA-capable" memory. The kernel could not find even a single free page of memory while running some ext3 filesystem code. When that happens, the kernel starts looking for pages it can reclaim. Apparently it found some.
The stack trace may look scary, but it just allows the kernel developers to pin-point exactly what the kernel was doing when the error was detected. Sometimes it's useful, here it doesn't seem to be important.
After the reboot, everything looks normal, except for two things:
> Jan 12 09:10:14 renault kernel: Warning only 4GB will be used.
> Jan 12 09:10:14 renault kernel: Use a PAE enabled kernel.
Apparently your system is now running a kernel which can handle at most 4 GB of memory (the structural limit of 32-bit systems without PAE technology). Your system seems to have more than that, but with the current kernel, you're limited to 4 GB.
> Jan 12 09:10:14 renault kernel: WARNING: NR_CPUS limit of 1 reached. Processor ignored.
You are running a multi-processor or multi-core system with a single-processor kernel. Looks like your system has two processors/cores, but you're now using only one.
Solution: install the "kernel-smp" package from the RHEL 4 distribution if it isn't already installed. If you need it, install the matching "kernel-smp-devel" package too. It supports both multiple processors and PAE, so it will fix both of your problems.
Or perhaps your onsite guy simply chose the wrong kernel from the GRUB boot menu when rebooting the system?
Check /boot/grub/grub.conf to make sure the SMP kernel is set as default. Then reboot the system to make it use the SMP (=multi-processor) kernel.
"fsck on swap FS" sounds strange, unless you're using a filesystem swap. On a swap partition/LV there is normally no filesystem, so there is nothing to fsck. If a swap partition has errors, the fix is to re-run "mkswap" on it (just like when starting to use it) before activating it with the "swapon" command.
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-13-2010 12:22 AM
тАО01-13-2010 12:22 AM
Re: memory issues causing server hang
Thanks for the detailed description.
You were right; there are 3 kernels
2.6.9-42.0.0.0.1.ELhugemem
2.6.9-42.0.0.0.1.ELsmp
2.6.9-42.0.0.0.1.EL
The onsite guy booted 2.6.9-42.0.0.0.1.EL kernel.
I'll reboot with SMP kernel. So the CPU limit and memory limit wont be a problem then.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-13-2010 12:33 AM
тАО01-13-2010 12:33 AM
Re: memory issues causing server hang
2.6.9-42.0.0.0.1.ELhugemem and
2.6.9-42.0.0.0.1.ELsmp
Which one is more suitable for my hardware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО01-13-2010 02:13 AM
тАО01-13-2010 02:13 AM
Re: memory issues causing server hang
2.6.9-42.0.0.0.1.ELsmp
2.6.9-42.0.0.0.1.EL
EL = the default, single-processor kernel. Can use up to 4 GB of memory, total. Optimized for small systems.
ELsmp = multi-CPU kernel. Supports up to 16 GB of memory (using the PAE technology) and multiple CPUs.
ELhugemem = can support multiple CPUs and up to 64 GB of memory (the maximum allowed by the PAE technology). Optimized for "huge" systems (at the time of the introduction of RHEL 4; they don't seem so huge today).
If you need more than 64 GB of memory, you must install the 64-bit version (x86_64) of the OS. Switching from the 32-bit version (what you have now) to 64-bit will require OS re-installation.
Although a 32-bit OS with PAE can handle up to 64 GB, it is less efficient than a real 64-bit OS and limits the maximum size of individual processes to 4 GB. So I would definitely recommend using a 64-bit OS instead of relying on PAE if you have more than, say, 16 GB of memory.
By the way, the current RHEL kernel versions are 2.6.9-89.0.19.EL*. Your 2.6.9-42* versions are pretty old.
MK