System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

System hangs often with out of memory

 
Highlighted
Advisor

System hangs often with out of memory

Hello All,

We are running system with following details.

Linux bwga090 2.6.16.21-0.8-bigsmp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686 i686 i386 GNU/Linux

CPU, memory and swap infor at normal time is
Cpu(s): 3.1%us, 0.6%sy, 0.0%ni, 95.7%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 4023012k total, 2948912k used, 1074100k free, 166268k buffers
Swap: 4200988k total, 0k used, 4200988k free, 2204300k cached

we are facing a peculiar problem with below error message. And the system is pingable at that time,but telnet,ssh even console does not work at that time and we need to hard reboot the system.

Can anyone please suggest why this is happnening? and how to solve this problem.

Apr 18 16:05:19 bwga090 kernel: oom-killer: gfp_mask=0x201d2, order=0
Apr 18 16:05:20 bwga090 kernel: [] out_of_memory+0x25/0x144
Apr 18 16:05:20 bwga090 kernel: [] __alloc_pages+0x1f3/0x2a5
Apr 18 16:05:20 bwga090 kernel: [] __do_page_cache_readahead+0xc4/0x1e2
Apr 18 16:05:20 bwga090 kernel: [] filemap_nopage+0x14f/0x2f9
Apr 18 16:05:20 bwga090 kernel: [] __handle_mm_fault+0x405/0xb1f
Apr 18 16:05:20 bwga090 kernel: [] do_select+0x38b/0x3b8
Apr 18 16:05:20 bwga090 kernel: [] __pollwait+0x0/0x95
Apr 18 16:05:20 bwga090 kernel: [] core_sys_select+0x1cb/0x26c
Apr 18 16:05:20 bwga090 kernel: [] do_page_fault+0x173/0x5f6
Apr 18 16:05:20 bwga090 kernel: [] do_page_fault+0x0/0x5f6
Apr 18 16:05:20 bwga090 kernel: [] error_code+0x4f/0x60
Apr 18 16:05:20 bwga090 kernel: Mem-info:
Apr 18 16:05:20 bwga090 kernel: DMA per-cpu:
Apr 18 16:05:20 bwga090 kernel: cpu 0 hot: high 0, batch 1 used:0
Apr 18 16:05:20 bwga090 kernel: cpu 0 cold: high 0, batch 1 used:0
Apr 18 16:05:20 bwga090 kernel: cpu 1 hot: high 0, batch 1 used:0
Apr 18 16:05:20 bwga090 kernel: cpu 1 cold: high 0, batch 1 used:0
Apr 18 16:05:20 bwga090 kernel: DMA32 per-cpu: empty
Apr 18 16:05:20 bwga090 sshd[25829]: fatal: Write failed: Connection reset by peer
Apr 18 16:05:20 bwga090 kernel: Normal per-cpu:
Apr 18 16:05:20 bwga090 kernel: cpu 0 hot: high 186, batch 31 used:52
Apr 18 16:05:21 bwga090 kernel: cpu 0 cold: high 62, batch 15 used:53
Apr 18 16:05:21 bwga090 kernel: cpu 1 hot: high 186, batch 31 used:63
Apr 18 16:05:21 bwga090 kernel: cpu 1 cold: high 62, batch 15 used:52
Apr 18 16:05:21 bwga090 kernel: HighMem per-cpu:
Apr 18 16:05:21 bwga090 kernel: cpu 0 hot: high 186, batch 31 used:18
Apr 18 16:05:22 bwga090 kernel: cpu 0 cold: high 62, batch 15 used:14
Apr 18 16:05:22 bwga090 kernel: cpu 1 hot: high 186, batch 31 used:30
Apr 18 16:05:22 bwga090 kernel: cpu 1 cold: high 62, batch 15 used:8
Apr 18 16:05:23 bwga090 kernel: Free pages: 42256kB (504kB HighMem)
Apr 18 16:05:23 bwga090 kernel: Active:4190 inactive:692413 dirty:0 writeback:0 unstable:0 free:10564 slab:193852 mapped:697864 pagetables:99324
Apr 18 16:05:23 bwga090 kernel: DMA free:12828kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:14 all_unreclaima
ble? yes
Apr 18 16:05:23 bwga090 kernel: lowmem_reserve[]: 0 0 880 3951
Apr 18 16:05:23 bwga090 kernel: DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Apr 18 16:05:23 bwga090 kernel: lowmem_reserve[]: 0 0 880 3951
Apr 18 16:05:24 bwga090 kernel: Normal free:28924kB min:3756kB low:4692kB high:5632kB active:16760kB inactive:16404kB present:901120kB pages_scanned:
166209 all_unreclaimable? yes
Apr 18 16:05:24 bwga090 kernel: lowmem_reserve[]: 0 0 0 24575
Apr 18 16:05:24 bwga090 kernel: HighMem free:504kB min:512kB low:3792kB high:7072kB active:0kB inactive:2753248kB present:3145600kB pages_scanned:801
9798 all_unreclaimable? yes
Apr 18 16:05:24 bwga090 kernel: lowmem_reserve[]: 0 0 0 0
Apr 18 16:05:24 bwga090 kernel: DMA: 49*4kB 53*8kB 49*16kB 49*32kB 28*64kB 11*128kB 8*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 12828kB
Apr 18 16:05:24 bwga090 kernel: DMA32: empty
Apr 18 16:05:24 bwga090 kernel: Normal: 6351*4kB 46*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 28924kB
Apr 18 16:05:24 bwga090 kernel: HighMem: 0*4kB 1*8kB 7*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 504kB
Apr 18 16:05:24 bwga090 kernel: Swap cache: add 1109145, delete 1108929, find 27918/32228, race 0+3
Apr 18 16:05:24 bwga090 kernel: Free swap = 0kB
Apr 18 16:05:25 bwga090 kernel: Total swap = 4200988kB
Apr 18 16:05:25 bwga090 kernel: Free swap: 0kB
Apr 18 16:05:25 bwga090 kernel: 1015776 pages of RAM
Apr 18 16:05:25 bwga090 kernel: 786400 pages of HIGHMEM
Apr 18 16:05:25 bwga090 kernel: 10023 reserved pages
Apr 18 16:05:25 bwga090 kernel: 3219311 pages shared
Apr 18 16:05:25 bwga090 kernel: 216 pages swap cached
Apr 18 16:05:25 bwga090 kernel: 0 pages dirty
Apr 18 16:05:25 bwga090 kernel: 0 pages writeback
Apr 18 16:05:25 bwga090 kernel: 697864 pages mapped
Apr 18 16:05:25 bwga090 kernel: 193852 pages slab
Apr 18 16:05:26 bwga090 kernel: 99324 pages pagetables
Apr 18 16:05:26 bwga090 kernel: oom-killer: gfp_mask=0x200d2, order=0
Apr 18 16:05:26 bwga090 kernel: [] out_of_memory+0x25/0x144
Apr 18 16:05:26 bwga090 kernel: [] __alloc_pages+0x1f3/0x2a5

Thanks & Regards,
bhaski
6 REPLIES 6
Highlighted
Exalted Contributor

Re: System hangs often with out of memory

Shalom,

Note that when a process is launched it tries to reserve swap. If it can't it acts like there is no memory even if there is memory.

You are out of swap and should increase it. You r memory use is also heavy and you may want to buy some of that as well.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Highlighted
Honored Contributor

Re: System hangs often with out of memory

You have 4 GB of physical memory (RAM) and 4200988kB of swap.

This message:
Apr 18 16:05:24 bwga090 kernel: Free swap = 0kB

says your programs are using 100% of swap, which means the physical RAM is also 100% in use. Because the kernel has activated the "oom-killer", some program is wanting still more memory.

Examine the software that is running. Are there any memory leaks in the programs you're using? If there are, have them fixed. Adding more RAM when there is a leaking program may hide the symptoms for a while, but a memory leak will eventually eat through any amount of RAM and the problem will re-occur.

If there are no leaking programs, the solution is to _buy more RAM_.

Start by doubling the current amount, at least (from 4 GB to 8 GB). If your current hardware does not allow that, you need a bigger machine.

As a workaround, you can add more swap space, but adding swap when real RAM is needed will give you very bad performance.

MK
MK
Highlighted
Honored Contributor

Re: System hangs often with out of memory

You can install the sysstat package and identify if your system is really out of memory using sar, or it's a kernel/program bug.

How ofted do you have this problem? How much memory your system have, can you post the output of:

free
swapon -s

Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Highlighted
Advisor

Re: System hangs often with out of memory

Hello All,

Thank you very much.

Here is the output of free and swapon -s.

bwga090:/var/log # free
total used free shared buffers cached
Mem: 4023012 2850496 1172516 0 40996 1335404
-/+ buffers/cache: 1474096 2548916
Swap: 4200988 172 4200816
bwga090:/var/log # swapon -s
Filename Type Size Used Priority
/dev/sda2 partition 4200988 172 -1
bwga090:/var/log #

and one more thing i forgot to add, this happens every one month or 20 days.
Mainly this machine is used as a compile machine. and i had written a small script to check the swap,cpu usage,memory usage every 15 minutes.

The last report just before the hang showed a user trying a make(compile) and the load is around 6.00.

If it is a memory leak,how can i approach it? as there are 6-7 users using it for compile and they expect that it should throw error than getting hanged.

Thanks & Regards,
Bhaski
Highlighted
Advisor

Re: System hangs often with out of memory

Hello,

The system already has sysstat package,i was not monitoring using sar command(as i didnt know of that).

unfortunate for me sar is giving detail of present day only.

Thanks & Regards,
Bhaski
Highlighted
Advisor

Re: System hangs often with out of memory

Hello,

Found the sar output for yesterday also.
The sar output is as below.

15:20:01 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
15:30:01 821980 3201032 79.57 146732 2251552 4199892 1096 0.03 40
15:40:01 813220 3209792 79.79 149376 2254048 4199892 1096 0.03 40
15:50:02 115208 3907804 97.14 1180 42796 1795388 2405600 57.26 7424
16:30:01 3695652 327360 8.14 12144 238688 4200988 0 0.00 0
16:40:01 3692440 330572 8.22 13712 239176 4200988 0 0.00 0
16:50:01 3641792 381220 9.48 15580 253756 4200988 0 0.00 0

The system hanged at 16:00.

Thanks & Regards,
Bhaski