memory problem

Ciaran Byrne · ‎04-14-2003

Hi,
I have just upgraded from 16gig to 28gig on a HP-UX 11i o/s. I am now seeing segmentation violation errors and application crashes. Other similar app's
with exactly the same kernel params/patches/code base are not experiencing this problem. Are there kernel params that need to change as a result?

Other points
* "Memory fault" message when using top or glance : happens occasionally and causes an exit.
* swap is fairly low PCT used (around 15%)
* shmmax is 1 gig
* maxdsiz is 1 gig
* 32 bit app
* Core dump is produced but does not show any relevant info.
* tusc logs
"exit(11) implicit] .................................................................................. WIFSIGNALED(SIGSEGV)|WCOREDUMP"

I dont think its the app as it was running without issue before the upgrade. And is experiencing the problem
under much less load than normal.

help!
Thanks,
CB

Michael Tully · ‎04-14-2003

Hi CB,

Seeing you have upgraded your RAM how are you utilising it. i.e. Have you started using memory windows, this being 32 bit application still utilises memory just like a 32 bit OS. Perhaps a look at the memory management paper in /usr/share/docs/mem_mgt.txt file will assist.

How much swap do have? Have you increased it?
Have you lowered your buffer cache % ? It should be around 300-500Mb in total.
You may need to increase your shared memory.

Some food for thought
Michael

Anyone for a Mutiny ?

Stefan Farrelly · ‎04-14-2003

Very interesting. Youve had a large increase in RAM but you dont say why you added 12Gb ? I presume because it was needed by some other app which is now using it ?

So, how much free RAM do you have now ? (vmstat)
What is the output from swapinfo -mt (dev usage should be ZERO to prove youre not running out of ram and it should be the same size as ram - 28Gb).

Im from Palmerston North, New Zealand, but somehow ended up in London...

Ciaran Byrne · ‎04-14-2003

Hi Michael/Stefan,

thanks for the quick response.

SWAP
====
swapinfo -tm
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
dev 16000 0 16000 0% 0 - 1 /dev/vg01/lvswap
reserve - 4233 -4233
memory 22286 1332 20954 6%
total 39310 5565 33745 14% - 0 -

VMSTAT
======
procs memory page faults cpu
r b w avm free re at pi po fr de sr in sy cs us sy id
2 0 0 1049506 5297766 79 42 0 0 0 0 0 1926 11282 2970 6 2 92

I have not increased the swap. The reason the memory was added was because the app was hitting 100% mem utilization.

Buffer Cache
============
Available : 1.92gb Requested : na
Used : 1.92gb
High : 1.92gb

Do you mean the shmmax value?

Thanks
CB

Ciaran Byrne · ‎04-15-2003

Hi,
one other piece of info. In the tusc logs I see tons of messages like
"[27571] brk(0x8cfc0000) ...................................................................................... ERR#12 ENOMEM"

Thanks,
CB

Stefan Farrelly · ‎04-15-2003

Youre only running 17Gb of swap (1+16) and with 28Gb of RAM you really should have at a minimum 28Gb of swap. Basically you are using RAM as extra swap which could be causing the memory errors/crashes.

Up swap to 28Gb total and Im sure your problems will go away. Easy to do on the fly.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Ciaran Byrne · ‎04-15-2003

Hi,
I changed the swap to 41 gig.
Still seeing the same problems. Now i am thinking its some way related to ulimit.

ulimit -a
==========
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) 966852
stack(kbytes) 262144
memory(kbytes) unlimited
coredump(blocks) 4194303
nofiles(descriptors) 2048

It appears the Glance version needs to be upgraded because of the VPar installation.

Thanks,
Ciaran

Olivier ROBERT · ‎04-15-2003

Hi Ciaran,

That may be stupid, but do you have one swap volume, or is it fragmented across multiple volumes?

Another idea, maybe it would be interesting to check how it works if swap is managed the "lazy" way, i.e. without immediate reservation of the swap space. I don't know if it's possible on HP-UX, but that's what one can do on Tru64 UNIX by removing the "/sbin/swapdefault" symbolic link, and then rebooting.

If either solution works anyway, it would look like an HP-UX bug... and "lazy" swap management is not as good as immediate reservation, as far as performance is concerned.

Hope this helps,

Olivier

James Murtagh · ‎04-15-2003

Hi Ciaran,

The failing brk() call appears to indicate you are hitting the maxdsiz limit. See the "man brk" page for the three possible enomem errors returned.

I can see you have maxdsiz set to almost 1GB but what about maxdsiz_64bit? This may seem a funny question but if the process is first started as a 64bit process then exec'd it will take the lower limit of the two.

The first step is to ensure this is the cause. You can examine the processes memory regions in Glance, concentrate on the Data Segment field. If Glance is still causing you problems at the moment you can compile this program and run it with the pid as the argument. If the format is bad I will attach it instead.

#include
#include
#include
#include

main(int argc, char *argv[])
{

if(argc != 2)
{
printf("\n\tUsage: %s \n\n", argv[0]);
exit(1);
}

int i = atoi(argv[1]);

struct pst_status procmem;
pstat_getproc(&procmem, sizeof(struct pst_status), 0, i);

printf("\nProcess Data and stack sizes for process %d : \n\n", i);
printf("Process Data size is : %d KB\n", 4*(procmem.pst_dsize));
printf("Process Stack size is : %d KB\n", 4*(procmem.pst_ssize));
printf("Process Virtual Data size is : %d KB\n", 4*(procmem.pst_vdsize));
printf("Process Virtual Stack size is : %d KB\n", 4*(procmem.pst_vssize));

}

If these tests indicate you are reaching the maxdsiz limit you may be able to recompile the program and link the objests with the -N flag for EXEC MAGIC. This will almost double your data segment area for the 32bit process by allowing space in the text quadrant to be used too.

The big question though is why this would fail after the memory upgrade. Does anything in the code use the physical memory total to calculate variables or something similar? I would also run an swverify to ensure the installed OS software is configured properly.

Regards,

James.

James Murtagh · ‎04-15-2003

Hi again Ciaran,

I think the argument to brk() actually show us the problem:

"[27571] brk(0x8cfc0000).... ENOMEM"

That address is a third quadrant address for a 32bit process, hence beyond the end of the data quadrant. This is the list of ranges:

Q1 (quadrant 1): 0x00000000-0x3fffffff
Q2 (quadrant 2): 0x40000000-0x7fffffff
Q3 (quadrant 3): 0x80000000-0xbfffffff
Q4 (quadrant 4): 0xc0000000-0xffffffff

So it appears the application is asking for memory beyond its limit, hence ENOMEM. This may be also be due to fragmentation of the address space in Q2, if you are allocating and freeing chunks before this allocation request.

Regards,

James.

Ciaran Byrne · ‎04-15-2003

Hi,
thanks everyone for the responses so far. The application is a 32 bit app.

maxdsiz 990056448

maxdsiz_64bit 17179869184

The other thing to mention is that this is a VPar'd environment. Just combined two NPars from a SuperDome and split it into VPars.

I am about to look at the logproc to see what the process memory allocations were before it dies.

Thanks,
Ciaran

Ciaran Byrne · ‎04-15-2003

Ciaran Byrne · ‎04-15-2003

hi,
here is the mware report template

REPORT "Export MWA !DATE !COLLECTOR !SYSTEM_ID"
FORMAT ASCII
HEADINGS ON
SEPARATOR= "|"
SUMMARY=1

*********************************************************************
DATA TYPE PROCESS
* Record Identification Metrics
DATE
TIME

* Summary Metrics
PROC_PROC_NAME
PROC_USER_NAME
PROC_PROC_ID
PROC_CPU_TOTAL_UTIL
PROC_MEM_VIRT
PROC_MEM_RES
PROC_MINOR_FAULT
PROC_MAJOR_FAULT
PROC_INTEREST
PROC_STOP_REASON
PROC_PRI
PROC_THREAD_COUNT

Thanks,
Ciaran

James Murtagh · ‎04-15-2003

Hi Ciaran,

I don't think the output tells us too much I'm afraid (or either would my pstat code thinking of it). We can see the Resident size of the process just before it was killed but this doesn't say a lot. It could still then request a very large region that would hit virtual memory limits, which I think you are seeing. The tusc output would be of more interest if you wish to post this.

Regards,

James.

Ciaran Byrne · ‎04-15-2003

James Murtagh · ‎04-15-2003

Hi Ciaran,

I was just going to examine the memory allocation patterns for possible fragmentation issues. If you could provide the output of:

# egrep "brk|free"

that would help, or possibly attach the full tusc output using the attachment box at the bottom of the post screen.

Is this actually in-house code? Do you think it would be possible to recompile for EXEC_MAGIC to test it?

Regards,

James.

Ciaran Byrne · ‎04-15-2003

Ciaran Byrne · ‎04-15-2003

Hi,
to answer your other question. It is a mix or in-code and vendor code. It is working correctly on other similar servers and was working on this one before the VPar install and addition of memory.

Thanks,
Ciaran

James Murtagh · ‎04-15-2003

Hi Ciaran,

There are no free() calls from your list so there is probably no point pursuing this. I'm surprised there are so many ENOMEM errors reported though.

One thing that has caught my eye though - from your ulimit output the stack size is very large. As stack and data share Quadrant 2 this is imposing a limit on your data area as stack takes precedence. I would compare maxssiz to your other servers. You may also set the limit (using the posix shell) using:

# ulimit -s 8192

The 8192 is just an abritary figure. Then if you rerun the process and see how it goes.

Regards,

James.

Ciaran Byrne · ‎04-15-2003

Hi,
thanks James. We have set to ulimit -s 16384. And are re-running the test.

Also just got this message
/var/adm/sw # view swagent.log
"swagent.log" [Read only] 3168728 lines, 176327638 characters Warning: Out of m
emory saving lines for recovery - try using ed

Regards,
Ciaran

James Murtagh · ‎04-16-2003

Hi Ciaran,

Let me know how you get on with the tests.

The "out of memory" message you got with view is normal and doesn't relate to any memory issues here. It actually affects any file over 1843199 lines in size, on all releases. The significance of that number escapes me. 1843200 is directly divisable by 1024 (1800) so it has probably got something to do with vi's internal buffer and the stdio limit BUFSIZ.

Regards,

James.

Jdamian · ‎04-16-2003

may you show the output from commands 'swapinfo -tam', 'ipcs -ma' and 'kmtune -q maxswapchunks' ?

Alan Wu · ‎04-16-2003

Hi J,

Just on behalf of Ciran,provide the outputs of those commands you expected.

swapinfo -tam

Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
dev 16000 0 16000 0% 0 - 1 /dev/vg01/lvswap
dev 40960 0 40960 0% 0 - 1 /dev/vg01/lvol6
reserve - 193 -193
memory 22286 1368 20918 6%
total 80270 1561 78709 2% - 0 -

ipcs

Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 1024 0 1024 0% 0 - 1 /dev/vg00/lvol2
dev 16000 0 16000 0% 0 - 1 /dev/vg01/lvswap
dev 40960 0 40960 0% 0 - 1 /dev/vg01/lvol6
reserve - 193 -193
memory 22286 1368 20918 6%

kmtune -q maxswapchunks

Parameter Current Dyn Planned Module Version
===============================================================================
maxswapchunks 12288 - 12288
total 80270 1561 78709 2% - 0 -

Thanks

Alan Wu · ‎04-16-2003

Hi J,

A typo for paste the output of 'ipcs', the following is the correct one..

ipcs

IPC status from /dev/kmem as of Wed Apr 16 08:29:32 2003
T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
Shared Memory:
m 0 0x4118025d --rw-rw-rw- root root root root 0 348 597 21030 4:18:43 4:18:43 7:48:52
m 1 0x4e0c0002 --rw-rw-rw- root root root root 1 61760 597 21030 4:18:43 4:18:43 7:48:52
m 2 0x411c0d21 --rw-rw-rw- root root root root 1 8192 597 21030 4:18:43 4:18:43 7:48:52
m 11267 0x0c6629c9 --rw-r----- root root root root 3 15020520 1437 11621 20:28:05 20:28:14 7:49:40
m 4 0x06347849 --rw-rw-rw- root root root root 1 77384 1437 1492 7:49:45 7:49:40 7:49:40
m 3077 0x49102070 --rw-r--r-- root root root root 0 22908 1432 12103 8:29:32 8:29:32 7:49:43
m 88070 0x00000000 D-rw------- root root root root 6 1052672 7749 7749 8:27:12 no-entry 8:27:12
m 7 0x00000000 D-rw------- www other root root 6 184324 7753 7753 8:27:13 no-entry 8:27:13

Regards
Alan

Mike Stroyan · ‎04-16-2003

Since the application is now asking for more address space than it can get, I suspect that someone tried to do something clever. The application may inquire the amount of available RAM and then try to allocate a fraction of that. When you added more RAM, the application got more hungry. You could ask around among code authors and get someone to put a reasonable cap on their malloc requests.
You might be able to meet the new grander plans of the application by relinking it with ld's -N option, or by using "chatr +q3p enable a.out" on the program file. Either of those will allow a program to allocate almost 2GB of data if maxdsiz and ulimit allow.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

memory problem

memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem

Re: memory problem