Operating System - HP-UX
1855754 Members
5099 Online
104103 Solutions
New Discussion

Re: out of memory error in batch program HP 11.0, V processor

 

out of memory error in batch program HP 11.0, V processor

We have a large people soft implementation which uses the "sqr" language for some batch queries.

We have a batch sqr job that runs around 3 hours nightly. Recently it has been giving us an "out of memory" error during the late stages of the run. I searched itrc and found references to malloc system calls and to maxdsiz and maxsiz kernel parms.

We have got the program to error out on a dev machine running against data similar in size to production (export / imported a few months back). We have gotten the same failure on dev. (also a HPux 11.0 V processor)

Both dev and prod are patched to a late 2002 timeframe.

I've messed with some kernel parms on dev and got the job to run. However, it may have run just due to the reboot. Both machines are rebooted every 3rd saturday of the month.

Sqr is running in 32bit mode. File command output says:

sqr: PA-RISC1.1 shared executable dynamically linked

Here is what the kernel parms have been for a year.
maxdsiz 2147483648
maxdsiz_64bit 2147483648
maxssiz 201326592
maxssiz_64bit 1073741824

We recently got the program to run in dev (after a reboot ofcourse) with these kernel parms in;

maxdsiz 2147483648
maxdsiz_64bit 2147483648
maxssiz 100663296
maxssiz_64bit 536870912

We could maybe lower maxssiz down to 50 meg.

Another itrc posts mentions that memory available is something like maxdsiz - maxsiz.

I'm thinking the SQR runtime programs has a bug or hpux has a bug.

I'm wondering a few things.
Is it true this could run different 1 month after reboot vs 1 day. I would think these memory allocations are on a process level which would make me think they should run the same at all times.

Did I maybe get the program to run by lowering maxssiz. We use to have a lower maxssiz maybe 1 - 2 years ago. We bumped it up for no good reason. (when we bumped up maxdsiz because java seemed to need it for other possible apps on the machine.

Any enlightment would be helpfull.

Some guy named Clay seemed to have some great insight on a prior forum post:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=67260

Thanks,
Lynn

PS, we split the production program up into 2 pieces, however we only lowered the input into the longer running piece by 5%. I'm concerned the issue will pop up again.




9 REPLIES 9
RAC_1
Honored Contributor

Re: out of memory error in batch program HP 11.0, V processor

With 32 bit programs, you a limit on how much memmory a process can access. May be your progrm is hitting massiz or maxdsiz.

I would advise you run program with tusc tool.
tusc-trace unix system calls. This would give more insights into what limit is causing the problem or where exactly the problem is. You can get tusc at following site.
http://hpux.cs.utah.edu/hppd/hpux/Sysadmin/tusc-7.5/

Once you get that start program as follows.

tusc -vfp -o /tmp/trace.log "program"

But beware, this would create a large file.

Anil
There is no substitute to HARDWORK

Re: out of memory error in batch program HP 11.0, V processor

I tried to use tusc and could not. The sqr program would not connect to oracle when run under tusc.

Hopefully people will respond with some good info.
ranganath ramachandra
Esteemed Contributor

Re: out of memory error in batch program HP 11.0, V processor

could you post the exact message ?
 
--
ranga
hp-ux 11i v3[i work for hpe]

Accept or Kudo

Bill Hassell
Honored Contributor

Re: out of memory error in batch program HP 11.0, V processor

Out of memory is a far too general message unfortunately. malloc refers to a program's local data area and indeed maxdsiz controls the maximum data area that can be requested. But it's not quite that simple. 32bit programs are severely restricted in addressing space. The basic restriction is 960megs for the data. By relinking (or using chatr) or recompiling the program, you can enable the first and second quadrants and increase the available addressing space to about 1750 megs.

Your patch level is very old so you may not have some recent patches that can enable even large memory areas. Read both the memory and process management white papers in your /usr/share/doc directory (mem_mgt and proc_mgt) for the details.

The other type of memory is shared memory and this is a different map but with similar restrictions. However, since it is shared, it can be fragmented because other unrelated programs use this map too. And if a program is killed with -9 (a big no-no for most applications) then the shared memory area is left to occupy space and reduce the available memory. You can see the shared memory segments with ipcs -bmop.

Just like malloc, shared memory is also limited to about 900megs but can be increased with chatr or link options. However, the shared memory map is shared by all programs that need this memory so other programs may be using this area and not leaving enough for your sqr program. You can checkout your current shared memory maps with the shminfo program: ftp://hprc.external.hp.com/sysadmin/programs/shminfo/ (login=contrib,pw=9unsupp8).

If you can determine that the problem is with shared memory (the author of the program(s) can help), you can bring your system up to date on patches (2004 is recommended) and use memory windows, a technique that can give a private window for shared memory to a set of programs.


Bill Hassell, sysadmin

Re: out of memory error in batch program HP 11.0, V processor

Below is a cut and paste of the error shown when running our sqr program under tusc:

/psDEV/override/dana/sqr/pspxu40r -i/psDEV/prod/dana/sqc/;/psDEV/prod/dana/sqc/
-m/psDEV/prod/dana/sqr/supermax.max -f/psDEV/reports/tmp/XXUTLZZZ.729
Enter Username:
SQR for PeopleSoft V8.19
(SQR 5528) ORACLE OCIServerAttach error 12546 in cursor 0:
ORA-12546: TNS:permission denied
(SQR 4701) Cannot logon to the database.

-----------------
I messed around quite a bit trying to get this to run. I tried running tnsping as our regular user and it failed. I ran tnsping as root and got it to work. So I took our application program and ran it as root, and got the above error.
-------------------------------------

I wish someone would post how memory works. I know from another post that maxdsiz is reduced by maxssiz. What I'm wondering is if each new process on a unix system gets a fresh hunk of memory regardless of what is going on in the system. I see from someone's reply that kill -9 could affect this task's memory.

If it is not clear, to my limited viewpoint this sqr language is like a cobol runtime. We write code in sqr (sql with extensions) that does not chew memory on it's own. (because it is a high level language). But the runtime engine manages the memory. We are doing nothing in this program that is memory intensive in itself. It's either bugs in the runtime or the unix.

Lynn
Bill Hassell
Honored Contributor

Re: out of memory error in batch program HP 11.0, V processor

RE: how memory works:

Please print the two files: /usr/share/doc/mem_mgt.txt and /usr/share/doc/proc_mgt.txt. (if you have a Postscript printer, print the .ps versions of those files for better readability). That will explain HP-UX memory management. NOTE: how SQR works is not under HP-UX control. A high level language almost always means more RAM is necessary, perhaps a very small amount, simply because the high level must be translated to the equivalent database calls. For SQR, it is acting as a preprocessor to simplify user interaction.

It is quite untrue that maxdsiz and maxssiz are related. These values are simply fences that prevent runaway programs from using too much RAM. One setting does not affect the other. As mentioned before, you can recompile/link or chatr some programs to take advantage of additional 1Gb quadrants of RAM (as explained in the whitepapers as EXEC_MAGIC).


Bill Hassell, sysadmin

Re: out of memory error in batch program HP 11.0, V processor

Check this posting relative to maxdsiz and maxsiz being related:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=67260

If I'm reading Clay's posting correctly he is stating these parms are related.

I'm barely a novice in dealing with these memory issues on HP. Last week I read thru the 2 manuals mentioned.

I guess it is quite unfortuneate that we have have an enironment running, get an error message and not know what part of memory is getting the error. Maybe this is clearer to a non novice, but to me this stuff is horribly complicated and the errors and definitions are as clear as mud.

Lynn
Bill Hassell
Honored Contributor

Re: out of memory error in batch program HP 11.0, V processor

Don't feel too bad about the error message. It is terribly misleadng and since you did not write the program, you have no way of knowing what the program is doing. That is why technical support from the author/manufacturer is imperative. Even then you may have to escalate the problem to get a resolution. It is true that maxssiz's value will limit maxdsiz. However, it is doubtful that you have changed maxssiz from the default 8megs (0x00800000) which has minimal impact on the 1000megs available in the data quadrant.

To understand what is happening, you will need detailed information on how your programs allocate memory--without that, we're all guessing as to what the error message might mean or how to fix it.


Bill Hassell, sysadmin

Re: out of memory error in batch program HP 11.0, V processor

Quite the contrary,

Look at the original write up. We had maxssiz at 80 meg probably for 5 years. We changed it (out of ignorance) to 200 meg a year ago and ran that way. After having problems a couple of weeks ago we lowered maxsiz to 100 meg. We could definately go back to 80 meg and maybe lower.

As far as the vendor code goes. We have the source code for the sqr program. However it is using the sqr runtime routines that are doing the real memory management. I doubt any vendor of application software knows much about the runtime level.