1827894 Members
2000 Online
109969 Solutions
New Discussion

Re: Memory leak

 
Wim Van den Wyngaert
Honored Contributor

Memory leak

We have a program running on a 2 CPU GS160.
It's not changed since 2003 and is doing TCP (Reuters Sink) and DECnet (other VMS nodes) communications.

Since a few days the load is heavy due to the heavy activity on the stock exchanges. Under this heavy load, it starts consuming a lot more memory and after a few minutes it goes out of memory (normally +- 600 MB for the whole process tree, now going to 1500 MB).
The process tree is restarted every day and each time the problem comes back.

I included the PSDC samping report taken when the process goes from 750 MB to 1500 MB. The process is named FOE_RGS_SRV (and consumes the cpu together with FOE_POS_SRV).

Is anyone able to make something of it ?
(VMS 7.3, TCP 5.3 eco 2, decnet 7.3 eco 3)

Wim
Wim
30 REPLIES 30
labadie_1
Honored Contributor

Re: Memory leak

Before you find the leak, you can add in your program a regular (every 2 hours ?) call to $purgws. Be prepared for more pagefile utilisation.

Of course this is a temporary workaround.
Andy Bustamante
Honored Contributor

Re: Memory leak

What's the QBB/CPU/memory layout? Do you use global pages in the process tree?

Starting in 7.3, global pages are mapped allocating pages across QBBs. This led to an application running multiple processes against global pages creating heavily fragmented global allocations during the day. Eventually, performance deteriorated and the application hung requiring a restart.

HP recommended application changes to way global pages were allocated. The end solution adopted was moving to GS-1280s. Updating the hardware layout may mitigate the issue in your case.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Hoff
Honored Contributor

Re: Memory leak

An application that's been running perfectly since 2003 is not an application that is immune to latent bugs.

Application (and system) load is one of the classic and salient triggers for exposing bugs and race conditions and leaks.

Identify what memory resource(s) are leaking, and work from there. This can involve digging around in the process data structures, and in the process address range. (If restarting the application cures these, then it's usually a process private leak. That doesn't, however, mean it's your code or HP code.)

Your attachment shows PC samplings, and those are not on point for a memory leak; there's not a correlation between cold or hot PC ranges and memory use. Yes, you do have to access the range to get the leak, but the range of code doesn't have to be hot.

Small leaks in hot code and big leaks in cold code can ruin your uptime statistics. And nothing says there is just one leak. Though big leaks in hot code are usually pretty obvious.

There have been various leaks in OpenVMS and TCP/IP Services and other products remediated over the years; if you're not current...
Hein van den Heuvel
Honored Contributor

Re: Memory leak

Please explain to your management that it is time to 'pay the piper'.

This is what you get for sticking with an old OS version on an old platform.

Good luck! (you'l need some :-)

Regards
Hein
(geen punten nodig voor dit advies :-(
Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

Then I'll ask a different question.

52$ of the time is taken by SYSTEM_PRIMITIVES_MIN / MMG_STD$ALLOC_SYSTE

What is this ? It can't be that the program is in "alloc" for 52% of the time, I hope.

The problem is that it will be difficult to get a correction for a few months that are left. So, if it could be solved by a reboot I would be very happy.

Wim
Wim
Willem Grooters
Honored Contributor

Re: Memory leak

A reboot won't help much if the problem is within the software. It will keep consuming memory until you boot it again unless you find the hole and fix it.
Memory fragmentation might be one cause; but given your description, I would look for the code executed in those new minutes and concentrate in the code tree that is executed. also check for asymnconous code that may get triggered and allocates chunks of memeory.
Willem Grooters
OpenVMS Developer & System Manager
kari salminen
Advisor

Re: Memory leak


If your application is using LIB$GET_VM / LIB$FREE_VM, a memory leak bug in LIBRTL.EXE
may be the cause, it got fixed in VMS 7.3-2 LIBRTL ECO 2, I don't know if there is an ECO for VMS 7.3

A LIB$GET_VM may expand the process region when there are sufficient contiguous bytes in the memory zone to satisfy the request.

ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.3-2/VMS732_LIBRTL-V0200.txt

You may find the PQUOTA tool useful for analyzing memory leaks, latest version V2.0 runs on VAX, Alpha and Itanium,

http://vms.process.com/scripts/fileserv/fileserv.com?PQUOTA
Hoff
Honored Contributor

Re: Memory leak

There are two choices: either find and fix the leak, or use the $purgws/restart sequence and a large pagefile and/or a reboot when the stock market gets busy.

It may be cheaper to throw some disk storage (pagefile) and some quota and some memory at this case; to buy enough headroom for the daily restart.

If following the former path, the intrepid explorer needs to first find what structures are being leaked. This can be through examination or through instrumentation.

One obvious variation here is to speed up the migration off of OpenVMS.

Robert Gezelter
Honored Contributor

Re: Memory leak

Wim,

As Hoff mentioned, "long running" does not necessarily imply "no latent problems".

Without knowing how the application is structured, it is difficult to guess. Having done similar applications in the past, I can see many situations where such a thing could happen.

In this situation, I would suggest both palliative measures and a longer-term fix. For palliative measures, a larger page file is definitely a start, and an automated restart at a quiet time.

For a longer term fix, it might be useful to get one or more sets of process dumps to identify the nature of the "memory leak". It is not unlikely that it is a small, discrete fix.

- Bob Gezelter, http://www.rlgsc.com
Richard W Hunt
Valued Contributor

Re: Memory leak

When I last saw a problem like this on my system (and referencing the Allocate System Memory) function, it was caused by something growing its working set because of a string processing issue.

By any chance are those two programs written in a language for which the string paradigm is that a string is actually a descriptor that points somewhere in the program heap? (As opposed to FORTRAN-like, where strings are pre-allocated and fixed length.)

The problem was "thrashing" the program's scratchpad. You would probably also see a sudden increase in paging/swapping activity just before this problem reared its ugly head.
Sr. Systems Janitor
Hoff
Honored Contributor

Re: Memory leak

Richard is referring to modifications to a dynamic string descriptor; erroneously writing to the descriptor as if it were a static descriptor.

That sequence can certainly result in memory loss, but it's typically a sort of more continuous leak. That class of bug is not (usually) a load-activated bug, though that class of bug could easily be secondary to another bug.

I posted the general code review list over in http://h71000.www7.hp.com/wizard/wiz_1661.html and some other threads referenced there.

Do ramp up on the new platform, too -- whatever that might be. Life's too short to stay grumpy, and I'm inferring you've got a case of the grumpies today. :-)


John McL
Trusted Contributor

Re: Memory leak

You've said nothing about the architecture of the application but that might be the cause of the problem.

Is it AST driven with a lot of network I/O and using a ring buffer that might be stressed by a heavy load? It's feasible that an initial buffer allocation is insufficient and further allocations are made, but the "expand" flag is not being cleared after an expansion is made. Perhaps previous input rates have never been enough to trigger this action (i.e. the buffer contents are processed fast enough so that the buffer never needed expansion) and the bug has not previously been exposed. If you are really lucky you'll have monitoring tools that tell you how many buffers are allocated and used.

Of course I might be barking up the wrong tree because I'm only guessing at how an application that processes Borse data might be structured.
labadie_1
Honored Contributor

Re: Memory leak

Volker Halle
Honored Contributor

Re: Memory leak

Wim,

is the application using PTHREADS ? Use ANAL/SYS and SDA> SET PROC FOE_RGS_SRV
Then try SDA> PTHREAD VM

If there is no error message, because the process is not threaded, is there any lookaside list with a substantial amount of packets ?

Volker.
John McL
Trusted Contributor

Re: Memory leak

You might also find the Tech Journal No. 7 article "Faking it with Open VMS Shareable Images" by John Gillings useful especially if you think the problem might be an errant LIB$GET_VM call.

The article is online at http://h71000.www7.hp.com/openvms/journal/v7/faking_it_with_openvms_shareable_images.html and the code seems to be downloadable.

Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

All I know it's written in C.

Volker : pthread is unknown command in 7.3. Show proc/thr says "1 thread". May be retry when NYSE opens.

Nice article of John G. We used something simular on HP3000 to fool the verification of the license date of a certain product.

Labadie : ada ...

The applidcation boys will not look at the code and we now restart the process when it gets mad. BTW : when it gets mad it only takes a few minutes before the 1.5 GB is taken. Increasing it would only delay the problem a little.

Wim
Wim
Volker Halle
Honored Contributor

Re: Memory leak

Wim,

the SYS$SHARE:PTHREAD$SDA.EXE extension should be available since OpenVMS V7.2-1 ...

You can also check with SDA> SHOW PROC/CHAN, if PTHREAD$RTL is an activated image for this process.

Volker.
Robert Gezelter
Honored Contributor

Re: Memory leak

Wim,

"The applidcation boys will not look at the code and we now restart the process when it gets mad. BTW : when it gets mad it only takes a few minutes before the 1.5 GB is taken. Increasing it would only delay the problem a little."

I have had a few of those at clients over the years. Regrettably, the solution has often been to identify the failing code independently, and propose a fix. Not the best way to work, but it can be the most effective way to deal with organizational politics.

- Bob Gezelter, http://www.rlgsc.com
Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

Bob,

Did that once (network connection was not closed). But after several years it's still not in production because of testing requirements.

In November we will have DRP tests and then I can reboot the node. May be it gets solved that way.

Volker,

I shortened the command to PTHR but this didn't work. When I typed it in full it worked. Sorry. Same conclusion : 1 thread.

Wim
Wim
Jan van den Ende
Honored Contributor

Re: Memory leak

Wim,

some things in this story do not add up in MY logic. (But then, in yours neither, I guess).

So, this app is (a. o.) processing NYSE transactions for ING bank, right?

Well, all is well then, I believe.

Only yesterday ING had to get an emergency state loan of just EUR 10 G (about USD 13 G, ie, $13.000.000.000)
A loan at a, well, "friendly" intrest of ___ 8.5 ___ % !!! ( if non-official publications hold some truth)

HOW can your management rhyme this to __NOT__ doing everything necessary to get the software right, AND AS QUICKLY AS POSSIBLE???

Can you explain to someone in accountancy that repairing this will cost MUCH less than even ONE day of interest on that loan alone?

If _I_ were a shareholder, I would publicly declare this an unparallelled case of mismanagement, which calls for IMMEDIATE curative action.....

Just a thought though.

And yes, ING is alsothe bank that processes MY salary.

In spite everything, anyhow:

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Volker Halle
Honored Contributor

Re: Memory leak

Wim,

the fact, that the process only has ONE thread does not mean, that it can't be using DECthreads (pthreads).

Only if SDA> PTHREAD VM returns

Process "xxx" (PID ppp) is not threaded

then you know, that pthreads is not in use.

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

Volker,

Then it's using pthreads. I think the program is capable of threads but not using it (just tried sda commands during heavy activity). It has a channel to pthread$rtl in show proc/chan.

Wim
Wim
Volker Halle
Honored Contributor

Re: Memory leak

Wim,

then what does SDA> PTHREAD VM report ?

Any lookaside list with lots of packets ?

Volker.
Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

Process name: FOE_RGS_SRV Extended PID: 22BD7166 Thread data: "vm"
-------------------------------------------------------------------------
lookaside 0 (112 bytes; rwb, cvb, mub) 1 in use, 0 free
lookaside 1 (2120 bytes; cv-meter) 0 in use, 0 free
lookaside 2 (3184 bytes; mu-meter) 0 in use, 0 free
Wim