1753534 Members
6884 Online
108795 Solutions
New Discussion юеВ

Re: Memory leak

 
Wim Van den Wyngaert
Honored Contributor

Memory leak

We have a program running on a 2 CPU GS160.
It's not changed since 2003 and is doing TCP (Reuters Sink) and DECnet (other VMS nodes) communications.

Since a few days the load is heavy due to the heavy activity on the stock exchanges. Under this heavy load, it starts consuming a lot more memory and after a few minutes it goes out of memory (normally +- 600 MB for the whole process tree, now going to 1500 MB).
The process tree is restarted every day and each time the problem comes back.

I included the PSDC samping report taken when the process goes from 750 MB to 1500 MB. The process is named FOE_RGS_SRV (and consumes the cpu together with FOE_POS_SRV).

Is anyone able to make something of it ?
(VMS 7.3, TCP 5.3 eco 2, decnet 7.3 eco 3)

Wim
Wim
30 REPLIES 30
labadie_1
Honored Contributor

Re: Memory leak

Before you find the leak, you can add in your program a regular (every 2 hours ?) call to $purgws. Be prepared for more pagefile utilisation.

Of course this is a temporary workaround.
Andy Bustamante
Honored Contributor

Re: Memory leak

What's the QBB/CPU/memory layout? Do you use global pages in the process tree?

Starting in 7.3, global pages are mapped allocating pages across QBBs. This led to an application running multiple processes against global pages creating heavily fragmented global allocations during the day. Eventually, performance deteriorated and the application hung requiring a restart.

HP recommended application changes to way global pages were allocated. The end solution adopted was moving to GS-1280s. Updating the hardware layout may mitigate the issue in your case.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
Hoff
Honored Contributor

Re: Memory leak

An application that's been running perfectly since 2003 is not an application that is immune to latent bugs.

Application (and system) load is one of the classic and salient triggers for exposing bugs and race conditions and leaks.

Identify what memory resource(s) are leaking, and work from there. This can involve digging around in the process data structures, and in the process address range. (If restarting the application cures these, then it's usually a process private leak. That doesn't, however, mean it's your code or HP code.)

Your attachment shows PC samplings, and those are not on point for a memory leak; there's not a correlation between cold or hot PC ranges and memory use. Yes, you do have to access the range to get the leak, but the range of code doesn't have to be hot.

Small leaks in hot code and big leaks in cold code can ruin your uptime statistics. And nothing says there is just one leak. Though big leaks in hot code are usually pretty obvious.

There have been various leaks in OpenVMS and TCP/IP Services and other products remediated over the years; if you're not current...
Hein van den Heuvel
Honored Contributor

Re: Memory leak

Please explain to your management that it is time to 'pay the piper'.

This is what you get for sticking with an old OS version on an old platform.

Good luck! (you'l need some :-)

Regards
Hein
(geen punten nodig voor dit advies :-(
Wim Van den Wyngaert
Honored Contributor

Re: Memory leak

Then I'll ask a different question.

52$ of the time is taken by SYSTEM_PRIMITIVES_MIN / MMG_STD$ALLOC_SYSTE

What is this ? It can't be that the program is in "alloc" for 52% of the time, I hope.

The problem is that it will be difficult to get a correction for a few months that are left. So, if it could be solved by a reboot I would be very happy.

Wim
Wim
Willem Grooters
Honored Contributor

Re: Memory leak

A reboot won't help much if the problem is within the software. It will keep consuming memory until you boot it again unless you find the hole and fix it.
Memory fragmentation might be one cause; but given your description, I would look for the code executed in those new minutes and concentrate in the code tree that is executed. also check for asymnconous code that may get triggered and allocates chunks of memeory.
Willem Grooters
OpenVMS Developer & System Manager
kari salminen
Advisor

Re: Memory leak


If your application is using LIB$GET_VM / LIB$FREE_VM, a memory leak bug in LIBRTL.EXE
may be the cause, it got fixed in VMS 7.3-2 LIBRTL ECO 2, I don't know if there is an ECO for VMS 7.3

A LIB$GET_VM may expand the process region when there are sufficient contiguous bytes in the memory zone to satisfy the request.

ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.3-2/VMS732_LIBRTL-V0200.txt

You may find the PQUOTA tool useful for analyzing memory leaks, latest version V2.0 runs on VAX, Alpha and Itanium,

http://vms.process.com/scripts/fileserv/fileserv.com?PQUOTA
Hoff
Honored Contributor

Re: Memory leak

There are two choices: either find and fix the leak, or use the $purgws/restart sequence and a large pagefile and/or a reboot when the stock market gets busy.

It may be cheaper to throw some disk storage (pagefile) and some quota and some memory at this case; to buy enough headroom for the daily restart.

If following the former path, the intrepid explorer needs to first find what structures are being leaked. This can be through examination or through instrumentation.

One obvious variation here is to speed up the migration off of OpenVMS.

Robert Gezelter
Honored Contributor

Re: Memory leak

Wim,

As Hoff mentioned, "long running" does not necessarily imply "no latent problems".

Without knowing how the application is structured, it is difficult to guess. Having done similar applications in the past, I can see many situations where such a thing could happen.

In this situation, I would suggest both palliative measures and a longer-term fix. For palliative measures, a larger page file is definitely a start, and an automated restart at a quiet time.

For a longer term fix, it might be useful to get one or more sets of process dumps to identify the nature of the "memory leak". It is not unlikely that it is a small, discrete fix.

- Bob Gezelter, http://www.rlgsc.com