Online Expert Day - HPE Data Storage - Live Now
April 24/25 - Online Expert Day - HPE Data Storage - Live Now
Read more
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

PGFLQUOTA leak with FTP

SOLVED
Go to solution

PGFLQUOTA leak with FTP

It has been found that the PGFLQUOTA of FTP never decreases. When new FTP connections are established the PGFLQUOTA is consumed but when the connections are closed the PGFLQUOTA is not decreased. As times passes, more and more writeable pages are gradually
added to the end of p0 space. If the system is left on long enough, it could eventually run out of quota.

Does any one have any idea why this problem occurs?
25 REPLIES
Hoff
Honored Contributor

Re: PGFLQUOTA leak with FTP

Looks to be a bug in the code. A leak. There are various reasons why this can arise within an application, but there's little that can be done to resolve this case without access to the source code. (Eons ago, there was a bug within sys$creprc that caused a four-page leak, for instance.)

This reply assumes that this is the TCP/IP Services product (there are several IP stacks around) and the current version and current ECO, and that this arises on a recent or current version of OpenVMS Alpha or OpenVMS I64, and that this is involving the ftp server and not the ftp client.

Assuming this is the current version and ECO of the TCP/IP Services product, log a problem report with HP with a reproducer.

In the short term, you could periodically restart the FTP server; that should clear this up. (If you have the source code to ftp available, that's another discussion entirely.)
Wim Van den Wyngaert
Honored Contributor

Re: PGFLQUOTA leak with FTP

Don't get it. FTP client and server are both alive only during a session. Do you mean that you keep the session alive for a long time ?

Wim
Wim

Re: PGFLQUOTA leak with FTP

The FTP server is an OpenVMS system and the client may be any other. The OVMS system is kept on without restarting the system or the service. The clients may come and go as they like. Even though the FTP connection is closed, the PGFLQUOTA is not decreased.

Re: PGFLQUOTA leak with FTP

Hoff,
Can you please explain to me what all could be reasons for the memory leak?
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

which process are your referring to ? TCPIP$FTP_1 ? Which version and architecture of OpenVMS and which version of TCPIP ?

$ UCX SHOW VERS ! should give info

For each FTP client connection to the OpenVMS system, a new network process will be created named TCPIP$FTPCnnnnn. This process terminates, if the FTP connection is terminated by the client. Only the FTP Server TCPIP$FTP_1 process stays around.

If there would be a 'memory' leak affecting PGFLQUOTA, this could only exist in the FTP server process.

What symptoms of this leak are you seeing ?
Are you using SHOW PROC/QUOTA/ID=ypid-of-FTP-server> ?

The Paging file quota: value shown is the remaining page file quota for this process. If some more is allocated for each FTP connection and not de-allocated or even re-used, then you would see pagefile quota decreasing over time.

Volker.
Hoff
Honored Contributor

Re: PGFLQUOTA leak with FTP

>Can you please explain to me what all could be reasons for the memory leak?

Resolving a leak requires source code access, or a whole lot more time and effort in reverse engineering this than this particular bug is probably worth. Do you have source code access to the particular ftp daemon here? (If you don't have that source code, there's not that much that can be done here short of a very substantial investment in software engineering. Other than forwarding this problem report over to HP support with a reproducer.)

As for the trigger, this isn't a small topic area. There are a vast number of triggers here. I have presented day-long courses on advanced programming tips and techniques and tools specifically for OpenVMS tailored for various sites. And that time only begins to cover some of what's involved here. And which introduces how to try to avoid these leaks (and a close cousin to leaks, the heap corruption), and how to more quickly find them when they arise. For an overview of some of what has been in the training in this specific topic area, see http://64.223.189.234/node/401 and some of the articles referenced there.

Most any dynamically-allocated data structure can be lost during processing, and there can be fragmentation of the heap that leads to increased virtual memory use. The sys$creprc flaw mentioned earlier leaked pagefile; the system service was not releasing all of what it had consumed. There are many other operations where the application must perform some sort of tracking and release; neither OpenVMS nor C do not offer any form of garbage collection, though there are ways to incorporate some techniques that avoid the need to track memory.

John Gillings
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

In general the PAGFILCNT (=remaining PGFLQUOTA) of a process does not increase. Although it's possible to delete virtual memory, it tends to be rarely done, because there's very little benefit, and the conditions under which you can do it are fairly strict and difficult to achieve. For most processes VIRTPEAK is monotonic increasing and PAGFILCNT is therefore monotonic decreasing.

A long lived process may continue to grow even if it doesn't have a leak. It just means its work load is gradually expanding. Hopefully the rate of growth diminishes over time.

Monitor the process long term and look at the shape of the growth curve. If it's linear, you probably are dealing with a leak (but if you don't have access to the source code, there's little you can do about it). If the growth is asymptotic, there's probably natural and benign.

>If the system is left on long enough,
>it could eventually run out of quota.

Possibly true, but if it hasn't happened yet, and the projected point at which it will happen is a long way into the future, it's not something you should be too concerned about.
A crucible of informative mistakes

Re: PGFLQUOTA leak with FTP

Volker,

>which process are your referring to ? TCPIP$FTP_1 ?
Yes, I am referring to the TCPIP$FTP_1 process.

>Which version and architecture of OpenVMS and which version of TCPIP ?
OpenVMS is Alpha V8.3-1h1 and TCP/IP is V5.6 Eco3 with the latest patches installed.

>What symptoms of this leak are you seeing ?
If there is a lot of incoming FTP with my system , %PGFLQUOTA_CNT can grow, I mean, the quota is partially consumed. When all those FTPs (open channels) with my system finish, the channels are released , the process TCPIP$FTPC000XXX DIES ,AND ONLY TWO CHANNELS BG'S, OPENED AT TCPIP$FTP_1. BUT PAGFILCNT of TCPIP$FTP_1 is not decreased

>Are you using SHOW PROC/QUOTA/ID=ypid-of-FTP-server> ?
The information about the leak is gotten from SDA formatting the JIB (Job Information Block) of the TCPIP$FTP_1 process and displaying with F$GETJPI and PAGFILCNT

Re: PGFLQUOTA leak with FTP

John,

>Although it's possible to delete virtual memory, it tends to be rarely done
How is this done?

>Monitor the process long term and look at the shape of the growth curve.
I can see that each 48 or 54 minutes the consumption of PGFLQUOTA grows. But there is not an explanation for that , becase the number of FTP session DON'T GROW.

>but if it hasn't happened yet, and the projected point at which it will happen is a >long way into the future
It has happened.

Re: PGFLQUOTA leak with FTP

Can anyone help me out with the deleting the virtual memory once used?

What is the difference between LIB$FREE_VM and LIB$DELETE_VM_ZONE ?

Is it true that "LIB$FREE_VM never gives memory back to the system when it is freed. It keeps it for future use." ?

If it is, what is the way out of this?
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

there is NOTHING you can do about TCPIP$FTP_1, if this process would actually have a memory leak and therefore would be consuming it's pagefile quota over time.

If you can prove that this is true and you can prove - or at least demonstrate - this case, document your findings and raise a call with HP. Only TCPIP engineering could do anything about this, if there is a real problem.

Don't waste your time thinking about LIB$*VM* routines, as they all require access to the source code of the application, i.e. TCPIP$FTP_SERVER.EXE.

As a workaround, you could restart the TCPIP FTP service, if the remaining pagefile quota approaches ZERO. Use @SYS$STARTUP:TCPIP$FTP_SHUTDOWN.COM and TCPIP$FTP_STARTUP.COM. Note that this will temporarily disrupt FTP traffic.

You could also raise the SYSGEN parameter PQL_MPGFLQUOTA, if you don't want to restart FTP every once in a while.

I've tested a simple FTP connection to our OpenVMS Alpha V8.2 TCPIP V5.5 system and the remaining pagefile quota of TCPIP$FTP_1 was the SAME before and after the connection.

You should be able to prove your point by:

$ SHOW PROC/QUOTA/ID=
$ DIR/FTP localhost::login.com
$ SHOW PROC/QUOTA/ID=

If you do this 100 times and if the decrease in pagefile quota remains constant for each FTP connection, you've proven your point. You could then even predict, when TCPIP$FTP_1 is going to fail and restart (see above) the FTP service early enough.

Volker.

Wim Van den Wyngaert
Honored Contributor

Re: PGFLQUOTA leak with FTP

The leak doesn't seem present on 5.3 eco 2.

Also check sys$sysddevice:[tcpip$ftp]tcpip%ftp_run.log for messaages. And opcom.

Wim
Wim

Re: PGFLQUOTA leak with FTP

Volker

Let us assume that I have access to the codes. No questions on that please. I am not at liberty to talk more on that.

Please tell me more about why the problem might have occured and also about the LIB$*VM* routines
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

a typical scenario for a 'virtual memory leak' like this would be, if the FTP server allocates some virtual memory (via malloc or an explicit LIB$GET_VM call) and 'forgets' to deallocate this data after finishing the new connection or starting the child process to handle the new connection. Or it deallocates the memory, but not the whole chunk but only a subset of the allocated memory.

You need to look at the code to determine, where this may happen. If you look at the data at the end of P0 space, you may be able to guess from the contents of memory, which data structures are being allocated and check the code for these types of data structures.

All of this requires familiarity with the code itself !

You could use LIB$SHOW_VM calls to collect and display statistics about the no. of bytes allocated/freed etc. Look at the OpenVMS documentation (HP OpenVMS RTL Library (LIB$) Manual) on how to use the LIB$*VM* calls.

Volker.
Hoff
Honored Contributor

Re: PGFLQUOTA leak with FTP

Consider open-sourcing the particular ftp source code involved here, and we'll look at and fix it.

lib$get_vm and malloc() are only one of many ways that an application can leak memory.
GuentherF
Trusted Contributor

Re: PGFLQUOTA leak with FTP

Hi Hari!

Since you own that code, put a wrapper around all the alloc and dealloc calls and you'll see where the deallocs are missing.

/Guenther
John Gillings
Honored Contributor
Solution

Re: PGFLQUOTA leak with FTP

Hari,

> Can anyone help me out with the deleting
> the virtual memory once used?

It's unlikely to be applicable in your case, but for completeness, here's the theory:

Virtual memory is created by expanding a region with $EXPREG. For 32 bit user mode applications, and the default heap, the only region you're dealing with is P0. LIB$GET_VM will $EXPREG on your behalf as required.

To delete virtual memory you use $DELTVA, BUT, you need to be certain that the pages being deleted won't be touched, and, if the pages aren't at the end of the region, they won't be available for $EXPREG to reuse, so there's no benefit. All you do is lose memory.

Memory created by LIB$GET_VM for the heap is never deleted because it's too difficult to manage allowing the heap to expand and contract. However, memory that is freed with LIB$FREE_VM can be reallocated.

Even assuming your code uses GET_VM and FREE_VM correctly, it is still possible to get VM leakage with pathological patterns of allocations and deallocations. The simplest way to illustrate this is with the default allocation algorithm (first fit):

allocate small
allocate big
deallocate small
deallocate big
(repeat many times)

Since the "big" object is at the front of the list, it's used to satisfy the next small request, leaving a free block too small to satisfy the next big request. Although the code is correct, VM will expand, with a free list alternating a small object and big-small object until it hits a limit. This type of heap fragmentation is called "checkerboarding"

If possibly, reordering the sequence to:

allocate small
allocate big
deallocate big
deallocate small

will fix the problem. Alternatively, seggregate big and small objects into different heaps, or manage your own "lookaside" free lists of standard sized objects. Another possibility is to use the slower "best fit" algorithm when allocating memory.

All this is academic if you don't have direct control over trhe code in question.
A crucible of informative mistakes

Re: PGFLQUOTA leak with FTP

Hoff,

I am afraid that I dont have the authority to open source the code.
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

as this seems to be an easily reproducable problem with TCPIP V5.6 ECO 3 and you have access to the sources, did you also go the route to try with ECO 2 and if the problem is not reproducable with that version, look for changes in the source code ?

Volker.

Re: PGFLQUOTA leak with FTP

Thank you Volker and John for the info. Can you just confirm this one thing more?

Is it true that "LIB$FREE_VM never gives memory allocated by LIB$GET_VM back to the system when it is freed. It keeps it for its future use"?

Do we have to use LIB$CREATE_VM_ZONE, then LIB$GET_VM specifying the zone-id from the LIB$CREATE_VM_ZONE, and then call LIB$DELETE_VM_ZONE with zone-id to properply release the memory back to the system?
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

do you know about the OpenVMS Heap Analyzer ? See the HP OpenVMS Debugger Manual:

http://h71000.www7.hp.com/doc/82final/4538/4538pro_contents_003.html#toc_chapter_12

This will allow you to look at each memory allocation/deallocation in detail. There is even a Sample Session (in chapter 12.5), which shows how to find a memory leak.

Volker.
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

if you carefully read the documentation OpenVMS RTL (LIB$) manual, you'll find that all the LIB$*VM* routines are layered on LIB$GET_VM_PAGE and LIB$FREE_VM_PAGE to obtain blocks of virtual memory for use by the LIB$*VM* routines.

The description of LIB$FREE_VM_PAGE says: the page or pagelets are returned to the processwide pool and are available to satisfy subsequent calls to LIB$GET_VM_PAGE.

This implies, that the virtual address space associated with those pages is not being deleted.

There are no $DELTVA calls in the [LIBRTL] facility, except some to handle allocation failures.

Volker.
Volker Halle
Honored Contributor

Re: PGFLQUOTA leak with FTP

Hari,

the more recent versions of OpenVMS also ship the LIBRTL symbol table file. Using this, you can obtain the LIB$*VM* statistics counters from any process using SDA.

First check, whether SYS$LIBARY:LIBRTL.EXE and LIBRTL.STB have the same date. If not, you have installed some patch including LIBRTL.EXE, which did not also include the associated .STB file. This will most liekly cause the following to not provide the expected results.

$ ANAL/SYS
SDA> SET PROC TCPIP$FTP_1 ! or any other process
SDA> READ/IMAGE SYS$LIBRARY:LIBRTL

SDA> SHOW SYMB/ALL/VALUE LIB$$GL
...
SDA> SHOW SYMB/ALL/VALUE LIB$$GQ
...
SDA> EXIT

The following symbols represent the LIB$*VM* statistics:

LIB$$GL_GETVM_C - Number of successful calls to LIB$GET_VM
LIB$$GL_FREVM_C - Number of successful calls to LIB$FREE_VM
LIB$$GL_VMINUSE - Bytes still allocated

LIB$$GL_GETPG_C - no. of calls to LIB$GET_VM_PAGE
LIB$$GL_FREPG_C - no. of calls to LIB$FREE_VM_PAGE
LIB$$GL_PGINUSE - no. of VM pages still allocated

and the same as above for the LIB$*VM*_64 calls (allocation from P2 address space):

LIB$$GQ_GETVM_C_64
LIB$$GQ_FREVM_C_64
LIB$$GQ_VMINUSE_64

LIB$$GQ_FREPG_C_64
LIB$$GQ_GETPG_C_64
LIB$$GQ_PGINUSE_64

Using this technique, you can determine, if the expansion of the P0 virtual address space of TCPIP$FTP_1 is actually caused by calls to the LIB$*VM* routines.

Volker.
Richard W Hunt
Valued Contributor

Re: PGFLQUOTA leak with FTP

While I cannot speak for the contents of the FTP server process or the exact sequence of calls for LIB$ VM routines, I can say with absolute certainty that this exact problem has been seen since VMS v 2.0 and there are very few solutions. The problem is in the garbage collection routines that are used when you release something to VM.

As described earlier, it is partly due to the "random" allocation of different packet sizes. They might actually be only a few different sizes, but they arrive and release randomly because of the nature of a server process with varying numbers of files to PUT and GET.

My first run-in with the memory "leak" was with a big honkin' FORTRAN program and dynamic text-string allocation using string descriptors. Essentially, the string allocation paradigm cause memory to be very quickly checker-boarded with fragments that were too small to be re-used.

A second symptom occurred after a very long time - the program started spending a LOT of time in the LIB$FREE routine doing a huge number of page faults - because VM had grown high enough to overflow the WQQUOTA + WSEXTENT combined, and on this particular machine, enough users were on the system that BORROWLIM and GROWLIM had become significant factors as well. The long time in the VM routines came about because when you DID try to free up a chunk of memory, the free-up routine had to step through the structures of each previously released memory chunk to see if it could merge the chunk being released to any already-released chunks. It actually tries to merge adjacent released chunks. But that means it has to identify and merge the released chunk's neighbors. So it thrashed the system with tons of page faults stepping through a gazillion little memory pool fragments.

At the time my solution was simple. Rather than worry about releasing memory, just let the program EXIT. That way, it would release its private pool of memory wholesale rather than retail.

This problem is really a flaw ... well, a feature ... of the algorithm as applied to random-arrival, random-departure, varying sized memory chunks.

My suggestion is this: If there is a time to schedule it such that you are pretty sure no one will get tripped up by it, terminate the FTP server and restart it. That will reset the size of the process.

I have an idle-job checker. It runs every so often and is allowed to decide if the FTP server has been idle. I also know from system profiling when it is and isn't safe to whack that process if I need to. It might not be possible for you to profile it, but short of totally re-inventing the VM chunk allocation and reclamation algorithm, I don't know how you would be able to do any better with the code. Short of doing a one-size-fits-all method by creating fixed-size lookaside lists that are as big as you'll ever need, and just recycle them.
Sr. Systems Janitor