Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Mail Utility Routines Virtual Memory Problem

SOLVED
Go to solution
Michael Bartman
Occasional Visitor

Mail Utility Routines Virtual Memory Problem

There appears to be a bug in the mail utility routines. If there is a really large message...one that eats up a large part of the available page file quota of the process...the large message can be read once, but not again until the image exits and is restarted. This appears to be due to a failure to deallocate the memory used for the internal buffer when the buffer is a big enough part of the total available memory (over half?). If the process has a larger page file quota, the problem isn't
seen.

I have reproduced this problem in a test program. The source of this test program, as well as a more detailed description of how to reproduce the issue is contained in the attachment.
9 REPLIES
Hein van den Heuvel
Honored Contributor

Re: Mail Utility Routines Virtual Memory Problem


It does not have to be a bug. It could be 'bad luck' with (virtual) memory allocation.

First would instrument the code with calls to GETJPI to get JPI$FREP0VA and JPI$FREP1VA and JPI$FREPTECNT and such and a call to LIB$SHOW_VM

Next I would experiment with stuff like an initial large malloc, folowed by a free, to get the address space expanded in a huge step instead of nickle an diming it.
And maybe retry the experiment with the LINKER OPTION IOSEGMENT=

Or... you could just get more pagfilquo ?!
It's just a little number in a little file :-).

Hein.



Michael Bartman
Occasional Visitor

Re: Mail Utility Routines Virtual Memory Problem

A little more clarification? I don't think "bad luck" is involved, as this happens every time you read a really large message with a very small page file quota. The VM used for buffering the large message isn't freed when you close out the mail session. It is retained, but apparently not used again, until the image exits.

The behavior was first noted in a POP server I'm maintaining, and I wrote the test program to find out if it was the POP code or the mail utility routines that were at fault. The POP server ran continuously, and in use generally had lots of small messages, in multiple mail sessions, before reading the large one, while my test program had the large message in the first session.

When reading small messages the VM is freed between sessions, or even between messages. When reading the large message it is not, so it isn't available when the large message is read again, and so the large message is not read. It's size is still accurately reported, but attempting to read the text fails as if the message was very short (just the headers). If you increase the page file quota enough, the VM used by the large message is freed between messages, and so you can read it as many times as you like.

The problem doesn't appear to be with the absolute size, but with what percentage of the available page file quota a given message uses. Without knowing how the mail utility library routines manage their memory I can't really guess very well as to what they might be doing to cause this, but that's what it appears is happening.

Increasing the page file quota is a valid workaround, but it only masks the bug, it doesn't remove it. It is the solution we gave to our customer until HP fixes the underlying problem...if they couldn't get their users to stop using e-mail as a file transfer protocol anyway... ;-)
Cass Witkowski
Trusted Contributor

Re: Mail Utility Routines Virtual Memory Problem

Is your PGFILEQUOTA on your account larger than WSEXTENT or the SYSGEN parameter WSMAX?

Hein van den Heuvel
Honored Contributor

Re: Mail Utility Routines Virtual Memory Problem

>> I don't think "bad luck" is involved, as this happens every time you read a really large message with a very small page file quota.

Well that's bad luck right there, isn't it? :-)
But I'll explain more...

>> The VM used for buffering the large message isn't freed when you close out the mail session. It is retained, but apparently not used again, until the image exits.

And how do you know this? Please share.

btw... my compliments on a most excellent re-producer. Looks like solid stuff ready for a problem submission through proper channels (not this forum)

Please help us understand whether (virtual) memory is lost and how much.

Of course I agree with your speculation that this is a VM problem, as PAGFILQUO provides a workaround.

But I am speculating further that the VM is not lost, but perhaps not re-useable as a clean sheet of paper. It may be fragmented.

Let's say you malloc 1kb in a first chunk of VM (default allocation is I believe 128 pagelettes) running up to address 1M. Now RMS opens the some mail files and allocates buffers and stuff from address 1M to 1.5M.
Mow comes time for the big message, and VM allocates 10Mb up to address 11.5M. Now run down all context, do your private free's and start again. Now, BY BAD LUCK, that 1kb MIGHT be allocated from the 10Mb chunk, making it 9.999Mb. Subsequently, when that large message comes back, it no longer fit, and a new chunk of 10Mb is needed which may break the bank.

This would resolve itself if the PAGFILQUO was twice as large would it not?

The large malloc I suggestsed, again pure speculation, could make that initial alooc, before RMS, become 12Mb (or 20 or whatever) instead of a few Kb. Now all your mallocs, and that large message will all fit in the one chunk over and over again (unless VM would allocate smack bang from the middle).
At the end of each loop you would have the same nice clean sheet of paper to work with.

So stick in some calls to LIB$SHOW_VM and whatever other routine you can find in this area, like a GETJPI report on what the highest address used is.
And/Or use the debugger primitives to analyze VM.

Perhaps put those calls (or breakpoints for dbg) just after
- while(tmp->next)
or after
- for( msg_num=1;...

>> Increasing the page file quota is a valid workaround, but it only masks the bug, it doesn't remove it.

If there is a bug, then it is that mail just reports "[0x007edfa8]" which normally means all is well... with a warning.

Instead it should return a proper 'error' status with the original error code (reporting a lack of VM probably :-).

>> It is the solution we gave to our customer until HP fixes the underlying problem...

If my wild speculations are correct, then the only bug is error handling and fixing that will not make your problem go away, just make it easier to find earlier.
if they couldn't get their users to stop using e-mail as a file transfer protocol anyway... ;-)

Cheers,
Hein.
Michael Bartman
Occasional Visitor

Re: Mail Utility Routines Virtual Memory Problem

Pgflquo is greater than WSextent, but smaller than WSMAX. As the attached data shows:

WSdef: 1000
WSquo: 4000
WSextent: 2500
Pgflquo: 25000

when the problem occurs. Raising Pgflquo to 200000 makes the problem go away.

This wasn't in the attached data:

SYSGEN> SHO WSMAX
Parameter Name Current Default Min. Max. Unit Dynamic
-------------- ------- ------- ------- ------- ---- -------
WSMAX 360448 4096 1024 8388608 Pagelets
internal value 22528 256 64 524288 Pages

Michael Bartman
Occasional Visitor

Re: Mail Utility Routines Virtual Memory Problem

I understand Hein's points, and he may be right about what's going on, in which case it does come down to a misleading error (saying "that's all!" when the problem is really "I can't read it due to lack of available contiguous VM")

To answer the "how do you know" question, I was watching both my test program and the POP server with SHOW PROCE/CONT/ID=, using the "watch memory" option ("V" key).

When pgflquo is small you can see the memory get allocated, and retained, even when you've stepped the program past the point where it closes out the mail session. When pgflquo is large, you see the memory allocated, and then freed in the same stretch of code.

I admit this isn't conclusive, but it does suggest that a big buffer isn't deallocated when pgflquo is small relative to the size of the message being read. Why is left as an open question. Perhaps it's the fragmentation thing, and perhaps it's something else.

If this was my problem to resolve, instrumenting the test program as suggested would be a good next step. Looking over the mail utility code to see how it's dealing with memory might be informative too. As it is, proving that the problem isn't in my POP code, and finding a usable workaround for my custoemr, is all I need to do with this. If it did turn out to be memory fragmentation there's not much I can do about it...the mail utility routine's use of buffers is beyond my control. At best I can put in some code to check for the problem and log an error in the POP server log file rather than just returning an incorrect message length when this comes up.

I've tried to submit this to HP support, but they can't take it without a contract number, which I don't have (the latest one I know of here is pre-merger and the system won't take it. I hear that this is being worked on, but I have no idea when that will happen).

I will suggest that our customer contact HP support about it, using their support contract, but I wanted there to be at least a chance that engineering wouldn't have to start from scratch, so I've also put it here so that they can have access to the test program at least. Relying on the chain of people that leads from me, through our support, to the customer, to HP support to HP engineering to relay the information seems a bit unreliable. I know this isn't an official support forum, but I heard that HP engineering did glance in here occasionally at least, so maybe they'll be able to find the test program code and other data in the attachment and make use of it when they get the problem officially. If not, at least I tried.

Thanks everyone!
Hein van den Heuvel
Honored Contributor

Re: Mail Utility Routines Virtual Memory Problem

Thanks for that follow up. I appreciate your position and it sounds like you are doing the right things under the circumstances.

IMHO it is HP who is doing the wrong thing by not accepting a well documented problem report which undeniably proves a bug with a better than average description.
They should amost reward you for your effort and graciously accept the problem in order to improve the product, whether you have a contract or not.
The intensity with which they would follow up would of course depend to the level of your support contract, and if there is none then they would not be in a hurry to ffix it at all.

enough ramblings...

Have a great weekend you all,
Hein.




David Jones_21
Trusted Contributor
Solution

Re: Mail Utility Routines Virtual Memory Problem

If I add a call to LIB$SHOW_VM_INFO_ZONE to bottom of the outer loop and got some interesting results if there is a large message. After the first pass, the default zone has 2 areas with the larger having a minimum and maximum (i.e. 1) free block of 2730608 and 6112 bytes not yet allocated. After the second pass, there are now 4 areas, 3 of which have the 2730608 free block. This suggests that the first fit algorithm LIB$GET_VM is using is causing fragmentation.

I added calls at the top of the program to allocate (LIB$GET_VM) and then deallocate a 30 million byte block so that the default zone starts with a single area that can hold all dynamic requests the MAIL routines will make. With this workaround, the program doesn't leak LIB$VM pages.
I'm looking for marbles all day long.
Hein van den Heuvel
Honored Contributor

Re: Mail Utility Routines Virtual Memory Problem

'told you so' in the first reply

:-)


Thank you David for actually going through the motions!

Hein.