Operating System - OpenVMS
1830908 Members
1727 Online
110017 Solutions
New Discussion

CONV$_PURGE_RECLAIM problem

 
Ruslan R. Laishev
Super Advisor

CONV$_PURGE_RECLAIM problem

Hello All!

There is a problem which periodicaly take place:
19-OCT-2007 11:07:04.33 [00 89.253.3.39/intracon]%POP3-I-SHUTMAILCTX, VMS Mail box: deleted/old/sent=1/1/2 msgs, deleted 133472 byt
%CONV-F-OPENIN, error opening CLIENTS$ROOT:[INTRACON]MAIL.MAI;1 as input
-RMS-F-IFI, invalid internal file identifier (IFI) value
%TRACE-F-TRACEBACK, symbolic stack dump follows
image module routine line rel PC abs PC
CONVSHR 0 00000000000301CC 00000000000921CC
PTHREAD$RTL 0 000000000006C1A8 FFFFFFFF8128A1A8
PTHREAD$RTL 0 000000000006C334 FFFFFFFF8128A334
PTHREAD$RTL 0 000000000006D018 FFFFFFFF8128B018
PTHREAD$RTL 0 000000000003D364 FFFFFFFF8125B364
PTHREAD$RTL 0 000000000003CFCC FFFFFFFF8125AFCC
PTHREAD$RTL 0 000000000006F3C4 FFFFFFFF8128D3C4
0 FFFFFFFF8017AF84 FFFFFFFF8017AF84
0 FFFFFFFF80164D68 FFFFFFFF80164D68
SPOP3_SRV POP3_SRV main 85305 0000000000004BF0 0000000000034BF0
SPOP3_SRV POP3_SRV __main 85110 0000000000004100 0000000000034100
PTHREAD$RTL 0 0000000000057718 FFFFFFFF81275718
PTHREAD$RTL 0 0000000000030444 FFFFFFFF8124E444
0 FFFFFFFF80383CE4 FFFFFFFF80383CE4
%TRACE-I-END, end of TRACE stack dump


Is there someone who can me tae a righ direction for investigation ?


Thanks!
11 REPLIES 11
Hein van den Heuvel
Honored Contributor

Re: CONV$_PURGE_RECLAIM problem


I think that what is happenign here that an auto purge was triggered leaving more than 32K bytes freed up (133472) which in turn triggered an mail purge/reclaim, which uses CONV$RECLAIM, which requires (unfortunately, and avoidably!) exclusive access to the main, rms-indexed, mail file (MAIL.MAI). Apparenltly that file was locked locked at that time (incomming Email) and CONV$RECLAIM failed... which is fine, it can be done the next time.

The error handling was broken tough.
This may be a application problem, an MAIL$ problem or a CONV$ problem.
- What exact OpenVMS version was used (patches)
- Looks like this is a calleable VMSmail application, using MAIL$xxx functions, not the DCL MAIL command. Correct?
- CONV/RECLAIM, was not explicitly called right?

A workaround might be to do: MAIL> SET NOAUTO_PURGE or authwise set the NOAUTO_PURGE flag in the VMSMAIL$PROFILE record for the username in question.

You would then have to proceduralize the emptying of the wastebasket and the reclaims.


From MAIL> HELP PURGE :
"An automatic PURGE/RECLAIM is done when the
amount of deleted space in a mail file exceeds 32,767 bytes.
(Mail uses the Convert/Reclaim utility to reclaim space.)"

I think I patched VMSmail once to increase this to a more reasonable number of bytes for todays Mail usage.

hth,
Hein.
Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

Hello, Hein!

Thanks for the answer. This is take place under 8.3/Alpha with all ECOs. An application - it's a POP3 server wich performs a purge/reclaim MAIL.MAI. I dunno how to catch the a reason of the problem at the time.
Hein van den Heuvel
Honored Contributor

Re: CONV$_PURGE_RECLAIM problem

So the program explicitly requests VMSmail to do the reclaim huh?

A few remarks....

The MAIL$ functions like to SIGNAL the results. Are you requesting MAIL$_NOSIGNAL?
You may want to establish a signal handler (LIB$SIG_TO_RET?) around this particular call.

There is both a a very small and very big timing window in mail with respect to reclaim.

1) Just before calling CON$RECLAIM, the MAIL$ functions close MAIL.MAI (or whatever it is called) to re-open it right after.
Something could sneak in between the two and grab an exclusive lock making the re-open file at an unexpected time.

A normal file-locked open is handled gracefully:

$ open/read/write x mail.mai
$ mail

MAIL> dir
%MAIL-E-OPENIN, error opening USER1:[HEIN]MAIL.MAI as input
-RMS-E-FLK, file currently locked by another user

But this is a re-open where success is very much anticipated. I coudl understand less than optimal error handling for that code section.


The long window I refer to is that anything is allowed to open the mail file shared, this again is handled gracefully:

$ open/read/write/share=write x mail.mai
$ mail

MAIL> purge /reclaim
%MAIL-I-RECLPLSWAIT, reclaiming deleted file space. Please wait...
%MAIL-E-OPENIN, error opening USER1:[HEIN]MAIL.MAI;1 as input
-RMS-E-FLK, file currently locked by another user
%MAIL-I-DELMSGS, 0 messages deleted



If I was desperate to find out whether this problem is cause by the short window, I would try the following
- create a dummy shareable with entrypoint CONV$RECLAIM which only does a SYS$HIBER + return.
- define CONVSHR to point to the dummy
- start mail
- request purge/reclaim
- 'see' process hibernated
- open mail.mai exclusive
- wake program
- watch what happens!

But I would not be desperate, just mildy curious.

If it was my problem I would

1) review my error handling/signalling

2) consider NOT using the MAIL$ function to PURGE/RECLAIM but realize that all that function does is to call CONV$RECALIM dynamically (LIB$FIND_IMAGE_SYMBOL).
So I woudl take control, and NOT call MAIL$ to reclaim, but just call MAIL$ to close the MAIL.MAI, then call CONV$RECLAIM on my on terms directly, then call MAIL$... to continue where left of.

3) I would not reclaim anything less than 100KB but maybe at 1MB or so (2000 blocks, 400+ buckets to inspect).

4) the 32K auto-purge threshold is hardcoded value moved into a longword early on in MAIL$MAILFILE_BEGIN

Sure this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting









Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

Hello, Hein!

MAIL$NOSIGNAL... sure!

The piece of code wihg call conv$reclaim, probably it will allow to you to understand better what about I speaking:


/*
** http://www.eight-cubed.com/examples.shtml
** http://www.eight-cubed.com/examples/framework.php?file=lib_establish.c
** ISUBS.LIS: %sbttl 'reclaim_handler'
*/

static unsigned handler (
struct chfdef1 * sigargs,
unsigned *mechargs
)
{
if ( sigargs->chf$is_sig_name == SS$_UNWIND )
return SS$_CONTINUE;

if ( sigargs->chf$is_sig_name == SS$_ACCVIO )
return SS$_RESIGNAL;

// sigargs->chf$is_sig_args -= 2;
// lib$signal(&sigargs->chf$is_sig_args);
// sigargs->chf$is_sig_args += 2;

lib$sig_to_ret(sigargs,mechargs);
return 0;
}




if ( deleted && pr_threshold && (deleted > pr_threshold) )
{
unsigned flags = 1;
unsigned short mfns = 0;
unsigned char mfna[512];
struct dsc$descriptor mfile;
struct recl$statistics stat = {4,0,0,0,0};

mfns = sprintf(mfna,"%.*sMAIL.MAI",ctx->pop3ctx$t_maildir[0],&ctx->pop3ctx$t_maildir[1]);
INIT_SDESC(mfile,mfns,mfna);

/*
** Establishing a condition handler especialy for CONV$RECLAIM routine
*/
lib$establish (&handler);

/*
** Performs a calling CONV$RECLAIM routine
*/
if ( !(1 & (status = conv$reclaim(&mfile,&stat,NULL, NULL))) )
{int msgvec[] = {1,status}; sys$putmsg(&msgvec,spop3_putmsg,0,ctx);}

spop3__log(ctx,POP3_RECLAIM,stat.recl$l_scan_count,stat.recl$l_data_count,
stat.recl$l_index_count,stat.recl$l_total_count,status);
}
Hein van den Heuvel
Honored Contributor

Re: CONV$_PURGE_RECLAIM problem

>> The piece of code wihg call conv$reclaim, probably it will allow to you to understand better what about I speaking:


Actually, that confused me a little.

Is that an implementation of how the code already was, or a follow up to my suggestion to do the conv$reclaim manually?

If this is the existing code, then is this executed only when the mail file is closed (after succesful MAIL$MAILFILE_CLOSE).
CONV$RECLAIM requires exclusive access.
It looks like it will have been closed, just judging be the application message SHUTMAILCTX which uses data (MAIL$_MAILFILE_MESSAGES_DELETED) made available by a close.

Looks like you are calling CONV$RECLAIM with the flags argument set to 0, so the CONV$M_SIGNAL flag is not set. This is consistent with the code which calls $PUTMSG in line, not as part of a signal.

So the condition handler shown, is not going to be activated right?

But the call to putmsg, only provides one argument in the vector where there should be an accompagning rms secondary (primary cause) message) ?


fwiw,
Hein.



Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

Hello, Hein!

>> The piece of code wihg call conv$reclaim, probably it will allow to you to understand better what about I speaking:


>Actually, that confused me a little.
It's a piece of code of the real application.

I you have a couple minutes, take a look to :

http://starlet.deltatel.ru/~laishev/work/pop3/spop3_vmsmail.c
conv$reclaim is calling in the vmail_shut_ctx() after MAIL files has been closed.


>If this is the existing code, then is this >executed only when the mail file is closed >(after succesful MAIL$MAILFILE_CLOSE).
>CONV$RECLAIM requires exclusive access.
>It looks like it will have been closed, just >judging be the application message >SHUTMAILCTX which uses data >(MAIL$_MAILFILE_MESSAGES_DELETED) made >available by a close.

>Looks like you are calling CONV$RECLAIM with >the flags argument set to 0, so the >CONV$M_SIGNAL flag is not set. This is >consistent with the code which calls $PUTMSG >in line, not as part of a signal.

>So the condition handler shown, is not going >to be activated right?
I hope that the "condhandler" is not activated.

>But the call to putmsg, only provides one >argument in the vector where there should be >an accompagning rms secondary (primary >cause) message) ?
I dunno where I can get the RMS condition status/stv.


Thanks, Hein!
Hein van den Heuvel
Honored Contributor

Re: CONV$_PURGE_RECLAIM problem

I''ll consider isolating the CON$RECLAIM + error handling from the refered sode and try. But that will take more time than I'm will to use now.

In the mean time I offer you this question...

The message from spop3__log(ctx,POP3_SHUTMAILCTX ... appears before the CONV message in the base note here.

In code however that message call comes after the convert call.

Is it called twice?
multi-thread error?

Hein.



Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

Hello, Hein!

[quote]
The message from spop3__log(ctx,POP3_SHUTMAILCTX ... appears before the CONV message in the base note here.

In code however that message call comes after the convert call.
[/quote]
Oh, I see. I'm just puzzled...


[quote]
Is it called twice?
multi-thread error?
[/quote]
The piece of code is calling in loop of the main thread.
Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

Tried some tipes & triks... Still conv$purge_reclaim crashed the apps.

It's looks like that the problem in the conv$ code. :-(
Ruslan R. Laishev
Super Advisor

Re: CONV$_PURGE_RECLAIM problem

It's looks like that conv$reclaim crashing due AST has been disabled by "someone", probably by a thread inside in the PTHREAD RTL.
Hein van den Heuvel
Honored Contributor

Re: CONV$_PURGE_RECLAIM problem

Ruslan,

I did not find time to dive into this.
I suspect you are right, as convert is NOT thread safe:

Calleable Utilities:
6.1 Introduction to CONVERT Routines
"These routines are not reentrant and cannot be called from the asynchronous system trap (AST) level. In addition, these routines require ASTs to remain enabled in order to function properly."

http://h71000.www7.hp.com/doc/83final/4493/4493pro_006.html#10_introductiontoconvertroutin

The workaround would appear to be to SPAWN the CONVERT/RECLAIM command.
Now I do not like spawns any more than you, but here it woudl seem ok. If it is not 'worth' to spawn, then it is not worth to reclaim either and visa versa.

Good luck,
Hein.