Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Odd behavior for GETQUI-related operations

 
SOLVED
Go to solution
The_Doc_Man
Advisor

Odd behavior for GETQUI-related operations

Scenario:  We are a DoD site with special logging and extra security rules.  I am building a program in BASIC to let my Help Desk folks do things to jobs in a queue that they normally would not be able to do without privilege/ACLs.  The intended users will not be getting the DCL prompt when this code is finally working right.  (Otherwise I wouldn't have bothered with this method.)

 

I have a program that is way too long and too complex to post, but the key parts are that it does several things in sequence relevant to the SYS$GETQUI service calls. 

 

1.  Does a GETQUI CANCEL_OPERATION to reset a context variable

2.  Does a GETQUI DISPLAY_ENTRY on a specific entry number, does a lot of item-list data retrievals.  The call includes the option to force the context to BATCH + WILDCARD mode so that it lets me make another call for the same context, because the next step is...

3.  Does a GETQUI DISPLAY_FILE on the (single) file that is the batch file in question.  I need to get the file's fully qualified file spec and file ID, nothing else.

4.  Does a GETQUI CANCEL_OPERATION to reset the context variable for the last time.

 

All else after those calls is just formatting the stuff returned from the item lists and distributing the data to where I need it to go.  If I run the queue probe program from a privileged account, it works OK.  Runs first time, every time.  There are no compile errors and no bad item-list items.  Final status code is SYSTEM-S-NORMAL, hex code %X00000001.

 

Here is the head-scratcher.  My test account has rights identifiers that match one of the ACLs on the applications queues that are the target of this operation.  The ACLs are all (ACCESS=READ+SUBMIT) to allow me to read the queue content from this program that will eventually become the code to which the session is CAPTIVE.  (Remember, the goal includes that the users will not see the DCL prompt.) 

 

I run the test program in a way that allows me to test it and exit (rather than force the CAPTIVE logout).  The program fails with an INVCONTEXT message from the GETQUI calls.  So... I do (from the same interactive session) a SHOW ENTRY/FULL of the target entry number - and it works fine, doesn't complain about inacessible data, doesn't leave any blanks.  That is, the content of the queue is perfectly readable.  Next, I run the BASIC program again.  And this time it works perfectly even though I have not changed queue settings, code, or the states of any jobs in the queue!

 

I am explicity saying the sequence is

 

$ RUN program (which fails)

$ SHOW ENTRY /FULL entry-num

$ RUN program (which succeeds)

 

No other commands intervene in the above sequence.  The complied program brackets its DISPLAY_x calls with CANCEL_OPERATION calls; therefore the next run should be unable to see any remnant of the queue context from a prior invocation.

 

If I manually run the program twice in a row without the intervening $ SHOW ENTRY command, it fails both times, which means I probably AM cleaning up after myself in the code. 

 

Does that mean that $ SHOW ENTRY is leaving something behind?  This is driving me crazy because it means my program behavior is erratic for no reason I can discover.

 

Security+ Certified; HP OpenVMS CSA (v8)
11 REPLIES 11
Hoff
Honored Contributor

Re: Odd behavior for GETQUI-related operations

 

Explicitly initialize the relevant variables, if you're going to re-use the structures.   To create a new stream, pass a negative one value in via the context.  This after you reset the context and before you reuse any of the variables and structures, too.

 

Are there any asynchronous routines — ASTs or threads or such — anywhere in the application?   You repeatedly reference $getqui and not $getquiw, which implies there is asynchronous code around, or that you're doing your own synchronization.

 

The error referenced is INVCONTEXT and that's presumably intended to be SS$_INVCONTEXT.     Might that error be SS$_BADCONTEXT?   (SS$_BADCONTEXT is missing from HELP /MESSAGE, too.  That's probably an OpenVMS bug.  But I digress.)  There's no INVCONTEXT error listed in the local system error definitions, nor in the associated documentation.

 

Always specify an IOSB.   Then check the IOSB.  After every call.  Even if the call you're using is not asynchronous.

 

Any other system service or RTL calls interspersed into the sys$getqui sequence?   (The $getqui call is not particularly reentrant nor is this call particularly modular, so... bad things can happen.)

 

Look for process quota differences, in addition to process privilege differences.

 

Though SS$_INVCONTEXT/SS$_BADCONTEXT likely has nothing to do with privileges and where you're looking here, enable and check for use-of-privileges and access failures via SET AUDIT — you already have these and the most interesting ones enabled, most likely — and see if there are any alarms or audits generated when the program succeeds, and when it fails.  This probably won't turn up any issues, though.

 

Embed debugging.   SS$_DEBUG or otherwise.   Get an image dump at the error, and have a look at the carcass.   (With the X Windows debugging discussed there, you can even debug in the context of the CAPTIVE account, aiming the output else-terminal.)

Hein van den Heuvel
Honored Contributor

Re: Odd behavior for GETQUI-related operations

[edit: did not see Hoff's reply before composing mine. Some overlap. Oh well] 

 

This behaviour could be version/platform specific please indicate. Did you test other versions/platforms?

 

I don't understand step 1 in the program. That seems bogus.

What value do you provide for the context cell?

There should be nothing to cancel... if it is an image context.

However, the documentation reads:

 

"Once established, wildcard mode remains in effect until one of the following actions causes the GQC to be released:

  • $GETQUI returns a JBC$_NOMORExxx or JBC$_NOSUCHxxx condition value on a call to display characteristic, form, queue, queue manager, or entry information, where xxx refers to CHAR, FORM, QUE, QMGR, or ENT.
  • You explicitly cancel the wildcard operation by specifying the QUI$_CANCEL_OPERATION function code in a call to the $GETQUI service.
  • Your process terminates."

 

That's odd, but that's what it indicates.

This context appears to be survive image exits.

 

That jives with it working when first run.

 

The SHOW QUEUE command implementation uses, and clears, teh default context 0 (see also extract from $GETQUI below)

 

You mention SYS$GETQUI only, not SYS$GETQUIW.

This call may finish asynchroneously.

May we assume that the program actually uses SYS$GETQUIW and/or uses SYS$SYNCH?

 

I suspect that the value provided for the memory pointed to by the context parameter in the CANCEL call is important.

Is it initialized 0?

 

Maybe, just maybe, the context address is important.

Using the same variable all over, or differen variables with copied contents?

Try using a static variable in common or map ?

 

In step 2 you mention BATCH and WILDCARD, but I sooner expected QUI$V_SEARCH_FREEZE_CONTEXT

 

If you still can't figure it out, then you may want to reduce teh program a bunch and attach to this topic ?!

 

Hope this helps some,

Hein

 

 

#define QUI$_JOB_CONTROL_GQC 87         /* Reserved for Digital (Use to send GQC to job control process) */

 

CONTEXT:

 

"Address of a longword containing the number of a context stream for this call to the $GETQUI system service. If the argument is unspecified or 0, the service uses the default context stream (#0).

To generate a new context stream, the specified longword must contain --1. $GETQUI then modifies the longword to hold the context number for that stream of operation. The context is marked with the caller's mode (user, supervisor, executive, or kernel). Any attempt to use that context in successive calls is checked and no call from a mode outside the recorded mode is allowed access.

To clean up a context, make a $GETQUI call using the QUI$_CANCEL_OPERATION function code and specify the address of the context number as the context argument."

The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

Apologies for being imprecise.

First, I am using GETQUIW as opposed to GETQUI then SYNCH calls.

Second, you are correct - it is BADCONTEXT, not INVCONTEXT.  I was posting from memory and after a long day, sometimes my memory is less precise than at other times.  I don't recall offhand from where the error is detected but I look both at the system service call return AND the IOSB to verify that the call worked.  In the case when I'm running privileged, the same program never fails unless I give it a bad entry number, so both the service call status return and the IOSB return are consistent in that case.

Third, the initial GETQUIW to CANCEL_OPERATION was suggested in the HP documentation I saw as a way to assure that I didn't catch a leftover context.

Fourth, between the initial and final GETQUIW/CANCEL_OPERATION calls, the ONLY system service calls are for the GETQUIW/DISPLAY_ENTRY and GETQUIW/DISPLAY_FILE. 

Fifth, I am sensitive to "leftover" variables, so the BASIC program in question is careful to ONLY use internal variables for everything except the entry number, which it gets via a call LIB$GET_SYMBOL.  The options are "THIS_JOB" or a number, and where appropriate, the code will use the QUI$M_SEARCH_THIS_JOB option for the input item-list option-mask.  However, I am exclusively using this based on an explicit entry number.

The GETQUIW uses an input item-list variable for the entry number and another input item-list variable for options QUI$M_SEARCH_ALL_JOBS + QUI$M_SEARCH_BATCH + QUI$M_SEARCH_WILDCARD.  That last part comes because of something in the GETQUI document called "NESTED WILDCARD" mode of operation.  I might well be confused but the FREEZE_CONTEXT option seems more oriented towards the case of F$GETQUI where I would find a job and then do a bunch of individual F$GETQUI calls on it, one attribute/variable at the time.  I am in a program that can diddle with item-lists and do its data retrieval wholesale rather than retail, but what I can't do in a single system service call is to look at file details from a GETQUI/DISPLAY_ENTRY.  That's why the GETQUI/DISPLAY_FILE is needed.

The GETQUI documentation in the System Service Calls manual is about as clear as a pile of dyspeptic dragon droppings but this is what I think I'm supposed to do.  The part that gets me bonkers is that running the same program twice in a row fails, but running the program, doing the DCL SHOW ENTRY command, and then running the program a second time, the second run works.   If I log out and log back in, the symptoms are reproducible.  However, if I pick another entry number without logging out first, the THIRD run of that program also works.

The suggestion of forcing the context variable to -1 is something I will try on the theory that the "leftover" is the internal context structure, though I would have thought that bracketing the DISPLAY_x calls with CANCEL_OPERATION calls would have also prevented leftovers.

 

 

Security+ Certified; HP OpenVMS CSA (v8)
The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

Hoff,

No ASTs are involved.  No service calls exist in the sequence except the GETQUIW/CANCEL_OPERATION, GETQUIW/DISPLAY_ENTRY, GETQUIW/DISPLAY_FILE, and GETQUIW/CANCEL.  I also replied to Hein's post so I will try to not duplicate my answers for your post.

The only RTL call I make is to LIB$GET_SYMBOL as the source of the entry number, and it is outside of the GETQUIW sequence.

The process quotas for the process that fails are an exact match for a privileged process that always works.  I essentially cloned the account for testing because the persons who will eventually run this are not privileged but will access a lot of things similar to what the privileged users access.  Therefore, they have a lot of WSQUOTA and WSEXTENT.  They have large amounts of PGFLQUO, JTQuota, BYTLM, FILLM, etc.  Automatic Working Set adjustment is enabled for the system as a whole.  No ASTs are in use except for the ones implied by the file system if I turn on the trace options built into the code to dump the values returned from the DISPLAY_x calls (i.e. the stuff implied by the item lists).  But ASTLM is at 250, which ought to be enough.

You are right, I have use-of-privilege already set up for auditing, this being a U.S. Government site, and you are also right that nothing oddball shows up.  We should be using ACLs, not privileges, to access the data objects for the queue, and I have object access failures turned on for auditing, too.

 

Security+ Certified; HP OpenVMS CSA (v8)
The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

Now it gets REALLY sticky.  First, I updated the code to explicitly set the context variable to -1 before the first call to GETQUIW/CANCEL_OPERATION, and thereafter I use the one and only context variable in the program.  That had no effect vs. prior versions that didn't preset the context variable.

To describe what I've tested, let me use this short-hand.  When I say "run probe" I am speaking of the program that runs in USER mode and uses the GETQUIW sequence discussed earlier in the thread to probe the information of a specific queue entry.  When I say "SHOW ENTRY" then I am using the DCL SHOW ENTRY command to probe the queue entry.  Of course, DCL commands generally run in SUPERVISOR mode.

In each of these tests, I log out of the test account and log back in again.  That way I get a fresh process context.

Test 1:  (target job already running as entry 88)

Log in

Run Probe 88  .... fails  (bad context)

Run Probe 88 .... fails  (bad context)

SHOW ENTRY 88 .... works, shows entry data

Run Probe 88 .... works, shows entry data

Run Probe 87 .... works by correctly reporting "No Such Entry"

 

Test 2:  (job not yet running)

Log in

Run Probe 89 .... fails (bad context)

From another session, submit a job

Run Probe 89 .... fails (bad context)

SHOW ENTRY 89 .... works, shows entry data

Run Probe 89 .... works, shows entry data

 

Test 3:  (multiple jobs running)

Log in

Run probe 90 .... fails

Run probe 91 .... fails

SHOW ENTRY 89 .... fails with error No Such Entry (which is correct)

Run Probe 90 .... works, shows entry data

Run Probe 91 .... works, shows entry data

 

Test 4:  (jobs not running yet)

Log In

SHOW QUEUE of a queue for which no ACLs are attached.

SUBMIT job to a queue with ACL that allows access

Run Probe 92 .... works, shows entry data

 

Test 5: (job running)

Log in

Run Probe 93 .... fails, bad context

$ A = F$GETQUI("CANCEL_OPERATION")

Run probe 93 .... works, shows entry data

 

I've done other variations on this sequence, but here is what I see:  No matter how many times I run the probe program, I get an "invalid context" UNTIL the first time I do a SHOW QUEUE or SHOW ENTRY (doesn't matter which one) or F$GETQUI, after which I can run the probe and get correct results on ANY queue and ANY entry.  It is almost like the queue information my program would read isn't loaded until DCL does something to my process context to load all queue information.  After that, I can run the probe reliably.

At NO TIME do I muck with any process privileges from inside or outside a session.  At NO TIME do I alter ACLs from inside or outside a session.  The ONLY variables here are whether a job exists or not (to be probed) and when I choose to run a DCL queue-related command.  Note, by the way, that the F$GETQUI didn't specify BATCH or GENERIC or SERVER or TERMINAL for the queue type.  Just a plain old CANCEL_OPERATION.

So... my question is, what is it that I am not doing in the probe program that SHOW QUEUE and SHOW ENTRY and F$GETQUI are doing to prepare my process for subsequent queue probes?  The documentation didn't seem to require any other predecessor step to the GETQUIW/CANCEL_OPERATION call.

 

Security+ Certified; HP OpenVMS CSA (v8)
abrsvc
Respected Contributor

Re: Odd behavior for GETQUI-related operations

One thing that sticks out to me as far as differences is that the DCL SHOW command has some level of priviledge associated with it where the Basic program has not.  Try "installing" the program with a priv or 2 if you can and see if the behavior changes. 

Dan

==============================================

From the system services manual for GETQUI(W):

The caller must have manage (M) access to the queue, read (R) access to the job, or SYSPRV or OPER privilege to obtain job and file information.  If the caller does not have the privilege required to access a job specified in a QUI$_DISPLAY_JOB or QUI$_DISPLAY_FILE operation, $GETQUI returns a successful condition value. However, it sets the QUI$V_JOB_INACCESSIBLE bit of the QUI$_JOB_STATUS item code and returns information only for the following item codes: 

QUI$_AFTER_TIME

QUI$_COMPLETED_BLOCKS

QUI$_ENTRY_NUMBER

QUI$_INTERVENING_BLOCKS

QUI$_INTERVENING_JOBS

QUI$_JOB_SIZE

QUI$_JOB_STATUS

 

Hein van den Heuvel
Honored Contributor

Re: Odd behavior for GETQUI-related operations

OpenVMS Version ?

Platform?

One exaple of an actual program line(s) with the SYS$GETQUIW call, preferably with the declarations of the variables used.

Which of the multiple SYS$GETQUIW calls signals the error?

Ideally the smallest possible version of a reproducer, but no smaller.

Hein

 

 

Hein van den Heuvel
Honored Contributor
Solution

Re: Odd behavior for GETQUI-related operations

Well, I'm starting to think this is a hard OpenVMS bug.

I found an Basic GETQUI example (Digits: in the old CLT::BASIC notes files :-) and it shows the same behaviour (attached)

I tried on OpenVMS V8.3-1H1  ( Itanium of course ).

The workaround is easy. Just do NOT specify a context ( or 0% by VALUE)

If you absolutely must have a context (I doubt it), then toss a gratuitous F$GETQUI into (SY)LOGIN.COM

To test, I submit a silly 1 liner x.com with $ wait 1:0:0

Next SPAWN RUN SYSGETQUIW_EXAMPLE_1  You'll see that the only accepted context value there is -1, but it still does not 'cancel' the 'super' context. SYSGETQUIW_EXAMPLE_2 just works.

Hein

 

The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

Dan,

 

I was aware of the requirement for READ access, which is why I've got ACLs on the queues.  The programs run in the context of the user process, so if the account doesn't have the required Rights Identifiers, it isn't supposed to work.  I'll try to avoid too much digression, but you should know that the D.o.D. is a bit skittish about broad-brush privileges.  On the other hand, they'll let ACL/Rights IDs occur because they are fine-grained.  So my code these days is ALWAYS designed to run with the program NOT installed with privilege.  Instead, if the user has the right context, it works.  If not, it doesn't, and the user gets privilege violations (i.e. %SYSTEM-F-NOPRIV or the like).

 

However, here is one of the curve balls in the game.  Earlier I mentioned that the user account I'm testing is a clone of a privileged account that has rights, quotas, limits, etc. to match the source account - but only two privileges:  TMPMBX and NETMBX.  (I can justify those because in an SSH environment, those are required to allow my users to log in at all.)  So we have two accounts - the high-privilege and low-privilege accounts that, except for the privilege differences, are identically configured.

 

If I run my program from the privileged account, it works first time, every time, and does not balk at any of the values I request to be returned.  If I run the same exact program from the non-privileged account, I see the behavior as previously described.  But curve-ball #2 is that once this program starts working, it doesn't balk at any of the returned values either.  I am forced by that observation to conclude that my problem is something done by DCL to the process context that a non-privileged program cannot do - but if my program is privileged, whatever it is that I want to do happens "automagically."  So I have to conclude that my code isn't really doing something wrong.  I don't recall seing documentation about "non-privileged programs need to do X, Y. and Z before they can use the SYS$GETQUIW calls" and that is why my program doesn't now do X, Y, or Z.

 

Security+ Certified; HP OpenVMS CSA (v8)
The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

I'm going to toss in that extra F$GETQUI( "CANCEL_OPERATION" ) since the users will be captive to a script that after defining some other stuff, runs the code in question.  I can sneak in the lexical in the pre-amble to running the actual program.  Thanks for your research, Hein.  I was beginning to doubt my own sanity.  (Of course, the guys around me don't doubt that I've used up my sanity quota long ago...)  As wordarounds go, an extra lexical is not a high price to pay.

 

Security+ Certified; HP OpenVMS CSA (v8)
The_Doc_Man
Advisor

Re: Odd behavior for GETQUI-related operations

I tried to post code snippets but it blew up on me.  I'm having some trouble with this interface and because it is on a government system, I cannot chose an alternate browser without first purchasing Archimedes's Lever (the lever from an Archimedes comment that given a big enough lever, you could move Heaven, Hell, and Earth).

 

Environment:  OpenVMS 8.4 on rx2800 i2, fully patched through 2015Q3, the last "big" patching being, I believe, VMS84I_SYS v 6.0 and a LIBOTS patch that was also recent.  Some of the patches of category 3 are not installed because we don't run the particular program.   We have the newest TCPIP (5.7-13ECO5) and the TELNET PAT.  We have recent LMF, LDAP, SSL, FIBRE_SCSI, UPDATE 10.0, etc.

 

Security+ Certified; HP OpenVMS CSA (v8)