Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

GETJPIW hanging in UWSS

 
SOLVED
Go to solution
Sam Weiner
Occasional Advisor

GETJPIW hanging in UWSS

I've checked the shared image cookbook and every place else I can think of so I figured I'd try here.

In a UWSS, we do a GETJPIW for IMAGECOUNT and LOGINTIM. It never comes back. I would expect it to return SS$_SUSPENDED if the target process was not available to provide the requested information. Am I missing something?

This is Alpha 8.2.

Thanks, Sam
18 REPLIES 18
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

Is the current mode Kernel or Executive?

If Kernel, repeat the test with Executive.

Does the same source code work in user mode?

Source code exemplar, please?
John Gillings
Honored Contributor

Re: GETJPIW hanging in UWSS

Sam,

If you're in a UWSS, what mode are you executing? What IPL? Are ASTs enabled?

If it's anything above USER, or IPL 0, you're probably waiting for delivery of a user mode AST, but since you're in an inner mode, it can't be delivered until you get back to user - deadlock.

This should be easy to check. Look at your process from SDA while it's hung and see if there are any user mode ASTs pending:

SDA> READ SYSDEF
SDA> SHOW PROCESS (find your process)
SDA> VALIDATE QUEUE PCB+PCB$L_ASTQFL_U

If that's the issue, you'll need to use the $GETJPI asynch form and use some mode & IPL appropriate mechanism to synchronize. $GETJPI documentation says if an AST is specified it executes in the mode of the caller.
A crucible of informative mistakes
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Hi Sam,

Did this used to work and just stop working or is this a new development?

What language are you using?

Is it Kernel or Exec mode?

What are you doing in the AST? (trying to use RMS to output something?)

Cheers Richard Maher
Sam Weiner
Occasional Advisor

Re: GETJPIW hanging in UWSS

To provide some requested context this is a C application and occurred during a test system run. It has happened a few times over the past year.

The UWSS is in kernel mode and was called during process rundown via the PLV$PS_KERNEL_RUNDOWN_HANDLER entry. It was trying to clean up entries in a global section so was trying to see if the process which owned each entry was still around. Multiple user ids are involved which is why the getjpi needs to be done with privileges.

It really didn't need to know the LOGINTIM in this case but was using a common routine. I'm planning on doing away with that complication.

The getjpiw call specified EFN$C_ENF and an IOSB but no AST.

All AST modes are shown as enabled in the process running the UWSS. The target process, if I'm looking for the right one, doesn't seem to be around anymore so I can't check it.

SDA> VALIDATE QUEUE PCB+PCB$L_ASTQFL_U reports 12 entries. However, this application makes use of ASTs for timers and I/O so that doesn't say much without decoding the entries.

Just to complicate matters, an attempt was made to stop the process so it is now stuck in pending delete mode. Unless someone has a bright idea of how to further investigate this, I'm going to reboot the system tomorrow and concentrate on trying to prevent this problem in the future as laid out above. Additional suggestions are welcome.

Thanks to all who responded. I hadn't really looked at this stuff with the right mindset before.

Thanks, Sam
John Gillings
Honored Contributor
Solution

Re: GETJPIW hanging in UWSS

Sam,

What state is your process in? That should help determine what it's waiting for.

Walking down the AST queue isn't that hard. Use:

SDA> READ/EXEC
SDA> READ SYSDEF
SDA> FORMAT @(PCB+PCB$L_ASTQFL_U)

which will format the entry as an ACB. From there it should be possible to figure out roughly where it comes from. Step along the queue by adding "@" to the front of the address expression:

SDA> FORMAT @(PCB+PCB$L_ASTQFL_U)
SDA> FORMAT @@(PCB+PCB$L_ASTQFL_U)
SDA> FORMAT @@@(PCB+PCB$L_ASTQFL_U)

etc..

If the output doesn't make sense, and you're on Alpha or Itanium, try:

FORMAT/TYPE=ACB64

Of most interest are the _AST and _ASTPRM fields as they should point back to the AST routine and parameter. Hopefully enough of a clue to work out that they are. Look for P0 addresses and compare with the linker map of your image.

As for unblocking your process... I think that's a reboot! Welcome to the world of inner mode programming, and yes, you need to think MUCH harder about everything you do, especially stuff which can potentially block.
A crucible of informative mistakes
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

My two-points and two-cents response: this rundown code (and code which unfortunately remains unknown) appears to be more complex than it probably should be.

When operating in an inner mode, do the minimum work in the outer-most mode available and feasible, and then exit to user-mode code. Log the process-level details into the section during the image startup, and have the rundown log the minimum and exit. Any required cleanup can then be performed asynchronously by a clean-up process, within another and "appointed" application presently active, or at the next application startup.

Depending on how you're storing data, this clean-up approach also usually catches (most) cases where the system fails or the power fails; cases without the rundown having occurred.

If the current rundown-handler design is to be maintained, I'd look to use executive-mode rundown here, too.

http://labs.hoffmanlabs.com/node/1349

It's also feasible to perform user-mode cleanup of an exiting process using the lock manager, and this requires zero inner-mode code.

http://labs.hoffmanlabs.com/node/492

And I'd ensure the code is not linking against the user-mode RTLs; it's easy to get that included if the UWSS image, and user-mode RTLs and such isn't reliable in inner-mode code.

Here are some other discussions of UWSS design considerations:

http://groups.google.com/group/comp.os.vms/search?group=comp.os.vms&q=UWSS&qt_g=Search+this+group
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Hi Sam,

> The UWSS is in kernel mode and was called
> during process rundown via the
> PLV$PS_KERNEL_RUNDOWN_HANDLER entry.

I'm just trying to get this right in my head. You've got a KERNEL mode UWSS with a rundown handler that calls into another "common" UWSS (or RTL) to do the GETJPI? Or just the rundown handler calls GETJPI?

Also just making sure it's getjpiW and not getjpi?

I take it you've linked /NOPROTECT in this case? (Sorry if it's a stupid question but are you sure there are no (if !sts printf(something)s in there?)

Cheers Richard Maher
Sam Weiner
Occasional Advisor

Re: GETJPIW hanging in UWSS

> I'm just trying to get this right in my
> head. You've got a KERNEL mode UWSS with a
> rundown handler that calls into another
>"common" UWSS (or RTL) to do the GETJPI? Or
> just the rundown handler calls GETJPI?

[smw] The rundown handler calls into a routine in the UWSS (in the kernel routine list of the protected shared image) and that routine calls a routine which does the getjpiw. Just one UWSS.

> Also just making sure it's getjpiW and not
> getjpi?

[smw] Yep:

status = sys$getjpiw(EFN$C_ENF, &pid, NULL, &item_list, iosb, NULL, 0);

[smw] where item list requests JPI$_LOGINTIM and JPI$_IMAGECOUNT.

> I take it you've linked /NOPROTECT in this
> case? (Sorry if it's a stupid question but
> are you sure there are no (if !sts
> printf(something)s in there?)

[smw] The option file has "protect=yes". We are very careful to not do I/O in these routines and probe any addresses before use.

[smw] Note that the basic structure of all this is a couple of decades old. The problem shows up in a test precisely to check the cleanup routine when the application process dies suddenly. So this is a stress test for this area.

[smw] To Hoff, the cleanup routine runs through various structures in the global section doing sanity checks and undoing work in progress (this is a database system with transactions.) The real work is done by the next process when it finds a flag set which triggers the completion of the cleanup work which may require I/O.

[smw] Interesting articles on your site. Thanks.

[smw] A major (or even moderate) rework is probably not in the cards given the position of the VMS port of this product. At least not unless a customer actually hits this problem.

[smw] John, the FORMAT of the AST queue mostly shows some blocking ASTs we set on files to get notifications of an event which requires all the cooperating processes to take action. I don't spend enough time with SDA so this was a new one for me.

[smw] I think given the low frequency with which this occurs along with the situation in which it occurs, I'm going to use what I've learned so far to remove the LOGINTIM request from the getjpiw. I've also noticed some other things as I've been going over other routines that I haven't looked at in years. If it continues to run into this hang problem, I have more information and tools to use.

Thanks, Sam
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Hi Sam,

[smw] The rundown handler calls into a
routine in the UWSS (in the kernel routine
list of the protected shared image) and that
routine calls a routine which does the
^^^^^^^^^^
getjpiw. Just one UWSS.

And this routine (that does the getjpiw) is also in the "one UWSS"?

> I take it you've linked /NOPROTECT in this
> case? (Sorry if it's a stupid question but
> are you sure there are no (if !sts
> printf(something)s in there?)

[smw] The option file has "protect=yes". We are very careful to not do I/O in these routines and probe any addresses before use.

Well to answer my question in the interests of imparting useful information I guess the answer would be "Yes Richard, you are correct that we are not linking with /PROTECT"?

Otherwise why would you be cherry-picking which clusters to protect?

Here's a coupls of suggestions: -

1) Seeing as it's "Just one UWSS" then LINK/PROTECT and then come back an tell me what it really said
2) Run an analyze/image of your UWSS and see what it's linking against

"We are very careful to not do I/O" really? Perhaps the odd sys$putmsg somewhere in the bowels of a common routine that was never meant to be called from inner-mode but already had all the $getjpiw stuff you wanted so what the hell?

Anyway, that's me had enough of extracting teeth. See ya.

Regards Richard Maher

Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

Get out of kernel mode. Either entirely out of inner-mode code, or use an exec-mode exit handler if you can. I prefer to avoid maximizing the mode for these cases; I want to design to minimize use of inner-mode code. Or removing the inner-mode code in its entirety.

Executive and particularly kernel have limits in what you can call and what you can do. And a related factor: the LINK command that was used. Many user-mode calls don't work in kernel, so it's usually best to exclude those via the LINK.

Get out of the exit handler as quick as you can, too, and simplify the exit handler as much as you can. I've already posted my suggestions for alternatives.

If you want us to look at the code, then you're going to need to post (more of) it. From what code has been posted, I can't tell if that sys$getjpiw call is correctly constructed or if it is a latent system-crasher.

I just got done doing an exec-mode UWSS for a customer. They're (usually) not particularly difficult, but the rules and the limitations and the requirements are definitely not well documented.
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

FWIW, a SYS ECO just popped out for OpenVMS I64 V8.3-1H1 with various fixes for problems in sys$getjpi. Some of these same issues may (or may not?) affect OpenVMS Alpha V8.2, and (if so) HP may or may not generate an ECO kit.
Sam Weiner
Occasional Advisor

Re: GETJPIW hanging in UWSS

Richard,

[smw] The option file has "protect=yes". We are very careful to not do I/O in these routines and probe any addresses before use.

Well to answer my question in the interests of imparting useful information I guess the answer would be "Yes Richard, you are correct that we are not linking with /PROTECT"?

[smw2] I now understand the question better. There is only one protect= in the option file and it covers all modules in the image so I guess we could have used /protect instead.

2) Run an analyze/image of your UWSS and see what it's linking against

[smw2] The only image it is linking against is:

Shareable Image List

0) "SYS$PUBLIC_VECTORS"

Hoff,

Doing things in exec mode sounds interesting but as I understand it the rundown handler at least starts in kernel mode. However, I don't see us moving that way at this time. Maybe another time. I am going to create an item in our tracking system to look at this area more. When it will bubble up the priority list is another matter.

I'm attaching (if I did it right) the routine which does the getjpiw, the link /opt file, and the PLV. I probably should have done so earlier. The whole system is on sourceforge and even one of the VMS Freeware volumes if anyone really cares.

Again, thanks to all who responded.

Sam
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

There's a kernel-mode rundown pointer in the PLV, and there's an exec-mode rundown pointer.

http://labs.hoffmanlabs.com/node/1349
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

And here is a pointer to what looks to be the package in question:

http://digiater.nl/openvms/freeware/v80/gtm/
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Hi Sam,

> [smw2] I now understand the question better.
> There is only one protect= in the option file
> and it covers all modules in the image so I
> guess we could have used /protect instead.

/PROTECT and protect=yes are not synonyms and not functionally identical.

> Shareable Image List
>
> 0) "SYS$PUBLIC_VECTORS"

Ok, let's stick with the same symptom but a different cause. I'm guessing (because you haven't left much choice) that there's lots of lock manager activity going on and that such locks are not likely to be held at Kernel mode? Either way, could you be experiencing a race condition where some/most times the locks have been cleaned-up (or never instantiated) before your Kernel rundown handler gets called, yet in some corner-cases you find yourself waiting in kernel mode for an exec mode AST to be delivered to your process?

Anyway, let me go crazily way out on a limb here and actually provide a small code example. The attached code can also be found at Mar 27, 2006 in: -
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=940030

and

Feb 12, 2009 in: -
http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1312923

but as they could be out of date so I'll attach the coe again.

The example has relevance in that it is an Exec mode UWSS that does a CMKRNL when and only when it needs to. May not solve your problem, but if you stay in Exec mode most of the time then you'll certainly be rebooting your machine less :-)

So I suppose my last question (before I get to accuse the Rev. Hoff, in the Library, with the Pseudo-Device Driver :-) is "Is there anything *any circumstances* to cause your Kernel rundown handler to wait for something that will only be set from an outer mode?

Regards Richard Maher
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Sam,

I've attached some documentation for the previous UWSS code here. Also, search my UWSS code for the word "churlish" and an example of waiting at a lower mode. (Maybe there is a bug in VMS that's affecting you, but I doubt it. I'd also like to see a good example of the WAIT_AT_CALLERS_MODE flag, but I doubt I'll see that also)

Hoff,

> I just got done doing an exec-mode UWSS for a customer.
> They're (usually) not particularly difficult,
> but the rules and the limitations and the requirements
> are definitely not well documented.

I just about fell off my chair at this volte-face to end all back-flips! But why no Pseudo-Device Driver? Hmmm?

Anyway, I hope Damascus was nice, and it's good to have you back in the fold! Care to share your example with us? What did it do? I'm sure Gilly is champing at the bit to critique your argument-stack probing and validation!

Cheers Richard Maher
Hoff
Honored Contributor

Re: GETJPIW hanging in UWSS

The UWSS-level probe operations (__PAL_PROBER, __PAL_PROBEW, et al) I used with the UWSS here were targeted toward maintaining the stability and integrity of the operating system; to avoid having an errant argument causing the exec-mode code to romp on something critical.

Mounting a security attack against this particular UWSS seems, well, more effort than just calling an obviously-named entry point to gain privileged access. The central purpose of this particular UWSS was to breach OpenVMS security, after all.

But it was fun to apply the memories of these areas of OpenVMS; of how to crack and how to harden a UWSS or a pseudo-device driver.
Richard J Maher
Trusted Contributor

Re: GETJPIW hanging in UWSS

Hi Steve,

> Mounting a security attack against [this]
> particular UWSS seems, well, more effort
> than just calling an obviously-named entry
> point to gain privileged access. The
> central purpose of [this] particular UWSS
> was to breach OpenVMS security, after all.


Please define "this". Which/what UWSS are you talking about?

Cheers Richard Maher