Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Can three looping processes, zero priority, affect performance significantly?

 
SOLVED
Go to solution
Clark Powell
Frequent Advisor

Can three looping processes, zero priority, affect performance significantly?

We have a 4 cpu, ES45 running Cache and GE Flowcast web framework. This app is a hybrid web app that uses a browser user interface and telnet to gather data from Cache. Occasionally, it will terminate the telnet process incorrectly and create a looping LOGINOUT.EXE process at zero priority and with the "no delete" bit set. We have to reboot to get rid of them but, since OpenVMS recognizes the futility of these processes, it sets the priority to zero, they don't do any IO, and they don't interfere much. At least that what I think but now we are having a debate on this topic and I would like to poll you on this issue.

We have 500 interactive users and three of these looping, zero priority processes. My position is that these processes are not going to cause a problem until there are 10 or 20 of them. At that point the book keeping or balancing done at a much higher priority would begin to have an affect. The opposing view says that there are 4 processes in the compute queue and that's too high and higher than normal. I say, there are three looping processes that will always be waiting to be in the compute queue so that number is not so high. Of course, nobody has asked the users if they are experiencing a slowdown...

What do you all think, who's correct? or let's say, more correct?

thanks
Clark Powell
21 REPLIES
Hoff
Honored Contributor
Solution

Re: Can three looping processes, zero priority, affect performance significantly?

Processes looping at zero are inconsequential, unless you're right at your process limit, or unless the applications are accessing or holding some resource that causes interference with other system activity, or if you're running other processes at zero.

Something similar to the bug you're describing here was fairly common telnet bug back around TCP/IP Services V5.0, but that was fixed in an ECO and in more recent releases.

Clearing the PCB$M_NODELET bit and fully deleting these processes can be performed with some simple kernel-mode code, if that's permissible to load and run here.
Jon Pinkley
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Clark,

Software priority zero still has higher precedence than the idle loop. One thing the idle loop does is zero pages for the zeroed page list, which used to satisfy demand zero page faults. If the idle loop never gets time, then demand zero page faults will take longer, since the pages must be zeroed before they are given to the process requesting them.

I don't know how much you would notice that.

What will notice are other processes running at priority zero.

What is setting PCB$V_NODELETE? I thought this was normally only set for special processes like the swapper, NETACP and NET$ACP (OpenVMS Alpha Internals, Scheduling and Process Control V7.0, bottom of page 419)

RE: "since OpenVMS recognizes the futility of these processes, it sets the priority to zero". Where did this information come from?

Priority zero isn't a "normal" priority that I would expect VMS to set. Are you sure you don't have something that is doing this lowering of priority?

Jon
it depends
Robert Gezelter
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Clark,

I concur with Hoff wholeheartedly.

The wonders of statistics. Three of the four items in the compute queue are the "almost dead" priority zero processes.

"Items in the COM state" as a metric is a crude bludgeon. It does not take into account processes that are CPU "black holes" (e.g., SETI, or a background job whose purpose is to soak up otherwise wasted CPU cycles doing some other socially useful task). Such jobs often use virtually (pardon the pun) no resources.

If one wants a metric that is more useful, one can gather the data oneself and calculate the effective Computable queue (excluding the CPU soak tasks and zombies).

When using metrics to measure performance, it is always important to remember the limitations of the data.

- Bob Gezelter, http://www.rlgsc.com
Clark Powell
Frequent Advisor

Re: Can three looping processes, zero priority, affect performance significantly?

We are running tcpip V5.6 - ECO 3 and we had not experienced any problems terminating telnet in any manner for a number of years until we installed the GE web framework. The product is, as I said, a hybrid of web client and telnet so it tends to do things differently resulting in this problem.

What program will un-set the no delete bit and what is the risk of using it? We have to be 100% confident in the program's safety before we can use it at any time of day. We are a hospital and cannot tolerate any interuption in our patient care computing. Since this is a clustered system, we can reboot without interrupting service and don't mind the extra work of rebooting if it means a near zero chance of a service interuption.

thanks
Clark Powell
Clark Powell
Frequent Advisor

Re: Can three looping processes, zero priority, affect performance significantly?

Reguarding the Zero Priority

In the past, when a similar incident would happen we would set the priority to zero. Then during this latest loop problem I would happen on the processes and find that they were already zero priority. I assumed that my colleague had done this but later when I mentioned it he said no. Then discussing this problem with Colorado Springs (or whatever it is now,) they said that there was a feature in OpenVMS that would identify these processes and lower the priority automatically. I don't know if this is true but it make sense.

Reguarding the Impact
I haven't pulled out the T4 data yet but I have an impression that as time goes by, with a fixed number of looping processes, the proportion of Kernal mode processing might go up in these situations. Or it could be that each additional process creates a disproportionate increase in Kernal mode processes. Kernal mode runs 37 to 76 out of a max of 400.
Hoff
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

1: This is a kernel bug in either the GE software, or in TCP/IP Services.

Accordingly, call your support organization. Probably whomever is supporting the GE software here, though this might be transferred along to the TCP/IP Services folks if the core bug is in that and not in the GE software.

2: Program? I usually create a slightly customized version of something akin to this tool:

http://labs.hoffmanlabs.com/node/767

You'd need a little more code here than what is in that example, as you'd have to scan the processes and then jump to kernel mode to clear the bit. Scan for the processes via $process_scan or $getjpi or such, jump to kernel, set up a kernel-mode signal handler, convert the PID to the PCB, clear the NODELET bit, and off you go. Typically a few hours to design, code, test and document this tool.

Might well be feasible to script this via delta or xdelta, but there's less error-handling available when using that approach.
Hoff
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

FWIW...

Kernel looping at priority zero is still typically inconsequential.

Yes, it'll look nasty in the process mode displays.

Yes, it'll look nasty in any COM queue displays.

Yes, it'll look nasty in the CPU time displays.

Yes, it'll consume a little memory.

In reality, you've usurped what used to be known as the NULL process. OpenVMS itself loops like this while it's waiting for work, too; OpenVMS has an idle loop.

These looping processes will remain inconsequential until and unless you've other tasks at zero or you fill up available process slots or if there is resource contention or there are too many BG devices or other such; the other sorts of contention discussed earlier.

Oh, and if you're running OpenVMS under emulation (and which you are not doing here), these loops can confuse idle loop detection during hardware emulation.

Go talk to your support folks for the GE software, and get the process leak fixed.

And strictly for giggles, toss a SET PROCESS /SUSPEND at these processes, and see if that locks them down.

Or pay for somebody to write a NODELET tool for this for you, and skip the reboots.

Stephen Hoffman
HoffmanLabs LLC
EdgarZamora
Trusted Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Are you current with your patches... look into VMS83A_DCL-V0300, it might fix your problem with these processes not getting deleted.
John McL
Trusted Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Less a reply than a question to respondents here... will these priority 0 jobs run to completion of their quantum or will they be interrupted by processes at higher priority?

And Clark you might like to check for any resources that they are holding. If some resources are held there may be impact on the system but that's only a "may". If they hold 1 of 10,000 items it would be no big deal but if they hold 1 of 10 (or less) getting rid of those processes becomes more urgent.
Jon Pinkley
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

RE:"will these priority 0 jobs run to completion of their quantum or will they be interrupted by processes at higher priority?"

They will be preempted by a higher priority kernel thread (process) that has priority PRIORITY_OFFSET+1 more than (zero). The default value for the sysgen parameter PRIORITY_OFFSET is 0, so in that case, a priority 1 process will preempt the priority 0 process.

Reference: page 31 of "OpenVMS Alpha Internals: Scheduling and Process Control"

http://books.google.com/books?id=ydKIsgCiFVsC&pg=PA29&dq=priority_offset#v=onepage&q=priority_offset&f=false
it depends
EdgarZamora
Trusted Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

To elaborate more on my earlier response... I had a similar problem with telnet processes not logging off properly and becoming undeletable detached processes (through normal means). VMS83A_DCL_V0300 fixed this problem. Below is an excerpt from the patch notes. It may not describe EXACTLY what you're experiencing, it didn't mine, but it sure did fix my problem. The patch fixes something with that NODELETE bit being set.

5.2.1.1 Problem Description:

If a parent process with its nodelete bit set spawned a
subprocess, if the subprocess was still active when
terminal attached to it was closed, the parent process
would go into a loop with 100% CPU utilization.

Images Affected:

- [SYSEXE]DCL.EXE


5.2.2.1 Problem Description:

After applying the VMS83A_RMS-V0600 or VMS83A_RMS-V0700
patch kits, a subprocess would go into a compute-bound
loop when its Telnet window on a remote PC was closed.

Images Affected:

- [SYSEXE]DCL.EXE


Clark Powell
Frequent Advisor

Re: Can three looping processes, zero priority, affect performance significantly?

I was hopeful that the patch would solve the problem but when I looked at the patch history:
DEC AXPVMS VMS83A_DCL V3.0 Patch Install Val 09-SEP-2008
so no luck there.

I have sent a crash dump to OpenVMS Engineering and I will report any developements.
John Gillings
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Clark,

On step down from priority 0

If you're worried about the impact of these processes on your CPUs, or the idle loop page zeroing work, why not suspend them?

$ SET PROCESS/SUSPEND

Provided they're not holding any locks, all they will do is consume a process slot.
A crucible of informative mistakes
Volker Halle
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Clark,


Then discussing this problem with Colorado Springs (or whatever it is now,) they said that there was a feature in OpenVMS that would identify these processes and lower the priority automatically.


AFAIK, OpenVMS does not do something like this.

There is one code path in SYS$EXIT, where OpenVMS lowers the own process priority EXPLICITLY to 0 and this is after the final $DELPRC_S (Delete self) call in process exit handling, so that - if the $DELPRC fails - the process would be looping at prio 0 with the $DELPRC status return value kept in R0 !

If this is what you're seeing, look at the PC values of such a looping process. This loop in SYS$EXIT is a 'BRB .', so the PC value would remain constant. Then go into SDA, set context to that process and issue a SHOW PROC/REGISTER. What's the value stored in R0 ?

Maybe this application was setting the NODELETE bit and forgot to clear it on some unexpected error path...

Volker.
Jon Pinkley
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Clark,

Volker's response is an example of why he gets consistently high points awarded.

My bet is with Volker's conjecture. I am expecting the crash dump to reveal that R0 has value SS$_NODELETE, and that the process is looping in SYS$EXIT.

This is one of the code paths that was never expected to execute, because the process is requesting that it be killed, and does not expect to survive long enough to execute the instructions following the $DELPRC. By having the process loop at priority zero, the context is saved so it can be analyzed to determine why the path was taken.

My opinion is that $DELPRC with pad == self should not honor the no delete bit, but that isn't the way it is currently coded. In other words, change the meaning of the nodelete bit to mean, "don't allow another process to kill me, but let me kill myself". The way it is now, the process can't be killed, but it can't do anything useful either.

By the way, the $DELPRC documentation does not mention SS$_NODELETE as a possible return status. It also says this about calling $DELPRC to delete the calling process:

"The Delete Process service allows a process to delete itself or another process. If you specify neither the pidadr nor the prcnam argument, $DELPRC deletes the calling process; control is not returned."

In my opinion, the current behavior is not correct, and should be modified to ignore the PCB$M_NODELET bit when the calling process is the target for deletion.

Can anyone think of a problem that the change would cause?

Jon
it depends
John Gillings
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

re: Jon:

>In my opinion, the current behavior is not
>correct, and should be modified to ignore
>the PCB$M_NODELET bit when the calling
>process is the target for deletion.
>
>Can anyone think of a problem that the
>change would cause?

Well, not really, but then, that's a problem! Setting NODELET isn't something you do without a REALLY good reason. Summarily overriding it when you're in a code path that you didn't expect to be in anyway is definitely NOT the sort of risk that OpenVMS philosophy allows.

You might argue that SUSPend would be more appropriate (less resources consumed, and more obvious in a SHOW SYSTEM display), but then if I remember the code that Volker's talking about correctly, it's possible the $DELPRC could be asynchronous, so SUSP might block a process deletion that would otherwise complete.

Whatever is done in this "impossible" code path is really just papering over some other problem, so silently deleting the process is just hiding it even more. The correct approach is to identify how these processes are getting into an unexpected state and fixing it at the root cause.
A crucible of informative mistakes
Jon Pinkley
Honored Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Even if $DELPRC did not honor the nodelete bit when pid == self, that alone would not solve all problems cause by improper use of PCB$V_NODELET.

For example consider setting the no delete bit in a subprocess, and then killing the parent. In that case, I think the parent will enter an RSN$_ASTWAIT MWAIT state, and would deadlock.

So, as John Gillings said, the real problem needs to be found and eliminated. Since setting pcb$v_nodelet requires privilege, making incorrect use apparent will tend to get the real problem fixed sooner than if it is swept under the rug.

This reminds me of the Twilight Zone episode "Escape Clause"

http://en.wikipedia.org/wiki/Escape_Clause

except that the process doesn't have an escape clause once it is in the BRB loop.

Jon
it depends
Clark Powell
Frequent Advisor

Re: Can three looping processes, zero priority, affect performance significantly?

Here is the looping process and it does exactly what was previously said.

0000023E 18637 ;
0000023E 18638
00000249 18639 $DELPRC_S ; DELETE SELF
00000249 18640 PUSHL R0 ; SAVE ANY ERROR RETURNED
0000025E 18641 $SETPRI_S PRI=#0 ; MAKE NEXT LOOP HARMLESS
0000025E 18642 POPL R0 ; RESTORE THE ERROR FROM DELPRC_S
00000261 18643 20$: BRB 20$ ; ****** FELL THROUGH DELPRC SOMEHOW

The application, Cache, is setting the NODELET bit and the normal exit would clear it but in this case the telent session is initiated from a web browser and when the web browser is stopped by power outage or clicking on the "X" the NODELET bit is not cleared. We don't have this problem with normally initiated interactive telnet sessions but I don't know if that's because the users of such have been properly trained to use "LOGOUT" or if there is a built in protection. But, that question is beyond the scope of this discussion. I think that you all have done a super of analyzing this problem.

thanks
Clark Powell
Clark Powell
Frequent Advisor

Re: Can three looping processes, zero priority, affect performance significantly?

I will talk to GE about their Web Access Flowcast product running on Cache and see if they can come up with a better way than setting the NODELET bit. thanks for the help.
Clark Powell
Cass Witkowski
Trusted Contributor

Re: Can three looping processes, zero priority, affect performance significantly?

Cache sets the no delete bit to prevent processes in Cache from being deleted with the STOP/ID which can cause nasty things with Cache.
Tom O'Toole
Respected Contributor

Re: Can three looping processes, zero priority, affect performance significantly?


Would I be correct in saying by the time a process is in this cul-de-sac, it's too late to clear the NODELET bit, since it will not call delprc again? I have just seen two of these badly terminated cache processes here (basically the same environment as you Clark).

It seems like it would be possible to clear the bit and then send a special kernel mode ast to make the process call delprc again? (or do both in the kernel ast.

This is a bug in cache I guess, which is letting these processes exit the cache databaseenvironment without clearing the bit.
Can you imagine if we used PCs to manage our enterprise systems? ... oops.