Operating System - OpenVMS
1839242 Members
2350 Online
110137 Solutions
New Discussion

Re: submit causing infinite loop

 
Jeff Shulmister
Occasional Advisor

submit causing infinite loop

Hi:
I tried coding some dcl to resubmit itself for three days in the future at 2:00 in the morning...the job wound up looping, submitting thousands of jobs to the que...What's the correct way to do this? I had...

submit.../after="tomorrow+2-2:00"

Thanks,

Jeff S
18 REPLIES 18
Steven Schweda
Honored Contributor

Re: submit causing infinite loop

> submit.../after="tomorrow+2-2:00"

Are you sure that that's what you (or your
procedure) did? Around here, that seems to
work as expected:

alp $ sub harmless.com /after = "tomorrow+2-02:00"
Job harmless (queue SYS$BATCH_ALP, entry 17) holding until 9-APR-2011 02:00
John McL
Trusted Contributor

Re: submit causing infinite loop

I just checked your time string via lexical f$cvtime()and it returned the time I would expect.

Did you test this from within the self-submitting batch job? Maybe the cause of the looping can be found there.
John Gillings
Honored Contributor

Re: submit causing infinite loop

Jeff,

"TOMORROW+2-2:00" seems reasonable to me.

What's the rest of the SUBMIT command?

Do some testing with a minimal job:

RESUBMIT_TEST.COM
$ SET VERIFY
$ self=F$PARSE(";",F$ENVIRONMENT("PROCEDURE"))
$ WAIT 00:01:00
$ SUBMIT 'self' /whatever you like
$ SHOW ENTRY/FULL '$ENTRY'
$ EXIT

The WAIT should give you time to break a submit loop. Wait for the job to finish, then see if the next one is scheduled as you expect. If it starts immediately, just delete the entry. You can also break a loop with:

$ CREATE RESUBMIT_TEST.COM
$ EXIT
^Z

(because "self" is defined as the latest version of the file, which is always a good idea for a self re-submitting job)
A crucible of informative mistakes
Hoff
Honored Contributor

Re: submit causing infinite loop

Are you operating your batch queues in a cluster, and are the system times of the cluster member nodes (badly) skewed?
The Brit
Honored Contributor

Re: submit causing infinite loop

Hi Jeff,
In addition to the responses above, can you; 1) add a "set verify" at the start of the job, and then post the section covering the submit, from the batch job logfile. This will give us the complete submit command, and show what response you got from the system when the batch job re-submits itself.

Dave.
Hein van den Heuvel
Honored Contributor

Re: submit causing infinite loop



Here is a 'simple' error to cause the /AFTER not to be qualifier and thus the job to re-execute immediately.


$ subm tmp.com/para="test /after=tomorrow
:
$ SHOW ENTRY/FULL ... /PARAM=("test /after=tomorrow")

This still first the definition of your command. So yeah... show us the EXACT command please!

fwiw,
Hein



Jeff Shulmister
Occasional Advisor

Re: submit causing infinite loop

Well, I wanted to avoid sending this whole script because it's pretty ugly -- but the answer is in here somewhere. The submit is inside a loop, so the first thought is that it's just inside an infinite loop. However, there are two submits -- one that executes Mon thru Thurs, and one for Friday. And only when it does the Friday submit, does it get stuck creating infinite entries in the que.

What's going on in the loop is that it's attempting to see if the job is already in the que. If not, then it will submit the job, and if it is already there, then it won't. At least, that's the intent. Here's the complete DCL...
The loop starts on the Qsearch1 label:

$ set noverify
$ on error then goto write_error
$ setup hprod
$ set def sys$scratch
$ today = f$cvtime(f$time(),,"weekday")
$ day = f$cvtime(f$time(),,"day")
$ show time
$ if day .eqs. "01" .or. day .eqs. "02" .or. day .eqs. "03" .or. day .eqs. "04"
-
then goto end_process
$ quiz auto=star_exe:check_lockbox_purge.qzc
$ show sym purge_okay
$ show sym beg_range
$ show sym end_range
$ if purge_okay .eqs. "N" then goto end_process
$ purge_range == "''beg_range',''end_range'"
$ show sym purge_range
$ qtp auto=star_exe:purge_lockbox_range.qtc
$ submit/noprint/log=star_log/que=sys$batch -
star_com:convert_lockbox_arc_files.com
$ define distrib_name distrib_fm_error
$ @star_com:distrib_mail.com -
"PURGE_LOCKBOX_RANGE.COM completed" -
sys$scratch:TRANSMITTAL_MSG.TXT
$ write sys$output "Purge completed"
$ end_process:
$ show sym today
$ context = f$getqui("CANCEL_OPERATION")
$ qname = f$getqui("DISPLAY_QUEUE","QUEUE_NAME","ERNIE_BATCH","WILDCARD")
$ qsearch_1:
$ jname = -
f$getqui("DISPLAY_JOB","JOB_NAME",,"ALL_JOBS,TIMED_RELEASE_JOBS")
$ if jname .eqs. "PURGE_LOCKBOX_RANGE"
$ then goto eoj
$ endif
$ if jname .eqs. ""
$ then if today .eqs. "Friday" then goto Friday
$ submit/noprint/que=ernie_batch/log=sys$scratch: -
/after="tomorrow+02:00" -
star_root:[com]purge_lockbox_range.com
$ goto eoj
$ Friday:
$ submit/noprint/que=ernie_batch/log=sys$scratch: -
/after="tomorrow+2-02:00" -
star_root:[com]purge_lockbox_range.com
$ goto eoj ; 342394
$ endif
$ goto qsearch_1 ! 342409
$ eoj:
$ type prd$command:[command]success.msg
$ delete TRANSMITTAL_MSG.TXT;*
$ exit
$ write_error:
$ wso " "
$ define distrib_name distrib_fm_error
$ @star_com:distrib_mail.com -
"PURGE_LOCKBOX_RANGE.COM failed, see log for details"
$ type prd$command:[command]failure.msg
$ wso " "
$ exit 2
abrsvc
Respected Contributor

Re: submit causing infinite loop

My first suggestion would be to change the label EOJ to something else (End_job maybe). EOJ is a valid DCL command and perhaps the CLI is getting a bit confused. It is a simple change and should be easy to test. Notice that only the "Friday" codeflow is close to that label.

Let us know what happens.

Dan
Jeff Shulmister
Occasional Advisor

Re: submit causing infinite loop

Hey thanks guys thanks for focusing me on the "eoj" -- I think I actually see the problem!

It looks like there's an invalid comment in the Friday section, on the "goto eoj"... (should have an '!' rather than a ';'.

So, if that line is being ignored, then it will just keep going back to qsearch1!
John Gillings
Honored Contributor

Re: submit causing infinite loop

Jeff,

> there's an invalid comment in the Friday section

That would do it. But there should be an error message in the log file:

%DCL-W-MAXPARM, too many parameters - reenter command with fewer parameters

Looking at your script, here's a few suggestions that may make debugging easier (though, being a programming language, there will be many alternative opinions).

I try to avoid hard coded file specifications. F$PARSE is your friend. If you're talking about the procedure itself, use:

$ self=F$PARSE(";",F$ENVIRONMENT("PROCEDURE"))

The semi colon is to remove the version number. This will remain correct regardless of where you move the procedure, or even if you rename it.

You can then use "self" to locate related files if they live in the same directory. For example:

$ arc_files=F$PARSE("convert_lockbox_arc_files",self)

$ @'F$PARSE("DISTRIB_MAIL",self)' SYS$SCRATCH:TRANSMITTAL_MSG.TXT

This kind of coding becomes natural, and it makes it trivially easy to move code around, or between systems, as it removes local dependencies.

When dealing with different actions depending on the day of the week, the easiest construct is:

$ GOSUB 'F$CVTIME(,,"WEEKDAY")'
$! Other stuff
$ EXIT
$
$ Monday:
$ Tuesday:
$ Wednesday:
$ Thursday:
$
$ ! Processing for Mon-Thur
$
$ RETURN
$
$ Friday:
$ ! Processing for Fri
$ RETURN
$
$ Saturday:
$ Sunday:
$ RETURN

You could do something similar with "DAY" but that would be rather a lot of labels. Maybe turn it around and use GTS instead of individual tests?

$ IF 'F$CVTIME(,,"DAY").GTS."04"
$ THEN
$ ! stuff to do > 04
$ ENDIF ! Your label end_process

3) Perhaps overkill in this case, but when you have something selectable from a small set, like a day name, a table can sometimes be clearer and easier to maintain than an IF-THEN tree. For example:

$ OnMonday="TOMORROW+2-02:00"
$ OnTuesday=OnMonday
$ OnWednesday=OnMonday
$ OnThursday=OnMonday
$ OnFriday="TOMORROW+02:00"
$ OnSaturday=""
$ OnSunday=""

Now, let the code select itself:

$ after=On'F$CVTIME(,,"WEEKDAY")'
$ IF after.NES."" THEN SUBMIT 'self' /AFTER="''after'" /whatever

This may not be the correct logic for your job, but hopefully you get the idea. You only write one SUBMIT command, and it's very easy to have different times depending on the day. We can also block resubmission on particular days.

That said, rather than try to code the execution schedule in advance by submitting jobs at different intervals, it's often easier to just run the job at the same time every day, and decide if something should be done when it starts to execute. That means it's safe to submit at any time.

I also do my resubmit at the becinning of the job. This helps reduce the risk of something going wrong during the job causing it to abort before the resubmit and breaking the chain. Thus:

$ IF F$MODE().EQS."BATCH" THEN SUBMIT 'self'/AFTER="TOMORROW+02:00"
$
$ ! Work out circumstances, date, day etc...
$ ! maybe use GOTO 'F$CVTIME(,,"WEEKDAY")
$ ! or:
$ IF NothingToDo THEN EXIT
$
$ ! Do stuff

To determine if there's another copy of yourself already scheduled you don't need to scan the whole queue. For the case of jobs owned by yourself DISPLAY_ENTRY is a shortcut. Consider:

$ me=F$GETQUI("DISPLAY_ENTRY","ENTRY_NUMBER",,"THIS_JOB")
$ myname=F$GETQUI("DISPLAY_ENTRY","JOB_NAME",,"THIS_JOB")
$ IF myname.EQS."" THEN myname=F$PARSE(self,,,"NAME")
$ CheckLoop: e=F$GETQUI("DISPLAY_ENTRY","ENTRY_NUMBER",myname,"WILDCARD")
$ IF e.NES."".AND.e.NE.me THEN GOTO CheckLoop
$ IF e.EQS.""
$ THEN
$ ! No matching job scheduled
$ ELSE
$ ! Found job with entry number e
$ ENDIF

Note that this code doesn't hardcode the job name. It also works for interactive mode as "me" will be blank, and we've assumed the job name will be the same as the procedure name.
A crucible of informative mistakes
Jeff Shulmister
Occasional Advisor

Re: submit causing infinite loop

Thanks John...great info!
GuentherF
Trusted Contributor

Re: submit causing infinite loop

There is an "on error.." on top which includes warnings as well. So the bogus DCL line warning should cause a jump to label write_error.

/Guenther
GuentherF
Trusted Contributor

Re: submit causing infinite loop

I am surprised...the "on error" is not honored for a DCL error. Is that a new feature?

With that in mind the bogus "goto" line can cause the loop.

/Guenther
Hoff
Honored Contributor

Re: submit causing infinite loop

GF: ON uses equal or greater matching.

ON ERROR catches ERROR and SEVERE (FATAL) errors.

ON WARNING catches WARNING, ERROR, and SEVERE/FATAL.

GuentherF
Trusted Contributor

Re: submit causing infinite loop

Thanks Hoff! I had this mixed up. Some deprivation of VMS contact.

/Guenther
Jeff Shulmister
Occasional Advisor

Re: submit causing infinite loop

Ok here's a test I ran today. This just kept resubmitting itself over and over...
I thought this was going to go in the que, holding for two weeks from now, but it acted as if it was probably just waiting till one PM -- and, since it was already after 1 PM when I submitted it, it just kept resubmitting itself.

Doesn't the "+13" mean 13 days?
===========================================
strsrv> type test1.com
$ submit test1.com/notify/noprint/keep/que=sys$batch/log=star_log/-
after="today+13"
$ exit
==========================================
Steven Schweda
Honored Contributor

Re: submit causing infinite loop

> Doesn't the "+13" mean 13 days?

Apparently not. Remember this?:

> I just checked your time string via lexical
> f$cvtime()and it returned the time I would
> expect.

That could have been taken as educational.
For example:

alp $ write sys$output f$cvtime( "today")
2011-04-08 00:00:00.00

alp $ write sys$output f$cvtime( "today+13")
2011-04-08 13:00:00.00

Seems to be +13 hours. On the other hand:

alp $ write sys$output f$cvtime( "today+13-")
2011-04-21 00:00:00.00


HELP Date_Time

Pay particular attention to "Combination" and
"Delta".

And, of course:

HELP Lexicals F$CVTIME
Jeff Shulmister
Occasional Advisor

Re: submit causing infinite loop

Steven, thanks for that tip. I never thought of using that lexical to check the value I was loading into the parameter. I'm DEFINATELY adding that to my arsenal!

And John G, also, thanks for the tip on how to test and break the loop. Not being sure exactly what was going to happen, I've always been a little scared to use this parameter. One of those things I thought I didn't even want to test, simply because of what could happen when the test failed. You guys have given me the tools I need on this now, tho, so many thanks!