1839311 Members
2803 Online
110138 Solutions
New Discussion

Re: DECSCHEDULER

 
Kevin Raven (UK)
Frequent Advisor

DECSCHEDULER

Where does scheduler get it's history data from ?
We have been running the following command for the last 3 years with no issues ..but now it fails with an ugly error message.
Is this an indication that scheduler database file , is about to go pop ?

$ ! get history of all job which ran in the last 24 hours
$ ! -----------------------------------------------------
$ minus1day = F$CVTIME("-1-00:00:00","ABSOLUTE")
$ SCHED SHOW HIST/GROUP=*/RECORD-
/OUT=RDTS_LOG:SCHED_200907102240.LIS-
/COMPLETION=ALL/USER=AUTO_OP/START_TIME="9-JUL-2009 22:40:05.34"
%NSCHED-I-NORUNREC, No specified run records found for job 6
%NSCHED-I-NORUNREC, No specified run records found for job 7
%NSCHED-I-NORUNREC, No specified run records found for job 41
%NSCHED-I-NORUNREC, No specified run records found for job 61
%NSCHED-I-NORUNREC, No specified run records found for job 33
%NSCHED-I-NORUNREC, No specified run records found for job 32
%NSCHED-I-NORUNREC, No specified run records found for job 31
%NSCHED-I-NORUNREC, No specified run records found for job 52
%NSCHED-I-NORUNREC, No specified run records found for job 35
%NSCHED-I-NORUNREC, No specified run records found for job 34
%NSCHED-I-NORUNREC, No specified run records found for job 55
%NSCHED-I-NORUNREC, No specified run records found for job 53
%NSCHED-I-NORUNREC, No specified run records found for job 27
%NSCHED-I-NORUNREC, No specified run records found for job 42
%NSCHED-I-NORUNREC, No specified run records found for job 62
%NSCHED-I-NORUNREC, No specified run records found for job 63
%NSCHED-I-NORUNREC, No specified run records found for job 43
%NSCHED-I-NORUNREC, No specified run records found for job 44
%NSCHED-I-NORUNREC, No specified run records found for job 38
%NSCHED-I-NORUNREC, No specified run records found for job 58
%NSCHED-I-NORUNREC, No specified run records found for job 40
%NSCHED-I-NORUNREC, No specified run records found for job 60
%NSCHED-I-NORUNREC, No specified run records found for job 54
%NSCHED-I-NORUNREC, No specified run records found for job 37
%NSCHED-I-NORUNREC, No specified run records found for job 39
%NSCHED-I-NORUNREC, No specified run records found for job 57
%NSCHED-I-NORUNREC, No specified run records found for job 59
%NSCHED-I-NORUNREC, No specified run records found for job 46
%NSCHED-I-NORUNREC, No specified run records found for job 65
%NSCHED-I-NORUNREC, No specified run records found for job 28
%NSCHED-I-NORUNREC, No specified run records found for job 45
%NSCHED-I-NORUNREC, No specified run records found for job 64
%NSCHED-I-NORUNREC, No specified run records found for job 36
%NSCHED-I-NORUNREC, No specified run records found for job 56
%NSCHED-I-NORUNREC, No specified run records found for job 29
%NSCHED-I-NORUNREC, No specified run records found for job 4
%NSCHED-I-NORUNREC, No specified run records found for job 5
%NSCHED-I-NORUNREC, No specified run records found for job 30
%NSCHED-I-NORUNREC, No specified run records found for job 49
%NSCHED-I-NORUNREC, No specified run records found for job 48
%NSCHED-I-NORUNREC, No specified run records found for job 51
%NSCHED-I-NORUNREC, No specified run records found for job 50
$
$ ! create a new log if necessary (every 90 DAYS)
$ DAYS_90 = F$CVTIME("-90-00:00:00","COMPARISON",)
$ schedlog_file = f$trnlnm("NSCHED$LOGFILE")
$ if f$search(schedlog_file) .nes. ""
$ then
$ if F$CVTIME(F$file(schedlog_file,"CDT"),"COMPARISON",) .les. DAYS_90
$ endif
$ endif
$
$
$ Above all works fine ...then it checks if the job is being run on a Friday and ....

$ SCHED SHO JOB GET_STATS_DATA_WKLY/SYM ! will set sched$type
$ if SCHED$TYPE .eqs. "OFFLINE" then - ! if offline job (fri)
SCHED SHOW HIST/GROUP=* /COMPLETION=ALL/USER=AUTO_OP/OUT=RDTS_LOG:SCHED_HIST_200907102240.LIS
SUB VSS$GET_HISTORY: Error 51 at line 29 ?Integer error or overflow
%SYSTEM-F-HPARITH, high performance arithmetic trap, Imask=00000000, Fmask=00000504, summary=12, PC=0000000000000003, PS=82B29380
-SYSTEM-F-FLTINV, floating invalid operation, PC=0000000000000003, PS=82B29380
-SYSTEM-F-FLTUND, arithmetic trap, floating underflow at PC=0000000000000003, PS=82B29380
$ L_ERROR:
$ My_status = $status
$ if My_status .eq. My_aborted then goto L_ABORTED
$ log "SCHED-E-GET_STATS_DATA_WKLY terminated. Error status : %X00000504"
19 REPLIES 19
Peter Elliott
Occasional Advisor

Re: DECSCHEDULER

Hi,
Not too sure about the actual error message you are receiving... But generally, I believe Scheduler writes History records to a Default logfile of NSCHED$:.log
This also requires the logical NSCHED$DEFAULT_LOG to be set up in the Scheduler startup routine (normally to 5)
I've always found most scheduler issues can be fixed by running NSCHED$:DB_UTILITY and compressing the Database.
(Try the command: Sched Show Delete - to see how many deleted record there are in the DB)
Also - if you haven't already - maybe look to upgrade to the latest version provided by CA: http://supportconnectw.ca.com/public/unijobmgtopenvms/unijobmgtopenvms_supp.asp

Hope this helps...

Peter.
Peter Elliott
Occasional Advisor

Re: DECSCHEDULER

Sorry - the link I sent is probably not the best - The data sheets on the following link are better:

http://www.ca.com/us/products/product.aspx?id=1485#documents
Kevin Raven (UK)
Frequent Advisor

Re: DECSCHEDULER

We rebuild the DB on a regular basis ...once every 6 months ....
Do the historystats come from the VSS.DAT ?
We have a record of jobs from 2006 when the history command is run on another system (mirror of production , that has the issue).


(LNM$SYSTEM_TABLE)

"NSCHED$" = "NSCHED$DATA"
= "NSCHED$COM"
= "NSCHED$EXE"
"NSCHED$CLEAR_RESTART_PARAM" = "TRUE"
"NSCHED$COM" = "SYS$COMMON:[NSCHED.COM]"
"NSCHED$DATA" = "SYS$COMMON:[NSCHED.DATA]"
"NSCHED$DEFAULT_JOB_MAX" = "10"
"NSCHED$EXE" = "SYS$COMMON:[NSCHED.EXE]"
"NSCHED$LBAL$CPU_WEIGHT" = "0.5"
"NSCHED$LBAL$INTERVAL" = "0 00:00:30"
"NSCHED$LBAL$MEMORY_WEIGHT" = "0.5"
"NSCHED$LOGFILE" = "SYS$COMMON:[NSCHED]DECSCHEDULER.LOG"
"NSCHED$MAILBOX" = "_MBA58:"
"NSCHED$REMOTE_SUPPORT_ENABLED" = "TRUE"
"NSCHED$TERM_MAILBOX" = "_MBA63:"
"NSCHED$UID" = "NSCHED$:SCHEDULER$MOTIF.UID"
"NSCHED_DEFAULT_SD_ACTION" = "SKIP"
"RDTS_SCHED" = "SYS_SYSTEM:[OMN$SCHED]"
"RDTS_SCHEDULER_REPORTS" = "SYS_STATS:[RDTS.REPORTS.SCHEDULER]"
"SCHED$FLAG" = "OMN$FLG"
"SCHED$LBAL_GROUP_HOST" = "RDTSC RDTSD"
"SCHEDULER$NODE" = "RDTSC::"
"SCHED_RTL" = "NSCHED$:SCHED_RTL.EXE"
"SCHED_RTL_TV" = "NSCHED$:SCHED_RTL.EXE"
"SCHED_SHUTDOWN" = "AUTO-RESTART"

(LNM$SYSCLUSTER_TABLE)

(SYSMAN$NODE_TABLE)
$ $ DIR NSCHED$:*.LOG/SIZE/DATE

Directory SYS$COMMON:[NSCHED.DATA]

RDTSA.LOG;2 3 14-NOV-2006 10:40:01.45
RDTSA_REMOTE_EXECUTOR.LOG;2
1 14-NOV-2006 10:40:02.35
RDTSB.LOG;2 3 14-NOV-2006 10:41:40.16
RDTSB_REMOTE_EXECUTOR.LOG;2
1 14-NOV-2006 10:41:40.18
RDTSC.LOG;35 4 22-NOV-2006 15:29:09.19
RDTSC_REMOTE_EXECUTOR.LOG;34
1 22-NOV-2006 15:29:09.27
RDTSD.LOG;21 3 22-NOV-2006 15:29:10.52
RDTSD_REMOTE_EXECUTOR.LOG;21
1 22-NOV-2006 15:29:10.58
SCHED$ERR.LOG;1 0 10-NOV-2006 11:58:45.84
VERMONT_CREAMERY.LOG;1
288 10-NOV-2006 11:40:47.78

Total of 10 files, 305 blocks.

Directory SYS$COMMON:[NSCHED.EXE]

SCHED$ERR.LOG;1 0 13-NOV-2006 16:10:35.21

Total of 1 file, 0 blocks.

Grand total of 2 directories, 11 files, 305 blocks.
$

The log file gets rolled over every 90 days see below ....
$ DIR SYS$COMMON:[NSCHED]DECSCHEDULER.LOG/SIZE/DATE

Directory SYS$COMMON:[NSCHED]

DECSCHEDULER.LOG;1 9744 4-MAY-2009 21:24:59.96

Total of 1 file, 9744 blocks.
$

What is the VERMONT_CREAMERY.LOG;1 file used for ???? Funny name .....lol



Peter Elliott
Occasional Advisor

Re: DECSCHEDULER

I think (having read up some more), unless you specify otherwise - the Default Logfile which contains Job History is Vermont_Creamery.log (I believe they were the first company to run Scheduler).
There are also suggestions that on some older versions - Event history was sometimes written to the Vermont_cremery.old file instead of the .log

What version of scheduler are you running ?
($Sched show stat)
marsh_1
Honored Contributor

Re: DECSCHEDULER

hi,

vermont creamery - thats where they started with it , well thats the story they gave us when we had decscheduler when digital still had it.

fwiw

Kevin Raven (UK)
Frequent Advisor

Re: DECSCHEDULER

Looking at our system ......

The logical to assign the scheduler log file is set ...see below and data is being logged into the log file.
This file is rolled over every 90 days ....
The vermont Creamery log file is dormant and not being logged into.

The VSS.DAT file was last rebuilt in October 2007 .....

So when I run the History command ...where is picking the data up from ...
the history shows job runs prior to the date stamps on both the log file and VSS.DAT file.

Version as Below .....

$ sched show stat
Node Version Started Jobs Jmax Log Pri Rating
RDTSC v3.0 15-JUL-2009 14:02:18 0 10 5 4 283 <-- Default
RDTSD v3.0 15-JUL-2009 14:02:19 0 10 5 4 261
$

$ show log *sched*log*

(LNM$PROCESS_TABLE)

(LNM$JOB_80CFEEC0)

(LNM$GROUP_000102)

(LNM$SYSTEM_TABLE)

"NSCHED$LOGFILE" = "SYS$COMMON:[NSCHED]DECSCHEDULER.LOG"

(LNM$SYSCLUSTER_TABLE)

(SYSMAN$NODE_TABLE)
$

Rolled over every 90 days ....

Directory SYS$COMMON:[NSCHED]

DECSCHEDULER.LOG;1 File ID: (5215,9685,0)
Size: 9744/9744 Owner: [SYSTEM]
Created: 4-MAY-2009 21:24:59.96
Revised: 15-JUL-2009 14:02:14.56 (5)
Expires:
Backup:
Effective:
Recording:
Accessed:
Attributes:
Modified:
Linkcount: 1
File organization: Indexed, Prolog: 3, Using 2 keys
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 9744, Extend: 500, Maximum bucket size: 2
Global buffer count: 0, No version limit
Record format: Variable length, maximum 156 bytes, longest 0 bytes
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: None
Client attributes: None

Total of 1 file, 9744/9744 blocks.
$

Vermont log file ....not an active log file because of the logical being defined....
Directory SYS$COMMON:[NSCHED.DATA]

VERMONT_CREAMERY.LOG;1 File ID: (1181,2,0)
Size: 288/288 Owner: [SYSTEM]
Created: 10-NOV-2006 11:40:47.78
Revised: 14-NOV-2006 15:28:00.69 (4)
Expires:
Backup:
Effective:
Recording:
Accessed:
Attributes:
Modified:
Linkcount: 1
File organization: Indexed, Prolog: 3, Using 2 keys
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 288, Extend: 500, Maximum bucket size: 2
Global buffer count: 0, No version limit
Record format: Variable length, maximum 156 bytes, longest 0 bytes
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: (IDENTIFIER=*,ACCESS=READ+WRITE)
Client attributes: None

Total of 1 file, 288/288 blocks.
$


Date stamps on VSS.DAT ---- Scheduler database ?


$ dir SYS$COMMON:[NSCHED...]vss.dat/full

Directory SYS$COMMON:[NSCHED.DATA]

VSS.DAT;1 File ID: (4794,8,0)
Size: 2208/2208 Owner: [SYSTEM]
Created: 31-OCT-2007 11:57:54.00
Revised: 15-JUL-2009 15:10:34.71 (7696)
Expires:
Backup:
Effective:
Recording:
Accessed:
Attributes:
Modified:
Linkcount: 1
File organization: Indexed, Prolog: 3, Using 6 keys
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 2208, Extend: 300, Maximum bucket size: 3
Global buffer count: 0, No version limit
Record format: Fixed length 1024 byte records
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:, World:
Access Cntrl List: (IDENTIFIER=*,ACCESS=READ+WRITE)
Client attributes: None

Total of 1 file, 2208/2208 blocks.
$


Output from History command .....(This is all on a system where it works) ...mirror of production ....
$ SCHED SHOW HIST/GROUP=* /COMPLETION=ALL/USER=AUTO_OP/out=a.a
$ type a.a/page

Current Minimum Average Maximum Scale
Elapsed Time : 0.00 0.22 7.96 15.81 Seconds
CPU Time : 0.00 0.10 0.39 0.61 Seconds
Page Faults : 0 1906 4715 5798
Pk Wk Set Size : 0 8192 8224 8496
Buffered I/O : 0 446 1007 1220
Direct I/O : 0 240 839 1132
Mounted Vols : 0 0 0 0
--------------------------------------------------------------------------------
Job : 21 - BACKUP_DB_EDUMP_WKLY Owner : AUTO_OP
Count : 167 SUCCESS and FAILURE Records Combined Server : Local
Earliest Login Time : 11-NOV-2006 17:57:17.25
Last Completion Time : 10-JUL-2009 22:30:37.77
Current Login Time : Job not running

Current Minimum Average Maximum Scale
Elapsed Time : 0.00 0.01 57.21 6610.08 Minutes
CPU Time : 0.00 0.00 1.55 7.28 Seconds
Page Faults : 0 0 3230 3515
Pk Wk Set Size : 0 0 42854 47792
Buffered I/O : 0 0 3072 14952
Direct I/O : 0 0 12518 70813
Mounted Vols : 0 0 0 0
Press RETURN

--------------------------------------------------------------------------------
Job : 21 - BACKUP_DB_EDUMP_WKLY Owner : AUTO_OP
Count : 167 SUCCESS and FAILURE Records Combined Server : Local
Earliest Login Time : 11-NOV-2006 17:57:17.25
Last Completion Time : 10-JUL-2009 22:30:37.77
Current Login Time : Job not running

Current Minimum Average Maximum Scale
Elapsed Time : 0.00 0.01 57.21 6610.08 Minutes
CPU Time : 0.00 0.00 1.55 7.28 Seconds
Page Faults : 0 0 3230 3515
Pk Wk Set Size : 0 0 42854 47792
Buffered I/O : 0 0 3072 14952
Direct I/O : 0 0 12518 70813
Mounted Vols : 0 0 0 0



History command has data on jobs that run fist run in 2006 ...yet VSS.DAT and DECSHEDULER.LOG ...post date this ...
So where is the History command getting its data from ?

Wim Van den Wyngaert
Honored Contributor

Re: DECSCHEDULER

Is it possible that it is exactly what it says ? No run record found = the job was aborted before it started ?

Wim without VMS
Wim
Thomas Ritter
Respected Contributor

Re: DECSCHEDULER

Raven,
we use nsched$:VSS_REPORTS.EXE to produce reports.
Kevin Raven (UK)
Frequent Advisor

Re: DECSCHEDULER

I have been digging and found the following ....

The history stats do indeed come from the DECHEDULER log file ...
Because we use the /summ switch when we roll the log file over ...it passes the run history from the old log into the new log.
Thus even with a log file less than 90 days ...it still knows about job history back in 2006.

The command to produce the history stats worked on the 3rd of July ...
It now falls over when run a wek later on the 10th of July.

It bombs out just prior to reporting on a job that has run over 260,000 times since 2006 (?) ....
So I think we might have hit a magic number between the history command working on the 3rd and not on the 10th ...
greater than 262143 or 111111111111111111....

So I have written a test script to run via scheduler every 1/2 a second ...script is below...
$!
$exit
$!

lol ....

So now waiting to see what happens on a server where the history command works ...when this sched job has run more than 262143 times.
Ian Miller.
Honored Contributor

Re: DECSCHEDULER

you may also want to tidy your VSS.DAT with DB_UTILITY or the new command in V3 which I can't remember at present.

A VSS.DAT of that size will probably cause you performance issues.
____________________
Purely Personal Opinion
cdan
Frequent Advisor

Re: DECSCHEDULER

Same problem here with one job running every 15 minutes, with scheduler version 3.0 (I think the last).
Not solved by db_utility.
I had this problem before and I think it was resolved by a reboot but can't be sure.
VERMONT_CREAMERY.LOG may look dormant but it's not.
I would be happy if there was a way to delete history only for one job.
Kevin Raven (UK)
Frequent Advisor

Re: DECSCHEDULER

Ladies and Gents,
Thanks for all the responses to my issue.
We have found the cause and a resolution.

On one of our test clusters , i have been able to reproduce the problem.
It appears that when the history record for the job show it has run 260,080 times , you get the error message during the sched show history command for the job in question.
You then get the error from then on ....
When you next issue the command to close the schedule log file with the /summ switch , the log file rolls over and all job history are retained in the new log file. With the exception of the job that has the issue. This job has it's count reset to 0. Thus the error then goes away.

In production we roll over the sched log file every 90 days , with the /summ switch.
Thus on at the end of august the error will go away.
They will live with it until then. It's easier than navigating the change and testing process to get the log file rolled over as an exception.

Thanks everyone for the input...
Kevin
cdan
Frequent Advisor

Re: DECSCHEDULER

Raven,
The solution is not working for me.
I already tried scheduler close log/summary and also scheduler close log (without the sum).
The log is not not rolled over and renamed to .old , despite the -I- message that log has been closed.
In the other log file (nsched$:nodename.log) there is an "error renaming log".
I see a possible solution by shutting down scheduler, rename the file manually then start the scheduler. Needs to be tested though.
If history is only needed for manual research , it should be still accessible by sched sho hist /file=vermon_creamery.old

Cheers.
Yves HUDON
Advisor

Re: DECSCHEDULER

Hi,

I don't know if it will help you but there is
a strange log file called : VERMONT_CREAMERY.LOG

When you do a report this file is involved.

Also, it is a good idea to recreate it when you can. Here each day we stop the scheduler and make a cold backup of VSS.DAT file and rename the VERMONT_CREAMERY.LOG ... to VERMONT_CREAMERY.OLD!

When the Cremery is to large, it has an impact on performance !... ;)

Also, each Weekend the VSS.dat Data Base is compressed with the DB_UTILITY.EXE utility after taking a backup of it.

Regards
YH

sys>dir/col=2 disk_ced:[nsched.data]*cream*

Directory DISK_CED:[NSCHED.DATA]

VERMONT_CREAMERY.LOG;1
VERMONT_CREAMERY.OLD;884
VERMONT_CREAMERY.OLD;883
VERMONT_CREAMERY.OLD;882
VERMONT_CREAMERY.OLD;881
VERMONT_CREAMERY.OLD;880
VERMONT_CREAMERY.OLD;879
VERMONT_CREAMERY.OLD;878

sys>dir nsched$data:vss.dat /dat

Directory DISK_CED:[NSCHED.DATA]

VSS.DAT;513 12-SEP-2009 06:46:58.06
cdan
Frequent Advisor

Re: DECSCHEDULER

For some reason, my vermont_creamery.log file was in subfolder [ALPHA]. I guess it has something to do with a change in the logical name NSCHED$ , a search list in which [DATA] was moved to the first position instead of [ALPHA].
After moving the log to [DATA] (using RENAME) the command $ sched close log/summary did not fail anymore, vermont_creamery.log was rolled-over normally and also got rid of the SYSTEM-F-HPARITH error in history reports.
Did'n't even need to stop scheduler for this.

Kevin Raven (UK)
Frequent Advisor

Re: DECSCHEDULER

Our log file in production did indeed rollover as it should every 90 days and fixed the issue.
So for the next 2 years + x months we are error free again .....lol

Based on the fact we are moving to a Solaris/ Java / VCS based solution ...in the next 12 months , this is no longer an issue.


gunners
Frequent Advisor

Re: DECSCHEDULER

Hi Everyone ,

Would anyone know how to change the location of where the VERMONT_CREAMERY.LOG files are created in. ?

So for example if I want to change the location they go from dsa100:{vermont] to dsa300:[vermont] , how do I change this , is there a parameter file ? (struggling for a few hours trying to solve this)

Thanks,

Dave

Hoff
Honored Contributor

Re: DECSCHEDULER

Google is your friend.

 

 "Note: You can set the name of the log    file by modifying a line in the startup file    SYS$STARTUP:SCHEDULER$STARTUP.COM. If    you do not modify this file, the default file name    is NSCHED$:VERMONT_CREAMERY.LOG. This    command file is placed in your SYS$STARTUP    directory when you install DECscheduler."

 

Here's the Google sequence I used.  "VERMONT_CREAMERY.LOG".  First hit in what is returned an index from a DECscheduler manual.  Search in the index page (via Cmd-F here, probably Ctrl-F in your browser to find the string), and locate the index entry "NSCHED$:VERMONT_CREAMERY.LOG", follow that index entry link, and find above text.

gunners
Frequent Advisor

Re: DECSCHEDULER

Brilliant Hoff , I was googling but I didnt get that at all hmmm . Thats a great help I will follow it up myself and see how I go.

Many thanks indeed