1825801 Members
2304 Online
109687 Solutions
New Discussion

Re: VMS Batch Job

 
LM_2
Frequent Advisor

VMS Batch Job

I am running Open VMS 7.2-2, TCPWARE 5.4-3 on a two node cluster (ES40's). The past month I have noticed a batch job which is running but is not getting any cpu time - no error messages - nothing. It has happened once a week for the past month. I stop the queue, put the job on hold - let the other jobs run in the queue and then release the job which was not doing anything - it gets processed fine. While the job is being processed, if I do a show sys the process name is blank and there is still the previous process which I stopped from running. I can not kill the previous process it claims it is in a suspended state. The only way I can get rid of it (it really isn't hurting anything to leave it) is to reboot the system. I have hundreds and hundreds of batch jobs which run through the day fine. It seems it is this one queue and one job. But, to make it even more strange - the job and queue run fine the majority of the time - just once a week this happens - not at any specific time or with any specific file. Someone told me I may have a corrupt job controller file which I find hard to believe since it is not happening to all of my jobs or all of my queues. Has anyone ever seen this - or can give any insight as to what is going on.....??
23 REPLIES 23
labadie_1
Honored Contributor

Re: VMS Batch Job

could you be more precise about the state of your job ? It can be Mutex, RWxxx (Rwast, Rwscs, Rwmpb/Rwmpe,...) . You should instal Amds or Availability Manager (from http://h71000.www7.hp.com/openvms/system_management.html) which is often quite helpful.

Could you post the dcl of the job that is a problem ? Which username does it use ?

Are you up to date about your patches for Vms 7.2-2 ?
LM_2
Frequent Advisor

Re: VMS Batch Job

21A0998C BATCH_2008432 LEF 5 12220 0 00:00:01.65 4113 1250 B

LISA_1> show proc/id=21A07AC7 /cont
%SYSTEM-F-SUSPENDED, process is suspended


I will attach the com procedure
Ian Miller.
Honored Contributor

Re: VMS Batch Job

If the process is suspended that is in SUSP state then it was suspended by a SET PROCESS/SUSPEND command or equivilent system service.

Look at
HELP SET AUDIT/ENABLE
to see if you can enable auditing for process control system services. If you can in your version of VMS then you will be able to get a record of who suspended the process.


stopping the queue will suspend the queue.

If the process is not suspended before you stopped the queue then what state was it in?
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

Try $ show sys/batch to find the status of the job.
Show proc says suspended while often it isn't (but is in rwxxx).

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

Or try :
$ ana/sys
set proc/Id=xxx
show proc
exit

and post the output.

Wim
Wim
LM_2
Frequent Advisor

Re: VMS Batch Job

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------
Process status: 20044013 RES,DELPEN,PSWAPM,BATCH,PHDRES,ERDACT
status2: 00000001 QUANTUM_RESCHED

PCB address 81481E40 JIB address 81791EC0
PHD address 8B0AA000 Swapfile disk address 00000000
KTB vector address 8148212C HWPCB address 8B0AA080
Callback vector address 00000000 Termination mailbox 0000
Master internal PID 000F02C7 Subprocess count 0
Creator extended PID 00000000 Creator internal PID 00000000
Previous CPU Id 00000001 Current CPU Id 00000001
Previous ASNSEQ 000000000001E5F1 Previous ASN 00000000000000F2
Initial process priority 4 Delete pending count 0
# open files allowed left 1012 Direct I/O count/limit 4094/4096
UIC [00001,000126] Buffered I/O count/limit 511/512
Abs time of last event 011C0B6D BUFIO byte count/limit 167760/167952
ASTs remaining 4094 # of threads 1
Swapped copy of LEFC0 00000000 Timer entries allowed left 5000
Swapped copy of LEFC1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 1339
LM_2
Frequent Advisor

Re: VMS Batch Job

Also - before I killed the process/queue - it was in the LEF state
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

May be the job is submitted with low prio while the others are not.

Do show que/full to find out if something like /prio=1 is present. On a heavy loaded system, it won't get the cpu (in contrast with an HP3000 that gives the cpu to all processes).

Next time the job starts, try set proc/id=xxx/prio=4 (may be several times as it seems to be partly suspended) to get it going.

Wim
Wim
LM_2
Frequent Advisor

Re: VMS Batch Job

It actually has a priority of 80
labadie_1
Honored Contributor

Re: VMS Batch Job

your process has DELPEN, so it is in delete pending.
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

It seems the process is waiting for something. It is waiting to be removed but waiting on the completion of something.

Have to see the dcl (and may be the source code). With anal/sys you may find out what it is waiting for but I'm at home without VMS (and my notes).

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

Labadie,

What about the ERDACT ?

Wim
Wim
LM_2
Frequent Advisor

Re: VMS Batch Job

the dcl procedure is on my second post as an attachment
Kris Clippeleyr
Honored Contributor

Re: VMS Batch Job

Hi,

The process is in ERDACT (executive rundown active). Basically (if I recall correctly) waiting for I/Os to finish (I see 2 DIOs and a couple of BIOs left), and open files to get closed.
What about a SHOW PROCESS/CHANNEL within SDA.

Kris (aka Qkcl)
I'm gonna hit the highway like a battering ram on a silver-black phantom bike...
LM_2
Frequent Advisor

Re: VMS Batch Job

Process active channels
-----------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0010 00000000 $1$DKD203:
0020 8142C440 $1$DKC106:[VICTOR.SHARE]EDI_PO_LOAD_FROM
_EXT.EXE;49
0030 811DCF40 $1$DRA0:[SYSCOMMON.SYSLIB]TPUSHR.EXE;1 (
section file)
0040 811DC9C0 $1$DRA0:[SYSCOMMON.SYSLIB]DPML$SHR.EXE;1
(section file)
0050 811DAF00 $1$DRA0:[SYSCOMMON.SYSLIB]CMA$TIS_SHR.EX
E;1 (section file)
0060 811D8B40 $1$DRA0:[SYSCOMMON.SYSLIB]LIBOTS.EXE;1 (
section file)
0070 811D8800 $1$DRA0:[SYSCOMMON.SYSLIB]LIBRTL.EXE;1 (
section file)
0080 81249280 $1$DRA0:[SYSCOMMON.SYSLIB]SQL$INT.EXE;11
(section file)
Press RETURN for more.
SDA>

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0090 811DD5C0 $1$DRA0:[SYSCOMMON.SYSLIB]DECC$SHR.EXE;1
(section file)
00A0 811DA500 $1$DRA0:[SYSCOMMON.SYSLIB]SORTSHR.EXE;1
(section file)
00B0 811DA340 $1$DRA0:[SYSCOMMON.SYSLIB]SMGSHR.EXE;1 (
section file)
00C0 811E6640 $1$DRA0:[SYSCOMMON.SYSEXE]DCL.EXE;1 (sec
tion file)
00D0 811D5500 $1$DRA0:[SYSCOMMON.SYSLIB]DCLTABLES.EXE;
227 (section file)
00E0 81754540 $1$DKD202:[VICDATAEDI.EDI.ARCHIVE.LOG]20
05100407343803_DNA1W_830_ERL3.LOG;1
00F0 8162F7C0 $1$DKD202:[VICDATAEDI.EDI.COM]EDI_LOAD.C
OM;34
0100 813C2F40 $1$DRA0:[SYSCOMMON.SYSLIB]RDBSHR.EXE;11
(section file)
0110 811DAB40 $1$DRA0:[SYSCOMMON.SYSLIB]LIBOTS2.EXE;1
(section file)
0120 811DA2C0 $1$DRA0:[SYSCOMMON.SYSLIB]SCRSHR.EXE;1 (
section file)
Press RETURN for more.
SDA>

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0130 811D9FC0 $1$DRA0:[SYSCOMMON.SYSLIB]LBRSHR.EXE;1 (
section file)
0140 815708C0 $1$DRA0:[SYSCOMMON.SYSLIB]FORMS$MANAGER.
EXE;6
0150 811D8340 $1$DRA0:[SYSCOMMON.SYSLIB]EPC$SHR.EXE;1
(section file)
0160 8150FD00 $1$DRA0:[SYSCOMMON.SYSLIB]FORMS$CIOSHR.E
XE;6
0170 814F8680 $1$DRA0:[SYSCOMMON.SYSLIB]FORRTL_D56_TV.
EXE;1
0180 811EB740 $1$DRA0:[SYSCOMMON.SYSLIB]TIE$SHARE.EXE;
1 (section file)
0190 8149B200 $1$DRA0:[SYSCOMMON.SYSLIB]LIBRTL_D56_TV.
EXE;1
01A0 811DF8C0 $1$DRA0:[SYS0.SYSLIB]DEC$FORRTL.EXE;2 (s
ection file)
01B0 811DBFC0 $1$DRA0:[SYSCOMMON.SYSLIB]EDTSHR.EXE;1 (
section file)
01C0 81042840 $1$DRA0:[SYS0.SYSLIB]DEC$COBRTL.EXE;2 (s
ection file)
01D0 811DE680 $1$DRA0:[SYSCOMMON.SYSLIB]DEC$BASRTL.EXE
;2 (section file)
01E0 813BAAC0 $1$DRA0:[SYSCOMMON.SYSLIB]DBMSHR.EXE;5 (
section file)
01F0 813BAA40 $1$DRA0:[SYSCOMMON.SYSLIB]DBMPRV.EXE;1 (
section file)
Press RETURN for more.
SDA>

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0200 81413440 $1$DRA0:[SYSCOMMON.SYSLIB]MMPRV.EXE;12 (
section file)
0210 81248F80 $1$DRA0:[SYSCOMMON.SYSLIB]SQL$SHR.EXE;10
(section file)
0220 811D6B40 $1$DRA0:[SYSCOMMON.SYSLIB]DTI$SHARE.EXE;
1 (section file)
0230 811DC740 $1$DRA0:[SYSCOMMON.SYSLIB]NCSSHR.EXE;1 (
section file)
0240 81249A00 $1$DRA0:[SYSCOMMON.SYSLIB]RDB$SHARE.EXE;
10 (section file)
0250 813C3540 $1$DRA0:[SYSCOMMON.SYSLIB]RDB$COSIP.EXE;
11 (section file)
0260 811D7D40 $1$DRA0:[SYSCOMMON.SYSLIB]SECURESHRP.EXE
;1 (section file)
0270 811DA8C0 $1$DRA0:[SYSCOMMON.SYSLIB]SECURESHR.EXE;
1 (section file)
0280 81249DC0 $1$DRA0:[SYSCOMMON.SYSLIB]RDMSHR.EXE;12
(section file)
Press RETURN for more.
SDA>

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0290 81360E80 $1$DRA0:[SYSCOMMON.SYSLIB]RDMSHRP.EXE;12
(section file)
02A0 813A7400 $1$DRA0:[SYSCOMMON.SYSLIB]RDMPRV.EXE;17
(section file)
02B0 813BAB40 $1$DRA0:[SYSCOMMON.SYSMSG]DBMMSG.EXE;4 (
section file)
02C0 813BABC0 $1$DRA0:[SYSCOMMON.SYSMSG]DBQMSG.EXE;4 (
section file)
02D0 8124A480 $1$DRA0:[SYSCOMMON.SYSMSG]SQL$MSG.EXE;10
(section file)
02E0 811EAA40 $1$DRA0:[SYSCOMMON.SYSMSG]SHRIMGMSG.EXE;
1 (section file)
02F0 813BA100 $1$DRA0:[SYSCOMMON.SYSMSG]RDBMSGS.EXE;11
(section file)
0300 811EAC00 $1$DRA0:[SYSCOMMON.SYSMSG]TPUMSG.EXE;1 (
section file)
0310 813D3E40 $1$DRA0:[SYSCOMMON.SYSMSG]COSI$MSG.EXE;1
1 (section file)
Press RETURN for more.
SDA>

Process index: 02C7 Name: BATCH_2002318 Extended PID: 21A07AC7
------------------------------------------------------------------

Channel Window Status Device/file accessed
------- ------ ------ --------------------
0320 811E9C80 $1$DRA0:[SYSCOMMON.SYSMSG]DECC$MSG.EXE;1
(section file)
0330 8124A0C0 $1$DRA0:[SYSCOMMON.SYSMSG]SORTMSG.EXE;1
(section file)
0340 81570740 $1$DRA0:[SYSCOMMON.SYSMSG]CXXL$MSG_SHR.E
XE;1
0350 814E6D80 $1$DRA0:[SYSCOMMON.SYSMSG]FORMS$MSGMGRSH
R.EXE;6
0360 813C4D00 $1$DRA0:[SYSCOMMON.SYSMSG]EPC$MSG.EXE;1
(section file)
0370 81427600 $1$DRA0:[SYSCOMMON.SYSMSG]TIE$MESSAGES.E
XE;1
0380 813D3BC0 $1$DRA0:[SYSCOMMON.SYSMSG]RDMSMSG.EXE;10
(section file)
0390 811DB480 $1$DRA0:[SYSCOMMON.SYSLIB]TRACE.EXE;1 (s
ection file)
03A0 81706180 $1$DRA0:[SYSCOMMON.SYSMSG]DBGTBKMSG.EXE;
1
05A0 00000000 Dpnd Busy $1$DKD202:
John Gillings
Honored Contributor

Re: VMS Batch Job

Lisa,

This case needs to be logged with your local customer support centre.

Sounds like you're dropping an I/O somewhere. You'll probably need to crash the system and analyze the dump to determine exactly where.

If it happens again, make sure no one attempts to delete the entry, stop the process or stop the queue. Crash the system and get the dump analyzed.

Note that OpenVMS engineering won't look at OpenVMS V7.2-2, you'd need to upgrade to V7.3-2 and reproduce the problem there to have it elevated.

If you don't want to upgrade, make sure you at least have the latest V7.2-2 patches.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

For the record : ERDACT means Exec mode rundown active.

It seems to be an undocumented process state for sys$getjpi but it is defined in pcbdef.h since at least 6.2 1h3.

Wim
Wim
Jan van den Ende
Honored Contributor

Re: VMS Batch Job

Lisa,

from your Forum Profile:


I have assigned points to 23 of 45 responses to my questions.


Maybe you can find some time to do some assigning?

http://forums1.itrc.hp.com/service/forums/helptips.do?#33

Mind, I do NOT say you necessarily need to give lots of points. It is fully up to _YOU_ to decide how many. If you consider an answer is not deserving any points, you can also assign 0 ( = zero ) points, and then that answer will no longer be counted as unassigned.
Consider, that every poster took at least the trouble of posting for you!

To easily find your streams with unassigned points, click your own name somewhere.
This will bring up your profile.
Near the bottom of that page, under the caption â My Question(s)â you will find â questions or topics with unassigned points â Clicking that will give all, and only, your questions that still have unassigned postings.

Thanks on behalf of your Forum colleagues.

PS. â nothing personal in this. I try to post it to everyone with this kind of assignment ratio in this forum. If you have received a posting like this before â please do not take offence â none is intended!

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
labadie_1
Honored Contributor

Re: VMS Batch Job

Lisa

You should first check your patches applied ( $ set term/wid=132 $ prod sh prod/fu $ prod sh hist /fu)

and apply all the latest patches available for Vms 7.2-2 .

Maybe you just lack a few patches and this is not a huge task.

Check at
http://www8.itrc.hp.com/service/patch/search.do?BC=patch.breadcrumb.main|&pageContextName=openvms::

(you will put Alpha and Vms 7.2-2 on the above link)

Only then, if your problem is not solved, you should upgrade to Vms 7.3-2 + all patches.

Good luck
Ian Miller.
Honored Contributor

Re: VMS Batch Job

You can access the patches directly at
ftp://ftp.itrc.hp.com/openvms_patches/alpha/V7.2-2

unfourtually there does not appear to be a master list.
____________________
Purely Personal Opinion
Wim Van den Wyngaert
Honored Contributor

Re: VMS Batch Job

Lisa,

Also check the process quota with ana/sys show proc (but type several returns until all info is given).

Wim
Wim
Paul Blaney
New Member

Re: VMS Batch Job

Can you find out what this program does, and who scheduled the batch job? This will tell us what type of I/O is in reference to the LEF state, and it is likely closing a channel if this program name is meaningful.

$1$DKC106:[VICTOR.SHARE]EDI_PO_LOAD_FROM
_EXT.EXE;49

Thanks

Paul
Volker Halle
Honored Contributor

Re: VMS Batch Job

Lisa,

the process is being deleted and has an IO outstanding to $1$DKD202:, which does not complete.

A process cannot terminate, if it still has outstanding IOs. So it's stuck. This cannot be an application problem, it must be an operating system problem.

Have there been any errors or unusual events on this device ?

Volker.