- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Multi-threading and Batch Queues
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 06:55 AM
07-25-2006 06:55 AM
Multi-threading and Batch Queues
The same jobs never fail if I have compiled the executable with /THREADS_ENABLE=UPCALLS or without the /THREADS_ENABLE linker switch at all. Only when I only specify /THREADS_ENABLE which will give me two kernel threads as I am on a two-cpu Alpha machine (OVMS 7.2-2).
I also run many single file jobs to batch queues using 'enter_file' and 'synchronize_job' and never have a failure on those with full multi-threading.
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:06 AM
07-25-2006 07:06 AM
Re: Multi-threading and Batch Queues
welcome to the OpenVMS ITRC forum.
Do these batch jobs have any batch logfile ? If so, what kind of error message do you find at the end of the logfile ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:16 AM
07-25-2006 07:16 AM
Re: Multi-threading and Batch Queues
Unfortunately, I only get the failures when I run the test repeatedly (kind of a stress test). In this case, the log file that corresponds to the job may be from a previous run. The log file does not have any error data in it, if it is related to this occurrence. I am working on some modifications which will uniquely name the log file so I can tell if it corresponds.
I also get this failure, same scenario, on the same job types, on OVMS 8.2 on an IA64 machine.
I have studied the code and application logs that are produced and don't see anything 'deleting' the jobs. And it happens very intermittenly, If I re-run the same job, it usually completes normally.
I will re-post once I have run with unique log names. If the return code is accurate, the job is not running, so there shouldn't be a log file!
Thanks
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:17 AM
07-25-2006 07:17 AM
Re: Multi-threading and Batch Queues
I am not completely clear on your situation.
Your post implies that the batch jobs are created by some form of submission program. Is that submission program the program that you are compiling?
Otherwise, I have similar questions to Volker.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:27 AM
07-25-2006 07:27 AM
Re: Multi-threading and Batch Queues
The executable is part of an Agent that performs tasks on behalf of a scheduler. The agent can be sent interactive tasks (jobs or commands) or tasks that can be submitted to batch queues.
I use sys$sndjbc (create_job, add_file, close_job) to submit the multi-file jobs to the batch queue. I then use sys$sndjbc (synchronize_job) with an AST program to let me know when they are complete and what the job status was.
One thread creates and submits the jobs, the other thread watches for job completion and gathers up the results, logs, etc.
Everything works well (even under my stress testing where I continuously re-run the same jobs) unless I have turned on full multi-threading. I get better throughput with full multi-threading, but I get the occasional failure mentioned above.
Never fails with just UPCALLS, and I get some throughput improvement, but would like to be able to use true full multi-threading!
Cheers,
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:28 AM
07-25-2006 07:28 AM
Re: Multi-threading and Batch Queues
While it is a good idea, it is not necessary to create unique log file names.
The version numbering (together with the creation date/time) allow you to sort out which is which.
Make sure that you have limited the version numbering.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 07:34 AM
07-25-2006 07:34 AM
Re: Multi-threading and Batch Queues
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 10:51 AM
07-25-2006 10:51 AM
Re: Multi-threading and Batch Queues
There is not a log file created when the status returned is 295124, so the job apparently is being deleted. I still don't understand why these jobs are being deleted intermittently only when they are being submitted by a fully enabled multi-threaded application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 06:07 PM
07-25-2006 06:07 PM
Re: Multi-threading and Batch Queues
If I am understanding you correctly, then the problem is happening as you submit the job.
You may want to package up the source code, and contact HP support. Be explicit that the problem appears:
- intermittently
- appears to be connected to your use of /THREADS_ENABLE
- Bob Gezelter. http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 06:09 PM
07-25-2006 06:09 PM
Re: Multi-threading and Batch Queues
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 06:11 PM
07-25-2006 06:11 PM
Re: Multi-threading and Batch Queues
to determine, who is deleting those jobs prior to execution, you could set an ALARM ACE on the batch queue:
$ set secu/acl=(alarm=security,acc=delete+success)/class=queue batch-queue-name
$ REPLY/ENABLE=SECURITY
You would get an OPCOM security alarm, if a job in that batch queue gets deleted.
You can delete the ACE using the same command as above and adding /DELETE
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-25-2006 08:23 PM
07-25-2006 08:23 PM
Re: Multi-threading and Batch Queues
a) the entry number
b) the queue name + entry number
c) the queue name + job name
If the answer is c then I suggest that you modify your program to uniquely name the job (this will also give you a unique log file)
help /sync
SYNCHRONIZE
Holds the process issuing the command until the specified job
completes execution.
Requires delete (D) access to the specified job.
(I've always wondered why it needed this)
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 01:01 AM
07-26-2006 01:01 AM
Re: Multi-threading and Batch Queues
I'd try queuing the multi-file jobs as held, then have the thread that does the synchonize release them after queuing synchonize_job ast.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 03:00 AM
07-26-2006 03:00 AM
Re: Multi-threading and Batch Queues
as your program is ONLY (intermittently) failing, if it's running as a truely multithreaded process, this seems to imply, that there is some synchronization issue in your program OR the called system services...
Did you try alarm ACE to determine, who actually deletes the batch job entry before it's execution ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 03:12 AM
07-26-2006 03:12 AM
Re: Multi-threading and Batch Queues
Wim, I have checked some logs and found no indication. Perhaps I am not checking all the logs (or the right one).
Volker, I am working on the ACE setup and re-run now. Will definitely let y'all know what that tells me. I have not used this facility before, will I get messages on the terminal where the commands were issued or will information show up in the logs?
Phil, I use queuename + entry number. I have double (and triple!) checked my code to make sure that I am not inadvertently deleting the wrong job at times. My application logs show no indication that that is occurring (and it only happens when full multi-threading is enabled).
DAvid, I check the return codes on the close and it is returning a success return code.
Thanks again, I will update later with results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 03:21 AM
07-26-2006 03:21 AM
Re: Multi-threading and Batch Queues
to receive the security alaram, you enable your terminal as an operator terminal with $REPLY/ENABLE=SECURITY (needs SECURITY and OPER privilege). You'll then receive security alarms on this terminal/session.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 08:46 AM
07-26-2006 08:46 AM
Re: Multi-threading and Batch Queues
I appreciate all the suggestions and ideas and have learned a few things, but everything seems to point at a synchronization problem in OVMS itself. I had one job that seemed to be hung for over 2.5 hours and then ended with the 295124 status!
Sometimes my job streams will run for long periods of time with no problems, and sometimes it fails on the first pass.
the output from the Alarm ACE does not give me data that I can correlate with the jobs (I don't have access to the batch PIDs and the ACE alarms don't give me the entry numbers).
%%%%%%%%%%% OPCOM 26-JUL-2006 15:34:31.07 %%%%%%%%%%%
Message from user AUDIT$SERVER on TEST1
Security alarm (SECURITY) on TEST1, system id: 1025
Auditable event: Object access
Event time: 26-JUL-2006 15:34:31.07
PID: 0000040D
Source PID: 0000ACCA
Username: QA
Process owner: [QA]
Object class name: QUEUE
Object name: SYS$BATCH
Access requested: DELETE
Privileges used: BYPASS
Status: %SYSTEM-S-NORMAL, normal successful completion
$
%%%%%%%%%%% OPCOM 26-JUL-2006 15:34:32.75 %%%%%%%%%%%
Message from user AUDIT$SERVER on TEST1
Security alarm (SECURITY) on TEST1, system id: 1025
Auditable event: Object access
Event time: 26-JUL-2006 15:34:32.75
PID: 0000040D
Source PID: 0000A8CA
Username: QA
Process owner: [QA]
Object class name: QUEUE
Object name: SYS$BATCH
Access requested: DELETE
Privileges used: BYPASS
Status: %SYSTEM-S-NORMAL, normal successful completion
The 'Source PID' of A8CA is what I would expect (my program), but I can't find an ACCA in the system. The ACCA seems to be a display problem as it alternates with A8CA and in different runs with different processes for my program, it also reported a bogus PID that was x'400' higher than the one I expected (exactly x'400' on three different runs!).
I have my IT support group looking into reporting this as a problem in sys$sndjbc (create, add, close). The creation of a single file job with 'enter_file' never fails and it does the 'create, add, close' under the covers in one call.
Thanks again for all the help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 10:58 AM
07-26-2006 10:58 AM
Re: Multi-threading and Batch Queues
When you do the SJC$_CREATE_JOB call the job is created and assigned an entry number. It is considered an "open" job at this point. It will not show up with SHOW QUEUE or SHOW ENTRY. However SYNCHONIZE/ENTRY= and DELETE/ENTRY= and their equivalent $SNDJBC calls will work.
The SJC$_CLOSE_JOB call places the job on the batch queue so it is eligible for execution. There must first be at least one SJC$_ADD_FILE call.
If a process issues a new SJC$_CREATE_JOB before the SJC$_CLOSE_JOB then the old open job is deleted and its final job status, returned via a SJC$_SYNCHRONIZE_JOB call, will be 295124 (%JBC-F-JOBDELETE, job deleted before execution).
I suspect that this is what VMS believes is happening for some reason, perhaps due a VMS kernel thread synchronization bug.
I would not allow the job's final status to be returned in the I/O status block of the SJC$_SYNCHRONIZE call. Instead use the output item code SJC$_JOB_COMPLETION_STATUS so it is returned in a separate longword. Although that item code is not documented until the VMS 7.3 System Services manual, it works at least as far back as VMS 6.2. One silly problem with letting the batch job's final status be returned in the I/O status block is that if the batch job does an "$EXIT 0" then the $SNDJBCW call with SJC$_SYNCHRONIZE will never complete.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 11:10 AM
07-26-2006 11:10 AM
Re: Multi-threading and Batch Queues
thanks for the comments, but I have double checked all those points! I have log entries in the Create, add, close and synch calls and I even use AST in each call to make sure it is complete before I return to the caller. I have pored over the documentation for sndjbc to make sure I am following all the rules.
I had one job that was 'active' for over 2.5 hours with lots of other batch jobs being created, running and deleted in between before it failed with a 295124.
It appears to me that there is a synchronization problem in the create, add_files, close, synch processing that does not exist in the enter_file, synch processing which leads me to believe it is in the create, add, close processing.
But again thanks, that is why I posted the question to the forum to see if someone could come up with something I hadn't checked!!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 05:59 PM
07-26-2006 05:59 PM
Re: Multi-threading and Batch Queues
please note that the different Source PIDs reported will most likely be the PIDs of the 2 threads in your process !
Different threads of a single process have different PIDs, although you'll never find the PID of the 2nd thread in ACCOUNTING or similar. Just have a look at your running multithreaded process with
$ ANAL/SYS
SDA> SHOW PROC/THREADS/ID=
SDA> EXIT
This proves, that the explicit (or implicit) DELETE operation seems to happen from both of your threads. Is this intentional ? If so, how do you guarantee, that a SJC$_CREATE_JOB does not happen, if another SJC$_CREATE_JOB operation is still pending ?
Could it be that the SJC$_CREATE_JOB and other operations affecting this 'open' job run in different threads ?
Mixing ASTs and multithreading may be problematic in some cases.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-26-2006 08:44 PM
07-26-2006 08:44 PM
Re: Multi-threading and Batch Queues
The PIDs are for the kernel threads, the thread scheduler decides each time a process thread becomes executable which kernel thread it will execute on. Therefore the same thread can originate requests from different PIDs at different times. I have programs that use mailbox communication which had to be modified to accommodate a varying sender PID in the IOSB.
I'm not sure if $ICC communication has the same problem, but perhaps the shifting pids aren't caught properly in all the needed places in the queue manager.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2006 02:25 AM
07-27-2006 02:25 AM
Re: Multi-threading and Batch Queues
The other thread monitors for the completion of the job (the flag set by the AST of the synch),gathers up the job statistics, logs, etc. and deletes the job.
Obviously, the threads themselves can bounce between physical CPUs and KThreads, but they never cross the functional lines described above. I have added __MB() (memory barrier) calls where it seemed appropriate to make sure the memory accesses/updates were synchronized across the physical CPUs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2006 02:39 AM
07-27-2006 02:39 AM
Re: Multi-threading and Batch Queues
For multiple kernel threaded processes, the queue manager uses the PID of the initial thread to validate the client requests. So there can only be one open job outstanding among all the kernel threads of the process. The documentation for SJC$_CREATE_JOB, as stated previously, mentions "if a process already owns an open job, that job is deleted" which is the problem you are encountering.
There is no $SNDJBC context or input item code for the SJC$_ADD_FILE and SJC$_CLOSE_JOB functions to tell the queue manager to which open job the operation belongs. This is one reason why the restriction exists.
If you really need this functionality, a request can be made to enhance the queue system to support multiple open jobs. If you would prefer to handle the issue in the application code, a request can be made to the documentation group to explicitly state this behavior.
Thank you,
Dave Sweeney
OpenVMS Engineering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2006 02:46 AM
07-27-2006 02:46 AM
Re: Multi-threading and Batch Queues
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-27-2006 03:16 AM
07-27-2006 03:16 AM
Re: Multi-threading and Batch Queues
For multiple kernel threaded processes, the queue manager uses the PID of the initial thread to validate the client requests. So there can only be one open job outstanding among all the kernel threads of the process
Could you please elaborate a little more ?
Could it be like this: assuming that the process sends a 'create_job' running on one kernel thread and then the PTHREAD thread gets re-scheduled onto another kernel thread and then sends the 'add_file' or 'close' information, the queue manager sees the different PIDs and will abort the transaction ?
Just speculating about a possible scenario...
Volker.