Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

job is executing in node2 in cluster.

shiva27
Frequent Advisor

job is executing in node2 in cluster.

Can someone help.

I'm managing one VMS cluster having 2 nodes(say node1 & node2) and defined one scheduler job which is runnig everyday in last 2 month and collect/generate the user profile listing with name node1.txt and send this file to remote server thru SCP with name node1.txt.

Suddenly this job generating this file with name node2.txt instead of node1.txt and sending file with name node2.txt.

There is no changes happend into system.

OS : OpenVMS V8.2

11 REPLIES
Steven Schweda
Honored Contributor

Re: job is executing in node2 in cluster.

> [...] one scheduler job [...]

Do you mean a batch job? As in SUBMIT?

> Suddenly this job [...]

So, if I understand this, you have a command
procedure, which you're hiding, you run it
in some unknown way, and you want someone to
tell you why it does what it does?

Good luck. My psychic powers are too weak
for me to guess what's happening, based on no
useful information.

> There is no changes happend into system.

Even knowing nothing, I'd guess that
_something_ has changed, or else you wouldn't
be here asking this question.
shiva27
Frequent Advisor

Re: job is executing in node2 in cluster.

Steven,

1.we are using CA scheduler product to schedule the jobs.
2.we are using Generic command procedure to collect the user listing profile and .txt file will create by this command procedure. It check the server name by cmd->
node = f$getsyi("nodename")in command procedure.

3.Till yesterday job was creating file node1.txt,but now the output file it is showing node2.txt.

My Question:
===========
As the job is running in node1 then why is output file node2.txt creating. Both are in cluster.
If you need more information let me know please. Thanks.
Jon Pinkley
Honored Contributor

Re: job is executing in node2 in cluster.

The most likely thing is that the command procedure is being run on node2 instead of node1 as it did the last two months. What evidence do you have that it is running on node1?

I have never used CA scheduler, but with VMS queues it is possible to create generic queues, which can then run on any execution queue specified for the generic queue. Most likely, CA scheduler has the same capability to schedule a job on any node. For many things, it is not important what node a job runs on, therefore for availability, scheduling a job on a generic queue is can make it more likely that a machine to run on will be available.

If you have accounting enabled, you can see where the job actually ran.

If you want the job to run on a specific node, you will have to read the CA scheduler documentation to see how that can be done.

Just because it happened to pick node2 to run on is no guarantee that it will always pick that node. Perhaps the load has changed and the scheduler decided it was best to run the job on node1 the first two months, but now it thinks node2 is better.

Jon
it depends
Steven Schweda
Honored Contributor

Re: job is executing in node2 in cluster.

> node = f$getsyi("nodename")in command procedure.

That should work correctly. (I'll assume
that the file name is created using this
"node" variable.)

> As the job is running in node1 then why is
> output file node2.txt creating. Both are
> in cluster.

_If_ the job were running on node1, then I'd
expect "node" to be "node1". If it comes out
as "node2", then I'd tend to believe that the
job is really running on node2. Do you have
any good reason to believe that it's really
running on node1?

Have you looked at what's happening where
while the job is running?

I know nothing about "CA scheduler", but it
might be smart enough to run a job on any
node in the cluster. Perhaps it looks for
the node with the most free time, and node1
is now busier than it was before.

Step1: Find out where the job actually runs.
shiva27
Frequent Advisor

Re: job is executing in node2 in cluster.

I've executed scheduler job and automatically it is running from NODE2.
message coming:JOB completed from NODE2.

But again, Why job is running from NODE2, as it should work from NODE1 as it was.

Note:This is production server we can't restart the scheduler.
Robert Gezelter
Honored Contributor

Re: job is executing in node2 in cluster.

Shiva,

It may sound like semantics, but the question is: Where did the job run? It is not: Where did it report that it ran?

Check the actual log file and accounting logs to determine where the job actually executed.

One may ask: How could this happen? The simple answer is that I have seen various jobs which take their node name as a parameter (which is fixed at the time of submission) rather than determining it using the DCL lexical function F$GETSYI or the analogous system service or RTL calls. This leads to incorrect file names and misleading messages.

Since the actual command procedure is also a mystery (at this time), there is no way to know if the message reporting execution on NODE2 is correct or incorrect.

- Bob Gezelter, http://www.rlgsc.com
Jan van den Ende
Honored Contributor

Re: job is executing in node2 in cluster.

Shiva,

The info I am missing is a full specification of the queue.

In a cluster environment, it is usual to specify that a queue may execute on multiple (by default: ALL) nodes.
At a certain moment in time it executes on a certain node, but for various reasons the queue manager may fail queues over to another node.
This MIGHT have happened here also.
Please provide the output of a
$ SHOW QUEUE/FULL
and we might rule this out, or call this a reasonable explanation.

Proost.

Have one on me.

jpe
Don't rust yours pelled jacker to fine doll missed aches.
Steven Schweda
Honored Contributor

Re: job is executing in node2 in cluster.

> But again, Why job is running from NODE2,
> as it should work from NODE1 as it was.

Why, exactly, should it run on any particular
node? When did you it where to run?

Don't tell me how it "should work", tell the
fellow (or program) who runs the job.
Volker Halle
Honored Contributor

Re: job is executing in node2 in cluster.

shiva27,

if you have a homogenous cluster, it should not matter, on which of the node a job may run. If it is a requirement for that job to be run on a specific node in the cluster, you may need to specify this requirement in the scheduler job definition by e.g. including something like /NODE=xxx

Volker.
Volker Halle
Honored Contributor

Re: job is executing in node2 in cluster.

Check the /CLUSTER_NODE qualifier for the various job-related commands

Volker.
John Gillings
Honored Contributor

Re: job is executing in node2 in cluster.

Shiva,

>But again, Why job is running from NODE2,
>as it should work from NODE1 as it was.

My guess is you have a generic queue which feeds execution queues on each node. This is a very common configuration. In a cluster it usually doesn't matter on which node a job executes, so there's a benefit in distributing the load between nodes.

Further, I'd guess that your particular job happens to have executed on NODE1 more often than on NODE2. This could be load related, or just a "phase of the moon" thing. For whatever reason, and regardless of how long it's been that way, it's now unexpectedly landed on a different node. Not a lot of point in trying to work out why, just concentrate on correcting the perceived misbehaviour.

Now, if the job really is dependent on running on a particular node (which some of us would consider a bug!), you have two broad options. First, fix the job so it's independent of the node it runs on, and second, fix the scheduling mechanism to specify the dependency, ie: force it to run on a specific execution node on your required node.

You'll need to consult the CA Scheduler documentation to determine how to specify a specific execution queue.
A crucible of informative mistakes