queues+clustering

nipun_2 · ‎04-20-2005

Hello all,
I recently added a 3rd node (EV68-ds25alpha station) to a common environment cluster running 7.3-1. (for all of you who were answering my previous questions of blue screen - it needed graphics drivers for the new machine)

So currently we have
DS25-main node
XP1000- satellite node
EV68 - Satellite node (new)

Now when I can clearly see the decwindows and software applications on the New node EV68. However, when I run evaluations(other such high processing applications) on EV68 it does not use it's own CPU but of the other two nodes?

I saw that when I looked at Monitor cluster command

Can anyone please guide me as to how I can check the current queing setup and how I should add my node to it. Please mention the *.com files as well as their path if possible.

Thanks in advance
Nipun

Uwe Zessin · ‎04-20-2005

Hello Nipun,
it's great to hear that you got your system working.

To get an overview over all queues:
$ show queue *

If you're interested in batch queues only:
$ show queue /batch *

For details, add the /FULL qualifier and if you like to see all jobs from all users use the /ALL qualifier.

I usually use a queue name that has the nodename as part of the queue name, e.g.:
$ initialize/queue/batch/job_limit=2 athena_batch /on=ATHENA::

Then, as part of the system startup (I put it in SYS$MANAGER:SYSTARTUP_VMS.COM):
$ start /queue athena_batch

Jobs can easily sent to this queue with:
$ submit /queue=athena_batch job.com

.

Wim Van den Wyngaert · ‎04-20-2005

Also check if the failover of the queue manager is OK. If not, a node failure may stop all queue activity.

$ sh que/man/fu
should show all 3 servers in the /on part.

To correct :
$ start/que/man/on=(n1,n2,n3)
or
$ start/que/man/on=(*)
but if you have cluster stations or quorum stations the first format is better.

Wim

Wim

Joseph Huber_1 · ‎04-20-2005

In general You should have at least on every cluster manager one dedicated queue, which has /(AUTOSTART_)ON=mynode. This way You can have whatever programs running on this node.
You need such a queue e.g. for jobs fired from sys$startup_vms.
My practice then is to have a /GENERIC queue, which includes all of these node-specific queues:
$ init/queue/batch/generic=(nodea_batch,nodeb_batch,nodec_batch) sys$batch

Then jobs submitted to the generic queue will run on any node selected by the queue manager.
This generic queue can of course have any name other than sys$batch: it is just convenient to have a sys$batch queue for submit without a /QUEUE=specific.

On the other hand You may choose to have jobs by default executing on the node where it is SUBMITted, then define a /system logical SYS$BATCH pointing to the node specific queue in sys$startup_vms:
$ define/system sys$batch 'f$getsyi("NODENAME")'_BATCH

http://www.mpp.mpg.de/~huber

Ian Miller. · ‎04-20-2005

I basically do what Joseph does - queue called nodename_BATCH and define SYS$BATCH on each system to point to it. Often there are other specific purpose queues e.g queues with a job limit of 1 to co-ordinate jobs and queues with different working set limits or base priorities and so on.

____________________
Purely Personal Opinion

Joseph Huber_1 · ‎04-20-2005

To correct a typo in my previous respone:

on every cluster manager one dedicated
should of course read as:
on every cluster MEMBER one dedicated

http://www.mpp.mpg.de/~huber

Wim Van den Wyngaert · ‎04-20-2005

I would say cluster SERVER. Stations and quorum stuff excluded.

Wim

Joseph Huber_1 · ‎04-20-2005

Wim, well it's just personal taste:
I have a dedicated queue on every member node in the cluster: at system startup I submit some utility/package startup in a batch job to make system startup shorter. Especially on a workstation: don't have to wait longer until I can login.

http://www.mpp.mpg.de/~huber

Wim Van den Wyngaert · ‎04-20-2005

Jos : ignore my comment. I read to fast.

Wim

Jan van den Ende · ‎04-21-2005

Nipun,

maybe just a matter of taste, but personally I do not like the idea of SYS$BATCH as name or alias for a clusterwide queue.
And here is why: during bootstrap it is just too convenient to KNOW that any job submitted withit explicit queuue specification will run ON THE BOOTING NODE.
Otherwise you will have to check (and re-check at every new release and new software product) that there is NOT any default submit command!

That is the reason I 100% agree with Ian: on each node a /SYSTEM logical name SYS$BATCH for the queue that is bound to that node.

OTOH, the concept of dedicated queues can (maybe; should) be applied generously.

We run multiple applications in the cluster, and each application has at least one queue. The queue is owned by the application's Resource Identifier, and resp. the application managers have management control over their queues.
A very pleasant way to delegate a lot of standard work to those that have the functional application knowledge, so now we get only the really technical issues.

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Wim Van den Wyngaert · ‎04-24-2005

During boot, I create/adjust 2 queues.

The first is called node$startup and has a job limit of 99. It is NOT a autostart queue and can be used during the boot to start things. The queue is emptied in the beginning of the boot.

The seconds one is node$batch with a job limit of 3 to which I map sys$batch. It is an autostart queue and everything that is started in it is only started when the boot is complete, t.i. the last thing I do during startup is enable autostart. It is not emptied during boot.

If an applications requires timely execution of a job, it must create its own queue. If not, it can use sys$batch but the queue may be busy and the job is delayed.

The advantage is that all applications queues are autostart queues and only start doing things after the boot, thus not delaying the boot. Also, the jobs start only after the boot, so their environment should be present.

And all my queues are retain=error. Don't understand why it's not the default.

Wim

Wim

Uwe Zessin · ‎04-24-2005

A self-resubmitting job that ends with an error status every time can fill up the queue file.

.

Wim Van den Wyngaert · ‎04-24-2005

Uwe,

Yes but why are they present ? We also had jobs that terminated in error but we corrected them all. Only in case of exceptions, they may or must terminate in error. And we monitor error entries, of course.

Wim

Wim

Uwe Zessin · ‎04-24-2005

What works good for you does not work well for somebody else and engineering had to make a decision. Of course, mine's only a guess - it is quite possible that today nobody can tell on what base the decision was made many years ago.

.

Anton van Ruitenbeek · ‎04-24-2005

Nipun,

Possibly I'm missing the begin of the discussion, but I don't understand what you mean with 'main node' and 'satellite node'.
Don't you have a CI/SCSI/SAN-cluster but do you have LAVC idea of a cluster ?
Other words: Do you have a (boot)node and the other nodes are satelites ?

I know this is not the major reason for difference setting up, but the QUEUE/MANAGER can't be clusterwide if you have a LAVC sort of.

To be honest: I think the solutions (what ever you choose) is given in one of the previous postes.

AvR

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !

Joseph Huber_1 · ‎04-24-2005

Anton,
who/what manual told You in a LAVC there can't be a cluster-wide queue management ?
Must be a on a different VMS planet ...

http://www.mpp.mpg.de/~huber

Karl Rohwedder · ‎04-24-2005

May be, he meant, that is not useful to start the queuemanager on satellite nodes, when the queuefiles reside on the bootnode...

regards Kalle

Anton van Ruitenbeek · ‎04-24-2005

Joseph,

I was on planet Eath.

Thanks Karl.

I ment if you realy have satelites (in the contents of LAVC) these satelites don't have any disks (only for paging/swapping) localy. For NI clusters there is a difference. So if the queue manager is running and the bootnode(s) are gone (and in this way the systemdisk(s) (and if exists the cluster comman data disk) are gone), and the queuemanager is running on all the node, the queuemanager maybe will work but not as expected.
And as good VMS manager, if something will work but cannot be garenteed: IT IS NOT WORKING (properly....)

AvR

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !

Jan van den Ende · ‎04-24-2005

Anton wrote:

if something will work but cannot be garenteed: IT IS NOT WORKING (properly....)

Can someone make a nice tune for this, and turm it into a mantra that needs to be sung 3 times by _EVERY_ IT worker AND MANAGER before each working day?
.. even reading it once will probably be enlightening for most managers..

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Uwe Zessin · ‎04-24-2005

I think that argument is flawed.

If your boot node is down, then your satellite nodes can't do any work anyway while they stall.

So you can't use any satellites with a single boot node, because it is not *guaranteed* that they will be working 'properly' all the time ;-)

.

Joseph Huber_1 · ‎04-24-2005

Yes o.k., Karls rule applied:

in a (homogenous) LAVC cluster, run only one queue manager, and only on the node where the qman database resides.

Of course there is a problem if this node is down, but in a LAVC cluster there is always this problem for nodes holding common files (sysuaf,rightslist,qman,...).

But if You have a LAVC, then You know it is not disaster tolerant, and the node(s) serving the common disk(s) have to be present (and votes/quorum should be set so that the cluster blocks until this/those nodes are back).

http://www.mpp.mpg.de/~huber

Anton van Ruitenbeek · ‎04-24-2005

Joseph,

You can actualy change youre LAVC to an NI/LAVC environment. Shadow the systemdisk over two or tree bootnodes and you be a stap ahead for SPOF.
If you do the shadow also for the datadisks its starting te become a nice system. The hardware is there (VAXes, Alpha's, network), only you need to buy harddisks. These aren't so expensive anymore.

AvR

NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !

Uwe Zessin · ‎04-25-2005

How do you do this with locally attached boot disks? When the system boots, the disk is not yet a shadow set member.

.

Joseph Huber_1 · ‎04-25-2005

Anton,
no, shadowing in a LAVC does not at all remove
the single point of failure (the common system disk), it just makes recovery easier.
The common system disk may have as many shadow copies as wanted, if it fails,the cluster blocks until the system disk comes back.
In case one wants the cluster to switch to a shadow disk, all members booting from the failed disk have to be shutdown to reboot
from the shadow disk. And shutdown here means a brutal halt of all nodes, because a gracefull dismount and shutdown will not work in such a situation.

So for the qman question shadowing will not change anything, in general yes, shadowing helps a lot in disaster recovery, but that's beyond this thread.

http://www.mpp.mpg.de/~huber

Joseph Huber_1 · ‎04-25-2005

Sorry, forget my previous reply, typing too fast :-)

Yes, a network-interconnect-only cluster works with shadowed disks and several boot boot servers serving the shadow members.

(I was thinking too much in my current situation, where LAN traffic and geometry excludes such a perfect solution ...).

http://www.mpp.mpg.de/~huber

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

queues+clustering

queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering

Re: queues+clustering