System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Creation of a smart scheduler in a HA environment

 
Highlighted
Super Advisor

Creation of a smart scheduler in a HA environment

Hi everyone!

I am looking for solutions and any good pointers to create a smart scheduler.

This is the scenario description:
O/S: SUSE Linux (release not relevant).
Clustering Service in use: Veritas.

1) This is an environment which has about 100 different scripts which primary purpose is to store some statistical data, they must be triggered at different intervals (the following breakdown is estimates):
a) Some need to be run every 5 minutes (say about 50 of them) and not necessarily have to start at the same time (some starting at minute 0,1,2,3,4 for example)
b) Some need to be run every 10 minutes (say about 30 of them) and again they all don't have to start at the same minute of the hour.
c) Some also need to run every minute, but these are minimal, say about 10 of them.
d) Others have to be run every 30 minutes or every hour, and they make the remainder 10.

The systems where these run, of course, have another number of scripts running for other purposes.

Further information:
1) This is a Highly Available environment which makes use of Veritas Cluster Server to make Service Groups available in one of two systems (active/standby), where a specific filesystem is mounted, wherever the particular SG is active.
2) The scripts and their results are stored in a filesystem mounted only in the system with the service group (SG) online, so they are only to run there, but if the SG switches, the mechanism to trigger the scripts also needs to move to the system where this FS is active.

Further:
3) Using crontab does not create the perfect solution because:
a) There are way too many scripts and not only it is becoming unmanageable to maintain but also...
b) We need to create a good load balance by smartly distributing start times of these so that they maximize CPU load (i.e., starting all scripts which are to be run every 5 minutes at minute 0 would certainly not be smart at all, but also distributing them to start at a different minute is not optimal as establishing that load balancing while taking into consideration all other pending processes besides these is a bit of a challenging estimation), also...
c) We need to be able to have a solution which will only trigger these scripts on the system with the active service group. And it also would be perfect to
d) be able to control the start of these scripts so that they get triggered at the 'x' second of a minute and not particularly to try to start them all at exactly the beginning of the minute (even for those which are setup to start at a specific minute)...

Other details which are worth noting:
1) We never want to encounter the situation in which two instances of the same script are running at the same time.
2) When the service group fails over (because the active system failed or for whatever other reason), we want it to immediately recover on the other system -- If not necessarily start at the point where it failed (i.e. not crucial to start again processes which were already running), we need them to start again where it needs at the next interval.

Of course, we also would need to consider a logging mechanism to keep track of the most recent actions performed in order to have an idea of what was taking place during failure, if any.

Any ideas are welcome.
We can also make use of a product called "BMC Control-M for distributed systems" (http://www.bmc.com/products/proddocview/0,2832,19052_19429_23437_1521,00.html) which can be used to trigger one or a few control scripts, however we would not like to go with a solution which makes use of these jobs in order to trigger every script, that would also be a bit of a chaos.

Thanks for any ideas,


MAD
Contrary to popular belief, Unix is user friendly. It's just very particular about who it makes friends with
2 REPLIES 2
Highlighted
Honored Contributor

Re: Creation of a smart scheduler in a HA environment

>>> Using crontab does not create the perfect solution

I still think in crontab as a solution.

>>> There are way too many scripts and not only it is becoming unmanageable to maintain but also...

Maybe not, you have scripts that can be "groupped" according to they run interval. You could create a "trigger" script for each group, insert only that entry in cron, and the "trigger" script run start the other scripts.

>>> We need to be able to have a solution which will only trigger these scripts on the system with the active service group.

In this case, you must create a service group with a resource that at start/stop, adds and removes the entry in cron.

>>> be able to control the start of these scripts so that they get triggered at the 'x' second of a minute and not particularly to try to start them all at exactly the beginning of the minute

Talking about simplicity?

>>> We never want to encounter the situation in which two instances of the same script are running at the same time.

The trigger script for the group, could start the jobs sequentially instead of all at once, you can also create dependencies using shell "&&" for example. Also, you can create some lock file to avoid multiple runs, as the used in /var/run or /var/lock.

>>> When the service group fails over (because the active system failed or for whatever other reason), we want it to immediately recover on the other system

As said before, it must be a cluster resource that adds and removes the entries from cron.


Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Highlighted
New Member

Re: Creation of a smart scheduler in a HA environment

It sounds like something simpler based on cron could accomplish most of the basics of what you need.

Anything more complicated will make your life more difficult when troubleshooting.

Here are some suggestions, based on the fact that you have [a] filesystem(s) that would only be active on one node at a time:

- organize your wrappers based on frequency, in their own directories on the filesystem that's part of the service group, e.g. create cron.05min, cron.10min, cron.30min (this is basically similar to the grouping idea that the previous poster suggested)

- put your 5-min scripts in cron.05min, 10-min scripts in cron.10min, etc. Name them like init scripts, e.g. 001stats1, 002stats-script2, etc. so that they have some natural sort order when doing an ls in the dir, and imply a sequencing order for discovery and execution by your wrapper.

- write a wrapper, e.g. run-my-jobs that takes at least one argument -- the dir to scan for scripts in each dir, e.g. run-my-jobs /path/to/foo/cron.05min) and runs those scripts. This is similar to how systems like RHEL have a run-parts script and directories such as /etc/cron.daily, /etc/cron.weekly, and so on. Incorporate whatever logging/error checking you want in your wrapper.

- install this static cron wrapper script on *both* systems, with some condition that checks for some dir/file on the filesystem where your code resides ( [ -d /path/to/foo ] && run-my-jobs /path/to/foo/cron.05min). This means these scripts will always run on both nodes at 5/10/30 min intervals, but silently exit on the inactive node. It keeps maint. simple. You could make it more sophisticated and additionally check some config/flag in those dirs to check which node is "active" and only run there.

This is how I do things in many cases that fit this model, but in addition each script has a common format and common return codes so the wrapper can interpret errors correctly. If you want sophisticated load-balancing and/or parallelism, build that into your wrapper script. For example, your wrapper could use the naming convention of your scripts to imply that two scripts with the same starting sequence no (100foo 100bar) would run at the same time. Many possibilities ...

By starting with this sequential technique, you at least have a simple, stable starting point from which to evolve.

--
VK