The Kick Ass Guide to Creating Nodejs Cron Tasks

Every app needs some tasks to run periodically on a schedule. It could be scheduling a daily backup of the database, sending out a weekly newsletter or performing some computationally intensive tasks that need to occur asynchronously. Usually cron is initiated by an external service that calls the app at an appropriate path (e.g. example.com/cron) every so often. Most linux distributions ship with a scheduling system called, what else, "Cron" that can be used to schedule commands to call your web app (we'll go into details of how later on) so this is pretty easy to do.



Cron happens
The obvious solution to design a cron system is to create an Express route to handle incomming cron calls e.g.

Express routing example

    
      app.get('/cron', function(req, res){
        sendNewsletter();
        doBackup();
        processImages();
      });
    
  


Now, if you're like me, you're starting to squirm uncomfortably a bit when you see a solution like this. First of all, each time cron gets called, all the tasks (sendNewsletter, doBackup, processImage) are executed. There is no way to put each task on its own schedule. In most production systems, cron gets called at least hourly (quite a few have cron called several times an hour) and your users would probably get a little upset if they got the weekly newsletter sent to them two dozen times a day!

Secondly, all the cron tasks are hard-coded into the solution. There is no way to add new or remove existing tasks except to re-write the code. This violates one of my coding tenets; functions and methods should be flexible and easily extensible.



Cron time
Knowledge prerequisites
For the purposes of this exercise, I am going to assume you are fairly comfortable with the Express framework. I've said it before and I'll say it again, if you are building apps with Nodejs, you will be much better off using the Express framework.


A proper cron system requires three methods

  • add new tasks to cron
  • remove existing tasks from cron
  • execute cron tasks on schedule


There also needs to be a way to record how often the cron task needs to be run and when last it did run. So let's dive into the code.

  
    var _ = require("underscore");

    // define tasks to be run periodically
    function taskOne(callback){
        // perform some actions and then return a callback with status
        return callback(err);
    }

    function taskTwo(callback){
        // perform computations
        return callback(err);
    }

    /* set up the object to keep track of the cron system
     * the object will have the format
     * {
     *  taskName: {task: <task>, frequency: <frequency>, lastrun: <time task was last run> }
     * }
     */

    var cronRecord = {};

    var Cron = {
        addTask: function(taskName, task, frequency){
            var lastrun = null;
            /*
             * you should probably do some validation of your arguments
             * to ensure "task" is really a function and 
             * "frequency" is a time period in the appropriate units
             * e.g. seconds

            /* let's check if the taskName already exists.
             * if so, just update the properties
             * /
            if(_.contains(Object.keys(cronRecord), taskName)){
                // get the last time the task was run
                lastrun = cronRecord[taskName]lastrun;
            }

            cronRecord[taskName] = {
                task: task,
                frequency: frequency,
                lastrun: lastrun 
            };
        }
    };

  


Now, to add a new task to the cron is pretty simple

  
    // set function "taskOne" to run daily
    Cron.addTask("First cron task", taskOne, 86400);

    // set function "taskTwo" to run weekly
    Cron.addTask("Second cron task", taskTwo, 604800);

    // modify period of "First cron task" from daily to hourly
    Cron.addTask("first cron task", taskOne, 3600);
  


Ok, now that we can add tasks to cron (or modify existing cron tasks), we need to be able to remove tasks from cron as well

  
    var Cron = {
        addTask: function(taskName, task, frequency){
            var lastrun = null;
            /*
             * you should probably do some validation of your arguments
             * to ensure "task" is really a function and 
             * "frequency" is a time period in the appropriate units
             * e.g. seconds

            /* let's check if the taskName already exists.
             * if so, just update the properties
             * /
            if(_.contains(Object.keys(cronRecord), taskName)){
                // get the last time the task was run
                lastrun = cronRecord[taskName]lastrun;
            }

            cronRecord[taskName] = {
                task: task,
                frequency: frequency,
                lastrun: lastrun 
            };
        },

        removeTask: function(taskName){
            delete cronRecord[taskName];
        }
    };
  


Now, removing a task from the cron is just as simple as adding one

  
    // remove "First cron task" from cron
    Cron.removeTask("First cron task");
  



Great! We're in the home stretch now. We can add and remove cron tasks, we now only need to be able to execute them on schedule.

  
    // utilize the underscore and async packages
    var  _  = require("underscore")
    , async = require("async")
    ;

    var Cron = {
        addTask: function(taskName, task, frequency){
            var lastrun = null;
            /*
             * you should probably do some validation of your arguments
             * to ensure "task" is really a function and 
             * "frequency" is a time period in the appropriate units
             * e.g. seconds

            /* let's check if the taskName already exists.
             * if so, just update the properties
             * /
            if(_.contains(Object.keys(cronRecord), taskName)){
                // get the last time the task was run
                lastrun = cronRecord[taskName]lastrun;
            }

            cronRecord[taskName] = {
                task: task,
                frequency: frequency,
                lastrun: lastrun 
            };
        },

        removeTask: function(taskName){
            delete cronRecord[taskName];
        },

        run: function(req, res, callback){
            // req and res are the express request and response objects

            // set timestamp for when cron is invoked
            var now = Date.now();

            // set up array to contain cron tasks that are ready to execute
            var scheduledTasks = [];

            _.forEach(cronRecord, function(taskDefinition, taskName){
                // check if the task is due
                if(taskDefinition.lastrun === null || _checkIfTaskIsDue(taskDefinition, now){
                    scheduledTasks.push(
                        function(asyncCallback){
                            taskDefinition.task(asyncCallback);
                        }
                    );
                }
            });

            // if there are tasks due, run them
            if(scheduledTasks.length > 0){
                async.parallel(scheduledTasks, function(err){
                    // cron tasks will have run
                    return callback(err);
                });
            } else {
                // no cron tasks are due
                return callback(null);
            }

            // function to check if a cron task is due to be executed
            // returns true if the cron task is due, returns false otherwise

            function _checkIfTaskIsDue(description, invokeTime){
                // set default value of lastrun to 0
                description.lastrun = description.lastrun || 0;

                // check how long it has been since the last time
                // the task was executed
                var elapsedTime = invokeTime - description.lastrun;

                // return true if the elapsed time is greater than or
                // equal to the cron frequency 
                if(elapsedTime - description.frequency * 1000 >= 0){
                    return true;
                }

                // return false otherwise
                return false;   
            }
        }),
    };
  


Now the Cron system is in place, we just need to set up a route to invoke the process.

  
    app.get('/cron', function(req, res){
        Cron.run(req, res, function(err){
            res.send("cron tasks executed");
        });
    });
  


Crash protection
It would look like we're done here. We can add, remove and excute cron tasks as required but there is a subtle little problem just waiting to blow up the cron system if we deployed at this point. This code has no way to recover if the nodejs app was to crash at any point. All the cron records are maintained in the "cronRecord" object. Unfortunately, any time the app restarts, "cronRecord" is reset to an empty object and all the information about previous cron tasks being run is lost. Therefore, it is imperative that the cronRecord object is persisted outside of the app memory.

The obvious solution is to store the object in a database (say, MySQL) and retrieve it whenever the system restarts.

  
    // utilize the underscore, async and mysql packages
    var  _  = require("underscore")
    , async = require("async")
    , mysql = require("mysql")
    ;

    var Cron = {
        addTask: function(taskName, task, frequency){
            var lastrun = null;
            /*
             * you should probably do some validation of your arguments
             * to ensure "task" is really a function and 
             * "frequency" is a time period in the appropriate units
             * e.g. seconds

            /* let's check if the taskName already exists.
             * if so, just update the properties
             * /
            if(_.contains(Object.keys(cronRecord), taskName)){
                // get the last time the task was run
                lastrun = cronRecord[taskName]lastrun;
            }

            cronRecord[taskName] = {
                task: task,
                frequency: frequency,
                lastrun: lastrun 
            };
            storeRecord();
        },

        removeTask: function(taskName){
            delete cronRecord[taskName];
        },

        run: function(req, res, callback){
            // req and res are the express request and response objects

            // set timestamp for when cron is invoked
            var now = Date.now();

            // set up array to contain cron tasks that are ready to execute
            var scheduledTasks = [];

            _.forEach(cronRecord, function(taskDefinition, taskName){
                // check if the task is due
                if(taskDefinition.lastrun === null || _checkIfTaskIsDue(taskDefinition, now){
                    scheduledTasks.push(
                        function(asyncCallback){
                            taskDefinition.task(asyncCallback);
                        }
                    );
                }
            });

            // if there are tasks due, run them
            if(scheduledTasks.length > 0){
                async.parallel(scheduledTasks, function(err){
                    // cron tasks will have run
                    storeRecord();
                    return callback(err);
                });
            } else {
                // no cron tasks are due
                storeRecord();
                return callback(null);
            }

            // function to check if a cron task is due to be executed
            // returns true if the cron task is due, returns false otherwise

            function _checkIfTaskIsDue(description, invokeTime){
                // set default value of lastrun to 0
                description.lastrun = description.lastrun || 0;

                // check how long it has been since the last time
                // the task was executed
                var elapsedTime = invokeTime - description.lastrun;

                // return true if the elapsed time is greater than or
                // equal to the cron frequency 
                if(elapsedTime - description.frequency * 1000 >= 0){
                    return true;
                }

                // return false otherwise
                return false;   
            }
        },

        storeRecord: function(){
            /* stores the cronRecord object in the MySQL database
             * given a database "appdb" with table "record" and the
             * object cronRecord is stored as a JSON string in column "cronrecord"
             * storeRecord should be executed anytime the cronRecord is 
             * modified i.e. everytime cron.add
             */

            var connection = mysql.createConnection({
              host     : 'localhost',
              user     : 'me',
              password : 'secret'
            });

            connection.connect();

            var query = 'UPDATE record WHERE ID=1 SET cronrecord=' + JSON.stringify(cronRecord);
            connection.query(query, function(err, result){
                // cronRecord is now stored in database table
                connection.end();
            });    
        },

        readRecord: function(){
            /* reads cronRecord object from the MySQL database
             * given a database "appdb" with the table "record"
             * readRecord should be executed whenever the app is 
             * started
             */

             var connection = mysql.createConnection({
              host     : 'localhost',
              user     : 'me',
              password : 'secret'
            });

            connection.connect();

            var query = 'SELECT record FROM cronrecord WHERE ID=1';
            connection.query(query, function(err, result){
                cronRecord = JSON.parse(result[0]);
                connection.end();
            });
        },

  


Now, there are two new methods; storeRecord and readRecord, which help store the cronRecord object in the database as a way to be resilient against system crashes.

Careful coders would have noticed we still have a little problem in the code; you can not stringify functions, store them in the database as a string and then read them back. Somewhere in the process of stringifying, storing and retrieving, the function will no longer execute (on account of it's no longer a function but now a string). So we need to find a way to store the function in the database as a string and restore it back to a function upon reading from the database.

  
    // insert tasks in nodejs global object
    exports.taskOne = taskOne;
    exports.taskTwo = taskTwo;

    var Cron = {
        readRecord: function(){
            /* reads cronRecord object from the MySQL database
             * given a database "appdb" with the table "record"
             * readRecord should be executed whenever the app is 
             * started
             */

             var connection = mysql.createConnection({
              host     : 'localhost',
              user     : 'me',
              password : 'secret'
            });

            connection.connect();

            var query = 'SELECT record FROM cronrecord WHERE ID=1';
            connection.query(query, function(err, result){
                var tempRecord = JSON.parse(result[0]);
                _.forEach(tempRecord, function(taskDescription, taskName){
                    cronRecord[taskName] = {task: module[task], frequency: taskDescription.frequency, lastrun: taskDescription.lastrun}
                });
                connection.end();
            });
        },

      // include rest of the methods 
      // ....
    }
  

That's it. Now, when cronRecord is read from the database, the task is converted from a string back to being a function.

Login or register an account to leave a comment

Comments


Sign up to receive more nodejs tips like this