Batch jobs are essential for many applications but sometimes there is too much hassle to stop and re-enable them when needed. We are going to see what they are, the scenarios in which it is needed and how to use a standard microservice pattern to address it.
What are batch jobs?
There are multiple tasks that have to be done with a fixed frequency so a user shouldn’t have to go to a website and start them manually, mostly if they should run out of office hours or there is a regulatory requirement that cannot be missed (e.g. delete personal data for GDPR). Some examples are the month-end or year-end reporting generation, a daily batch that sends new records to a data lake or a process that synchronises data between two systems every few minutes if it cannot be streamed (e.g. from a sales application to an accounting one or to send invoices to customers).
Why do we need an easy way to stop and re-enable them?
We can have these tasks scheduled so they start when needed, either inside the application (there are multiple tools like Spring Batch) or in an external scheduler (there are multiple solutions both cloud and on-premise), but in both cases we need to engage with some Ops teams (to either re-deploy the app with changes on the settings or to modify the task setting on the scheduler) and we may have to fill multiple forms and assist to different meetings like CAB ones. For sure there has to be a better way to do it, no?
Some may question why we need it as we can set some pre-conditions on the code e.g. to stop or delay a report generation until all the invoices have been processed. However there are some business-based conditions that cannot be automated, e.g. if a user wants to manually review the payments before their data is sent to auditor or customers. And there are some technical reasons to pause them like if there is a server o network outage planned during the weekend that may cause a bad processing and we don’t want to coordinate with an extra team to be there just to enable and re-enable considering the extra cost of them working on non-standard hours.
The feature toggle pattern
The feature toggle (or feature flag) pattern is commonly used on all architecture types (monolith, microservices and serverless) to change the behaviour of an application without having to re-deploy. We can use it to simply enable or disable a feature, change between two different versions of a feature (e.g. use an old version of a feature or a new one) and even change infrastructure (use a new version of an API or the previous one, or change between sending data to an old on-premise system or a newer version on the cloud). These conditions can be combined so we can change behaviour based on different requirements although it is usually not recommended as some people may tend to use them for business logic instead of to simply toggle features.
How to implement it?
The feature toggle status (on/off) can be stored in a database in which we store the feature name and the status and that is managed by a microservice, although it can be implemented in other ways like in a module inside a monolith. Once this service is in place it just has to be called from the code triggered by the batch job to check if the task should run or not. The job may decide then to either skip that job execution or try again after a few minutes to see if the job has been enabled again. The toggle status can be configured in a UI that uses that service in which authorised users can click on on/off buttons to easily change the behaviour.
It can be in a centralised service (ideally, to avoid having duplicate components, although the code can be moved to a library so it is not duplicated) or per application group (if some stakeholders don’t want to rely on infrastructure from other team or want to ensure that users in other business area cannot modify their settings by accident).
It may be tempting to add some additional columns like employee id (so it is only enabled for some business users that want to test a new feature in production) or per user group (to do A/B testing or so it is only enabled e.g. for early-adopter groups). However be aware that some product teams may want to use it for general user authorization (to manage which users regularly have access to some features) instead of using better solutions like LDAP just to avoid the extra paperwork needed.
The feature toggle pattern is frequently used to switch application behaviour but it can also be used to easily stop and re-enable batch jobs like the ones that send month-end reporting to customers or auditors. I hope you find it useful and reduces the manual effort and paperwork needed to change your scheduled tasks when needed.