When to use Ansible vs Docker to set up a cluster

When it comes to setting up a cluster, there are a number of tools available, each with their own strengths and weaknesses. Two popular options are Docker and Ansible that can be used to automate the setup and maintenance of a cluster, but they differ in their approach. We will see what they are and in which scenarios it is better to use each of them.

What are they?

Docker is a containerization platform that allows to package an application and its dependencies into a single container that can be run on any machine with Docker installed. This makes it easy to deploy applications to a cluster of machines, as each host only needs to have Docker installed and the containers can be easily moved between machines. There are a number of tools for managing large clusters such as Kubernetes that takes care of deployments, monitoring and auto-scaling based on demand.

Ansible, on the other hand, is a configuration management tool that allows to automate the setup and maintenance operations. You can define the expected configuration of each machine in your cluster, and then use Ansible to apply those configurations to each machine. This makes it easy to set up a cluster with a consistent configuration, and to update that configuration as needed.

What can both of them do well?

Apply changes to dozens or hundreds of instances: with Docker we can use operations or redeploy a new version of the image, and with Ansible we can run a playbook against all the hosts.

Install or upgrade software: with Docker we just need to change the image and deploy it, with Ansible we can run a playbook that performs the changes. However, upgrades on Ansible may be a bit more messy as we may not clean well residual files from previous versions.

The configuration and changes are self-documented: with Docker you have all the instructions on the Dockerfile and with Ansible on the playbooks and roles.

Splitting complex setups: with Docker we can use multi-stage builds and with Ansible we can separate complex and/or reusable parts into roles.

Restart failed instances: Kubernetes checks continuously if the instances are up using health checks, and we can configure schedulers with Ansible to perform similar checks, e.g. in Rundeck.

Check security: with Docker we can use container security tools and with Ansible we can use virtual machine threat detection tools.

Applications to send logs to Kibana: with Docker we can use Fluentd with the EFK stack and with Ansible configure Filebeat inside the hosts and use the ELK stack.

When could be better to use Docker?

When scalability is important: Docker is better suited for managing large clusters than Ansible as it can automatically scale to dozens of hundreds of instances. Docker provides a number of tools for managing large clusters, such as Kubernetes, that allow you to easily scale your cluster as needed, changing the number of pods and their CPU and memory settings.

To optimise costs: with Kubernetes we can run instances only when needed, potentially reducing costs on cloud providers.

To use complex release strategies such as blue/green deployments: depending on the type of cluster we may want to ensure new versions work without impacting customers or with little impact to them. With tools such as Kubernetes we can easily deploy to only a few instances and redirect part of the traffic to it until we are confident that the new changes work and all the traffic can be redirected to them. We can do something similar with Ansible but it may require to have a number of unused hosts (costing extra money), to tweak load balancers (requiring specific network settings), etc.

To ensure people cannot change settings manually: with Docker all the changes done on the containers are lost when they are restarted, so nobody will have the temptation to do changes manually on them and will do them instead on files stored on a source-code repository. This doesn’t happen with Ansible as people may do changes manually on the hosts after the initial set up with Ansible unless we have a policy to reset VMs with a certain frequency.

If you have restricted access to VMs: some companies have restrictive policies and only a dedicated team can install applications or create folders on VMs, what may make difficult to configure instances and may limit the Ansible modules you can use. In this case you may have more freedom configuring a bespoke image or may be able to download an image that has almost everything you need already configured.

To build prototypes: sometimes we don’t know if a tool is right for us and want to experiment with a few alternatives without spending much time on each of them. With Docker we can download images already configured for them without needing much knowledge of them, whereas with Ansible we may need to know how to do a full set up of each of them.

When could be better to use Ansible?

When it is needed to tweak low-level settings: If you need to tweak OS, kernel, or hardware settings to optimise performance, Ansible is a better option as we have access to the underlying system. Ansible allows for highly customisable configurations, making it easier to fine-tune these settings as needed. Also, if services need optimised routes to connect to other instead of depending on service discovery, e.g. for performance-intensive apps.

When there is no Docker or Kubernetes infrastructure available: running a production-ready cluster on Docker requires a number of tools in place like a Docker registry to store the images or a container orchestration tool like Kubernetes that takes care of maintaining the containers. Setting them up may take the same or more effort than configuring the application cluster.

If you have a restricted Kubernetes infrastructure: some companies have a Kubernetes cluster optimised for microservices with restrictive settings (e.g. very low memory limits per pods) or in which it is needed to get an approval to do any change (so you may not be able to do production support outside core working hours). These limitations may make it unusable to deploy on it an application cluster as we will need access any time and flexibility to increase memory if needed.

When you are only allowed to use limited base Docker images: some companies have restrictive policies in place that limit the base images that can be used, which applications can be installed on them, etc. This requires additional set up effort as you may not be able to use images that already have most of what you need, and it may become a problem when your needs evolve as you may find that you cannot maintain your cluster (e.g. not being able to install additional monitoring software).

If you need to access shared drives or other resources not accessible outside Kubernetes: e.g. if you want to store backups on external drives that cannot be mounted on the Kubernetes cluster due to security restrictions.

When you need flexibility: Ansible is more flexible than Docker, as it allows you to define the configuration of each machine in your cluster in a highly customizable way. E.g. different settings for upper and lower environments that can’t be met with a standard container image.

When there is a limited number of instances: e.g. for some infrastructure services like Tibco EMS that only support a primary/secondary model with two instances. In this case we cannot benefit from docker-specific features like auto-scaling.

When it is critical to do data backups: while Docker provides some backup and restore functionality and we can use data volumes, it may not be sufficient for all use cases. In some cases we may want to use Docker and combine it with Ansible to get the flexibility needed.

If the team supporting the cluster have limited or no knowledge of Docker: there will be many unexpected cases in which you will have Docker-specific issues like connecting to the Docker registry, etc. and a support team with reduced knowledge may cause more harm that good. However, the same team may be able to maintain perfectly well any non-Docker issues. Sure, we can provide Docker-specific training, but that won’t cover all the cases that may happen and you need people who can quickly debug and solve them when an issue happens in production or it is blocking a release.

Summary

Ultimately, the choice between Docker and Ansible will depend on your specific needs and preferences. If you are already familiar with containerization, have the infrastructure needed and need to set up a large, scalable cluster, Docker may be the better option. If you have specific requirements for your cluster and need a high degree of customization, Ansible may be the better option. In either case, both Docker and Ansible are powerful tools that can help you automate the setup and maintenance of your cluster.

Rafael Borrego

Consultant and security champion specialised in Java, with experience in architecture and team management in both startups and big corporations.

Disclaimer: the posts are based on my own experience and may not reflect the views of my current or any previous employer

Facebook Twitter LinkedIn 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>