What is Ansible and how to use it to automate infrastructure tasks

Many tasks need to be frequently repeated on dozens of servers and can be easily automated so they are done faster, effortless and in a more reliable way. We will see what is Ansible, that is one of the standard technologies for it, how to use and best practices.

What can be done with it?

Automate almost every operation a person can do on a terminal, no matter how long or complex it is. And once it is scripted it can be applied to any number of servers or host groups. It doesn’t require learning any specific language or knowledge about programming so it is great for those working in Ops that don’t have previous automation or coding experience.

Use cases:

Some of the scenarios we can use it for are:

Set up new clusters: this usually requires installing different applications, configuring them, integrating them with different monitoring or logging systems… It can be scripted so we can rebuild boxes or build new ones whenever we want to.
Perform regular tasks: e.g. doing upgrades, backups, data refresh… These usually require long sequences of steps and can also be automated so we only have to run one command.
Self-recovery: run scripts every few minutes to check all the systems are working as expected and restart anything needed. We should obviously have monitoring tools like Dynatrace to monitor systems 24/7 and send alerts/call-outs if something goes down, but many times issues can be fixed automatically and monitoring tools cannot do it. E.g. restarting a Kafka Connector if it goes down in the middle of the night due to a random network issue.

Are there similar tools that can be used?

Yes, Chef and Puppet have been widely used but they require an agent to be installed and maintained in the destination hosts. This is not needed for Ansible as it doesn’t require anything on the boxes apart from Python, what is quite standard on Unix environments. And to work with them it is needed to learn Ruby, what can be challenging for people with a pure Ops background and little previous coding experience. This is not needed for Ansible as we just write configuration files explaining the desired status of the server.

What is the difference with Terraform?

Ansible and Terraform work well together but on different stages. Terraform is used to provision infrastructure (create new servers, network settings, etc.) whereas Ansible is used to configure applications on that infrastructure (Elasticsearch, Kafka, etc).

Are there deployment tools available for it?

Yes. We can deploy from our computers but ideally the scripts should be in source control so all their changes are tracked (e.g. Gitlab) and run from a UI in which it is audited who has done each change and when, and the results of doing it.

Some popular tools to run them are:

Ansible Tower / Rundeck: these are Ansible-specific tools that are already prepared for most types of infrastructure changes and have a good integration with most Unix distributions. This allows to get more feedback about their deployments (if changes were done, status after it, the cause of the issues…). They have schedulers that allow to repeat recurrent tasks on a specific frequency.
Gitlab CI/GitHub Actions/Jenkins: these are standard generic CI/CD tools in which pipelines can be configured to run Ansible scripts, and also have schedulers. They require more initial effort and don’t have much knowledge about the target hosts, but they have a more friendly UI, can be used as a single deployment tool for both software and infrastructure and allow to configure different deployment and testing stages.

Can we add some tests to ensure the scripts will work as expected?

There are different types of tests we can add:

Check the files have correct syntax: we can use yamllint to verify they have valid xml.
Check they have valid Ansible syntax:
- syntax-check: to see if Ansible can compile them
- ansible-lint: checks they follow Ansible best practices
Integration tests: we can see if the playbook will run well on a fresh environment using molecule test
Idempotenty tests: to see if we can run scripts multiple times against production without breaking anything. We can “ansible-playbook –check” for it
Blue-green deployments: have a copy of the upper environments in which we can do an actual run of the scripts without impacting our users

What is needed to start using it?

Only to have Python running on the target boxes, and to install Python and Ansible on the device we are going to run the scripts. And any type of text editor of course, e.g. the free versions of Intellij or Visual Studio.

Challenges using it on-premise?

If you work on a heavily-audited industry you may find multiple limitations:

Lack of permissions on the target hosts: to create folders, install applications, configure them as services using systemctl, … You may have to engage with a Unix team for it and may not be able to use some Ansible modules that already do lots of the desired installation and configuration
Firewalls: e.g. to download application binaries from external websites (you may have to download them separately) or to connect from the deployment tools to the target boxes (you will have to request opening connection).
Access to deployment tools: sometimes we have to spent a lot more energy getting the approvals needed to use them than preparing the automation scripts. If you have worked on highly audited environments you know what it is like.

Best practices:

It is good to follow good practices from the beginning to ensure the scripts are maintainable and work as expected:

Idempotency: running the same script multiple times on a host should leave it on the same state. So, there should be a number of checks to ensure actions are only performed on certain conditions and we can configure the scripts in a special way.
Declarative vs imperative: when designing the automation think on the desired state after running the script (declarative) instead of on the steps to achieve it (imperative), and use components that are already available to use.
Use existing components: there are many repositories such as Ansible Galaxy that contain free components for most tasks you may want to implement and have been improved thanks to the efforts a huge community, so no need to reinvent the wheel.
Document every step: in every step we can write what it does and it will be written in the output. This makes a lot easier to understand long scripts containing a large number of instructions or commands.
Use pre and post condition checks: to ensure the system is in the expected state before running the scripts and in the desired state after it.

I hope this post helps you getting started with Ansible, let me know any feedback in the comments. Happy automating!

Rafael Borrego

Consultant and security champion specialised in Java, with experience in architecture and team management in both startups and big corporations.

Disclaimer: the posts are based on my own experience and may not reflect the views of my current or any previous employer

IT consulting

Rafael Borrego's blog

What is Ansible and how to use it to automate infrastructure tasks

Leave a Reply Cancel reply