How to back up MongoDB databases to prevent losing data

MongoDB is a popular NoSQL database system that needs a solid backup strategy to prevent data loss in case of unexpected events such as hardware failure. We will explore different strategies to ensure your data is properly backed up and secure and how you can implement them.

Tooling needed

All the strategies use Mongodump and Mongorestore that are two free command-line tools that can be used for backup and restore operations. Mongodump creates a binary export of your data, while Mongorestore can be used to import the binary export back into MongoDB. You may want to store the backups compressed, what can be done providing the “–gzip” parameter to compress each collection separately.

Daily vs incremental backups

MongoDB allows to generate daily and incremental backups. Daily backups involve taking a complete snapshot of your entire database each day. Incremental backups involve backing up only the changes that have occurred in the database since the last full backup, so you can restore to what it was a few minutes before.

Incremental backups can be done by enabling the oplog in MongoDB, which logs all changes made to the database. The oplog can then be used to apply these changes to a backup copy of the database, creating an incremental backup.

Full database vs individual collections

We can choose to back up and restore the full database or individual collections. Most of the times we may want to restore a full database to prevent data inconsistencies, but there may be cases when we are only interested in recovering a specific collection, e.g. if it was deleted accidentally or it got corrupted. The only change needed to back up or restore a specific collection is to provide an additional parameter with the collection name. Doing a full backup also allows to restore a specific collection as each of them is stored in a separate file.

MongoDB Installed directly in a host vs in a Docker container

If you are running MongoDB in a Docker container you can still use the same backup and restore strategies. However, it may be slightly more difficult to access the data files directly, which can complicate backup and restore operations. You can use the command below to generate backups from outside the container, using docker, kubectl or oc depending on how you access it (plain Docker, vanilla Kubernetes or Openshift):

docker/kubectl/oc exec ${REPLACE_CONTAINER_ID} bash -c "mongodump ..."

In addition, if you are using a Docker volume to store the database data, you will need to ensure the volume is properly backed up as well.

How to prevent data inconsistencies

It is difficult to guarantee no data inconsistencies in MongoDB as it is a non-transactional database. Changes impacting multiple collections may be applied at the same time but there is no transaction to fully commit or rollback the whole thing, so it may happen that a backup or a system failure happens between changes are saved on all the collections. We can handle the failure scenario in the application code, but we need something else for backups to prevent data loss or corruption.

Freezing is a technique used to prevent data inconsistencies, especially when there are primary and secondary instances. We can stop data writes into a copy of the database during the backup process to ensure all the collections are in sync when they are copied. The drawback is that it requires some unavailability as database writes have to be paused for a few minutes on the primary nodes.

It is not recommended to pause writes on secondary nodes as it could cause data loss. The error scenario would be: 1) an application would write into the primary while the secondary is paused and it would receive a success confirmation 2) the primary would fail while the secondary is paused but the new changes haven’t been replicated anywhere 3) when the secondary is enabled again it wouldn’t get the new data, and the business application wouldn’t be aware of the issue as it hasn’t received any error.

There are two ways of freezing a primary instance before backing up and releasing once it is completed:

1) Using freeze:

We can use “rs.freeze” to stop the writes on the primary instance and “rs.stepDown” to re-enable them again. Be aware that stepDown will trigger an election process and one of the secondary nodes will be elected as the new primary.

2) Locking access:

We can use “db.fsyncLock()” to stop the writes on the primary instance and db.fsyncUnlock() to enable them again.

We should also stop replication from the primary to the secondary instances to avoid inconsistencies using “db.shutdownServer()” and re-enable it later using “mongod –replSet <replica set name>”

Store the backups safely

Once a backup has been generated it is important to ensure the dump file is stored in a safe location, separate to where it was generated. We can use file backup solutions such as NetBackup to copy the file to a remote server or a cloud storage provider so it is available if the database server gets corrupted.

Test the backup and restore processes regularly

We should never assume that a backup process is working well as we may notice it isn’t at the worst moment, e.g. if a sales database gets corrupted during Black Friday. We should regularly test the whole process, restoring the data into a sandbox environment that is immediately deleted after the test, or into a lower environment. If it is into a lower environment it is important to run a script to anonymise data to ensure production data is not accidentally leaked.

Are you using MongoDB? If so, which have been your experiences managing it and doing back ups? I would love to read it in the comments section below :)

Rafael Borrego

Consultant and security champion specialised in Java, with experience in architecture and team management in both startups and big corporations.

Disclaimer: the posts are based on my own experience and may not reflect the views of my current or any previous employer

Facebook Twitter LinkedIn 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>