Update your Database Schema Without Downtime


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


Updating a database schema is pretty easy if you can take your application offline. You shutdown the application, create a backup of the current database schema, perform all required update operations using tools like Flyway or Liquibase, restart the application and hope that everything works fine. But that changes if your customers don’t accept any downtime. Simple changes, like removing a column or renaming a table, suddenly require a multi-step migration process. The reason for that is that high-available systems make heavy use of redundancy.

Redundancy – A required evil

If you want to build a high-available system, you need to run at least 2 instances of every subsystem. So, in the simplest case, you need at least 2 instances of your application and 2 instances of your database server.

The redundancy of all subsystems provides a lot of benefits. The two most important ones are:

  • It increases the number of parallel requests that your system can handle.
  • It makes sure that your system is still up and running even if an instance of one of your subsystems isn’t available.

But they also create new challenges. I will not dive any deeper into topics like monitoring, tracing, load balancing, and fault tolerance. If you don’t have any experience with high-available systems, you should read about all of them. The good news is that there are several great tools and libraries available that help you solve these challenges.

Rolling Updates

In this article, I want to focus on database schema migration for high-available systems. The redundancy of the application plays a critical role during the migration. It enables you to perform a rolling update.

The implementation of a rolling update depends on your technology stack. But the idea is always the same: You have a bunch of instances of a subsystem and you shutdown, update, and restart one instance after the other. While doing that, you run the old and the new version of your application in parallel. The Kubernetes documentation contains a nice, visual description of the rolling update concept.

Multi-Step Migration Process

The rolling update adds a few requirements to your database migration. You no longer need to just adapt the database in the way it’s required by your application; you also need to do it in a way that the old and the new version of your application can work with the database. That means that all migrations need to be backward-compatible as long as you’re running at least one instance of the old version of your application. But not all operations, e.g., renaming or removing a column, are backward compatible. These operations require a multi-step process that enables you to perform the migration without breaking your system.

Let’s take a closer look at the backward-compatible operations first.

Backward-Compatible Operations

Backward-compatible operations are all operations that change your database in a way that it can be used by the old and the new version of your application. That means that you can execute them during a migration step and don’t need to split them into multiple operations.

Add a table or a view

Adding new tables or views doesn’t affect the old instances of your application. You can perform them without any risk. Just keep in mind that while you’re performing the rolling update, some users might trigger write operations on old application instances. These old instances, obviously, don’t write any data to the new tables. You might need to clean up your data and add the missing records to the new table after all application instances have been migrated.

Add a column

It can be a little bit harder to add a new column. You don’t need to worry if you add a database column without a not null constraint. In that case, your operation is backward compatible, and you can simply add the column.

That’s not the case for columns with a not null constraint because it will contain null values for all existing records. You can easily fix that by providing a default value; please check your database documentation on how to do that. If you don’t want to define a default value, you need to execute 3 statements to add the column with the constraint:

  1. Add the column without a default value and update all application instances.
  2. Run a database script to fill that field in all existing records.
  3. Add the not null constraint.

The good news is that you can execute all 3 statements within the same migration step.

Remove a column that’s not used by the old and the new version of your application

Removing a database column that is neither accessed by the old nor the new version of your application is also a backward-compatible operation. No application is using that column anymore so there is also no application that could be affected by its removal.

Remove constraints

The removal of the constraint itself is a backward-compatible operation. The old version of your application can still write to the database in the same way as it did before.

But you need to check if there are any old use case implementations that would break if any database record doesn’t fulfill the constraint. During the rolling update, instances of the new version of the application might write some records that do not comply with the no-longer-existing constraint. If that breaks any old code, you’re in trouble, and I don’t know any good way to solve it. You can’t remove the constraint because some reading use cases of the old version will break. You also can’t keep the constraint because some write operations of the new version will fail. Your only option is to remove the constraint and to roll-out the update quickly.

Backward-Incompatible Operations

Backward-incompatible operations are the reason why I wrote this article. These are all the operations that change your database schema in a way that it can no longer be used by the old version of your application. You need to break these operations into a backward-compatible part which you perform before you update your application and a second part that you execute after you updated all application instances. In most cases, that requires you to add a new column or table in the first and to remove the old one in a later step.

This makes the migration process more complex than it would be if you didn’t perform a rolling, zero-downtime update. To make the migration process easier to execute and less error-prone, you should use a that performs automatic, version-based database updates. The two most popular ones are Flyway and Liquibase. I wrote a series of tutorials about both of them:

And now, let’s take a look at some backward-incompatible operations and how you can split them into parts that don’t break your system.

Rename a column, a table or a view

Renaming a column or table or view sounds simple, but it requires 3-4 steps if you want to use a rolling update that doesn’t cause any downtime. The steps needed for all 3 of them are identical. I, therefore, only explain how to rename a database column. In my experience, this is the most common operation.

The migration always follows the same concept, but the implementation differs based on the capabilities of your database. But more about that later. Let’s first take a look at an example.

The table review contains the column comment which I want to rename to message. This requires multiple steps. In the first one, you need to add the database column and initialize it with the data from the old column; then you need to update all application instances before you can remove the old column.

Unfortunately, the most complicated part isn’t the database migration itself, and it’s, therefore, not visible on this diagram. The main issues occur during the rolling update, which is between step 1 and the new version. While you’re updating your application instances, you’re running old and new versions of your application in parallel. The old version is still using the old database column, and the new one is using the new column. So, you need to make sure that both use the same data and that you don’t lose any write operations. There are 2 general ways to achieve that.

Option 1: Sync with database triggers

The migration process is a little bit easier if your database supports triggers. So let’s start with this one:

  1. Add a column with the new name and the same data type as the old one. You then copy all data from the old column to the new one.
    You also need to add database triggers to keep both columns in sync so that neither the old nor the new version of your application works on outdated data.
  2. Perform a rolling update of all application instances.
  3. Remove the old database column and the database triggers.

If you update your database during application startup, step 1 and 2 are executed as 1 step.

Option 2: Sync programmatically

Some databases don’t support triggers, and you need a different approach. In these cases, you need to perform 4 migration steps, and you might lose some write operations during the update if you don’t switch your application into a read-only mode.

  1. Add a column with the new name and the same data type as the old one. You then copy all data from the old column to the new one.
  2. Make sure that the new version of your application reads from and writes to the old and the new database column. Let’s call this version new1.
    Please also keep in mind that there are still old instances of your application that don’t know about the new column and that can write new and update existing records at any time. As your database doesn’t synchronize the write operations, you need to do that in the code of version new1.
    After you’ve made sure that the new1 version of your application can handle this situation, you can perform a rolling update of all application instances.
  3. All of your application instances now run version new1 which knows about the new database column. You can now perform a rolling update to application version new2 which only uses the new database column.
  4. Remove the old database column.

Similar to the previous approach, you can reduce the number of required steps if you run the database migration during application startup. In that case, you can execute step 1 and 2 as 1 step.

Change the data type of a column

You can change the data type of a column in almost the same way as you rename the column. The only difference is that you also need to convert all values stored in the old column to the data type of the new column.

Remove a column or table or view that’s still used by the old version of your application

I’m sorry to tell you that you can’t remove that column/table/view. At least not now. You first need to update your application so that there is no running instance of it that still uses it. After you’ve done that, you can remove the no longer used column/table/view from your database.

Summary

Migrating a database schema without downtime is possible, but it often requires a complex, multi-step approach. It requires you to change your database in a backward-compatible way so that the old and the new version of your application can use it.

As you’ve seen in this article, not all migration operations are backward-compatible. But you can split them into multiple steps so that you can create a database version that can be used by both versions of your application. In most cases, that requires you to add a new column or table or view which will be used by the new version of your application. After you updated all application instances, you can then remove the old one.

6 Comments

  1. Avatar photo Michał Ruszkowski says:

    Nice explanation, thanks

    1. Avatar photo Thorben Janssen says:

      Thanks

  2. Thanks for sharing the informative post with us. Very helpful.

    1. Avatar photo Thorben Janssen says:

      Thanks Tracy

  3. Very good article, congrats!

    In my opinion, even with the possibility of downtime during deployment it’s still important to keep in mind that your deploy can fail and you must rollback it as soon as possible!

    Rolling back the app is usually easy but not the database schema, so your schema migrations need to be backward-compatible at least one or two versions to allow roll the system back until the problem is found and fixed. Otherwise much of the fix is going to be done manually by a DBA or OPS team!

    What do you think?

    1. Avatar photo Thorben Janssen says:

      Good point. Yes, a backward compatible database schema makes it a lot easier to rollback an update. But sooner or later, you will need to change your database in an incompatible way and that most often also forces you to adapt your application code.
      I try to ship backward-incompatible database changes as a separate update that doesn’t contain any bugfixes or new features. That minimizes the risk of any unexpected side effects.

Comments are closed.