Distributed Transactions – Don’t use them for Microservices

Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.

Since I started talking about microservices and the challenges that you have to solve whenever you want to exchange data between your services, I hear 3 things:

  1. You only need to model the scope of your services “the right way” to avoid these problems.
  2. We use multiple local transactions, and everything works fine. It’s really not that big of a deal.
  3. We have always used distributed transactions to ensure data consistency. We will keep doing that for our microservice architecture.

Let’s quickly address the first 2 answers before we get to the main part of this article.

Designing services the right way

It’s a popular myth that you can solve all problems by designing the scope of your services the right way. That might be the case for highly scalable “hello” world applications that you see in demos. But it doesn’t work that way in the real world.

Don’t get me wrong; designing the scope of your services is important, and it makes the implementation of your application easier. But you will not be able to avoid communication between your services completely. You always have some services that offer their functionality based on other services.

An example of that is an OrderInfo service in an online bookstore. It shows the customer the current status of their order based on the information managed by the Order service, the Inventory service, and the Book service.

Another example is an Inventory service, which needs to reserve a book for a specific order and prepare it for delivery after the Order and the Payment service processed the order.

In these cases, you either:

  • Implement some form of data exchange between these services or
  • Move all the logic to the frontend, which in the end is the same approach as option 1, or
  • Merge all the services into 1, which gets you a monolithic application.

As you can see, there are several situations in which you need to design and implement some form of communication and exchange data between your services. And that’s OK if you do it intentionally. There are several patterns and tools for that. I explain the most important and popular ones in my upcoming course Data and Communication Patterns for Microservices. It launches in just a few days. I recommend joining the waitlist now so that you don’t miss it.

Using multiple local transactions

If teams accepted that they need to exchange data between their services, quite a few decide to use multiple, independent, local transactions. This is a risky decision because sooner or later, it will cause data inconsistencies.

By using multiple local transactions, you create a situation that’s called a dual write. I explained it in great detail in a previous article. To summarize that article, you can’t handle the situation in which you try to commit 2 independent transactions, and the 2nd commit fails. You might try to implement workarounds that try to revert the first transaction. But you can’t guarantee that they will always work.

Distributed transactions and their problems in a microservice application

In a monolithic application or older distributed applications, we often used transactions that span over multiple external systems. Typical examples are transactions that include one or more databases or a database and a message broker. These transactions are called global or distributed transactions. They enable you to apply the ACID principle to multiple systems.

Unfortunately, they are not a good fit for a microservice architecture. They use a pattern called 2-phase commit. This pattern describes a complex process that requires multiple steps and locks.

2-phase commit protocol

As you might have guessed from the name, the main difference between a local and distributed transaction that uses the two-phase commit pattern is the commit operation. As soon as more than one system is involved, you can’t just send a commit message to each of them. That would create the same problems as we discussed for dual writes.

The two-phase commit avoids these problems by splitting the commit into 2 steps:

  1. The transaction coordinator first sends a prepare command to each involved system.
    Each system then checks if they could commit the transaction.
  2. If that’s the case, they respond with “prepared” and the transaction coordinator sends a commit command to all systems. The transaction was successful, and all changes get committed.
    If any of the systems doesn’t answer the prepare command or responds with “failed”, the transaction coordinator sends an abort command to all systems. This rolls back all the changes performed within the transaction.

As you can see, a two-phase commit is more complicated than the simple commit of a local transaction. But it gets even worse when you take a look at the systems that need to prepare and commit the transaction.

The problem of a 2-phase commit

After a system confirmed the prepare command, it needs to make sure that it will be able to commit the transaction when it receives the commit command. That means nothing is allowed to change until that system gets the commit or abort command.

The only way to ensure that is to lock all the information that you changed in the transaction. As long as this lock is active, no other transaction can use this information. These locks can become a bottleneck that slows down your system and should obviously be avoided.

This problem also existed in a distributed, monolithic application. But the small scope of a microservice and the huge number of services that are often deployed make it worse.

A 2-phase commit between a transaction coordinator and 2 external systems is already bad enough. But the complexity and the performance impact of the required locks increase with each additional external system that takes part in the transaction.

Due to that, a distributed transaction is no longer an easy to use approach to ensure data consistency that, in the worst case, might slow down your application a little bit. In a microservice architecture, a distributed transaction is an outdated approach that causes severe scalability issues. Modern patterns that rely on asynchronous data replication or model distributed write operations as orchestrated or choreographed SAGAs avoid these problems. I explain all of them in great detail in my course Data and Communication Patterns for Microservices.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.