From Monolith to Microservices – Migrating a Persistence Layer
Take your skills to the next level!
The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, 2 monthly Q&A calls, monthly coding challenges, a community of like-minded developers, and regular expert sessions.
Since microservices have become popular, teams are trying to split their monolithic application into a set of small, independent, and highly-scalable microservices. In theory, that often seems like an easy task. You only need to follow the key principles of domain-driven design, identify the bounded contexts in your application, and extract each of them as a microservice.
As so often, the implementation quickly gets much more complicated than it seemed. There are always some use cases that require data from otherwise completely independent services. And some write operations need to ensure data consistency while adding or updating information in multiple services.
As I show in my Data and Communication Patterns for Microservice course, several patterns help you solve all of these problems. But they introduce additional complexity to your project and having to use too many of them can be an indicator that you cut your services in the wrong way.
So, let’s discuss the general steps to split up a monolith, some of the patterns you should know to solve common problems, and when you should rethink your design decisions.
- 1 Keeping Your Services Independent
- 2 Refactoring Your Data Model
- 3 Conclusion
Keeping Your Services Independent
When designing a microservice architecture, it’s important to avoid dependencies between your services. That enables you to implement, release, deploy, and scale each service independently. It also reduces the risk that you’re building a distributed monolith in which errors or performance issues of one service cascade to other services and negatively affect your entire system.
How you manage your data plays an important role when achieving this goal. To keep your services independent of each other, you should follow 3 basic rules when designing your services:
- Each service has to have its own database to keep it independent of all other services. I often get asked if you can deploy the databases of multiple services onto the same database server. I prefer to avoid that because it introduces the risks that database issues affect multiple services at once. But this can be an option for your first systems to reduce the complexity of your deployments.
- No microservice has access to the database of another service. That, of course, also includes foreign key references, database views, and stored procedures. Having direct access to any datastore of another service introduces a very strong dependency.
- Each service manages its own local transactions and doesn’t participate in any distributed transaction.
Based on these rules and some general best practices for designing your software, you can start refactoring your data model.
Refactoring Your Data Model
I always recommend focusing on your business domain instead of specific parts of your source code. It makes it much easier to identify the parts of your application that should be independent and eliminate all technical distractions.
And if you’re working on a huge codebase, you don’t need to and probably shouldn’t work on the entire monolith at once. It’s often better to take an incremental approach and focus on certain parts of your application.
Step 1: Identify independent modules and split your business code
You probably already heard it a thousand times. Before you can split your monolith into microservices, you need to identify the parts that shall become independent services. The common recommendation to identify these parts is by finding the bounded contexts within your system. If you do that right, everything is supposed to fall in place, and the refactoring is supposed to be quick and easy.
Well, that’s not entirely wrong. After you found the right scope of a microservice, extracting it from the monolith will be a lot easier. But there will still be a lot of work to do, and there will be several problems you need to solve. But let’s take it one step at a time.
There are several ways to identifying and extracting microservices. You can find lots of articles and talks discussing different options. I prefer splitting the monolith into modules first. This enables you to take an iterative approach and ignore all interactions with remote services for now. By doing that, you quickly find all the technical issues that will occur when you start extracting your microservices and help you identify hidden dependencies.
While doing that, you will find several queries and modeled associations that cross your module’s boundaries. Some of them indicate that these modules shouldn’t be separated and become one service instead. Others can be refactored.
Step 2: Remove Queries And Associations Across Module Boundaries
This is often the hardest part of the refactoring. You should take a look at the associations between your entities first. For each one that references an entity in another module, you need to decide if you need to keep it or if you can remove it. Here are 3 questions that will help you make that decision:
- Is there a matching foreign key constraint in your table model? If that’s the case, what would you lose if you remove it?
- Which queries use that association? Do you need detailed information stored in the associated record?
- Is there any complex business logic that uses this association?
Replace associations with foreign key reference
Quite often, a lot of these associations are not used in any complex queries or business operations. You can then easily remove them and use a basic attribute that stores a reference to the referenced record’s key instead. This information is sufficient to find all records referenced by another service or provide a caller with the required information to connect the information managed by different services.
If you only need a few pieces of information stored in the referenced record, it can be a good idea to keep this data redundantly in both services. That removes the association from your domain model, but it doesn’t remove your services’ dependency. You will need to define which service owns that information, and you need to replicate it to the other service. The important difference is, that you can use patterns like View Database and Outbox Pattern to do that robustly and asynchronously.
Merge Dependent Modules Or Introduce Another Module
If multiple services require huge parts of the same information, you are often dealing with the symptoms of a wrong service design. Your services could be too small, or you might be missing a service.
The most common way to solve this issue is to merge these services into one. That service then handles these dependencies internally in the same way as it did in the monolith, and the problem disappears. If you don’t have good reasons to keep the services separate, I recommend taking this approach.
But every now and then, you will have services that should stay separated. In these cases, it’s often a good idea to extract the shared data and the logic that operates on it into a separate service. Instead of having multiple services that depend on each other, you then have multiple independent services and one that depends on them.
This still isn’t an ideal situation. You would want to remove all dependencies. But in the described case, that’s not possible. So, you opt for the second-best option: You try to move all dependencies into one service and replicate the required information asynchronously to avoid synchronous service calls. By doing that, you reduce the complexity and improve fault tolerance.
Step 3: Handle Distributed Write Operations
In an ideal world, you don’t need to handle any write operations that affect more than one service. That’s because it’s tough to ensure data consistency without using a distributed transaction.
One option to avoid distributed write operations is to merge all services that are part of it. This is often the right approach if a set of services takes part in multiple distributed write operations. That’s a clear indication that these services are not as independent as they seemed to be in the beginning.
If you have good reasons to keep the services separated, you can use the SAGA pattern. I explain it in great detail in my Data and Communication Patterns for Microservices course. The general idea is simple. Similar to the Scandinavian stories, you split the operation into multiple steps that happen in a defined order. After all of them have been performed, your SAGA and the write operation it modeled were successful. And in case something goes wrong, you perform the inverse operation for all steps that were already performed. By doing that, you get an eventually consistent system.
In most cases, implementing and monitoring such a SAGA is way more complicated than it seemed in the beginning. Therefore, I recommend trying to avoid them and rethink your service design before implementing them.
If you decide to implement a SAGA, I recommend using an Orchestrator-based SAGA for all complex SAGAs that involve more than 3 services or require complex decisions to trigger the next steps. In that pattern, the orchestrator controls the flow of the SAGA. That makes tracking and implementing it much easier.
The Choreography-based SAGA can be a good option if you want a very loosely coupled SAGA. Similar to dance choreography, every service knows what and when it has to do something. There is no central component that manages the flow. That can provide a lot of flexibility, but it can also make it very hard to monitor and manage complex operations.
Step 4: Each Module Becomes A Service
The final step is easy. You already did the hard work in steps 2 and 3. You now only need to refactor your modules into independent services. If you didn’t miss anything in the previous steps, you only need to copy each module’s code into a different folder and add some build and deployment tooling.
After you identified the bounded contexts in your system and used them to modularize your business code, there is still a lot of work to do. You need to apply the same modularization to your persistence layer.
When you split up your persistence layer, you will find associations and queries that cross the boundaries of your modules. You then need to decide if you can:
- remove them,
- change your service design to keep them within the same service,
- replace them by replicating your data asynchronously.