Spring Data JDBC – Modelling and working with aggregates


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


Entities, aggregates, and aggregate roots are some of the core concepts used by Spring Data JDBC. Based on them, Spring Data JDBC decides which objects it loads or persists together. They also define which kind of associations you can model. That shows how important it is to understand both concepts and how they work together.

Spring Data JDBC didn’t invent the concepts of entities, aggregates, and aggregate roots. They are defined by Domain Driven Design. An entity is a domain object with an id that can have multiple additional attributes. A cluster of entities that can be treated as a single unit is called an aggregate. And the aggregate root is the root element of an aggregate. The aggregate root is the object that gets referenced from outside of the aggregate and that references other entities within the same aggregate. As you can see in the example in the following diagram, a typical aggregate structure looks like a tree with the aggregate root as its root.

Spring Data JDBC was designed with these concepts in mind. You are supposed to model a repository for each aggregate. The repository handles the aggregate as a single unit when it fetches it from the database or persists any changes.

Sounds simple, right?

Well, handling an aggregate as a unit has some side effects you should know. And if you used Spring Data JPA for your previous projects, you might find some of them confusing. But don’t worry, none of this is complicated, and you will get used to it quickly.

Modelling an aggregate

As I mentioned earlier, an aggregate is treated as a single unit and consists of one or more entities. One of these entities is the aggregate root, which gets referenced from the outside and references other entities within the aggregate.

None of this might sound special, and you’re probably wondering why I’m repeating all of that. The simple reason is that based on this description, you don’t need many-to-many associations, many-to-one associations, or any bidirectional associations in general. And that’s why Spring Data JDBC doesn’t support them.

This might surprise you if you have used Spring Data JPA in previous projects. But you can model your domain and follow those constraints. Your model fits the concepts of Domain Driven Design, and avoiding those associations makes a few things easier.

Let’s take a closer look at the ChessGame aggregate so that I can show you that you can model an aggregate without those associations. The ChessGame aggregate consists of the entities ChessGame and ChessMove. The ChessGame entity is the root of the ChessGame aggregate.

public class ChessGame {

	@Id
	private Long id;
	
    private LocalDateTime playedOn;

    private AggregateReference<ChessPlayer, Long> playerWhite;

    private AggregateReference<ChessPlayer, Long> playerBlack;

    private List<ChessMove> moves = new ArrayList<>();
	
	...
}

As you can see, the ChessGame entity models a one-to-many association to the ChessMove entity class. But the ChessMove entity doesn’t model a reference to its aggregate root. If you need to get the game in which a specific move was played, you need to execute a query. I explained how to define such queries in my guide to custom queries and projections with Spring Data JDBC.

public class ChessMove {

    private Integer moveNumber;

    private MoveColor color;

    private String move;
	
	...
}

Referencing other aggregates

Each ChessGame is played by 2 players. I modeled the ChessPlayer as a separate aggregate because the player is independent of a game or move.

The ChessPlayer entity class models a player and is the only class of the ChessPlayer aggregate. Due to that, it’s also the aggregate root. 

In Domain Driven Design, the association to a different aggregate is modeled as an id reference to the associated aggregate. When using Spring Data JDBC, you can model it using the AggregateReference interface. I use it in the ChessGame entity class to model the references to the player who played the white and the one who played the black pieces.

public class ChessGame {

    private AggregateReference<ChessPlayer, Long> playerWhite;

    private AggregateReference<ChessPlayer, Long> playerBlack;
	
	...
}

When fetching a ChessGame object, Spring Data JDBC uses the foreign key values stored in the database to initialize each AggregateReference. But in contrast to other ORM frameworks, e.g., Hibernate or Spring Data JPA, Spring Data JDBC can’t automatically fetch the referenced entity object.

To get the referenced ChessPlayer, you need to use the ChessPlayerRepository to fetch it from the database. This gives you full control over the executed SQL statements and avoids lazy loading issues that you might know from other ORM frameworks.

Modelling a repository for an aggregate

After you have modeled an aggregate, you can define a repository for it. As mentioned earlier, an aggregate gets treated as a unit. That means you read and persist the entire aggregate, and all required operations are handled as 1 atomic operation. Due to that, there should only be 1 repository for each aggregate. This repository handles all database operations for the entire aggregate with all its entities.

You can define a Spring Data JDBC repository in the same way as you define any other Spring Data repository. You define an interface that extends one of Spring Data JDBC’s standard repository interfaces, e.g., the CrudRepository interface. Spring Data JDBC then provides you with an implementation of that interface and a set of standard operations. In the case of the CrudRepository, these are methods to persist, update, delete and read an aggregate. If you need additional queries or other features, you can add the required methods to your interface definition.

public interface ChessGameRepository extends CrudRepository<ChessGame, Long> {

    List<ChessGame> findByPlayedOn(LocalDateTime playedOn);
    List<ChessGame> findByPlayedOnIsBefore(LocalDateTime playedOn);

    int countByPlayedOn(LocalDateTime playedOn);

    List<ChessGame> findByPlayerBlack(AggregateReference<ChessPlayer, Long> playerBlack);
    List<ChessGame> findByPlayerBlack(ChessPlayer playerBlack);
}

For the scope of this article, I expect you to be familiar with Spring Data’s repository interfaces and their derived query feature. If you’re not familiar with it, please read my guide to defining custom queries and projections with Spring Data JDBC.

Even though I explained repositories and their query capabilities in a previous article, there are a few things I need to show you to explain the implications of Spring Data JDBC’s handling of aggregates.

Reading an aggregate

Because Spring Data JDBC handles an aggregate as a unit, it always fetches the entire aggregate with all its entities. That can be problematic if your aggregate consists of several entities and multiple one-to-many associations.

Let’s call the findById method of the ChessGameRepository and check the executed SQL statements.

gameRepo.findById(gameId);

The ChessGameRepository returns ChessGame aggregates. The aggregate consists of a ChessGame entity and a List of ChessMove entities. As you can see in the log output, Spring Data JDBC executed 2 SQL statements. The 1st one fetched the ChessGame entity, and the 2nd one all ChessMoves played in the game.

2022-07-05 18:33:05.328 DEBUG 8676 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL query
2022-07-05 18:33:05.329 DEBUG 8676 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [SELECT "chess_game"."id" AS "id", "chess_game"."played_on" AS "played_on", "chess_game"."player_black" AS "player_black", "chess_game"."player_white" AS "player_white" FROM "chess_game" WHERE "chess_game"."id" = ?]
2022-07-05 18:33:05.345 DEBUG 8676 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL query
2022-07-05 18:33:05.345 DEBUG 8676 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [SELECT "chess_move"."move" AS "move", "chess_move"."color" AS "color", "chess_move"."move_number" AS "move_number", "chess_move"."chess_game_key" AS "chess_game_key" FROM "chess_move" WHERE "chess_move"."chess_game" = ? ORDER BY "chess_game_key"]

In this example, the performance impact of fetching the entire ChessGame aggregate will be small. But that quickly changes if you fetch multiple aggregates or your aggregate becomes more complex and includes more entities and to-many associations.

To avoid performance problems, you should keep your aggregates as small and concise as possible. So, if you see the chance to model something as a separate aggregate, it’s often a good idea to do that.

Persisting and updating an aggregate

Spring Data JDBC not only treats an aggregate as a unit when fetching it from the database. It does the same when persisting a new or updating an existing entity.

Persisting an aggregate is easy

This makes persisting a new aggregate very comfortable. You only need to instantiate your aggregate and provide the aggregate root to the save method of your repository. Spring Data JDBC will then automatically persist all entities that belong to the aggregate.

I use that in the following test case to persist a new ChessGame aggregate. I instantiate a new ChessGame object, which is the root of the aggregate. Then I instantiate 4 ChessMoves and add them to the List of moves played in the game. In the final step, I call the save method of the ChessGameRepository and only provide my ChessGame object.

ChessMove white1 = new ChessMove();
white1.setColor(MoveColor.WHITE);
white1.setMoveNumber(1);
white1.setMove("e4");

ChessMove black1 = new ChessMove();
black1.setColor(MoveColor.BLACK);
black1.setMoveNumber(2);
black1.setMove("e5");

ChessMove white2 = new ChessMove();
white2.setColor(MoveColor.WHITE);
white2.setMoveNumber(2);
white2.setMove("Nf3");

ChessMove black2 = new ChessMove();
black2.setColor(MoveColor.BLACK);
black2.setMoveNumber(2);
black2.setMove("Nc6");

ChessGame game = new ChessGame();
game.setPlayedOn(LocalDateTime.now());
game.setMoves(Arrays.asList(white1, black1, white2, black2));

gameRepo.save(game);

As you can see in the log output, Spring Data JDBC executed 5 SQL INSERT statements to persist the entire aggregate. It 1st wrote 1 record to the chess_game table and then 4 records to the chess_move table.

2022-07-05 18:36:03.474 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:36:03.475 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_game" ("played_on", "player_black", "player_white") VALUES (?, ?, ?)]
2022-07-05 18:36:03.503 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:36:03.503 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:36:03.510 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:36:03.511 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:36:03.515 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:36:03.515 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:36:03.519 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:36:03.519 DEBUG 28416 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]

Updating an aggregate can be inefficient

As comfortable as persisting an aggregate is, the handling as a unit makes update operations inefficient. Let’s run the following test case in which I fetch a ChessGame object and only change the value of the playedOn attribute before I tell Spring Data JDBC to save the object.

ChessGame game = gameRepo.findById(gameId).orElseThrow();
game.setPlayedOn(LocalDateTime.now());
gameRepo.save(game);

Spring Data JDBC treats the aggregate as 1 unit and doesn’t keep track of the data it fetched from the database. Due to that, it can’t detect which part of the aggregate has changed. That becomes a problem for every to-many association.

In this example, Spring Data JDBC doesn’t know if or which ChessMove object has changed. Due to that, it has to replace all of them.

As you can see in the log output, it updates the record in the ChessGame table, removes all records from the ChessMove table, and inserts a new one for each ChessMove object.

2022-07-05 18:38:52.927 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL query
2022-07-05 18:38:52.928 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [SELECT "chess_game"."id" AS "id", "chess_game"."played_on" AS "played_on", "chess_game"."player_black" AS "player_black", "chess_game"."player_white" AS "player_white" FROM "chess_game" WHERE "chess_game"."id" = ?]
2022-07-05 18:38:52.945 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL query
2022-07-05 18:38:52.946 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [SELECT "chess_move"."move" AS "move", "chess_move"."color" AS "color", "chess_move"."move_number" AS "move_number", "chess_move"."chess_game_key" AS "chess_game_key" FROM "chess_move" WHERE "chess_move"."chess_game" = ? ORDER BY "chess_game_key"]
2022-07-05 18:38:52.972 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL update
2022-07-05 18:38:52.973 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [UPDATE "chess_game" SET "played_on" = ?, "player_black" = ?, "player_white" = ? WHERE "chess_game"."id" = ?]
2022-07-05 18:38:52.987 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL update
2022-07-05 18:38:52.987 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [DELETE FROM "chess_move" WHERE "chess_move"."chess_game" = ?]
2022-07-05 18:38:52.993 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:38:52.994 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:38:53.000 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:38:53.000 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:38:53.005 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:38:53.005 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]
2022-07-05 18:38:53.010 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing SQL update and returning generated keys
2022-07-05 18:38:53.010 DEBUG 34968 --- [           main] o.s.jdbc.core.JdbcTemplate               : Executing prepared SQL statement [INSERT INTO "chess_move" ("chess_game", "chess_game_key", "color", "move", "move_number") VALUES (?, ?, ?, ?, ?)]

Depending on the size and complexity of your aggregate, this handling of update operations can cause severe performance problems. The best way to avoid those problems is to keep your aggregates small and concise.

Conclusion

An aggregate is a group of entity objects that are treated as a unit. As you saw in this article, this makes a few operations easier. E.g., you can easily persist an entire aggregate, and you don’t have to worry about LazyInitializationExceptions, which you might know from other ORMs.

But treating an aggregate as a unit also introduces performance issues if Spring Data JDBC has to fetch too many records from the database or has to replace lists of entities. To keep these effects as small as possible, I recommend keeping your aggregates concise and simple. The fewer associations and entities your aggregate includes, the lower the risk of performance problems. So, if you have the chance to model something as multiple, small aggregates, you should do that.