5 Hibernate Features You Should Use With Spring Data JPA
Take your skills to the next level!
The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.
With Spring Data JPA, you can use more features than just the ones defined by the JPA specification. It acts as a layer on top of a persistence provider, e.g., Hibernate. This provider implements the JPA specification and provides a set of proprietary features. Due to this multi-layer architecture, you can use all standard JPA and Spring Data JPA features and all proprietary features provided by Hibernate.
The JPA specification defines most features, especially those that map your entity classes and attributes to database tables and columns. All JPA implementations support these, and Spring Data JPA makes using them even a little easier. In addition to that, Hibernate provides a set of proprietary features that make it easier to understand or improve performance. Using them often requires a few extra lines of configuration or code, and I want to show you how to use them.
Understand your persistence layer
Many developers criticize that it’s hard to understand what Spring Data JPA does internally and how many queries it executes. That’s usually caused by using the wrong logging configuration during development. Hibernate can tell you about all SQL statements it generates and executes based on the API calls performed by Spring Data JPA’s repository implementations.
Here are 2 proprietary Hibernate features you can use to get more information about the executed database operations.
1. Hibernate’s statistics
One of the first things I recommend to all my consulting clients is to activate Hibernate’s statistics component during development. It’s by far the easiest approach to get a general overview of the database operations executed during a Hibernate Session.
You can activate the statistics and include detailed information in your log file by setting the following 2 properties in your application.properties file.
spring.jpa.properties.hibernate.generate_statistics=true logging.level.org.hibernate.stat=DEBUG
The first one activates Hibernate’s statistics component. That’s necessary because collecting all this information slows down your application, and you should avoid it in production.
The second property activates debug logging for the statistics component. That gets you an additional log message for each query, which shows you the executed HQL statement, its execution time, and the number of returned rows. You can learn more about Hibernate’s logging configuration in my Guide to Hibernate Logging.
Here you can see the log output of a simple test case that only calls Spring Data JPA’s findAll method on ChessPlayer repository.
2021-08-15 10:54:52.400 DEBUG 31972 --- [ main] o.h.stat.internal.StatisticsImpl : HHH000117: HQL: select generatedAlias0 from ChessPlayer as generatedAlias0, time: 310ms, rows: 19 2021-08-15 10:54:52.408 INFO 31972 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics { 5070899 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 4359903 nanoseconds spent preparing 20 JDBC statements; 76117198 nanoseconds spent executing 20 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 132026893 nanoseconds spent performing 40 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 838900 nanoseconds spent performing 19 L2C misses; 0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections); 16900 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections) }
As you can see iHibernate’s statistics component logs a message with metrics at the end of each Session. In this example, I only called the findAll method on my repository, which triggered the preparation and execution of 20 JDBC statements. That is much more statements than the findAll method should trigger, and it’s a clear indicator that you should check your code for an n+1 select issue.
2. Hibernate’s slow query log
Hibernate can also log all queries with an execution time slower than a configured threshold. The amount of provided information and the performance impact of this feature is a lot lower than Hibernate’s statistics. That makes it especially helpful on test systems, and you could also use it to monitor a freshly deployed release in production.
You can activate Hibernate’s slow query log by configuring the property spring.jpa.properties.hibernate.session.events.log.LOG_QUERIES_SLOWER_THAN_MS. It defines the execution time threshold in milliseconds above which Hibernate shall report a query. In the following example, I set this threshold to 10 ms.
spring.jpa.properties.hibernate.session.events.log.LOG_QUERIES_SLOWER_THAN_MS=10
After configuring this property, Hibernate will write the following log message for each SQL statement with an execution time above this threshold.
2021-08-15 10:54:52.397 INFO 31972 --- [ main] org.hibernate.SQL_SLOW : SlowQuery: 11 milliseconds. SQL: 'HikariProxyPreparedStatement@1734615070 wrapping select tournament0_.players_id as players_2_4_0_, tournament0_.tournaments_id as tourname1_4_0_, chesstourn1_.id as id1_2_1_, chesstourn1_.end_date as end_date2_2_1_, chesstourn1_.name as name3_2_1_, chesstourn1_.start_date as start_da4_2_1_, chesstourn1_.version as version5_2_1_ from chess_tournament_players tournament0_ inner join chess_tournament chesstourn1_ on tournament0_.tournaments_id=chesstourn1_.id where tournament0_.players_id=1'
I recommend writing the messages into a separate log file so that you can easily find, analyze and improve these queries.
Get the best performance
Hibernate not only provides proprietary logging features that make its internal operations easier to understand. It also offers a few features that you can use to improve the performance of your application.
3. Optimized handling of database sequences
The improved handling of database sequences is one of these features. It drastically reduces the number of executed statements to get new primary key values.
If you want to rely on Hibernate’s default handling, you only need to annotate your primary key attribute with a @GeneratedValue annotation.
@Entity public class ChessPlayer { @Id @GeneratedValue private Long id; ... }
Hibernate then gets a unique primary key value from its default sequence before persisting a new entity. This gets very inefficient if you’re persisting multiple entity objects within the same transaction.
for (int i=0; i<10; i++) { ChessPlayer player = new ChessPlayer(); player.setFirstName("Thorben"); player.setLastName("Janssen"); playerRepo.save(player); }
Hibernate then executes twice as many statements as you persist new entities. For each entity, it executes 1 statement to get a unique primary key value and another statement to persist the new record in the database.
2021-08-15 11:35:52.027 DEBUG 30944 --- [ main] org.hibernate.SQL : select nextval ('hibernate_sequence') 2021-08-15 11:35:52.031 DEBUG 30944 --- [ main] org.hibernate.SQL : select nextval ('hibernate_sequence') ... (8 more select nextval statements) ... 2021-08-15 11:35:52.092 DEBUG 30944 --- [ main] org.hibernate.SQL : insert into chess_player (birth_date, first_name, last_name, version, id) values (?, ?, ?, ?, ?) 2021-08-15 11:35:52.095 DEBUG 30944 --- [ main] org.hibernate.SQL : insert into chess_player (birth_date, first_name, last_name, version, id) values (?, ?, ?, ?, ?) ... (8 more insert statements) ... 2021-08-15 11:35:52.177 INFO 30944 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics { 2306000 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 3879204 nanoseconds spent preparing 20 JDBC statements; 49929399 nanoseconds spent executing 20 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 69791300 nanoseconds spent performing 20 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 50888301 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections) }
You can improve this by referencing a @SequenceGenerator with an allocationSize greater than 1 (default: 50) in your mapping definition.
@Entity public class ChessPlayer { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "player_seq") @SequenceGenerator(name = "player_seq", sequenceName = "player_sequence") private Long id; ... }
Hibernate then expects that for each call, the database sequence gets incremented by 50. This enables Hibernate to increment the retrieved primary key value internally 50 times without generating a duplicate primary key value. As you can see in the log output, this significantly reduces the number of executed statements.
2021-08-15 11:38:03.283 DEBUG 8436 --- [ main] org.hibernate.SQL : select nextval ('player_sequence') 2021-08-15 11:38:03.331 DEBUG 8436 --- [ main] org.hibernate.SQL : insert into chess_player (birth_date, first_name, last_name, version, id) values (?, ?, ?, ?, ?) 2021-08-15 11:38:03.340 DEBUG 8436 --- [ main] org.hibernate.SQL : insert into chess_player (birth_date, first_name, last_name, version, id) values (?, ?, ?, ?, ?) ... (8 more insert statements) ... 2021-08-15 11:38:03.435 INFO 8436 --- [ main] i.StatisticalLoggingSessionEventListener : Session Metrics { 2185700 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 2775299 nanoseconds spent preparing 12 JDBC statements; 31471999 nanoseconds spent executing 12 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 56503199 nanoseconds spent performing 20 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 49912700 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections) }
4. Natural identifier
The objects in most business domains have a natural identifier. It’s often a unique combination of multiple attributes and can include the identifiers of other objects. You could, for example, identify a book by its ISBN number. Or you could identify a game in a chess tournament by the id of the tournament, the number of the round and the number of the board at which it was played.
As you can see in the second example, a natural id can be a rather complex combination of multiple attributes. Most developers, therefore, prefer an easier to handle sequence-based technical identifier, also called a surrogate key.
But even though you might not be using your natural id as the primary key, it’s still used in your business domain. A lot of use cases will use the natural id to reference an object, and you need queries that return the referenced records. In these cases, you can use Hibernate’s proprietary support for natural identifiers.
You can define the natural id of an entity class by annotating one or more attributes with @NaturalId.
@Entity public class ChessPlayer { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "player_seq") @SequenceGenerator(name = "player_seq", sequenceName = "player_sequence") private Long id; @NaturalId private String fideId; ... }
After you did that, you can use Hibernate’s proprietary API to query entity objects by their natural id. The best way to use that API with Spring Data JPA is to implement a custom repository method using a fragment interface.
This requires a fragment interface definition that defines the signature of the repository method.
public interface ChessPlayerRepoExtension { public ChessPlayer getByFideId(String fideId); }
In the next step, your repository interface needs to extend the fragment interface and one of Spring Data JPA’s standard repository interfaces.
public interface ChessPlayerRepository extends JpaRepository<ChessPlayer, Long>, ChessPlayerRepoExtension { PlayerNameIntf findByFirstName(String firstName); }
And you need to provide an implementation of your fragment interface. In that implementation, you can use all of JPA’s and Hibernate’s APIs.
To find an entity by its natural id, you first need to call the byNaturalId method on Hibernate’s Session interface. That creates a query that returns entity objects of type ChessPlayer. In the next step, you call the using method for each part of the natural id. In this case, the natural identifier consists only of 1 attribute.
public class ChessPlayerRepoExtensionImpl implements ChessPlayerRepoExtension { private EntityManager em; public ChessPlayerRepoExtensionImpl(EntityManager em) { this.em = em; } @Override public ChessPlayer getByFideId(String fideId) { Session session = em.unwrap(Session.class); return session.byNaturalId(ChessPlayer.class) .using("fideId", fideId) .load(); } }
5. Parameter padding
When you execute a query, your database needs to prepare an execution plan for it. The plan defines which index gets used, the order in which your JOIN clauses get applied, and many other things. The quality of the execution plan often makes the difference between a great and rather slow performance.
Your database needs to spend time and resources on the creation of such an execution plan. That’s why it tries to cache and reuse previously created plans. This usually works great for queries with a fixed number of bind parameters. But a query with an IN clause can reduce the efficiency of this cache because it has a fluctuating number of bind parameters. Each number of bind parameters requires a new execution plan.
Hibernate’s parameter padding feature addresses this issue. If you set the property spring.jpa.properties.hibernate.query.in_clause_parameter_padding in your application.properties file to true, Hibernate will pad the bind parameters of every IN clause to the next power of 2.
spring.jpa.properties.hibernate.query.in_clause_parameter_padding=true
By doing that, a lot more queries use the same number of bind parameters, and your database can use its execution plan more efficiently.
After activating this feature, you can see in your log file that Hibernate changed all queries that use bind parameters in an IN clause. A typical example of such a query is the findAllById method defined by Spring Data JPA’s JpaRepository.
List<ChessPlayer> players = playerRepo.findAllById(Arrays.asList(101L, 102L, 103L));
In this example, I call that method with 3 id values. Without Hibernate’s parameter padding feature, you would expect to see an SQL statement with 3 bind parameters. But Hibernate pads these to the next power of 2, which is 4.
2021-08-15 20:20:42.863 DEBUG 60280 --- [ main] org.hibernate.SQL : select chessplaye0_.id as id1_1_, chessplaye0_.birth_date as birth_da2_1_, chessplaye0_.fide_id as fide_id3_1_, chessplaye0_.first_name as first_na4_1_, chessplaye0_.last_name as last_nam5_1_, chessplaye0_.version as version6_1_ from chess_player chessplaye0_ where chessplaye0_.id in ( ? , ? , ? , ? )
Conclusion
Hibernate provides a few proprietary features that make your persistence provider and the executed SQL statements easier to understand. Other features, like the optimized handling of database sequences, improve the efficiency of your entity mappings.
Using Spring Data JPA, you can use all of Hibernate’s proprietary features and the standard features defined by the JPA specification. As you saw in this article, Hibernate’s proprietary features often require a few additional configuration parameters or the implementation of a fragment interface. But none of this should hold you back from using them. You can usually provide them in no time.