How to use native queries to perform bulk updates
Take your skills to the next level!
The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.
If you just want to update 1 or 2 entities, you can simply fetch them from the database and perform the update operation on it. But what about updating hundreds of entities?
You can, of course, use the standard approach and load and update each of these entities. But that is often too slow because Hibernate performs one or more queries to load the entity and an additional one to update each of them. This quickly results in a few hundred SQL statements which are obviously slower than just 1 statement which lets the database do the work.
As I explain in great detail in the Hibernate Performance Tuning Online Training, the number of performed SQL statements is crucial for the performance of your application. So you better have an eye on your statistics and keep the number of statements as low as possible. You can do that with JPQL or native SQL queries which define the update in one statement.
Using a native UPDATE statement is quite easy as I will show you in the next paragraph. But it also creates issues with the always activated 1st level cache and doesn’t trigger any entity lifecycle events. I’ll show you how to handle these issues at the end of the post.
Native UPDATE statements
You just have to call the createNativeQuery method on the EntityManager and provide a native SQL statement to it.
em.createNativeQuery("UPDATE person p SET firstname = firstname || '-changed'").executeUpdate();
In this example, I update the firstName of all 200 persons in my test database with one query. This takes about 30ms on my local test setup.
The typical JPA approach would require 200 SELECT statements to fetch each Person entity from the database and additional 200 UPDATE statements to update each of them. The execution of these 400 statements and all the Hibernate-internal processing takes about 370ms on my local test setup.
I just used System.currentTimeMillis() to measure the execution time on my laptop which is also running a lot of other applications. The setup is far from optimal and not suited for a real performance test. So don’t rely on the measured milliseconds. But it becomes pretty obvious which approach is the faster one, and that’s what it’s all about.
Problem 1: Outdated 1st level cache
Hibernate puts all entities you use within a session into the first level cache. This is pretty useful for write-behind optimizations and to avoid duplicate selects of the same entity. But it also creates an issue, if you use a native query to update a bunch of entities.
Hibernate doesn’t know which records the native query updates and can’t update or remove the corresponding entities from the first level cache. That means that Hibernate uses an outdated version of the entity if you fetched it from the database before you executed the native SQL UPDATE statement. You can see an example of it in the following code snippet. Both log statements print out the old firstName.
PersonEntity p = em.find(PersonEntity.class, 1L); em.createNativeQuery("UPDATE person p SET firstname = firstname || '-changed'").executeUpdate(); log.info("FirstName: "+p.getFirstName()); p = em.find(PersonEntity.class, 1L); log.info("FirstName: "+p.getFirstName());
There are 2 options to avoid this issue:
The most obvious one is to not fetch any entity from the database which will be affected by the UPDATE statement. But we both know that this is not that easy in a complex, modular application.
If you can’t avoid fetching some of the affected entities, you need to update the 1st level cache yourself. The only way to do that is to detach them from the activate persistence context and let Hibernate fetch it again as soon as you need them. But be careful, Hibernate doesn’t perform any dirty check before detaching the entity. So you also have to make sure that all updates are written to the database before you detach the entity.
You can see an example of it in the following code snippet.
PersonEntity p = em.find(PersonEntity.class, 1L); log.info("Detach PersonEntity"); em.flush(); em.detach(p); em.createNativeQuery("UPDATE person p SET firstname = firstname || '-changed'").executeUpdate(); p = em.find(PersonEntity.class, 1L);
As you can see, I call the flush() and detach() method on the EntityManager before I perform the native query. The call of the flush() method tells Hibernate to write the changed entities from the 1st level cache to the database. This makes sure that you don’t lose any update. You can then detach the entity from the current persistence context and due to this remove it from the 1st level cache.
Problem 2: Not part of the entity life cycle
In most applications, this is not a huge problem. But I want to mention it anyways.
The native UPDATE statement is executed in the database and doesn’t use any entities. This provides performance benefits, but it also avoids the execution of any entity lifecycle methods or entity listeners.
If you use a framework like Hibernate Envers or implement any code yourself that relies on lifecycle events, you have to either avoid native UPDATE statements or implement the operations of your listeners within this specific use case.
Summary
With the standard JPA approach, you fetch an entity from the database and call some setter methods to update it. This feels very natural to Java developers, but the number of required SQL statements can create performance issues if you work on a huge set of entities. It’s often a lot faster to update all entities with one native or JPQL UPDATE statement.
But you then have to take care of your 1st level cache. Hibernate doesn’t know which records were updated in the database and didn’t refresh the corresponding entities. You either have to make sure that you haven’t fetched any entities which are affected by the update, or you have to detach them from the Hibernate session before you execute the update.
You also have to check if you use any entity lifecycle methods or entity listeners. The native UPDATE statement doesn’t use any entities and therefore doesn’t trigger any lifecycle event. If you rely on lifecycle events, you either have to avoid native UPDATE statements, or you have to handle the missing lifecycle events within your use case.
I was wondering if these issues still happen when using JPQL instead of native SQL?