JPA 2.2 introduced several new features, and one of them is the new getResultStream() method. This method is now part of the Query interface. As you might guess from its name, it allows you to retrieve the result of your query as a Stream. The goal of this method is to provide an efficient way to move through a result set. In the best case, it allows you to scroll through the result set instead of fetching all records at once.
You can see a simple example using the new method here. You just call the getResultStream() method instead of the getResultList() method. The rest of the API didn’t change. So, you can create the query, set bind parameter values and limit the number of selected records in the same way as you did in JPA 2.1.
Stream<Author> authors = em.createQuery("SELECT a FROM Author a", Author.class).getResultStream();
One method with different implementations
The method and its intention are described by the JPA specification. But the implementation depends on the persistence provider, and you should check the documentation and code before you use it.
The default implementation, provided by the Query interface, just calls the getResultList() method to retrieve the result set as a List and calls the stream() method to transform it into a Stream. This approach doesn’t provide any benefits compared to the getResultList() method available in JPA 2.1.
But it’s to be expected that most JPA implementations provide their own implementation of the getResultStream() method. Hibernate, for example, introduced its stream() method in version 5.2 and I would be surprised if they don’t reuse it for JPA’s getResultStream() method. It uses Hibernate’s ScrollableResult implementation to move through the result set and to fetch the records in batches. That prevents you from loading all records of the result set at once and allows you to process them more efficiently.
Don’t do this after retrieving the result set
Since the introduction of Streams in Java 8, I saw a lot of examples on blogs and in forum posts. They are often used to process huge amounts of data. The Stream API was designed for that, and it’s a great way to process data that you read from a file or got it via an API call.
But be careful, if you got your data from a database. You can, of course, use the Stream API to process selected entities or DTOs. But there are several things that the database can do a lot better than your Java code. So, make sure to distribute the processing steps intelligently so that you use the database and your business tier as efficiently as possible. Here are a few things you should better do within your query and not via the Java Stream API.
Filter the elements in the Stream
That was one of the first examples that I saw when Java 8 introduced the Stream API. The filter method allows you to select the elements in the Stream that fulfill certain criteria. Its implementation is easy and for some Java developers, it’s more comfortable to implement the filter criteria in the Java code instead of the JPQL query. But that’s something you shouldn’t do.
Databases are optimized for this use case and can do that a lot faster. So please, don’t implement any additional filter when you process your Stream. You should better use JPQL or the Criteria API to specify the filter criteria in your query. And if that’s not possible, you can still use a native SQL query.
I explained the capabilities of JPQL in great detail in a previous post. So, here is just a small and simple example on what you can do in a WHERE clause. You can use several compare operators, access the attributes of all entities referenced in the FROM clause or implicitly joined via the path operator and you can even call database functions to trigger more complex operations.
SELECT a FROM Author a WHERE a.firstName like ‘%and%’ and a.id >= 20 and size(author.books) >= 5
Limit the number of elements in the Stream
The Stream API provides several methods to cancel the processing or to retrieve certain elements from the Stream. And you can do the same within your queries. So, which is the better approach?
You should only select the database records that you want to process in your application. So, if you already know that you just need a certain number of records, you should always limit the size of the result set within the query. Typical examples are queries that select the most recent record that fulfills certain criteria or that retrieve a list of records that will be presented in a paginated list in the UI.
The reason why you should do that within your query is simple. The number of records you select can influence the execution plan of your query, e.g., indexes might be used or not used based on the number of selected records. That increases the execution time of your query. Building a bigger result set in the database also requires more resources and due to this, slows down your database server.
So, better use the setFirstResult and the setMaxResult methods on the Query and TypedQuery interface.
Stream<Author> authors = em.createQuery("SELECT a FROM Author a ORDER BY a.id ASC", Author.class) .setMaxResults(5) .setFirstResult(10) .getResultStream();
Your persistence provider translates this into a LIMIT and OFFSET clause in the SQL SELECT statement. That tells the database only to return the required number of records. So, it can use the available indexes, apply internal optimization and utilize its resources efficiently.
16:58:00,274 DEBUG [org.hibernate.SQL] - select author0_.id as id1_0_, author0_.firstName as firstNam2_0_, author0_.lastName as lastName3_0_, author0_.version as version4_0_ from Author author0_ order by author0_.id ASC limit ? offset ?
Sort the elements in the Stream
You could use the sorted method provided by the Stream interface to sort the elements of your Stream. But that’s another operation which the database can do much faster than your Java code. You just need to add an ORDER BY clause to your query and the database returns the result set in your preferred order.
In JPQL, you can do that with a similar syntax as you probably know from SQL.
Stream<Author> authors = em.createQuery("SELECT a FROM Author a ORDER BY a.id ASC", Author.class).getResultStream();
And in a CriteriaQuery, you need to provide one or more Order predicates to the orderBy method.
CriteriaBuilder cb = em.getCriteriaBuilder(); CriteriaQuery cq = cb.createQuery(Book.class); Root book = cq.from(Book.class); cq.select(book); cq.orderBy(cb.asc(book.get("title"))); Stream books = em.createQuery(cq).getResultStream();
For both cases, the persistence provider adds an ORDER BY clause to the SQL query.
17:49:02,103 DEBUG [org.hibernate.SQL] - select author0_.id as id1_0_, author0_.firstName as firstNam2_0_, author0_.lastName as lastName3_0_, author0_.version as version4_0_ from Author author0_ order by author0_.id ASC
The query now returns the selected database records in the defined order and you can use the Stream to iterate through the result set.
As you’ve seen, the new getResultStream method allows you to retrieve the result set as a Stream. In the best case, your persistence provider implemented that method in a way that allows you to move through the result set and doesn’t require to fetch all records at once.
The Stream API provides a great way to process the result set. But please keep in mind that databases are highly optimized to work with huge datasets and can do that a lot faster than your Java code. So, it’s better to perform all filter, limit and sorting operations in the database and just use the Stream to iterate through the result set.