Your 2 best options to fix Hibernate’s MultipleBagFetchException


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


You probably learned that you should use FetchType.LAZY for all of your associations. It ensures that Hibernate initializes an association when you use it and doesn’t spend any time getting data you don’t need.

Unfortunately, this introduces a new issue. You now need to use a JOIN FETCH clause or an EntityGraph to fetch the association if you need it. Otherwise, you will experience the n+1 select issue, which causes severe performance issues or a LazyInitializationException. If you do that for multiple associations, Hibernate might throw a MultipleBagFetchException.

In this article, I will explain when Hibernate throws this exception and show you your 2 best options to fix it. One of them is a great fit for associations with a small cardinality and the other one for associations that contain lots of elements. So, let’s take a look at both of them, and you pick the one that fits your application.

Cause of the MultipleBagFetchException

As I explained in a previous article about the most efficient data type for a to-many association, Hibernate’s internal naming of the collection types is pretty confusing. Hibernate calls it a Bag, if the elements in your java.util.List are unordered. If they are ordered, it’s called a List.

So, depending on your mapping, a java.util.List can be treated as a Bag or a List. But don’t worry, in real life, this isn’t as confusing as it might seem. Defining the order of an association requires an additional annotation and is almost always an overhead. That’s why you should avoid it and why at least 90% of the association mappings that use a java.util.List and that I’ve seen in real projects are unordered. So, Hibernate treats them as a Bag.

Here is a simple domain model in which Hibernate treats the Reviews and the Authors of a Book as Bags.

@Entity
public class Book {

    @ManyToMany
    private List authors = new ArrayList();
     
    @OneToMany(mappedBy = "book")
    private List reviews = new ArrayList();
     
    ...     
}

If you try to fetch multiple of these bags in a JPQL query, you create a cartesian product.

TypedQuery<Book> q = em.createQuery("SELECT DISTINCT b "
		+ "FROM Book b "
			+ "JOIN FETCH b.authors a "
			+ "JOIN FETCH b.reviews r "
		+ "WHERE b.id = 1",
		Book.class);
q.setHint(QueryHints.PASS_DISTINCT_THROUGH, false);
List<Book> b = q.getResultList();

This can create performance problems. Hibernate also struggles to differentiate between information that is supposed to be duplicated and information that was duplicated because of the cartesian product. Because of that, Hibernate throws a MultipleBagFetchException.

java.lang.IllegalArgumentException: org.hibernate.loader.MultipleBagFetchException: cannot simultaneously fetch multiple bags: [org.thoughts.on.java.model.Book.authors, org.thoughts.on.java.model.Book.reviews]

Fixing the MultipleBagFetchException

You can find lots of questions about this exception and various solutions to avoid it. But a lot of them come with unexpected side effects. The only 2 fixes between which you should choose are the ones that I will describe in the following sections. Which one of them is the best for you depends on the size of the cartesian product that your queries might create:

  1. If all of your associations only contain a small number of elements, the created cartesian product will be relatively small. In these situations, you can change the types of the attributes that map your associations to a java.util.Set. Hibernate can then fetch multiple associations in 1 query.
  2. If at least one of your associations contains a lot of elements, your cartesian product will become too big to fetch it efficiently in 1 query. You should then use multiple queries that get different parts of the required result.

As always, optimizing the performance of your application requires you to choose between different trade-offs, and there is no one-size-fits-all approach. The performance of each option depends on the size of the cartesian product and the number of queries you’re executing. For a relatively small cartesian product, getting all information with 1 query provides you with the best performance. If the cartesian product reaches a certain size, you should better split it into multiple queries.

That’s why I will show you both options so that you can pick the one that fits your application.

Option 1: Use a Set instead of a List

The easiest approach to fix the MultipleBagFetchException is to change the type of the attributes that map your to-many associations to a java.util.Set. This is just a small change in your mapping, and you don’t need to change your business code.

@Entity
public class Book {
     
    @ManyToMany
    private Set authors = new HashSet();
     
    @OneToMany(mappedBy = "book")
    private Set reviews = new HashSet();
     
    ...
     
}

As explained earlier, if you now perform the same query as I showed you before to get the Book with all its Authors and Reviews, your result set will contain a cartesian product. The size of that product depends on the number of Books you select and the number of associated Authors and Reviews.

TypedQuery<Book> q = em.createQuery("SELECT DISTINCT b "
		+ "FROM Book b "
			+ "JOIN FETCH b.authors a "
			+ "JOIN FETCH b.reviews r "
		+ "WHERE b.id = 1",
		Book.class);
q.setHint(QueryHints.PASS_DISTINCT_THROUGH, false);
List<Book> b = q.getResultList();

Here you can see the generated SQL query. To get all the requested associations, Hibernate has to select all columns mapped by these entities. In combination with the cartesian product created by the 3 INNER JOINs, this can become a performance problem.

19:46:20,785 DEBUG [org.hibernate.SQL] -
    select
        book0_.id as id1_1_0_,
        author2_.id as id1_0_1_,
        reviews3_.id as id1_4_2_,
        book0_.publisherid as publishe5_1_0_,
        book0_.publishingDate as publishi2_1_0_,
        book0_.title as title3_1_0_,
        book0_.version as version4_1_0_,
        author2_.firstName as firstNam2_0_1_,
        author2_.lastName as lastName3_0_1_,
        author2_.version as version4_0_1_,
        authors1_.bookId as bookId1_2_0__,
        authors1_.authorId as authorId2_2_0__,
        reviews3_.bookid as bookid3_4_2_,
        reviews3_.comment as comment2_4_2_,
        reviews3_.bookid as bookid3_4_1__,
        reviews3_.id as id1_4_1__
    from
        Book book0_
    inner join
        BookAuthor authors1_
            on book0_.id=authors1_.bookId
    inner join
        Author author2_
            on authors1_.authorId=author2_.id
    inner join
        Review reviews3_
            on book0_.id=reviews3_.bookid
    where
        book0_.id=1

Whenever you write such a query, you also need to keep in mind that Hibernate doesn’t hide that the result set contains a product. This query returns each Book multiple times. The number of references to the same Book object is calculated by the number of Authors multiplied by the number of Reviews. You can avoid that by adding the DISTINCT keyword to your select clause and by setting the query hint hibernate.query.passDistinctThrough to false.

Performance considerations

In this example, my query only selects 1 Book, and most Books have been written by 1-3 Authors. So, even if the database contains several Reviews for this Book, the cartesian product will still be relatively small.

Based on these assumptions, it might be faster to accept the inefficiency of the cartesian product to reduce the number of queries. This might change if your cartesian product becomes bigger because you select a huge number of Books or if your average Book has been written by a few dozen Authors.

Option 2: Split it into multiple queries

Fetching huge cartesian products in 1 query is inefficient. It requires a lot of resources in your database and puts unnecessary load on your network. Hibernate and your JDBC driver also need to spend more resources to handle the query result.

You can avoid that by performing multiple queries that fetch different parts of the required graph of entities. In the example of this post, I would fetch the Books with all their Authors in 1 query and the Books with all their Reviews in a 2nd query. If your graph of required entities is more complex, you might need to use more queries or fetch more associations with each of them.

TypedQuery<Book> q = em.createQuery("SELECT DISTINCT b "
		+ "FROM Book b JOIN FETCH b.authors a "
		+ "WHERE b.id = 1",
		Book.class);
q.setHint(QueryHints.PASS_DISTINCT_THROUGH, false);
List<Book> books = q.getResultList();

log.info(books.get(0));

q = em.createQuery("SELECT DISTINCT b "
		+ "FROM Book b "
			+ "JOIN FETCH b.reviews r "
		+ "WHERE b.id = 1",
		Book.class);
q.setHint(QueryHints.PASS_DISTINCT_THROUGH, false);
books = q.getResultList();

log.info(books.get(0));
log.info("Authors: "+books.get(0).getAuthors().size());
log.info("Reviews: "+books.get(0).getReviews().size());

As I explained in last week’s post, Hibernate ensures that within each Session, there is only 1 entity object that represents a specific record in the database. You can use that to resolve foreign key references efficiently or to let Hibernate merge the results of multiple queries.

If you take a look at the following log output, you can see that the Lists returned by both queries contain exactly the same object. In both cases, the Book objects have the reference @1f.

When Hibernate processed the result of the 2nd query, it checked for each record if the 1st level cache already contained an object for that Book entity. It then reused that object and added the returned Review to the mapped association.

19:52:10,600 DEBUG [org.hibernate.SQL] - 
    select
        book0_.id as id1_1_0_,
        author2_.id as id1_0_1_,
        book0_.publisherid as publishe5_1_0_,
        book0_.publishingDate as publishi2_1_0_,
        book0_.title as title3_1_0_,
        book0_.version as version4_1_0_,
        author2_.firstName as firstNam2_0_1_,
        author2_.lastName as lastName3_0_1_,
        author2_.version as version4_0_1_,
        authors1_.bookId as bookId1_2_0__,
        authors1_.authorId as authorId2_2_0__ 
    from
        Book book0_ 
    inner join
        BookAuthor authors1_ 
            on book0_.id=authors1_.bookId 
    inner join
        Author author2_ 
            on authors1_.authorId=author2_.id 
    where
        book0_.id=1
19:52:10,633 INFO  [org.thoughts.on.java.model.TestMultipleJoinFetch] - org.thoughts.on.java.model.Book@1f
19:52:10,645 DEBUG [org.hibernate.SQL] - 
    select
        book0_.id as id1_1_0_,
        reviews1_.id as id1_4_1_,
        book0_.publisherid as publishe5_1_0_,
        book0_.publishingDate as publishi2_1_0_,
        book0_.title as title3_1_0_,
        book0_.version as version4_1_0_,
        reviews1_.bookid as bookid3_4_1_,
        reviews1_.comment as comment2_4_1_,
        reviews1_.bookid as bookid3_4_0__,
        reviews1_.id as id1_4_0__ 
    from
        Book book0_ 
    inner join
        Review reviews1_ 
            on book0_.id=reviews1_.bookid 
    where
        book0_.id=1
19:52:10,648 INFO  [org.thoughts.on.java.model.TestMultipleJoinFetch] - org.thoughts.on.java.model.Book@1f
19:52:10,648 INFO  [org.thoughts.on.java.model.TestMultipleJoinFetch] - Authors: 2
19:52:10,648 INFO  [org.thoughts.on.java.model.TestMultipleJoinFetch] - Reviews: 2

Performance considerations

If you use multiple queries to get the required graph of entities, you avoid the creation of a huge cartesian product. This reduces the load on all involved systems and makes it easier to ensure a good performance for all queries.

But that not necessarily means that this approach is faster than option 1. You now perform more queries than before. Each of them requires a database roundtrip and creates some management overhead in the database, e.g., to create an execution plan. Due to that, this option is only faster than option 1, if the size of the cartesian product creates a bigger overhead than the execution of multiple queries.

Conclusion

As you have seen in this article, you can solve Hibernate’s MultipleBagFetchException in 2 ways:

  • You can change the data type of the attribute that maps the associations and retrieve all information in 1 query. The result of that query is a cartesian product. As long as this product doesn’t get too big, this approach is simple and efficient.
  • You can use multiple queries to fetch the required graph of entities. This avoids a huge cartesian product and is the better approach if you need to fetch a huge amount of data.

5 Comments

  1. So far so good. We face this problem even we do not have custom Queries. Wa are acting with an entity having three lists inside.
    Is there a way to get out of this without writing custom queries and not changing the lists to sets?

    1. Avatar photo Thorben Janssen says:

      Not really.
      Setting the FetchType of all 3 associations to LAZY and not using a query with a JOIN FETCH clause or EntityGraph should avoid the exception. But that will create a 3n+1 select issue. If you keep n (= number of entities you fetch and for which you need to initialize associations) small, your use case might be inefficient but still provide an acceptable performance.
      But I recommend applying the changes described in the article. All other options only replace your current problem with a new one.

  2. Avatar photo Markus hald says:

    Hi Thorben,

    For the 2nd approach how do you handle if the book has no reviews? I this case wouldnt you set your list of books to be empty?

    Best Regards,
    Markus Hald

    1. Avatar photo Thorben Janssen says:

      Hi Ján,

      I’m sorry, but I can’t find your comment. If you post it again, I will make sure to answer it.

      Regards,
      Thorben

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.