How to Choose the Most Efficient Data Type for To-Many Associations – Bag vs. List vs. Set


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


Which data type should you use to map a to-many association with Hibernate? Is it better to use a Set or a List?

That’s a very common question, and most developers are surprised when they look at the documentation and find out that these are not the only options. You can also use a Bag or a Map.

The mapping to a Map is a special case, and I already explained it in great details in one of my previous posts. In contrast to the other 3 options, it provides indexed based access to the associated entities. That might be beneficial for some use cases, but the creation and management of the Map also creates an overhead. It should, therefore, not be your default choice.

So, let’s focus on the other 3 options. List, Set, and Bag represents simple collections of entities. The first 2 options should sound familiar because the java.util package contains a List and a Set interface.

But what is a Bag? There is no class in the java.util package with that name.

The difference between a Bag and a List

Hibernate’s naming of the different collection types is a little bit confusing because Lists and Bags are both mapped by a java.util.List. The difference between them is that a List is ordered and a Bag is unordered.

So, if you map your to-many association to a java.util.List without specifying the order of your association elements, you’re using a Bag and not a List. That should be the case for most of your associations because retrieving the association in a specific order slows down your database queries. You should better use a JPQL query with an ORDER BY clause to define the ordering if you need it.

So, for most association mappings, there are 2 options remaining. You need to decide between a Bag and a Set.

Should you use a Bag or a Set?

When you just look at the Java types, the answer seems to be easy. In general, a java.util.List provides the better performance while a java.util.Set doesn’t contain any duplicates. As long as you implement the create use case correctly, a java.util.List seems like the obvious best choice for your association mapping.

But it’s not that easy. A List might be more efficient than a Set, but the type also influences how Hibernate manages the association in the database. So, there are a few other things you need to take into account when you make your decision.

A critical bug in older Hibernate versions

First of all, if you’re using a Hibernate version older than 5.0.8, you should be aware of bug HHH-5855. When you used a java.util.List and merged the parent entity, Hibernate generated 2 INSERT statements for each new child entity.

Inefficient handling of many-to-many associations

When you’re mapping a many-to-many association, you should always use a java.util.Set.

Don’t use a List for many-to-many associations

If you model the association as a java.util.List, Hibernate handles the removal of associated entities very inefficiently.

@Entity
public class Book {

	// DON'T DO THIS!!!
	@ManyToMany
	@JoinTable(name = "book_author", 
			joinColumns = { @JoinColumn(name = "fk_book") }, 
			inverseJoinColumns = { @JoinColumn(name = "fk_author") })
	private List authors = new ArrayList();
	
	...
}

In the following code snippet, I load a Book which was written by 2 Authors and remove one of the Authors from the association.

em = emf.createEntityManager();
em.getTransaction().begin();

// Get Book entity with 2 Authors
b = em.find(Book.class, 1L);

// Remove one of the Author
b.getAuthors().remove(a);

em.getTransaction().commit();
em.close();

As you can see in the log messages, Hibernate removes all records from the association table before it inserts a new record for the remaining association.

This approach is obviously very inefficient. Depending on the number of associated entities, the additional INSERT statements can create performance problems.

...
09:54:28,876 DEBUG [org.hibernate.SQL] - update Book set title=?, version=? where id=? and version=?
09:54:28,878 DEBUG [org.hibernate.SQL] - delete from book_author where fk_book=?
09:54:28,882 DEBUG [org.hibernate.SQL] - insert into book_author (fk_book, fk_author) values (?, ?)

Use a Set to map many-to-many associations

Hibernate handles the association a lot better if you model it as a java.util.Set.

@Entity
public class Book {

	@ManyToMany
	@JoinTable(name = "book_author", 
			joinColumns = { @JoinColumn(name = "fk_book") }, 
			inverseJoinColumns = { @JoinColumn(name = "fk_author") })
	private Set authors = new HashSet();
	
	...
}

If you run the same test case again, Hibernate now only removes the record that represents the removed association. As expected, all other database records are not affected by the remove operation.

...
10:00:37,709 DEBUG [org.hibernate.SQL] - update Book set title=?, version=? where id=? and version=?
10:00:37,711 DEBUG [org.hibernate.SQL] - delete from book_author where fk_book=? and fk_author=?

Summary

As you’ve seen, mapping an association as a java.util.List can create problems which by far outweigh the small performance gain you get compared to a java.util.Set. So, better make sure to update your Hibernate version and to use a Set to model many-to-many associations.

5 Comments

  1. Avatar photo Jimesh shah says:

    I tried to reproduce the same issue using Spring Data JPA with LIST. But, I could not reproduce. Is it a version issue?

    1. Avatar photo Thorben Janssen says:

      Which JPA implementation do you use with Spring Data JPA?
      EclipseLink handles a few things differently.

      Regards,
      Thorben

  2. Avatar photo Vipul Suman says:

    Great! I should try this.

  3. Thanks Thorben. This post is very helpful

    1. Avatar photo Thorben Janssen says:

      Thanks Thomas

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.