Best Practices for Many-to-Many Associations with Hibernate and JPA

Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.

Join the Persistence Hub!

Many-to-Many associations are one of the most commonly used associations with JPA and Hibernate. You can find lots of examples for them in the real world, and you can map them with JPA and Hibernate as a uni- or bidirectional association in your domain model.

But you probably also know that these mappings provide several pitfalls. In this article, I will show you 5 best practices that will help you to avoid these pitfalls and to implement efficient mappings. You will learn:

The most efficient data type for your association
Why you need utility methods to manage your association
The right FetchType for an efficient mapping
When and how to use query-specific fetching
The CascadeType you should avoid at all costs

I will not dive into the details of a basic many-to-many mapping. If you are not exactly sure how to create such mapping, please take a look at the many-to-many section in my association mapping guide.

The most efficient data type for your association

Most developers don’t spend a lot of thoughts on the data type of a to-many association. They just choose a java.util.List because it’s simple and doesn’t perform any checks to avoid duplicates.

That’s OK, if you implement a basic Java class or if you model a One-to-Many/Many-to-One association. But you should never use a List if you model a Many-to-Many association.

@Entity
public class Book {

	// DON'T DO THIS!!!
	@ManyToMany
	@JoinTable(name = "book_author", 
			joinColumns = { @JoinColumn(name = "fk_book") }, 
			inverseJoinColumns = { @JoinColumn(name = "fk_author") })
	private List<Author> authors = new ArrayList<Author>();
	
	...
}

Hibernate handles remove operations on Many-to-Many relationships that are mapped to a java.util.List very inefficiently.

em = emf.createEntityManager();
em.getTransaction().begin();

// Get Book entity with 2 Authors
b = em.find(Book.class, 1L);

// Remove one of the Author
b.getAuthors().remove(a);

em.getTransaction().commit();
em.close();

It first removes all records from the association table before it inserts all remaining ones.

09:54:28,876 DEBUG [org.hibernate.SQL] - update Book set title=?, version=? where id=? and version=?
09:54:28,878 DEBUG [org.hibernate.SQL] - delete from book_author where fk_book=?
09:54:28,882 DEBUG [org.hibernate.SQL] - insert into book_author (fk_book, fk_author) values (?, ?)

You should instead model a many-to-many association as a java.util.Set.

@Entity
public class Book {

	@ManyToMany
	@JoinTable(name = "book_author", 
			joinColumns = { @JoinColumn(name = "fk_book") }, 
			inverseJoinColumns = { @JoinColumn(name = "fk_author") })
	private Set<Author> authors = new HashSet<Author>();
	
	...
}

Hibernate then handles remove operations on the association much better. It now only removes the expected records from the association and keeps the others untouched.

10:00:37,709 DEBUG [org.hibernate.SQL] - update Book set title=?, version=? where id=? and version=?
10:00:37,711 DEBUG [org.hibernate.SQL] - delete from book_author where fk_book=? and fk_author=?

Why you need utility methods to manage your association

Bidirectional associations are mapped to an entity attribute on both ends of the relationships. So, in the previous example, you have an authors attribute on the Book entity, and a books attribute on the Author entity. That makes implementing a JPQL or CriteriaQuery very comfortable because you can use these attributes to define a JOIN clause.

But adding or removing an association gets more complicated. You always need to perform the change on both ends of the association. For example, if you want to add a Book to Author, you need to add it to the books attribute of the Author entity, and you also need to add the Author the authors attribute on the Book entity. Otherwise, your current persistence context contains inconsistent data which you will use until the end of your current transaction.

Book b = new Book();
b.setTitle("Hibernate Tips - More than 70 solutions to common Hibernate problems");
em.persist(b);

Author a = em.find(Author.class, 1L);
a.getBooks().add(b);
b.getAuthors().add(a);

Utility methods on your Author and Book entities make updating and removing much easier. Within these methods, you perform the required operations on both entities.

@Entity
public class Author {

	@ManyToMany(mappedBy = "authors")
	private Set<Book> books = new HashSet<Book>();

	...
	
	public void addBook(Book book) {
		this.books.add(book);
		book.getAuthors().add(this);
	}

	public void removeBook(Book book) {
		this.books.remove(book);
		book.getAuthors().remove(this);
	}
}

The right FetchType for an efficient mapping

This is a quick one. You should always use FetchType.LAZY for your many-to-many associations. It tells your persistence provider not to fetch the associated entities from the database until you use them. That’s usually the case when you call its getter method for the first time.

Luckily, that’s the default for all to-many associations. So, please make sure that you don’t change it.

And if you want to learn more about JPA’s different FetchTypes, please take a look at my Introduction to JPA FetchTypes.

When and how to use query-specific fetching

If you’re using FetchType.LAZY, you need to know about query-specific fetching. Otherwise, your application will be very slow because you created lots of n+1 select issues.

When you load an entity and use query-specific fetching, you tell Hibernate which mapped associations it shall initialize for each fetched entity. It then extends the SELECT clause of your query so that it includes the columns mapped by these other entities and initializes the associations. And because the associations are already initialized, Hibernate doesn’t need to perform an additional query when you access its getter method for the first time.

You can implement query-specific fetching in several different ways. The simplest one is a JOIN FETCH clause, which I will show you here. But you can also use a @NamedEntityGraph or an EntityGraph, which I explained in previous articles.

The definition of a JOIN FETCH clause is almost identical to a simple JOIN clause in a JPQL query. You just need to add the FETCH keyword.

Author a = em.createQuery("SELECT a FROM Author a JOIN FETCH a.books WHERE a.id = 1", Author.class).getSingleResult();

Even so, a JOIN And a JOIN FETCH clause look very similar, the JOIN FETCH clause has a much bigger effect on the generated SQL query. It not only gets translated into a SQL JOIN, as it’s the case for a JPQL JOIN clause, it also forces your persistence provider to extend the SELECT clause by all columns that are mapped by the associated entity.

16:21:03,046 DEBUG SQL:94 - 
    select
        author0_.id as id1_0_0_,
        book2_.id as id1_1_1_,
        author0_.firstName as firstNam2_0_0_,
        author0_.lastName as lastName3_0_0_,
        author0_.version as version4_0_0_,
        book2_.format as format2_1_1_,
        book2_.publishingDate as publishi3_1_1_,
        book2_.title as title4_1_1_,
        book2_.version as version5_1_1_,
        books1_.author_id as author_i2_2_0__,
        books1_.book_id as book_id1_2_0__ 
    from
        Author author0_ 
    inner join
        book_author books1_ 
            on author0_.id=books1_.author_id 
    inner join
        Book book2_ 
            on books1_.book_id=book2_.id 
    where
        author0_.id=1

The CascadeType you should avoid at all costs

If you activate cascading on an association, your persistence provider applies the operations you perform on the entity to all associated entities. If it does that for all operations or just for a few selected ones depends on the configured CascadeType.

That might sound like an amazing idea that makes the implementation of your business logic much easier. And that’s not entirely wrong.

But please avoid the CascadeTypes REMOVE and ALL, which includes REMOVE, for many-to-many associations. In the best case, it only creates performance issues, but in the worst case, it might also remove more records than you intended.

I explained both pitfalls and their solution in great details in a previous article. Or if you want to keep it simple, trigger the required information programmatically on the associated entities. This might require a few more lines of code, but it avoids any unexpected side effects.

Conclusion

You can find lots of examples for many-to-many associations in the real world, and you can easily map them with JPA and Hibernate. Unfortunately, these simple mappings hide a few pitfalls which you can avoid by following these 5 best practices:

Model associations as a java.util.Set.
Provide utility methods to add or remove an entity from an association.
Always use FetchType.LAZY, which is the default, to avoid performance problems.
Apply query-specific fetching to avoid n+1 select issues.
Don’t use the CascadeTypes REMOVE and ALL.

7 Comments

ghasem says:

at

Thank you for this nice article.
Can you please explain a little more about query-specific fetching?
Suppose I'm using spring data JPA and hibernate as persistence provider.
Do I have to use @Query for select entities to provide special query, or can I use findBy… methods provided by the spring data JPA?
Keerthi says:

at

Given this mapping, what is the best way to remove an Author and the Books associated with the Author? Would you mind explaining?
1. Thorben Janssen says:
  
  at
  
  Hi Keerthi,
  
  The best way to remove an Author with all associated Books is to implement the operation yourself. You need to get all Books of the Author, remove them, and then remove the Author.
  Please don’t use CascadeType.REMOVE because it would also remove all Books that this Author wrote together with other Authors. I explained that in great detail in https://thorben-janssen.com/avoid-cascadetype-delete-many-assocations/
  
  Regards,
  Thorben
frantisek says:

at

Hi,

in point 1 is used set:
@ManyToMany
@JoinTable(name = “book_author”,
joinColumns = { @JoinColumn(name = “fk_book”) },
inverseJoinColumns = { @JoinColumn(name = “fk_author”) })
private Set authors = new HashSet();

but in point 2 is used list:
@ManyToMany(mappedBy = “authors”)
private List books = new ArrayList();

Should not be also set here?
Thank you.
1. Thorben Janssen says:
  
  at
  
  Fixed it. Thanks!
Yannick Majoros says:

at

Basically, you should best avoid *ToMany relationships. Unscalable at best, a pain to maintain. Using ad-hoc queries doesn’t come with these problems.
1. Thorben Janssen says:
  
  at
  
  I don’t agree with that. If you follow a few rules, to-many associations work great. I explained the ones for many-to-many associations in this article.

Comments are closed.

Best Practices for Many-to-Many Associations with Hibernate and JPA

The most efficient data type for your association

Why you need utility methods to manage your association

The right FetchType for an efficient mapping

When and how to use query-specific fetching

The CascadeType you should avoid at all costs

Conclusion

7 Comments

Join the Newsletter

Level up your skills!

Cheat Sheet:
10 Hibernate Performance Tuning Mistakes

The most efficient data type for your association

Why you need utility methods to manage your association

The right FetchType for an efficient mapping

When and how to use query-specific fetching

The CascadeType you should avoid at all costs

Conclusion

7 Comments

Cheat Sheet:10 Hibernate Performance Tuning Mistakes

Cheat Sheet:
10 Hibernate Performance Tuning Mistakes