|

Modeling self-referencing associations with Hibernate


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


When you model a hierarchical data structure, you often have to use self-referencing associations. Both ends of these associations are of the same type. In other words, the entity object on which the association is defined and the one the association references are of the same type. In your database, you model that using a foreign key column that references the same table’s primary key.

Example Mappings

A typical example is a Person who has other Persons as their parents. You can model it as a uni- or bidirectional many-to-many association on the Person entity.

In the following code snippet, the parents and the children attribute model both ends of a bidirectional many-to-many mapping.

@Entity
public class Person {
    
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String firstName;

    private String lastName;

    @ManyToMany
    private Set<Person> parents = new HashSet<>();

    @ManyToMany(mappedBy = "parents")
    private Set<Person> children = new HashSet<>();

    ...
}

Another example is a Category that is a subcategory of another Category. You can model that as a uni- or bidirectional many-to-one/one-to-many association.

The parentCategory and subCategories attributes of the following Category entity map it as a bidirectional association.

@Entity
public class Category {
    
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String name;

    @ManyToOne(fetch = FetchType.LAZY)
    private Category parentCategory;

    @OneToMany(mappedBy = "parentCategory")
    private Set<Category> subCategories = new HashSet<>();
	
    ...
}

During your requirements analysis, these kinds of associations often seem to be very simple. But when you implement your mappings and business code, you quickly recognize that these mappings introduce several pitfalls.

Common Pitfalls of Self-Referencing Associations

In this article, I want to show you the most common of these pitfalls. You can avoid most of them relatively easily in your application. But for others, you might want to consider using a totally different approach, e.g., a graph database.

Pitfall 1: Managing associations

As for all association mappings, you need to decide if you want to map it uni- or bidirectionally. A unidirectional mapping only models the association in one direction, e.g., from a child to their parents. A bidirectional mapping would also model the association from each parent to their children.

In general, unidirectional associations are easier to update. Bidirectional associations are easier to use in your queries and business code. For most applications, the number and complexity of read operations are a lot higher than for write operations. Because of that, a lot of developers prefer bidirectional mappings.

If you decide to use a bidirectional mapping, you always need to update both ends of your association. Otherwise, Hibernate might not persist your change in the database, and the entities in your current persistence context become outdated.

The best way to do that for a self-referencing association is to provide utility methods. Their names should represent the actual business operation and perform all necessary validations.

Example 1: Add a new Parent-Child Association

For the Person example in this article, that might be the method addParents(Person parent1, Person parent2). It adds the 2 provided Person objects to the parents association of a Person. Within that method, you could also check that there are no parents set for that Person and update the associations on all 3 Person objects.

@Entity
public class Person {

    ...
	
    public void addParents(Person parent1, Person parent2) {
        if (!this.parents.isEmpty()) {
            throw new IllegalArgumentException();
        }

        this.parents.add(parent1);
        parent1.getChildren().add(this);
        this.parents.add(parent2);
        parent2.getChildren().add(this);
    }
}

Or you could provide the method createChild(String firstName, String lastName, Person parent2). This method instantiates a new Person object using the provided firstName and lastName, adds it to the Set of children, and initializes the parents association on the newly created Person object.

@Entity
public class Person {

    ...
	
    public Person createChild(String firstName, String lastName, Person parent2) {
        Person child = new Person();
        child.setFirstName(firstName);
        child.setLastName(lastName);
        this.children.add(child);
        child.getParents().add(this);

        if (parent2 != null) {
            parent2.getChildren().add(child);
            child.getParents().add(parent2);
        }
        return child;
    }
}

Example 2: Add a new Sub-Category

In the category example, the implementation of an addSubCategory(String categoryName) or a moveCategory(Category newParent) method would require a different association handling.

The implementation of the addSubCategory(String categoryName) method would be straight forward.

@Entity
public class Category {

    ...
    
    public Category addSubCategory(String categoryName) {
        Category sub = new Category();
        sub.setName(categoryName);
        this.subCategories.add(sub);
        sub.setParentCategory(this);
        return sub;
    }
}

You instantiate a new Category object, add it to the subCategory list of the Category object on which you called the method, and set that object as the parent of the newly created Category.

Moving a category requires an extra step that gets often forgotten.

@Entity
public class Category {

    ...
    
    public void moveCategory(Category newParent) {
        this.getParentCategory().getSubCategories().remove(this);
        this.setParentCategory(newParent);
        newParent.getSubCategories().add(this);
    }
}

You not only need to add the Category object to the subCategories Set of the parent and update the parentCategory reference on your sub-category object. You also need to remove the sub-category object from the subCategories list of the old parent Category.

Pitfall 2: Fetching behavior

Choosing the wrong FetchType for your associations is always a huge performance risk. That’s especially the case for self-referencing associations. Picking the wrong FetchType affects not only a single association in your domain model, but also your entire entity hierarchy.

You should always use FetchType.LAZY for all of your associations. This is the default for all to-many associations, and you need to declare it all of your to-one associations.

Example: Default FetchType of a to-one association

Let’s use the category example to see the effect of a missing FetchType declaration for a to-one association. The following mapping uses the default FetchType.EAGER for the parent association.

@Entity
public class Category {
    
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private Long id;

    private String name;

    @ManyToOne
    private Category parentCategory;

    @OneToMany(mappedBy = "parentCategory")
    private Set<Category> subCategories = new HashSet<>();
	
    ...
}

If you load this entity from the database, Hibernate will initialize the association to the parent entity. When it instantiates the Category object for the parent, it also has to initialize its parent association. This process continues until you reach the top of the hierarchy, where the parent attribute is null.

Category cat = em.find(Category.class, 103L);

Depending on your approach to load the Category entity, Hibernate might optimize this approach and join a Category with its immediate parent in 1 query.

16:37:56,607 DEBUG [org.hibernate.SQL] - 
    select
        category0_.id as id1_0_0_,
        category0_.name as name2_0_0_,
        category0_.parentCategory_id as parentca3_0_0_,
        category1_.id as id1_0_1_,
        category1_.name as name2_0_1_,
        category1_.parentCategory_id as parentca3_0_1_ 
    from
        Category category0_ 
    left outer join
        Category category1_ 
            on category0_.parentCategory_id=category1_.id 
    where
        category0_.id=?
16:37:56,627 DEBUG [org.hibernate.SQL] - 
    select
        category0_.id as id1_0_0_,
        category0_.name as name2_0_0_,
        category0_.parentCategory_id as parentca3_0_0_,
        category1_.id as id1_0_1_,
        category1_.name as name2_0_1_,
        category1_.parentCategory_id as parentca3_0_1_ 
    from
        Category category0_ 
    left outer join
        Category category1_ 
            on category0_.parentCategory_id=category1_.id 
    where
        category0_.id=?

But this still forces Hibernate to execute multiple queries to fetch all levels of the hierarchy. So, better double-check all your association mappings and make sure you’re not using eager fetching.

Pitfall 3: Performing Queries on Your Entity Hierarchy

You can easily define a query that joins over 2-3 levels of your hierarchy. There is no difference between a join over a self-referencing and any other association. As long as you make sure to provide an index for your foreign key column, your database usually handles these queries easily.

TypedQuery<Category> q = em.createQuery(
	"SELECT c "
	+"FROM Category c "
		+"JOIN c.parentCategory p "
		+"JOIN p.parentCategory pp "
	+"WHERE c.name = :name "
		+"OR p.name = :name "
		+"OR pp.name = :name ",
	Category.class);
q.setParameter("name", "myCategory");
q.getSingleResult();

But for quite a few applications, this isn’t good enough. You often need to search all levels of your hierarchy. Depending on the application’s domain, this can be less than 5 or several hundred. If your application requires JOINs over a huge number of hierarchy levels, you might consider using a graph database. They have been specially developed for these kinds of use cases. Relational databases and SQL are not well suited for that.

Initializing Self-Referencing Associations

JOIN FETCH clauses and EntityGraphs enable you to avoid n+1 select issues and to initialize your association efficiently. This also works for self-referencing associations.

TypedQuery<Category> q = em.createQuery(
	"SELECT c "
	+"FROM Category c "
		+"JOIN FETCH c.parentCategory p "
		+"JOIN FETCH p.parentCategory "
	+"WHERE c.id = :id",
	Category.class);
q.setParameter("id", 103L);
q.getSingleResult();

But you should be aware of an issue that I see very often in my consulting projects: If you use multiple JOIN FETCH clauses or complex EntityGraphs, your SQL query returns a huge product. This often slows down your application.

A lot of developers are aware of that but ignore it for self-referencing associations. That’s dangerous because you will experience the same performance issues as with any query that join fetches multiple associations. To fix that problem, you need to split your query into multiple ones, as I explained in a previous article.

Conclusion

There is not a huge difference between a self-referencing association and an association that references another entity class. But you should pay special attention to:

One Comment

  1. Thank you so much for another excellent tutorial

Comments are closed.