Database portability – Pitfalls when supporting multiple RDBMS with Hibernate


Take your skills to the next level!

The Persistence Hub is the place to be for every Java developer. It gives you access to all my premium video courses, monthly Java Persistence News, monthly coding problems, and regular expert sessions.


The JPA specification and Hibernate, as its most popular implementation, claim to provide database portability. That means you don’t need to change your persistence code or mapping annotations when connecting your application to a different RDBMS. They achieve that by providing an automated object-relational mapping, an API that abstracts from the executed SQL statements, and their own query language.

Based on this, developers often think they only need to change their JDBC driver and database connection to switch to a different RDBMS. But unfortunately, that’s usually not the case.

But JPA’s and Hibernate’s claim also isn’t entirely wrong. The provided abstractions hide most database-specific things and get you extremely close to a portable persistence layer. But there are a few pitfalls that often cause problems when you connect your application to a different RDBMS for the first time.

In this article, I will show you how Hibernate achieves database portability, which features usually cause problems, and how you can solve them.

How Hibernate achieves database portability

One of the main challenges when providing database portability is to handle the small differences between most RDBMS’s SQL dialects and supported features. Hibernate solves this by offering a huge set of different Dialect implementations.

The best part about this is that Hibernate can automatically pick and use a Dialect. But it’s nevertheless good to at least understand the basic concept. So, let’s take a quick look at it.

Hibernate’s database Dialects

Each Dialect implementation provides the required information to adjust the mapping of your entity classes and the generated SQL statements to a specific database. It defines things like:

  • the name of the column type to which your entity’s attribute type, e.g., Integer, gets mapped,
  • the name, parameters, and return type of the supported database functions,
  • how the database can generate primary key values,
  • the support for LIMIT and OFFSET clauses in the generated query statements,
  • the support for timeouts and database locks
  • and much more.

You can find Dialect implementations for almost all commonly used databases in the org.hibernate.dialect package.

Since Hibernate 6, each Dialect implementation handles all available versions of the corresponding database. Older versions often used separate Dialect implementations for different versions of the same RDBMS, e.g., PostgreSQL95Dialect, PostgreSQL94Dialect, and PostgreSQL93Dialect. Each of these version-specific Dialects extended the implementation of the previous version and added support for new column types, functions, and other things that were added in the corresponding release of the RDBMS.

Configuring or resolving the Dialect

Hibernate can pick the correct Dialect implementation automatically during startup. It does that by iterating through all DialectResolver implementations available on the classpath and calling their resolveDialect with the metadata information of the connected database. If a resolver knows how to handle this information, it instantiates a Dialect implementation.

These DialectResolver implementations are available for all of Hibernate’s standard Dialects. If you implement a custom Dialect, you either also need to provide an implementation of the DialectResolver interface or explicitly configure the Dialect implementation you want to use.

If the Dialect resolution fails or you want to override it, you need to configure the hibernate.dialect property in your persistence.xml configuration.

<persistence>
    <persistence-unit name="my-persistence-unit">
		...
        <properties>
 			<property name="hibernate.dialect" value="org.hibernate.dialect.PostgreSQLDialect" />

			<property name="jakarta.persistence.jdbc.driver" value="org.postgresql.Driver" />
			<property name="jakarta.persistence.jdbc.url" value="jdbc:postgresql://localhost:5432/myDb" />
			<property name="jakarta.persistence.jdbc.user" value="postgres" />
			<property name="jakarta.persistence.jdbc.password" value="postgres" />
       </properties>
    </persistence-unit>
</persistence>

Features that often break database portability

As I explained in the previous section, Hibernate’s Dialect implementations handle most differences between the various RDBMS. But there are a few features that often cause problems. That’s usually either because some databases don’t support a specific feature or because you use a Hibernate feature that doesn’t abstract from the underlying SQL statement.

Problem 1: Generating primary key values

You probably already know that you can use autoincremented database columns or database sequences to let your database generate unique primary key values. You can configure your preferred option by annotating the primary key attribute with a @GeneratedValue annotation and providing the GenerationType enum value for your preferred sequence.

@Entity
public class Author {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "author_seq")
    @SequenceGenerator(name = "author_seq", sequenceName = "author_seq")
	private Long id;
	
	...
}

For performance reasons, you should use GenerationType.SEQUENCE. It tells Hibernate to use a database sequence to generate unique primary key values. The main benefit of this approach is that it enables Hibernate to generate the primary key value without executing an SQL INSERT statement. And thanks to this separation, Hibernate can perform additional performance optimizations. I explain this in more detail in the Hibernate Performance Tuning course in the Persistence Hub.

Unfortunately, MySQL databases don’t support sequences. You need to use GenerationType.IDENTITY instead.

If you only need to support MySQL databases, you can annotate your primary key attribute with @GeneratedValue(strategy = GenerationType.IDENTITY) instead. But it gets a little more complicated if your application supports multiple RDBMS and you want to use GenerationType.SEQUENCE for some or most of them.

In that case, many developers decide to use GenerationType.AUTO and let Hibernate pick the strategy. But that causes severe performance problems. If your database doesn’t support sequences, GenerationType.AUTO uses the very inefficient GenerationType.TABLE and not as many developers expect to GenerationType.IDENTITY. So, you should better avoid it and use the following solution.

Solution: Override the generation strategy for primary keys

The best way to use the most efficient generation strategy for your primary key values for each database is to define a default strategy and override it when necessary.

I usually choose GenerationType.SEQUENCE as my default strategy. I annotate every primary key attribute of my entity classes with @GeneratedValue(GenerationType.SEQUENCE).

@Entity
public class Author {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "author_seq")
    @SequenceGenerator(name = "author_seq", sequenceName = "author_seq")
	private Long id;
	
	...
}

Then I create an orm.xml mapping file and put it next to my persistence.xml file. Hibernate will pick it up automatically.

<entity-mappings>
	<entity class="com.thorben.janssen.model.Author" name="Author">
		<attributes>
			<id name="id">
				<generated-value strategy="IDENTITY"/>
			</id>
		</attributes>
	</entity>
	...
</entity-mappings>

Hibernate will merge the mapping definitions defined via annotations and in the orm.xml file during the deployment. And if there is a conflict between these 2 definitions, it uses the mapping defined in the orm.xml file. You can learn more about all of this in my article Mapping Definitions in JPA and Hibernate – Annotations, XML or both?

Using this approach, you can override the generation strategy defined by your @GeneratedValue annotation and set it to GenerationType.IDENTITY.

Problem 2: Native queries

Native queries are another feature that often breaks database portability. And the reason for that is simple. Your JPA implementation, e.g., Hibernate, doesn’t parse and transform the SQL statement you provide when creating the native query. It simply executes it.

Author a = (Author) em.createNativeQuery("select a.id, a.firstName, a.lastName, a.version from Author a WHERE a.id = :id", Author.class)
					  .setParameter("id", author.getId())
					  .getSingleResult();

So, it’s up to you to provide an SQL statement that works on all databases your application needs to support. That’s often harder than it might sound. There are 3 common solutions to this problem.

Solution 1: Don’t use native queries

The easiest way to avoid this problem is to avoid native queries in general. As long as you only use JPQL queries or the Criteria API, Hibernate generates the SQL statement and uses a Dialect implementation to adjust it to the RDBMS.

Avoiding native queries might be the easiest approach, but it’s not very practical. Most applications require at least a few queries too complex for JPQL or the Criteria API. In those cases, you can try one of the following solutions.

Solution 2: Provide different sets of SQL statements

Ad-hoc native queries make it relatively easy to provide different SQL statements for each RDBMS. You only need to define multiple versions of a class that contains a String constant for every statement.

public class NativeQueryStrings {
    
    public static String AUTHOR__FIND_BY_ID = "select a.id, a.firstName, a.lastName, a.version from Author a WHERE a.id = :id";

    public static String BOOK__FIND_BY_ID = "select b.* from Book b WHERE b.id = :id";
}

When you create multiple versions of the NativeQueryStrings class, please remember that you will not need to adjust every statement. Especially the simpler ones might work on every RDBMS you need to support.

In that case, you can decide not to extract the query string into the NativeQueryStrings class. Or you create a class containing each query’s default version and extend that class to create a database-specific version.

In your persistence code, you then use these constants to create an ad-hoc native query.

Author a = (Author) em.createNativeQuery(NativeQueryStrings.AUTHOR__FIND_BY_ID, Author.class)
					  .setParameter("id", author.getId())
					  .getSingleResult();

When you package your application, you can pick the NativeQueryStrings class version that matches the RDBMS that you will use in production and include only that one in your deployment.

Depending on your application’s stack, you might also be able to include all of those classes in your application and configure which one you want to use at runtime. For example, if you’re using CDI, you can do this by using a feature called Alternatives.

Solution 3: Use @NamedNativeQuery and override the statement

If you don’t want to manage multiple classes that define your ad-hoc queries, you can define each native query as a @NamedNativeQuery.

@Entity
@NamedNativeQuery(name = "Author.selectByIdNative", query = "select a.id, a.firstName, a.lastName, a.version from Author a WHERE a.id = :id", resultClass = Author.class)
public class Author { ... }

After you define the named query, you can reference it by its name and instantiate it in your persistence code.

Author a = em.createNamedQuery("Author.selectByIdNative", Author.class)
             .setParameter("id", a.getId())
             .getSingleResult();

Similar to the previously discussed generation strategies for primary keys, you can override the statement of a @NamedNativeQuery in the orm.xml mapping file. That enables you to change the query’s statement without changing your persistence code.

<entity-mappings>
	<entity class="com.thorben.janssen.model.Author" name="Author">
		...
		<named-native-query name="Author.selectByIdNative"
			result-class="com.thorben.janssen.model.Author">
			<query><![CDATA[
                    select a.* 
                    from Author a 
                    WHERE a.id = :id
                ]]></query>
		</named-native-query>
	</entity>
</entity-mappings>

Problem 3: Custom functions or stored procedures

Even if you don’t use native queries, custom functions and stored procedures often cause problems if you need to support multiple RDBMS. But this time, it’s not a JPA or Hibernate problem.

The problem is the missing standardization of stored procedures and how you define database functions. That makes it often impossible to use the same code to deploy a stored procedure or database function on different RDBMS.

You only have 2 options to avoid this problem. You either provide a database-specific definition of every custom function and stored procedure or you avoid using these database features entirely.

Conclusion

RDBMS not only differ in the features and performance they provide. They also differ in the supported column types, the expected structure of the SQL statements, and the supported database functions.

Hibernate tries to handle these differences automatically. It uses a database-specific Dialect implementation that provides all the required information to adjust the entity mappings and generated SQL statements.

But if you’re using an API that doesn’t provide an abstraction of the executed query statements or if you use a feature that’s not supported by all databases, Hibernate can’t handle those problems for you. Typical examples are the generation of unique primary key values, native queries, and database functions. 

In those cases, you need to decide if you can avoid a specific feature entirely, e.g., only use JPQL or Criteria queries. Or you need to provide database-specific versions yourself. In that case, you can create an orm.xml file in which you adjust the mapping information you provided as annotations.