Opinionated JPA with Querydsl
Opinionated JPA with Querydsl
Richard "Virgo" Richter
Buy on Leanpub

Table of Contents

Preface

This book is about my experience with object-relational mapping in general and the Java Persistence API (JPA) in particular. It’s also about Querydsl, because since I was recommended this library I completely fell in love with it. It never let me down and sometimes generated feeling of sheer happiness – which is not that common in our programming profession.

Be warned – this book is, as the title suggests, very opinionated. Sometimes you may ask: “Why the hell this guy still uses JPA?!” It is not a typical tutorial book covering the technology systematically – I cherry-pick instead. It’s not a typical best practices book either. We use ORM technology but not quite as intended. We even left the safe water of JPA specification – although originally unintentionally because reference implementation let us.

We decided to take control over the SQL statements our system produces. You can’t rely much on lazy or eager fetch types, because often (as the specification says) these are mere hints, not real promises. Even with rather SQL-like approach JPA still provides big benefits over plain JDBC. We tuned down the ORM part of the JPA a little bit and mapped our entities to resemble “raw” tables more. You may feel like shouting “that’s all wrong!” – which is OK. Let me answer the following question first…

Is this book for you?

This book is for anyone who actively uses JPA – preferably in version 2.1 (Java EE 7) – and is not completely happy with it. It’s main focuses is on painful points related to relationship mapping where benefits mix with unavoidable drawbacks. Even if you’re happy with JPA as it is, the book still can give you more options how to solve your problems.

Some of the considerations are related to ORM in general and may be applicable for non-JPA ORM providers, but I can’t guarantee that. Some parts of the book are more generic and based on common sense (like “avoid N+1 select”), others are strictly JPA related (like how to achieve modularity). Some are probably complete answers, recipes even, others are open topics that only raise further questions. Some are definitely a bit unorthodox from a typical JPA perspective.

Last, but not least, I’ll use Querydsl throughout the book – and this library is still not covered in books as much as it deserves, I believe.

Not in the book

This is not an introductory book about JPA, see the next section about other JPA related books. I expect you know ORM and JPA at least superficially, you work with it for whatever reason (or will, or want to) and you are interested in something beyond reference material. It does not mean it is a difficult book though, as I always try explain the problem I talk about.

This is not a book about architecture, at least not primarily, although patterns how to use JPA are somewhere in the area. I take kind of “Fowlerish” liberty of mixing design and architecture and blur the line between them. I admit I have not yet worked on such a project where thinking architecturally big would pay off. Instead I have seen architecture on other projects where the effort was wasted for various reasons. Some of them are easy to guess if you’re interested in an agile or lean software development. That said, I’ll talk about some design considerations, for instance about the contract between presentation and session layer.

This book does not cover any of these topics:

  • JPA support for native SQL – I don’t use it massively and I have nothing much to say about it.
  • Calling stored procedures – it works as expected.
  • Criteria API – because I prefer Querydsl instead.
  • Locking, concurrency, validation, transactions and other features not affected by the opinionated approach.
  • High performance – we mention performance, and the advices should not affect it negatively, but I don’t focus on high-performance tricks in particular.

We may mention some of them here and there, but there will be little focus on these. Most of them are covered elsewhere anyway. Let’s talk about some of the books available in this segment.

Other JPA books

JPA came with EJB 3, actually first as a part of JSR 220 “Enterprise JavaBeans 3.0”, but it soon proved as a worthy API on its own because it is not tied to EJP in particular. In those days I’d recommend Pro EJB 3: Java Persistence API, but I think the book is now dead (and yes, in 2016 it celebrated its 10th anniversary).

I can highly recommend Pro JPA 2 in its 2nd Edition [PJPA2], which is updated to JPA 2.1 (in line with Java EE 7). There is hardly anything I haven’t found in this one. Sure, it’s closing to 500 pages, but it’s structured well. It has minimal overlap with the rest of Java EE in Chapter 3: Enterprise Applications, but even that may be useful. This is currently my go-to book when I need to refresh some JPA knowledge.

Then there is Spring Data book [SData]. For the sake of confusion there are actually two books of this name – one published by O’Reilly and the other one by Packt, both published within a month late in 2012. If you are Spring positive, I can only recommend reading at least first 50 pages of the one by O’Reilly (up to chapter 4 including), even if you don’t use Spring Data after all. It’s a nice quick overview of JPA and Querydsl with examples of Spring configuration both in XML and Java and they even mention kind of advanced things like orphanRemoval (although rather in passing). It’s definitely a little time well spent anyway.

I don’t know any other relevant books, but there is one more obvious resource – Java Persistence 2.1 Specification itself [JPspec]. There are things you hardly find anywhere else, especially when it comes to corner cases like “use of DISTINCT with COUNT is not supported for arguments of embeddable types”. So when you encounter a problem it is good to search through the specification for keywords. Often JPA provider works as specified, even when it does not make sense to us.

There are tons of blog posts, of course, some of them are very influential. I will mention the most important ones in chapter Love and hate for ORM.

Books on performance of the database and persistence layer

Any book about persistence gives at least some elementary advices on performance. You definitely should consult the documentation to your provider. I can also recommend two books on Java performance in general – both named quite plainly: Java Performance (2011) [JP] and Java Performance: The Definitive Guide (2014) [JPDG]. The first one has a final chapter named Java Persistence and Enterprise Java Beans Performance, the second one deals with the topic in its penultimate chapter Database Performance Best Practices. Finally, there is also a very recent book that covers performance of Java persistence specifically – High Performance Java Persistence [HPJP].

Looking back to Patterns of Enterprise Application Architecture

Martin Fowler wrote his Patterns of Enterprise Application Architecture [PoEAA] in 2002 trying to capture various patterns that were reoccurring in enterprise applications, not necessarily written in Java. Many patterns in the book cover data access layer and many of those are clearly ORM related – starting with Data Mapper going through Unit of Work or Lazy Load to many more related to concrete mapping problems.

Reading the material so old also gives you the idea how well the patterns stand the test of time. In 2002 Java was 7 years old, now it’s over 20. There were no annotations and dependency injection was known in theory, but there was no wide-spread container supporting it (mind you, Spring 1.0 was released in 2004). Years later we still used various SessionHelper static classes to work with Hibernate Session in thread-local manner, often forgetting to close session because we were not able to clearly put open and close to a single method with try/finally block. That was before we read books like Clean Code… probably because it was before they were written.1

Closing his chapter Mapping to Relational Databases Martin Fowler wrote:

Today there are more books related to ORM, even if you narrow it down to Java world. My attempt tries to step away from many ORM concepts and go back to SQL roots to a degree. That does not change the fact that Fowler’s work organizing and describing all these (still applicable) patterns is remarkable.

Structure of the book

In the first part of the book we’ll quickly go through JPA highs and lows (all highly subjective, you know the disclaimer drill) with a bit of historical context as well. This part should prepare a backdrop for any further arguments.

In the second part of the book we will get opinionated. After an introduction for my case we will make a detour to meet Querydsl, because I’m not willing to go on without it. Afterwards we’ll see how to work around the features marked as questionable (where possible).

Third part talks about more common problems you encounter when working with the JPA (like paginating and N+1 select) and offers solutions for them.

Finally, appendices will provide short references for related technologies I used throughout the book (e.g. H2 database) and other additional material.

Using code examples

Book is written on GitHub and all the code is in the GitHub repo for the book.

You may download the whole repository or browse it online, the most important snippets of code are presented in the book. Because I present various JPA models and setups examples are split into many small projects, each with its own Maven pom.xml. Entity classes and other “production” code is in the src/main/java (or resources) while demo code is in the src/test/java. If you want to run it in IDE, I recommend those that can create multiple run configurations and are able to run normal classes (with main) from the test sources as well – I suspect some prominent IDEs don’t support this. But you can always run the demo code using Maven like this (from the project directory, e.g. examples/querydsl-basic:

1 mvn test-compile exec:java -Dexec.mainClass="tests.DogQueryDemo"

There is also neat way how to download subtree of a GitHub project using – surprisingly – SVN export. This makes running any example on the command line a breeze:

1 svn export \
2 https://github.com/virgo47/opinionatedjpawithquerydsl/trunk/examples/querydsl-basic
3 cd querydsl-basic
4 mvn test-compile exec:java -Dexec.mainClass="tests.DogQueryDemo"

Example classes mentioned in the book also link directly to the sources on GitHub (unless you print the e-book, of course).

My writing background

I’ve written more than a few posts on my blog, mostly on technical topics, couple of them about JPA. Actually, this book is partially based on these frustrated experiences. I always felt like a writer – originally just to share, although not everything is worth sharing, I know. Later I realized that writing helps me to clarify a lot of concepts to myself. For the same reasons (sharing and learning) I gave courses and talks. Preparing those is also kind of writing. So I hope both me and this book are not lost causes.

Feedback

When you work on a technical book, especially alone, it happens that some of the information is not exactly right and some might be plain wrong. Obviously, it is not my intention – on the contrary, I try hard to avoid any confusion or leading anyone into a mistake. However, it may happen because the topic is not trivial. I study books and online resources, I test my solutions, I try to understand what is going on, but sometimes I may just understand something wrong. Sorry for that and please, if you know better, let me know. You can:

Obviously, factual errors are probably most crucial, but other suggestions are welcome too.

I JPA Good Times, Bad Times

The biggest trouble with JPA (and ORM in general) is that it is hard. Harder than people realize. The topic is complex, yet the solutions somehow make it look easier than it really is. JPA is part of Java EE and Spring framework supports it very well too, so we simply can’t go wrong with it. Or can we? Various problems – especially complexity – are inherent to ORM solutions in general. But there is also no widespread alternative. JDBC? Too low level, although we may use some frameworks above it (like Spring’s JdbcTemplate). Anything else? Not widespread.

JPA/ORM also somehow make people believe they don’t need to understand SQL. Or JPQL (or any QL) for that matter – and it shows (pain)fully both in the code and at runtime too, typically in the performance aspects.

Technology like this deserves to be studied more deeply than it usually is. It cannot be just slapped on the application. Yet, that is what I mostly see and that is what all those false implied promises deliver. JPA is now one of many embodiments of “EE” – or enterprise software. Applications are bigger and more complex than they need to be. And slower, of course. Developers map the entities but they don’t follow through and don’t investigate and let entropy do the rest of the work instead.

JPA is not perfect and I could name couple of things I’d love to see there. Many of them may not be right for the JPA philosophy, others would probably help tremendously, but in any case we are responsible to do our best to learn about such a key technology – that is if we use it. It reads and writes data from/to our databases which is a crucial thing for most enterprise applications. So let’s talk about the limits, let’s talk about implications, about lazy loads and their consequences, etc. Don’t “just fix it with OSIV” (Open Session in View).

In the following chapters we will talk about what is really good in JPA, what is missing and what is limiting or questionable. I tried to avoid “plain bad” for these parts because it is rather unfairly harsh statement and some of these features are… well, features. Just questionable on many occasions, that’s all. Because the questionable portion is naturally more fertile topic I let it overflow into separate chapters about caching and how people often feel about JPA.

1. Good Parts

First we will look at all the good things JPA offers to us. Why did I start to prefer JPA over a particular provider? Mind that this part is just an extended enumeration of various features, not a tutorial material.

The good parts are not absolute though. If we compare the JPA as a standard with a concrete ORM we’ll get the standardization, but there are related drawbacks. Comparing JPA with SQL is not quite appropriate as JPA covers much bigger stack. We may compare JPA with JDBC (read on the good parts), or we may compare Java Persistence Query Language (JPQL) with SQL to a degree.

JPA is standard

When we started with ORM, we picked Hibernate. There were not many other ORM options for Java, or at least they were not both high-profile and free. Yet even Hibernate creators are now glad when their tool is used via JPA interfaces:

Standard is not always necessarily good, but in case of JPA it is well established and respected. Now we can choose between multiple JPA 2.1 implementations or just deploy to any Java EE 7 application server. I’d not bet on the promise that we can easily swap one provider (or application server) for the other – my experiences are terrible in this aspect – but there is a common set of annotations for mapping our entities and the same core classes like EntityManager.

But standards also have their drawbacks, most typically that they are some common subset of the features offered in that area. JPA’s success is probably based on the fact that it emerged as a rock-solid subset of the features offered by major players in the Java ORM world. But starting this way it must have trailed behind them. JPA 1 did not offer any criteria API, can you imagine that? Now JPA 2.1 is somewhere else. The features cover nearly anything we need for typical applications talking to relational databases.

Even if something is missing we may combine it with proprietary annotations of a particular ORM provider – but only to a necessary degree. And perhaps we can switch those to JPA in its next version. In any case, JPA providers allow us to spice up JPA soup with their specifics easily. That way JPA does not restrain us.

Database independence

Let’s face it – it’s not possible to easily switch from one database to another when we have invested in one – especially if we use stored procedures or anything else beyond simple plain SQL. But JPA is not the problem here, on the contrary. It will require much less (if any) changes and the application will work on.

One of the best examples of this is pagination. There is a standard way how to do it since SQL:2008 using FETCH FIRST clause (along with OFFSET and ORDER BY) but only newer database versions support this. Before that the pagination required anything from LIMIT/OFFSET combo (e.g. PostgreSQL) to outer select (for Oracle before 12c). JPA makes this problem go away as it uses specific dialect to generate the SQL. This portion of the database idiosyncrasies is really something we can live without and JPA delivers.

Hence, if we start another project and go for a different database our application programmers using JPA probably don’t have to learn specifics of its SQL. This cuts both ways though. I’ve heard that “you don’t have to learn SQL” (meaning “at all”) is considered a benefit of the JPA, but every time programmers didn’t understand SQL they didn’t produce good results.

Natural SQL-Java type mapping

JPA can persist anything that is so called persistable type. These are:

  • entity classes (annotated with @Entity) – these loosely matches to database tables;
  • mapped superclasses (@MappedSuperclass) – used when we map class hierarchies;
  • embeddable classes (@Embeddable) – used to group multiple columns into a value object;

These were all types on the class level, more or less matching whole database tables or at least groups of columns. Now we will move to the field level:

  • simple Java types (optionally annotated with @Basic) like primitive types, their wrappers, String, BigInteger and BigDecimal;
  • temporal types like java.util.Date/Calendar in addition to types from java.sql package;
  • collections – these might be relations to other entities or embeddable types;
  • enum types – here we can choose whether it’s persisted by its name() or by ordinal().

Concepts in the first list are not covered by the JDBC at all – these are the most prominent features of ORM. But we can compare JDBC and JPA in how easy it is to get the value for any column as our preferred type. Here JDBC locks us to low-level types without any good conversion capabilities, not even for omnipresent types like java.util.Date. In JPA we can declare field as java.util.Date and just say that the column represents time, date or timestamp using @Temporal annotation. I feel no need to use java.sql package anymore.

Also enums are much easier, although this applies only for simple cases, not to mention that for evolving enums @Enumerated(EnumType.ORDINAL) is not an option (it should not be an option at all, actually). More in the chapter Mapping Java enum.

The bottom line is that mapping from SQL types to Java field types is much more natural with JPA than with JDBC. And we haven’t even mentioned two big guns JPA offers – custom type converters and large object support.

Convenient type conversion

Up to JPA 2.0 we could not define custom conversions, but JPA 2.1 changed this. Now we can simply annotate our field with @Convert and implement AttributeConverter<X, Y> with two very obvious methods:

  • Y convertToDatabaseColumn (X attribute)
  • X convertToEntityAttribute (Y dbData)

And we’re done! Of course, don’t forget to add annotation @Converter on the converter itself, just like we annotate other classes with @Entity or @Embeddable. Even better, we can declare converter as @Converter(autoApply = true) and we can do without @Convert on the field. This is extremely handy for java.time types from Java 8, because JPA 2.1 does not support those (it was released before Java SE 8, remember).

Large objects

Compared with JDBC, the JPA makes working with large objects (like SQL CLOB and BLOB types) a breeze. Just annotate the field with @Lob and use proper type, that is byte[] for BLOBs and String for CLOBs. We may also use Byte[] (why would we do that?) or Serializable for BLOBs and char[] or Character[] for CLOBs.

In theory we may annotate the @Lob field as @Basic(fetch=FetchType.LAZY) however this is a mere hint to the JPA provider and we can bet it will not be heard out. More about lazy on basic and to-one fields in a dedicated section.

Getting LOBs via JDBC from some databases may not be such a big deal, but if you’ve ever looped over LOB content using 4 KB buffer you will appreciate straightforward JPA mapping.

Flexible mapping

It better be flexible because it is the most important reason for the whole ORM concept. Field-to-column mapping can be specified as annotations on the fields or on the getters – or in XML files. Annotation placement implies access strategy, but it can be overridden using @Access on the class. In the book I’ll use field access mostly, but property access can be handy when we want to do something in get/setters. For instance, before JPA 2.1 we did simple conversions there, field was already converted and we used additional get/setters annotated as @Transient leaving those JPA related accessors only for JPA purposes.

Mentioning @Transient – sometimes we want additional fields in our JPA entities and this clearly tells JPA not to touch them. There are arguments how much or little should be in JPA entity, I personally prefer rather less than more – this on the other hand is criticized as anemic domain model – and we will return to this topic shortly later.

Unit of work

Unit of work is a pattern described in [PoEAA]. In Hibernate it was represented by its Session, in JPA more descriptive (and arguably more EE-ish) name EntityManager was chosen. To make things just a little tougher on newcomers, entity manager manages something called persistence context – but we make no big mistake if we treat entity manager and persistence context as synonyms. Nowadays EntityManager is typically injected where needed, and combined with declarative transactions it is very natural to use it. Funny enough, it’s not injected with standard @Inject but with @PersistenceContext.

To explain the pattern simply: Whatever we read or delete through it or whatever new we persist with it will be “flushed” at the end of a transaction and relevant SQL queries will get executed. We get all of this from EntityManager and while there is some ideal way how to use it, it is flexible enough and can be “bent”. We can flush things earlier if we need to force some order of SQL statements, etc.

Because unit of work remembers all the work related to it sometimes it is called “session cache” or “1st level cache”. This “cache” however does not survive the unit of work itself and talking about cache gives this pattern additional – and confusing – meaning.

After all the years of experience with Session and EntityManager it was actually really refreshing to read about the unit of work pattern, to see what operations are typical for it, to have it explicitly stated and also to read Fowler’s speculation how it will be used and implemented – all written back in 2002, and most of it still valid!

Declarative transactions

JPA has basic transaction support, but when it comes to declarative transactions they are part of Java Transaction API (JTA). JPA and JTA are seamlessly integrated from programmer’s point of view that we simply take @Transactional support for granted.

Programmatically we can get to the transaction using EntityManager’s getTransaction() method. This returns EntityTransaction implementation that allows us to check whether a transaction is active or not, begin one, commit or rollback. This is all good for simple cases while using transaction-type set to RESOURCE_LOCAL in our persistence.xml – typical for SE deployment. But in Java EE world (or when using Spring, even outside of application server) we’ll probably use @Transactional annotation for declarative transactional management, possibly to join a distributed transaction if necessary.

We will not cover transaction management, there are better resources and plenty of blogs. I can recommend [ProJPA2] which covers both @Transactional and @TransactionalAttribute annotations, and much more. A bit of a warning for Spring users. We can use their @Transactional annotation, this way we can even mark transaction as read-only when needed, but we cannot mix both @Transactional annotations at will as they work differently. It is advisable to use only Spring’s annotation in Spring environment.

Other JPA 2.1 goodies

We talked about custom converters already, but JPA 2.1 brought much more. Let’s cover the most interesting points quickly:

  • As 2.0 brought Criteria, 2.1 extended their usage for updates/deletes as well (although I’d use Querydsl for all these cases).
  • We can now call stored procedures in our queries.
  • We can map results into constructors and create custom DTO objects for specific queries (again, Querydsl has it too and it works for older JPAs).
  • Feature called entity graph allows us to define what relations to load in specific situations and then to fetch relations marked as lazy when you know up-front you will need them. I’m not covering this in the book.
  • There’s also an option to call any custom FUNCTION from JPQL. This means also native functions of a particular database if needed. While we limit ourselves to that vendor (or we need to rewrite it) it allows us to perform duties beyond and above JPA’s built-in functions.
  • JPA 2.1 also specifies properties that allow us to generate database schema or run SQL scripts, which unifies this configuration for various JPA providers.

For the whole list see either this part of Java Persistence wikibook or this post covering the features even better.

2. The Missing Parts

In this part we’ll briefly compare the JPA standard with capabilities offered by various ORM providers and go through the things missing in the Java Persistence Query Language (JPQL) when compared with SQL.

Compared to ORM providers

Couple of years back this section would be a long one – but the JPA narrowed the gap with 2.0 and virtually closed it with 2.1. There are still things that may be missing when we compare it with ORM providers. The list of various ORM features not included in the JPA is probably long. However, the right question is how much a feature is really missed, how often we use it and how serious impact this omission has.

In most cases we can get out of the comfortable “standard” zone and use capabilities of a particular ORM without affecting parts that can be pure JPA. Maybe we add some provider-specific annotations in our mapping – so be it, let’s be pragmatic.

One particular feature I once missed a lot is something that allows me to go through a result set just like in JDBC, but using entities. These don’t have to be put into the persistence context as I can process them as they come. Actually – they must not be put there, because they just bloat the heap without any real use. It’s like streaming the result set. This may be extremely handy when we produce some very long exports that are streamed straight to the client browser or a file. Hibernate offers ScrollableResults, I personally used JDBC with Spring JdbcTemplate instead and solved the problem without JPA altogether – obviously, I had to do the mapping myself, while Hibernate can do it for us. Even so, as mentioned in this StackOverflow answer this may still cause OutOfMemoryError or similar memory related issue, this time not on JPA/ORM level, but because of silly JDBC driver (or even because of database limitations, but that’s rare).

Another area that is not covered by JPA that much is caching. JPA is rather vague about the structure of caches, although it does specify some configuration options and annotations. But the ORM implementations can still differ significantly. We tackle this topic in a separate chapter.

Finally, with the introduction of ON clause we could get much more low-level with our queries when it suits us. ON is intended for additional JOIN conditions but we would have some new ways how to approach the relations which can be bothersome from time to time. We could use ON to explicitly add our primary JOIN condition – but all this is for naught because JPA 2.1 does not allow to use root entity (representing the table itself) in the JOIN clause. More about this topic and how to do it with EclipseLink and Hibernate (5.1.0 and higher) in the chapter Removing to-one altogether.

Comparing JPQL to SQL

JPQL gets more and more powerful with every new specification version, but obviously it cannot match SQL with proprietary features. Being “object query language” it does have its expressiveness on its own though – for instance we may simply say “delete all the dogs where breed name is <this and this>” without explicitly using joins (although this relies on to-one mapping we will try to get rid of). When object relations are mapped properly, joins are implied using the mapping and JPA provider will take care of generating the proper SQL for us. We may also ask for dog owners with their dogs and ORM can load it in a single query and provide it as a list of owners, each with their list of dogs – this is the difference between relational and object view.

But there are some very useful constructs that are clearly missing. I personally never needed right outer join, so far I was always able to choose the right entity to start with and then using left outer joins, but this one may hit us sometimes. There is also no full outer join, but this relates to the fact we’re working with objects, not with rows – although technically we may work with tuples (and with relations in RDBMS meaning). This dumbs down the JPA capabilities a bit, but in many cases it may be a good way and actually simplify things, provided we understand SQL – which we should.

When compared to SQL, probably the most striking JPQL limitations are related to subqueries. Some scenarios can be replaced by JOINs, but some can’t. For instance, we cannot put subqueries into a SELECT clause – this would allow for aggregated results per row. We cannot put them into a FROM clause and use the select result as any other table – or, in relational database sense, as a relation). This would allow us, among other things, to count rows for results that current JPA does not support.1

JPA offers a palette of favourite functions, but of course it does not provide all possible functions.2 Before JPA 2.1 we’d have to use ORM provider custom functionality to overcome this, or fallback to native SQL. Just because we are consciously sacrificing database portability it does not mean we don’t want to use JPQL. It provides us with FUNCTION construct where the name of the called function is the first argument with other arguments following behind. Easy to use and very flexible – this effectively closes the gap for functions we can use.

Other missing parts

The best way how to find corner cases that are not supported in the specification [JPspec] is simply to search for “not supported” string. Some of these are related to embeddables, some are pretty natural (e.g. “applying setMaxResults or setFirstResult to a query involving fetch joins over collections is undefined”).3

What I dearly miss in JPA is better control over fetching of to-one relations. Current solution is trying to be transparent both in terms of object-relational mapping and Java language features, but it may kill our performance or require caching, potentially a lot of it. While to-many relations can be loaded lazily with special collection implementation enabling it, to-one cannot work like this without bytecode modification. I believe though, that developers should be allowed to decide that this and this to-one relationship will not be loaded and only detached entities with IDs will be provided. But let’s wait with this discussion for the opinionated part.

3. Questionable parts

The JPA, as any ORM, is not without its drawbacks. Firstly, it is complex – much deeper than developers realize when they approach it. Secondly, it is not a perfect abstraction. The more you want to play it as a perfect abstraction, the worse it probably gets in marginal cases. And the margin is not that thin. You may solve 80% cases easily, but there are still 20% of hard cases where you go around your ORM, write native SQL, etc. If you try to avoid it you’ll probably suffer more than if you accepted it.

We can’t just stay on the JPA level, even for cases where ORM works well for us. There are some details we should know about the provider we use. For instance, let’s say we have an entity with auto-generated identifier based on IDENTITY (or AUTO_INCREMENT) column. We call persist on it and later we want to use its ID somewhere – perhaps just to log the entity creation. And it doesn’t work, because we’re using EclipseLink and we didn’t call em.flush() to actually execute that INSERT. Without it the provider cannot know what value for ID it should use. Maybe our usage of the ID value was not the right ORM way, maybe we should have used the whole entity instead, but the point is that if we do the same with Hibernate it just works. We simply cannot assume that the ID is set.4

Lazy on basic and to-one fields

While we can map these fields as lazy the behaviour is actually not guaranteed according to the JPA specification [JPspec]. Its section 2.2, page 25, states:

And the attached footnote adds:

While ORMs generally have no problem to make collections lazy (e.g. both to-many annotations), for to-one mappings this gets more complicated. [PoEAA] offers couple of solutions for lazy load pattern: lazy initialization, virtual proxy, value holder, and ghost. Not all are usable for to-one mappings.

The essential trouble is that such a field contains an entity directly. There is no indirection like a collection, that can provide lazy implementation, in case of to-many mappings. JPA does not offer any generic solution for indirect value holder. Virtual proxy would require some interface to implement, or bytecode manipulation on the target class, ghost would definitely require bytecode manipulation on the target class, and lazy initialization would require bytecode manipulation, or at least some special implementation, on the source class. JPA design neither offers any reasonable way how to introduce this indirection without advanced auto-magic solutions nor ways how to do it explicitly in a way a programmer can control.

Removing to-one mappings and replace them with raw foreign key values is currently not possible with pure JPA even though JPA 2.1 brought ON clause for JPQL but it does not allow root entities in JOINs. We will expand on this in the chapter Troubles with to-one relationships.

Generated SQL updates

Programmer using JPA should see the object side of the ORM mapping. This means that an object is also the level of granularity on which ORM works. If we change a single attribute on an object ORM can simply generate full update for all columns (except for those marked updatable = false, of course). This by itself is probably not such a big deal performance-wise, but if nothing else it makes SQL debug output less useful for checking what really changed.

I’d not even expect ORM to eliminate the column from update when it’s equal, I’d rather expect them to include it only when it was set. But we are already in the domain of ORM auto-magic (again) as they somehow have to know what has changed. Our entities are typically enhanced somehow, either during the build of a project, or during class-loading. It would probably be more complex to store touched columns instead of marking the whole entity as “dirty”.

To be concrete, EclipseLink out of the box updates only modified attributes/columns, while Hibernate updates all of them except for ID which is part of the WHERE clause.5 There is a Hibernate specific annotation @DynamicUpdate that changes this behaviour. We may even argue what is better and when. If we load an object in some state, change a single column and then commit the transaction, do we really want to follow the changes per attribute or do we expect the whole object to be “transferred” into the database as-is at the moment of the commit? If we don’t squeeze performance and our transactions are short-lived (and they better are for most common applications) there is virtually no difference from the consistency point either.

All in all, this is just a minor annoyance when we’re used to log generated queries, typically during development, and we simply cannot see what changed among dozens of columns. For this – and for the cases when we want to change many entities at once in the same way – we can use UPDATE clause, also known as bulk update. But these tend to interfere with caches and with persistence context. We will talk about that in the next chapter.

Unit of work vs queries

JPA without direct SQL-like capabilities (that is without JPQL) would be very limited. Sure, there are projects we can happily sail through with queries based on criteria API only, but those are the easy ones. I remember a project, it was an important budgeting system with hierarchical organizational structure with thousands of organizations. There were budget items and limits for them with multiple categories, each of them being hierarchical too. When we needed to recalculate some items for some category (possibly using wildcards) we loaded these items and then performed the tasks in memory.

Sometimes the update must be done entity by entity – rules may be complicated, various fields are updated in various way, etc. But sometimes it doesn’t have to be this way. When the user approved the budget for an organization (and all its sub-units) we merely needed to set a flag on all these items. That’s what we can do with a bulk update. UPDATE and DELETE clauses were in the JPA specification since day zero, with the latest JPA 2.1 we can do this not only in JPQL, but also in Criteria API.6

When we can use a single bulk update (we know what to SET and WHERE to set it) we can gain massive performance boost. Instead of iterating and generating N updates we just send a single SQL to the database and that way we can go down from a cycle taking minutes to an operation taking seconds. But there is one big “but” related to the persistence context (entity manager). If we have the entities to be affected by the bulk update in our persistence context, they will not be touched at all. Bulk updates go around entity manager’s “cache” for unit of work, which means we should not mix bulk updates with modification of entities attached to the persistence context, unless these are completely separate entities. In general, I try to avoid any complex logic with attached entities after I execute bulk update/delete – and typically the scenario does not require it anyway.

To demonstrate the problem with a snippet of code:

BulkUpdateVsPersistenceContext.java
 1 System.out.println("dog.name = " + dog.getName()); // Rex
 2 
 3 new JPAUpdateClause(em, QDog.dog)
 4   .set(QDog.dog.name, "Dex")
 5   .execute();
 6 
 7 dog = em.find(Dog.class, 1); // find does not do much here
 8 System.out.println("dog.name = " + dog.getName()); // still Rex
 9 
10 em.refresh(dog); // this reads the real data now
11 System.out.println("after refresh: dog.name = " +
12   dog.getName()); // Dex

The same problem applies to JPQL queries which happen after the changes to the entities within a transaction on the current persistence context. Here the behaviour is controlled by entity manager’s flush mode and it defaults to FlushModeType.AUTO.7 Flush mode AUTO enforces the persistence context to flush all updates into the database before executing the query. But with flush mode COMMIT we’d get inconsistencies just like in the scenario with bulk update. Obviously, flushing the changes is a reasonable option – we’d flush it sooner or later anyway. Bulk update scenario, on the other hand, requires us to refresh attached entities which is much more disruptive and also costly.

We can’t escape SQL and relational model

Somewhere under cover there is SQL lurking and we better know how it works. We will tackle this topic with more passion in the second part of the book – for now we will just demonstrate that not knowing does not work. Imagine we want a list of all dog owners, and because there is many of them we want to paginate it. This is 101 of any enterprise application. In JPA we can use methods setFirstResult and setMaxResults on Query object which corresponds to SQL OFFSET and LIMIT clauses.8

Let’s have a model situation with the following owners (and their dogs): Adam (with dogs Alan, Beastie and Cessna), Charlie (no dog), Joe (with Rex and Lassie) and Mike (with Dunco). If we query for the first two owners ordered by name – pagination without order doesn’t make sense – we’ll get Adam and Charlie. However, imagine we want to display names of their dogs in each row. If we join dogs too, we’ll just get Adam twice, courtesy of his many dogs. We may select without the join and then select dogs for each row, which is our infamous N+1 select problem. This may not be a big deal for a page of 2, but for 30 or 100 we can see the difference. We will talk about this particular problem later in the chapter about N+1.

These are the effects of the underlying relational model and we cannot escape them. It’s not difficult to deal with them if we accept the relational world underneath. If we fight it, it fights back.

Additional layer

Reasoning about common structured, procedural code is quite simple for simple scenarios. We add higher-level concepts and abstractions to deal with ever more complex problems. When we use JDBC we know exactly where in the code our SQL is sent to the database, it’s easy to control it, debug it, monitor it, etc. With JPA we are one level higher. We still can try to measure performance of our queries – after all typical query is executed where we call it – but there are some twists.

First, it can be cached in a query cache – which may be good if it provides correct results – but it also significantly distorts any performance measurement. JPA layer itself takes some time. Query has to be parsed (add Querydsl serialization to it when used) and entities created and registered with the persistence context, so they are managed as expected. This distorts the result for the worse, not to mention that for big results it may trigger some additional GC that plain JDBC would not have.

The best bet is to monitor performance of the SQL on the server itself. Most decent RDBMS provide some capabilities in this aspect. We can also use JDBC proxy driver that wraps the real one and performs some logging on the way. Maybe our ORM provides this to a degree, at least in the logs if nowhere else. This may not be easy to process, but it’s still better than no visibility at all. More sophisticated system may provide nice measurement, but can also add performance penalty – which perhaps doesn’t affect the measured results too much, but it can still affect the overall application performance. Of course, monitoring overhead is not related only to JPA, we would get it using just plain JDBC as well.

I will not cover monitoring topics in the book – they are natural problems with any framework, although we can argue that access to RDBMS is kinda critical. Unit of work pattern causes real DB work to happen somewhere else than the domain code would indicate. For simple CRUD-like scenarios it’s not a problem, not even from performance perspective (mostly). For complex scenarios, for which the pattern was designed in the first place, we may need to revisit what we send to the database if we encounter performance issues. This may also affect our domain. Maybe there are clean answers for this, but I don’t know them. I typically rather tune down how I use the JPA.

All in all, the fact that JPA hides some complexity and adds another set of it is a natural aspect of any layer we add to our application – especially the technological one. This is probably one of the least questionable parts of JPA. We either want it and accept its “additional layer-ness” or choose not to use it. Know that when we discover any problem we will likely have to deal with the complexity of the whole stack.

Big unit of work

I believe that unit of work pattern is really neat, especially when we have support from ORM tools. But there are legitimate cases when we can run into troubles because the context is simply too big. This may easily happen with complicated business scenarios and it may cause no problem. Often, though, users may see the problem. Request takes too long or nightly scheduled task is still running late in the morning. Code looks good, it’s ORM-ish as it can be – it’s just slow.

We can monitor or debug how many objects are managed, often we can see effects on the heap. When this happens something has to change, obviously. Sometimes we can deal with the problem within our domain model and let ORM shield us from the database. Sometimes it’s not possible and we have to let relational world leak into our code.

Let’s say we have some test cases creating some dog and breed objects. In ideal case we would delete all of them between tests, but as it happens, we are working on the database that contains some fixed set of dogs and breeds as well (don’t ask). So we mark our test breeds with ‘TEST’ description. Dog creation is part of tested code, but we know they will be of our testing breeds. To delete all the dogs created during the test we may then write:

1 delete from Dog d where d.breed.description = 'TEST'

That’s pretty clear JPQL. Besides the fact that it fails on Hibernate it does not touch our persistence context at all and does its job. We can do the same with subquery (works for Hibernate as well) – or we can fetch testing breeds into a list (they are now managed by the persistence context) and then execute JPQL like this:

1 delete from Dog d where d.breed in :breeds

Here breeds would be parameter with a list containing the testing breeds as its value. We may fetch a plain list of breed.id instead, this does not get managed, takes less memory and pulls less data from the database with the same effect, we just say where d.breed.id in :breedIds instead – if supported like this by our ORM, but it’s definitely OK with JPA. I’ve heard arguments that this is less object-oriented. I was like “what?”

Finally, what we can do is start with fetching the testing breeds and then fetch all the dogs with these breeds and call em.remove(dog) in a cycle. I hope this, object-oriented as it is, is considered a bit of a stretch even by OO programmers. But I saw it in teams where JPQL and bulk updates were not very popular (read also as “not part of the knowledge base”).

Persistence context (unit of work) has a lot to do when the transaction is committed. It needs to be flushed which means it needs to check all the managed (or attached) entities, and figure out whether they are changed or not (dirty checking). How the dirty checking is performed is beyond the scope of this book, typically some bytecode manipulation is used. The problem is that this takes longer when the persistence context is big. When it’s big for a reason, then we have to do what we have to do, but when it’s big because we didn’t harness bulk updates/deletes, than it’s simply wasteful. Often it also takes a lot of code when a simple JPQL says the same. I don’t accept an answer that reading JPQL or SQL is hard. If we use RDBMS we must know it reasonably well.

Sometimes bulk update is not a feasible option, but we still want to loop through some result, modify each entity without interacting with others and flush it (or batch the update if possible). This sounds similar to the missing “cursor-like result set streaming” mentioned in the section covering missing features some ORM providers do have – although that case covers read-only scenarios without any interaction with persistence context. If we want to do something on each single entity as we loop through the results and hit a performance problems (mostly related to memory) we may try to load smaller chunks of the data and after processing each chunk we flush the persistence context and also clear it with EntityManager.clear().

Other entity manager gotchas

Consider the following snippet:

1 em.getTransaction().begin();
2 Dog dog = new Dog();
3 dog.setName("Toothless");
4 em.persist(dog);
5 dog.setAge(4);
6 em.getTransaction().commit();

We create the dog named Toothless and persist it, setting its age afterwards. Finally, we commit the transaction. What statements do we expect in the database? Hibernate does the following:

  1. On persist it obtains an ID. If it means to query the sequence, it will do it. If it needs to call INSERT because the primary key is of type AUTO_INCREMENT or IDENTITY (depending on the database), it will do so. If insert is used, obviously, age is not set yet.
  2. When flushing, which is during the commit(), it will call INSERT if it wasn’t called in the previous step (that is when the sequence or other mechanism was used to get the ID). Interestingly enough, it will call it with age column set to NULL.
  3. Next the additional UPDATE is executed, setting the age to 4. Transaction is committed.

Good thing is that whatever mechanism to get the ID value is used we get the consistent sequence of insert and update. But why? Is it necessary?9

There is a reason I’m talking about Hibernate, as you might have guessed. EclipseLink simply reconciles it all to the single INSERT during the commit(). If it is absolutely necessary to insert the entity into the database in a particular state (maybe for some strange constraints reasons) and update afterwards we have to “push” it with an explicit flush() placed somewhere in between. That is probably the only reliable way how to do it if we don’t want to rely on the behaviour of a particular JPA provider.10

JPA alternatives?

When JPA suddenly stands in our way instead of helping us, we can still fallback gracefully to the JDBC level. Personally I don’t mix these within a single scenario, but we can. For that we have to know what provider we use, unwrap its concrete implementation of EntityManager and ask it for a java.sql.Connection in non-portable manner. When I don’t mix scenarios, I simply ask Spring (or possibly other container) to inject underlying javax.sql.DataSource and then I can access Connection without using JPA at all. Talking about Spring, I definitely go for their JdbcTemplate to avoid all the JDBC boilerplate. Otherwise I prefer JPA for all the reasons mentioned in the Good Parts.

We’ve lightly compared JPA with concrete ORMs already, but they are still the same concept – it is much more fun to compare it with something else altogether, let’s say a different language – like Groovy. We’re still firmly on JVM although it’s not very likely to do our persistence in Groovy and the rest of the application in Java. Firstly, Groovy language also has its ORM. It’s called GORM and while not built into the core project it is part of Grails framework. I don’t have any experience with it, but I don’t expect radical paradigm shift as it uses Hibernate ORM to access RDBMS (although it supports also No-SQL solutions). Knowing Groovy I’m sure it brings some fun into the mix, but it still is ORM.

I often use core Groovy support for relational databases and I really like it. It is no ORM, but it makes working with database really easy compared to JDBC. I readily use it to automate data population as it is much more expressive than SQL statements – you mostly just use syntax based on Groovy Maps. With little support code you create helper insert/update methods that can provide reasonable defaults for columns you don’t want to specify every time or insert whole aggregates (master-slave table structures). It’s convenient to assign returned auto-increment primary keys into Groovy variables and use them as foreign keys where needed. It’s also very easy to create repetitive data in a loop.

I use this basic database support in Groovy even on projects where I already have entities in JPA, but for whatever reason I don’t want to use them. We actually don’t need to map anything into objects, mostly couple of methods will do. Sure, we have to rename columns in the code when we refactor, but column names are hidden in just a couple places, often just a single one. Bottom line? Very convenient, natural type conversion (although not perfect, mind you), little to no boilerplate code and it all plays nicely with Groovy syntax. It definitely doesn’t bring so much negative passion with it as JPA does – perhaps because the level of sophistication is not so high. But in many cases simple solutions are more than enough.

4. Caching considerations

I didn’t want to discuss caching too much, but later I did spread couple of sections about it throughout the text. Later I decided to concentrate them here and be (mostly) done with it for the sake of this book.

While persistence context (EntityManager or session) is sometimes considered a cache too, it is merely a part of the unit-of-work pattern (identity map). The real cache sits underneath and is shared on the level of the EntityManagerFactory – or even between more of them across various JVMs in case of distributed caches. This is called the second-level cache.11 It is used to enhance performance, typically by avoiding round-trips to the database. But caching has consequences.

Caching control

[JPspec] doesn’t say much about caching. It says how to configure it – starting with shared-cache-mode in the persistence.xml. But I’d probably study caching documentation of a particular provider, because if we don’t care at all, we don’t even know whether and how we use the cache.

Without choosing shared-cache-mode it is up to the JPA provider and its defaults. This may render any use of @Cacheable annotations useless. Currently, Hibernate typically doesn’t cache by default, while EclipseLink caches everything by default. Being oblivious to the cache (not related to cache-oblivious algorithms at all) is rather dangerous, especially if our application is not the only one running against the same database. In that case setting shared-cache-mode explicitly to NONE is by far the best start. We may revisit our decisions later, but at least we know what is happening.

Probably the most important question to consider is: Is our application the sole user of a particular database? Is it running in a single process? If yes, we may safely use the cache. But I’d not “just use the defaults” of the JPA provider – which may be no cache as well. It’s not a good idea to use caching without any thought at all. I prefer not to use it until I feel the need. When we start using the cache and tuning it we must be prepared for a walk that may be not as easy as it seems.

Entity cache returning the entities by ID really quickly sounds like a good idea because it makes the problem of eager-loads of to-one relationships less serious. E.g. when we load a dog we may get its breed quickly from the cache. It doesn’t fix the problem though as all the additional entities are part of the current persistence context whether we want them or not. E.g. we want to update a single attribute of a dog and we are not interested in its breed. We already mentioned that units of work bigger than necessary are not for free.

Second-level cache vs queries

Caching should be transparent, but just turning it on is a kind of premature optimization which – in virtually all cases – ends up being wrong. Any auto-magic can only go so far, and any caching leads to potential inconsistencies. I believe most JPA users don’t understand how the cache is structured (I’m talking from my own experience too, after all). This depends on a concrete ORM, but typically there is an entity cache and a query cache.

Entity cache helps with performance of EntityManager.find, or generally with loading by entity’s @Id attribute. But this will not help us if we accidentally obfuscate what we want with a query, that would otherwise return the same. The provider has no way to know what entity (with what ID) will be loaded just looking at arbitrary where conditions. This is what query cache is for. Bulk update and deletes using JPQL go around either of these caches and the safest way how to avoid inconsistent data is to evict all entities of the modified type from the caches. This is often performed by the ORM provider automatically (again, check documentation and settings).

If we only work with whole entities all the time and nothing else accesses the database we can be pretty sure we always get the right result from the entity cache. You may wonder how this cache behaves in concurrent environment (like any EE/Spring application inherently is). If you imagine it as a Map, even with synchronized access, you may feel the horror of getting the same entity instance (Dog with the same ID) for two concurrent persistence contexts (like concurrent HTTP requests) that subsequently modify various fields on the shared instance. Luckily, ORMs provide each thread with its own copy of the entity. Internally they typically keep entities in the cache in some “dehydrated” form.12

Explicit application-level cache

Previously we ignored the case when we cache some entities under multiple different keys, not necessarily retrieved by the same method. Imagine a component that returns some classifiers by various attributes (ID, name, code, whatnot) – this is pretty realistic scenario. We code it as a managed component with declarative cache. There are separate methods to obtain the classifier by specific attributes. If we retrieve the same classifier by three different attributes we’ll populate three different caches with – essentially the same – entity stored under different key (attribute value) in each of these caches.

Even if we ignore that the entities sometimes participate in the persistence context and sometimes don’t this consumes more memory space than necessary. It may still be acceptable though. Personally I believe that the non-determinism regarding the attached/detached state is more serious but let’s say these are only for reading and we may not care. Imagine further that we may filter on these entities – like “give me a list of classifiers with name starting with BA”. Now we have even more variability in cache keys – any distinct filter is the key – and probably many more repeated entities in the results. But these are likely distinct objects even for the same logical entities. This may either explode our cache, or cause frequent evictions rendering our cache useless, probably utilizing more CPU in the process.

If the amount of underlying data is big we may have no other chance, but in case of various administered static data, code books or classifiers the size of the table is typically small. Once our DB admins reported that we queried a table with 20k rows of static data 20 million times a day – not to mention the traffic on our site was in order of thousands of request a day. It was an unnecessarily rich relational model and this part of system would be better represented in some kind of document/NoSQL store. We didn’t use the relations in that data that much – and they actually prevented some needed data fixes because of overly restricted cobweb of foreign keys. But this was the design we had and we needed a fix. “Can’t we just cache it somehow?” The data were localization keys – not for the application itself but for some form templates – so it was part of the data model. We joined these keys based on the user’s language for each row of possibly multi-row template (often with hundreds of rows).

First we needed to drop the joins. Plan was to ask some cache component for any of these localization keys based on its ID and the language. It took us some time to rewrite all the code that could utilize it and originally just joined the data in queries. But the result was worth it. The component simply read the whole table and created map of maps based on the languages and IDs. DB guys were happy. We stopped executing some queries altogether and removed unnecessary joins from many others.

There are other situations when we may seriously consider just coding our own cache explicitly and not rely on declarative one like JSR 107. Declarative cache doesn’t mean unattended anyway. We should limit it, set policies, etc. It can be extremely handy when we can get results cheap and this typically happens when limited subset of possibly big chunk of data is used repeatedly using the same keys.

Programmatic (explicit) cache can shine in other areas:

  • When we work with a limited set of data and we need it often – and we want to pre-cache it. This may be reasonable scenario also for declarative cache if we can pre-fill it somehow and there is a single way how we obtain the data.
  • If we require the same data based on various attributes (different views). We can use multiple maps that point to the same actual instances. This can work both for cases when we cache as we go (when misses are possible) and when we preload the whole set (ideal case when not too big).
  • The cache can cooperate with any administrative features that modify the underlying data. Because we code it with full knowledge of the logic behind the data we can selectively refresh update the cache in a smart way. In declarative cache we often have to evict it completely – although this can still be a good simple strategy even for programmatic caches, especially when the refresh requires a single select.
  • Full preload requires more memory up-front and slows down the startup of the application (can be done lazily on-demand) but deals with the DB once and for all. Declarative cache executes a query on demand for every miss but loads only some of the data that require caching – potentially less efficiently than full preload.

Of course, there are cases when we can’t use fully pre-loaded cache. In general, ask yourself a question whether “cache this by this key” is best (or good enough) solution or whether you can implement a component utilizing logic behind the data better – and whether it’s all worth it.

Conclusion

Caching can help tremendously with the performance of our applications – but it can also hurt if done badly. We should be very aware of our cache setup. We should be very clear how we want to do it. In the code it may look automagical, but it must be explicit somewhere – our strategy must be well known to all the developers who may encounter it in any way (even without knowing).

We always trade something for something – with caching it’s typically memory for speed.13 Memory can slow us down too, but in general we have plenty of it nowadays. While we are in a single process a JPA provider can typically manage the data consistency. If we have a distributed architecture we enter a different world altogether and I’d think twice before going there. We must feel the need for it and we must measure what we get – because we’ll definitely get the complexity and we have to think about consistency much more.

Don’t mix freely caching on multiple levels. Database cache is mostly transparent to us, but when we mix two declarative caches we often make matters worse. Especially when we cache entities with technology that is not aware of their lifecycle within the persistence context.

Finally, depending on what our keys for caching are we may waste a lot of memory. Entity ID (like in second level cache) is natural and good key. But if we key on many various selectors that may return the same entities (single or even whole collections) we may store many instances for the same entity in the cache. That wastes memory. Knowing more about the logic between the keys and values we may get better results with our own implementation of some explicit cache on an application level. It may require more effort but the pay-off may be significant.

Shortly:

  • Don’t cache mindlessly. Design what to cache (or not) and size the cache regions properly.
  • Realize that with caching we trade CPU for memory – or rather for memory and hopefully less CPU required by GC. Check the heap usage after changes.
  • Beware of caching on multiple levels, especially combining JPA and non-JPA caches inside JVM.
  • Consider implementing your own caching for selected sub-domains. The same data accessed by various criteria may be fit for this.
  • Measure the impact of any change related to caching.

5. Love and hate for ORM

ORM is typically seen as a good fit with domain-driven design (DDD). But ORM happened to become extremely popular and people thought it will solve all their database access problems without the need to learn SQL. This way obviously failed and hurt ORM back a lot, too. I say it again, ORM is very difficult, even using well documented ORM (like the JPA standard) is hard – there’s simply too much in it. We’re dealing with complex problem, with a mismatch – and some kind of mismatch it is. And it’s not the basic principle that hurts, we always get burnt on many, too many, details.

Vietnam of Computer Science

One of the best balanced texts that critique ORM, and probably one of the most famous, is quite old actually. In 2006 Ted Neward wrote an extensive post with a fitting, if provoking, name The Vietnam of Computer Science. If you seriously want to use ORM on any of your projects, you should read this – unless data access is not an important part of that project (who are we kidding, right?). You may skip the history of Vietnam war, but definitely give the technical part a dive it deserves.

ORM wasn’t that young anymore in 2006 and various experiences had shown that it easily brought more problems than benefits – especially if approached with partial knowledge, typically based on an assumption that “our developers don’t need to know SQL”. When some kind of non-programming architect recommends this for a project and they are long gone when the first troubles appear it’s really easy to recommend it again. It was so easy to generate entities from our DDL, wasn’t it? Hopefully managers are too high to be an audience for JPA/ORM recommendations, but technical guys can hurt themselves well enough. The JPA, for instance, is still shiny, it’s a standard after all – and yeah, it is kinda Java EE-ish, but this is the good new EE, right?

Wrong. Firstly, JPA/ORM is so complex when it comes to details, that using it as a tool for “cheap developers” who don’t need to learn SQL is as silly as it gets. I don’t know when learning became the bad thing in the first place, but some managers think they can save man-months/time/money when they skip training14. When the things get messy – and they will – there is nobody around who really understands ORM and there is virtually no chance to rewrite data access layer to get rid of it. Easiest thing to do is to blame ORM.

You may ask: What has changed since 2006? My personal take on the answer would be:

  • Nothing essential could have changed, we merely smoothed some rough edges, got a bit more familiar with already familiar topic and in Java space we standardized the beast (JPA).
  • We added more options to the query languages to make the gap between them and the SQL smaller. Funny enough, the JPA initially didn’t help with this as it lagged couple of years behind capabilities of the leading ORM solutions.
  • Query-by-API (mentioned in the post) is much better nowadays, state of the art technologies like Querydsl have very rich fluent API that is also very compact (definitely not “much more verbose than the traditional SQL approach”). Also both type safety and testing practices are much more developed.

Other than that, virtually all the concerns Ted mentioned are still valid.

Not much love for ORM

Whatever was written back in 2006, ORMs were on the rise since then. Maybe the absolute numbers of ORM experts are now higher than then, but I’d bet the ratio of experts among its users plummeted (no research, just my experience). ORM/JPA is easily available, Java EE supports it, Spring supports it, you can use it in Java SE easily, you can even generate CRUD scaffolding for your application with JPA using some rapid application development tools. That means a lot of developers are exposed to it. In many cases it seems deceivingly easy when you start using it.

ORM has a bad reputation with our DBAs – and for good reasons too. It takes some effort to make it generate reasonable queries, not to mention that you often have to break the ORM abstraction to do so. It’s good to start with clean, untangled code as it helps tremendously when you need to optimize some queries later. Optimization, however, can complicate the matter. If you explicitly name columns and don’t load whole entities you may get better performance, but it will be unfriendly to the entity cache. The same goes for bulk updates (e.g. “change this column for all entities where…”). There are the right ways to do it in the domain model, but learning the right path of OO and domain-driven design is probably even harder than starting with JPA – otherwise we’d see many more DDD-based projects around.

We talked about caching already and I’d say that misunderstandings around ORM caching are the reason for a lot of performance problems and potentially for data corruption too. This is when it starts to hurt – and when you can’t get easily out, hate often comes with it. When you make a mistake with a UI library, you may convince someone to let you rewrite it – and they can see the difference. Rewriting data access layer gives seemingly nothing to the client, unless the data access is really slow, but then the damage done is already quite big anyway.

But a lot of hate

When you need to bash some technology, ORM is a safe bet. I don’t know whether I should even mention ORM Is an Offensive Anti-Pattern but as it is now the top result on Google search for ORM, I do it anyway. It wouldn’t be fair to say the author doesn’t provide an alternative, but I had read a lot of his other posts (before I stopped) to see where “SQL-speaking objects” are going to. I cannot see single responsibility principle in it at all, and while SRP doesn’t have to be the single holy grail of OOP, putting everything into the domain object itself is not a good idea.15

There are other flaws with this particular post. Mapping is explained on a very primitive case, while ORM utilizes unit-of-work for cases where you want to execute multiple updates in a single transaction, possibly updates on the same object. If every elementary change on the object emits an SQL and we want to set many properties for the same underlying table in a transaction we get performance even worse than non-tuned ORM! You can answer with object exposing various methods to update this and that, which possibly leads to a combinatorial explosion. Further, in times of injection we are shown the most verbose way how to do ORM, the way I haven’t seen for years.

“SQL-speaking objects” bring us to a more generic topic of modelling objects in our programs. We hardly model real-life objects as they act in real life, because in many cases we should not. Information systems allow changing data about things that normally cannot change, because someone might have entered the information incorrectly in the first place.

How to model the behaviour of a tin can? Should it have open method? Even in real life someone opens the tin with a tin opener – an interaction of three objects. Why do we insist on objects storing themselves then? It still may be perfectly valid pattern – as I said real life is not always a good answer to our modelling needs – but it is often overused. While in real-life human does it, we often have various helper objects for behaviour, often ending with -er. While I see why this is considered antipattern, I personally dislike the generalization of rules like objects (classes) ending with -er are evil. Sure, they let me think – but TinOpener ends with “-er” too and it’s perfectly valid (even real-life!) object.

In any case I agree with the point that we should not avoid SQL. If we use ORM we should also know its QL (JPQL for JPA) and how it maps to SQL. We generally should not avoid of what happens down the stack, especially when the abstraction is not perfect. ORM, no question about it, is not a perfect abstraction.

To see a much better case against ORM let’s read ORM is an anti-pattern. Here we can find summary of all the bad things related to ORM, there is hardly anything we can argue about and if you read Ted Neward’s post too, you can easily map the problems from one post to another. We will go the full circle back to Martin Fowler and his ORM Hate. We simply have to accept ORM as it is and either avoid it, or use it with its limitations, knowing that the abstraction is not perfect. If we avoid it, we have to choose relational or object world, but we can hardly have both.

Is tuning-down a way out?

Not so long ago Uncle Bob has written an article Make the Magic go away. It nicely sums up many points related to using frameworks and although ORM is not mentioned it definitely fits the story.

Using less of ORM and relying less on complex auto-magic features is a way I propose. It builds on the premise that we should use ORM where it helps us, avoid it where it does not and know the consequences of both styles and their interactions. It may happen that using both ways adds complexity too, but from my experience it is not the case and not an inherent problem. JPA has much better mapping of values from DB to objects compared to the JDBC, so even if you used your entities as dummy DTOs it is still worth it. It abstracts concrete SQL flavour away which has its benefits – and unless this is a real issue for more than couple of queries you can resolve the rest with either native SQL support in the JPA, or use JDBC based solution.

Coming to relations, there may be many of them where to-one does not pose a problem. In that case make your life easier and use mapping to objects. If cascade loading of to-one causes problems you can try how well LAZY is supported by your ORM. Otherwise you have to live with it for non-critical cases and work around it for the critical ones with queries – we will get to this in the chapter Troubles with to-one relationships.

If your ORM allows it, you may go even lower on the abstraction scale and map raw foreign key values instead of related objects. While the mapping part is possible with the JPA standard, it is not enough as JPA does not allow you to join on such relationships. EclipseLink offers the last missing ingredient and this solution is described in the chapter Removing to-one altogether.

This all renders ORM as a bit lower-level tool than intended, but still extremely useful. It still allows you to generate schema from objects if you have control over your RDBMS (sometimes you don’t) or even just document your database with class diagram of entities16. We still have to keep unit-of-work and caching in check, but both are very useful if used well. I definitely don’t avoid using EntityManager, how could I?

And there is more

This list of potential surprises is far from complete, but for the rest of the book we will narrow our focus to relationships. We will review their mapping and how it affects querying and generated SQL queries.

I’d like to mention one more problem, kind of natural outcome of real-life software development. JPA is implemented by couple of projects and these projects have bugs. If the bug is generally critical, they fix it quite soon. If it’s critical only for your project, they may not. Most ORM projects are now open-sourced and you may try to fix it yourselves, although managing patches for updated OSS project is rather painful.

It’s all about how serious your trouble is – if you can find work-around, do it. For instance, due to a bug (now fixed) EclipseLink returned empty stream() for lazy lists. Officially they didn’t support Java 8, while in fact they extended Vector improperly breaking its invariants. We simply copied the list and called stream() on the copy and we had a utility for it. It wasn’t nice, but it worked and it was very easy to remove later. When they fixed the issue we simplified the code in the utility method and then inlined all the occurrences and it looked like it had never happened.

You may think about switching your provider – and JPA 2.1 brought a lot of goodies to make it even easier as many properties in persistence.xml are now non-proprietary. But you still have to go through the configuration (focus on caching especially) to make it work “as before” and then you may get caught into bug cross-fire, as I call it. I had been using Hibernate for many years when I joined another project using EclipseLink. After a few months we wanted to switch to Hibernate as we discovered that most of the team was more familiar with it. But some of our JPA queries didn’t work on Hibernate because of a bug.

So even something that should work in theory may be quite a horror in practice. I hope you have tests to catch any potential bug affecting your project.

II Opinionated JPA

To use ORM/JPA really well you need to know a lot about it, about the provider, its settings, about caching mechanisms and so on. And you should not be oblivious to SQL either. I’ve never worked on a system where ORM/JPA was used and at the same time everybody was really happy with the result. The system got slower, DBAs asked why we produce crappy SQLs and so on. We fought with ORM drawbacks before we could benefit from the features. It all comes down to inherent complexity of ORM. Any pro can instantly blame us for “just not knowing”… but I think the problem goes deeper.

While I too complain about programmers’ laziness to learn more about the technologies they work with, JPA (plus the provider, mind you!) is a really complex topic. I’ve been studying it for years already. Sure, there was no 100% study time like when you prepare for a certification. On the other hand, most of my study was based on experiences – and not only reactive (solutions to imminent problems) but also proactive (how to do it better on any level, as a coder, designer, architect, whatever).

I always welcome to learn more about JPA. Not because it’s my favourite technology, I just happened to spend so much time with it. When we started one recent project we discussed what to use, I was quite fresh to the team and somehow got convinced to use JPA although I would rather risk Querydsl over SQL directly. But I didn’t mind that much, JPA was quite well known to me. Little I knew how many surprises it would still have for me! The biggest of them all was probably the reason why I started to complain about the JPA on my blog and, eventually, I decided to put my twisted ideas how to utilize JPA into this book.

Generally I don’t like ideas like “let’s just use subset of this”. I don’t believe that hiring dumb Java programmers and forbidding them to use latest Java 8 features is the solution. I rarely saw such a “subset” programmer to write clean code – or they were bright enough to dislike such an arrangement and typically left soon for something better. But the JPA is not a programming language, you’re working with an API and you may choose the patterns you want to use and others you don’t. This time I decided “let’s use SQL more” and “let’s not load graph of objects (even if cached) when I just want to update this single one”. And that’s how it all started – the more I know about JPA the less of it I use. But it still has benefits I don’t want to let go.

So the question is: Can we tune the JPA down without crippling it – or ourselves?

6. Tuning ORM down

Full-blown object-relational mapping maps database foreign keys as relations between objects. Instead of Integer breedId you have Breed breed where mapping information provides all the necessary low-level information about the foreign key. This mapping information can be stored elsewhere (typically XML mapping file) or as annotations directly on the entity (probably more popular nowadays). Relations can be mapped in any direction, even in reverse, for one or both directions and in any cardinality with annotations @OneOnOne, @OneToMany, @ManyToOne and @ManyToMany. If you map both directions you choose the owner’s side, which is kinda more “in control” of the relationship. Mapping foreign keys as relations is probably the most pronounced feature of ORM – and originally I wanted to make a case against it in this very book.

In an ironic twist I discovered that the way we bent JPA 2.1 works only on its reference implementation, that is EclipseLink1, this all more than a year too late. I was considering to stop any further work on the book after loosing this most important showcase point. After all I decided to go on with a weaker version after all. We will describe the troubles with to-one relations, their possible solutions – including the one that is not JPA compliant – and there are also other helpful tips, many Querydsl examples throughout and two Querydsl dedicated chapters.

Most of the points I’m going to explain somehow revolve around two rules:

  • Be explicit. For instance don’t leave persistence context (session) open after you leave the service layer and don’t wait for the presentation layer to load something lazily. Make explicit contract and fetch eagerly what you need for presentation, just like you would with plain SQL (but still avoiding ORM syndrome “load the whole DB”). Also, don’t rely on caches blindly.
  • Don’t go lazy. Or at least not easily. This obviously stems from the previous point, but goes further. Lazy may save some queries here and there, but in practice we rely on it too much. In most cases we can be explicit, but we’re lazy to be so (so let’s not go lazy ourselves either). There are places where FetchType.LAZY means nothing to JPA. Oh, they will say it’s a hint for a provider, but it’s not guaranteed. That’s nothing for me. Let’s face it – any to-one mapping is eager, unless you add some complexity to your system to make it lazy. It’s not lazy because you annotate it so, deal with it.

As a kind of shortcut I have to mention that Hibernate 5.x supports LAZY to-one relationships as you’d expect, more about it later in part on how lazy is (not) guaranteed. When talking about often vain LAZY on to-one, let me be clear that it is still preferred fetching for collections – and quite reliable for them as well. This does not mean you should rely on these lazy to-many relations when you want to fetch them for multiple entities at once. For instance if you want to display a page with the list of messages with attachments, possibly multiple of them for each message, you should not wait for lazy fetch. More about this in the chapter Avoid N+1 select.

Price for relations

For me the biggest problem is that JPA does not provide any convenient way how to stop cascading loads for @ManyToOne and @OneToOne (commonly named to-one in this book as you probably noticed) relationships. I don’t mind to-many relations, they have their twists, but at least their lazy works. But to-one typically triggers find by id. If you have a Dog that has an owner (type Person) and is of specific Breed you must load these two things along with a dog. Maybe they will be joined by the JPA provider, maybe not2, maybe they are already in second-level cache and will be “loaded” nearly “for free”. All these options should be seriously considered, analyzed and proved before you can say that you know what is going on in your application.

And it just starts there, because Person has an Address which – in case of a rich system – may further point to District, County, State and so on. Once I wanted to change something like Currency in a treasury system. It loaded around 50 objects – all of them across these to-one relations. This may happen when you naively want to change a single attribute in that currency. In SQL terms, that’s an update of a single column for a single row.

When you insert or update a Dog you can use em.getReference(clazz, id) to get a Breed object containing only id of the breed. This effectively works as a wrapped foreign key (FK). Heck, you can just new an empty Breed and set its id, you don’t even have to ask em for the reference. In case of update you will not save much as that Dog is probably managed already anyway. You can either update managed entity – which means it loaded previous values for all to-one FKs cascading as far as it needed – or you can try to merge the entity – but this works only if you overwrite it completely, it’s not usable for update of a single column. Or should you just use JPQL update and possibly remove all dogs from the second-level cache? How ORM is that?

Why JPA doesn’t provide better fetch control for finds? I want to work with this Dog object now, I’m not interested in its relations, just wrap those FK values into otherwise empty entity objects (like references) and let me do my stuff! How I wished I could just map raw FK value instead of related object… actually, you can, but while you are able to load such related object explicitly (find by id), you can’t join on a relationship defined so.

If you use EclipseLink you can do it, but that means you can’t switch to other JPA 2.1 providers reliably, not to mention that dropping the mapping is by no means a minor detour from standard path. However the problems with to-one ultimately led me to these extreme measures and I got rid of to-one mappings and never looked back. Because I based my work on reference implementation of JPA 2.1 (EclipseLink) it took me more than a year to discover I diverged from the standard – and even then only because I started to write tests for this book with both major JPA providers.

More about this in the chapter Troubles with to-one relationships. Other points are much less radical compared to this one. Long before I learnt that being explicit and less lazy is definitely better and we will talk about it in other sections.

How does this affect my domain model?

I stated that this book will not be about architecture, but this is the part where we have to tackle it a bit. If we talk about domain model, we probably also talk about domain driven design (DDD) best described in [DDD]. I can’t claim experience with DDD because I never saw it in practice, but it must work for some, reportedly, especially for complex business domains with lots of rules, etc. Reading [PoEAA] it is obvious that there’s a lot of synergy between DDD and ORM. One may even ask whether to use ORM without DDD at all. And I can’t answer that, sorry.

[PoEAA] recommends other patterns for simpler domain problems (Transaction Script, Table Module) with related data source patterns (Table Data Gateway, Row Data Gateway or Active Mapper). There are some newer architectural solutions as well, like Command Query Responsibility Segregation (CQRS) or Data, context and interaction (DCI), both of them usable with object-oriented languages, but not necessarily for every problem.

Back to domain, though. One antipattern often mentioned in relation to ORM/JPA is called Anemic domain model. Let’s quote master Martin Fowler a bit again:

I’m not able to slip out of this topic and make object purists happy, but there are many other styles of programming and the question (far beyond the scope of this book) is whether being purely OO is the most important thing after all. In most projects I treated my entities like java beans with just a little low-level logic to it – so it was mostly this “anemic domain model” song. But entity objects can be perfect data transfer objects (DTOs) in many situations. It is still far more advanced compared to amorphous rows of a ResultSet. We have dumb model, but we can utilize it in a domain if we need to.

I even contemplated something like “domain over entity model” solution where entity is as dumb as possible (it still has responsibility for me, it maps to a database table) and I can wrap it with whatever logic I want in a domain object. This even allows me to reverse some dependencies if I wish so – for instance, in a database my order items point to the order using a foreign key, but in my higher model it can be the other way around, invoice can aggregate items that may not even know about an order.

In any case, whatever we do, we can hardly get rid of all the notions of a particular data store we use, because any such “total abstraction” must lead to some mismatch – and that typically leads to complexity and often significant performance penalty as well. The way I go with this book is trying to get as much benefit from using the JPA without incurring too much cost. Maybe I’m not going for the best approach, but I’m trying to avoid the biggest pain points while accepting that I have SQL somewhere down there.

One thing I respect on Fowler et al. is that they try to balance costs/benefits and they neither push ORM always forward nor do they criticize it all the time without offering real alternatives for the complex cases. However, many other smart and experienced people dislike ORM. Listening to “pure haters” does not bring much to a discussion, but some cases are well-argued, offering options and you can sense the deep understanding of the topic.

When to tune down and when not?

I’d say: “Just use it out of the box until it works for you.” There are good reasons not to rely on something we don’t understand but in custom enterprise software there often is no other way. For many systems everything will be OK, but for slightly complicated database/domain models a lot of things can go wrong.

Performance problems

Most typically the performance degrades first. You definitely should start looking at the generated queries. Something can be fixed in the database (typically with indexes), some queries can be rewritten in the application but sometimes you have to adjust the mapping.

With to-one relations you either have ORM with reliable lazy fetch or you have to do tricks described in Troubles with to-one relationships or even change the mapping in a way that is not JPA compliant, as in Removing to-one altogether. There are also still JPA native queries and JDBC underneath – needless to say that all of this introduces new concept into the code and if we mix it too much the complexity is on the rise.

To-many has less problems but some mappers generate more SQL joins than necessary. This can be often mended by introducing explicit entity mapping for the associative table. This does not raise conceptual complexity but some query code may look awkward and more SQL like instead of JPA like. I argue that explicit joins are not that bad though.

Other typical problems involve N+1 select problem, but this type of problems is easy to avoid if we are explicit with our service contracts and don’t let presentation layer to initiate lazy load.

Strange persistence bugs

Most typical enterprise applications can keep persistence context short – and we should. When ORM got mainstream a lot of people complained about lazy load exceptions (like the Hibernate famous LazyInitializationException) but this results from misusing ORM. Programmers solved it with making the session (persistence context) longer and introduced Open Session in View (OSIV) pattern. This is now widely considered antipattern and there are many good reasons for that.

Using OSIV allowed to keep back-end and front-end programmers separated (with all the disadvantages when two minds do what one mind can do as well), dealing only with some common service API, but this API was leaky when it didn’t load all the data and let presentation fetch the rest on demand. Instead of making it explicit what data the service call provides for the view we let the view3 to perform virtually anything on the managed (“live”) entities. This means that the presentation programmers must be very careful what they do.

There is a serious consequence when the service layer loses control over the persistence context. Transaction boundaries will stop working as one would expect. When the presentation layer combines multiple calls to the service layer you may get unexpected results – results no one have ever consciously implemented.

  1. Presentation layer first fetches some entity A. Programmer knows it is read-only call – which is not reliable across the board, but let’s say they know it works for their environment – and because used entity objects as DTO.
  2. Based on the request we are servicing it changes the entity – knowing it will not get persisted in the database.
  3. There is some transactional read-write service call working with completely different type of entities B.

Result without OSIV is quite deterministic and if we hit some lazy-load exception than we should fetch all the necessary data in the first service call. With OSIV, however, modified A gets persisted together with changes to B – because they are both part of the same persistence context which got flushed and committed.

There is a couple of wrong decisions made here. We may argue that even when the presentation layer is not remote we should still service it with one coarse-grained call. Some will scowl at entities used as DTOs and will propose separate DTO classes (others will scowl at that in turn). But OSIV is the real culprit, the one that gives you unexpected behaviour spanning across multiple service calls. Even if the example (real-life one, mind you) seems contrived when we dropped OSIV from our project everything around transactions got much cleaner.

7. No further step without Querydsl

I do realize I’m breaking a flow of the book a little bit, but since I’ve been introduced to this neat library I became a huge fan and I never looked back. It doesn’t affect JPA itself because it sits above it but in your code it virtually replaces both JPQL and especially Criteria API.

To put it simply, Querydsl is a library that – from programmer’s perspective – works like a more expressive and readable version of Java Persistence Criteria API. Internally it first generates JPQL and the rest from there is handled by the JPA provider. Querydsl can actually talk to many more back-ends: SQL, Hibernate directly and others, but we will focus on Querydsl over JPA here. To use it goes in these steps:

  • declare the dependency for Querydsl library and to its annotation processor,
  • add a step in your build to generate metamodel classes,
  • and write queries happily in sort of criteria-like way.

But Criteria API also lets you use generated metamodel, so what’s the deal? Why would I introduce non-standard third-party library when it does not provide any additional advantage? If you even hate this idea, then you probably can stop reading this book – or you can translate all my Querydsl code into JPQL or Criteria yourself, which is perfectly doable, of course! Querydsl does not disrupt the stack under it, it is “just” slapped over the JPA to make it more convenient.

Querydsl brings in a domain-specific language (DSL) in the form of its fluent API. It happens to be so-called internal DSL because it’s still embedded in Java, it’s not different language per se. This fluent API is more concise, readable and convenient than Criteria API.

DSL is a language of some specific domain – in this case it’s a query language very close to JPQL or SQL. It builds on a generated metamodel a lot. API of this metamodel is well-though and takes type-safety to a higher level compared with Criteria API. This not only gives us compile-time checks for our queries, but also offers even better auto-completion in IDEs.

Simple example with Querydsl

Let’s be concrete now – but also very simple. We have a class Dog, each dog has a name and we will query by this name. Assuming we got hold of EntityManager (variable em) the code goes like this:

Querydsl simple example
1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .where(QDog.dog.name.like("Re%"))
5 //.where(QDog.dog.name.startsWith("Re")) // alternative
6   .fetch();

Both where alternatives produce the same result in this case, but startsWith may communicate the intention better, unless you go for like("%any%") in which case contains would be better. If you are provided input values for like, leave it. If you can tell from the logic that more specific name for the operation is better, go for it.

This is a very subtle thing, but we can see that this DSL contains variations that can communicate our intention more clearly. Criteria API sticks to like only, because that is its internal model. Other thing is how beautifully the whole query flows. In version 4.x the fluent API got even closer to JPQL/SQL semantics, it starts with select (what) and ends with fetch which is a mere signal to deliver the results. As with any other fluent API, you need a terminal operation. In previous versions you would have no select because it was included in a terminal operation, e.g. list(QDog.dog). Newer version is one line longer, but closer to the target domain of query languages. Advanced chapter has a dedicated section on differences between versions 3 and 4.

Comparison with Criteria API

Both Querydsl and Criteria API are natural fit for dynamic query creation. Doing this with JPQL is rather painful. Imagine a search form with separate fields for a person entity, so you can search by name, address, date of birth from–to, etc. We don’t want to add the search condition when the respective input field is empty. If you have done this before with any form of query string concatenation then you probably know the pain. In extreme cases of plain JDBC with prepared statement you even have to write all the ifs twice – first to add where condition (or and for any next one) and second to set parameters. Technically you can embed the parameter values into the query, but let’s help the infamous injection vulnerability get off the top of the OWASP Top 10 list.

Let’s see how query for our dogs looks like with Criteria API – again starting from em:

Criteria API simple example
 1 CriteriaBuilder cb = em.getCriteriaBuilder();
 2 CriteriaQuery<Dog> query = cb.createQuery(Dog.class);
 3 Root<Dog> dog = query.from(Dog.class);
 4 query.select(dog)
 5   // this is the only place where we can use metamodel in this example
 6   .where(cb.like(dog.get(Dog_.name), "Re%"));
 7   // without metamodel it would be:
 8 //.where(cb.like(dog.<String>get("name"), "Re%"));
 9 List<Dog> dogs = em.createQuery(query)
10   .getResultList();

Let’s observe now:

  • First you need to get CriteriaBuilder from existing em. You might “cache” this into a field but it may not play well with EE component model, so I’d rather get it before using. This should not be heavy operation, in most cases entity manager holds this builder already and merely gives it to you (hence get and not new or create).
  • Then you create an instance of CriteriaQuery.
  • From this you need to get a Root object representing content of a from clause.
  • Then you use the query in a nearly-fluent fashion. Version with metamodel is presented with alternative without it in the comment.
  • Finally, you use em again to get a TypedQuery based on the CriteriaQuery and we ask it for results.

While in case of Criteria API you don’t need to generate metamodel from the entity classes, in Querydsl this is not optional. But using metamodel in Criteria API is advantageous anyway so the need to generate the metamodel for Querydsl using annotation processor can hardly be considered a drawback. It can be easily integrated with Maven or other build as demonstrated in the companion sources to this book or documented on Querydsl site.

For another comparison of Querydsl and Criteria API, you can also check the original blog post from 2010. Querydsl was much younger then (version 1.x) but the difference was striking already.

Comparison with JPQL

Comparing Querydsl with Criteria API was rather easy as they are in the same ballpark. Querydsl, however, with its fluency can be compared to JPQL as well. After all JPQL is non-Java DSL, even though it typically is embedded in Java code. Let’s see JPQL in action first to finish our side-by-side comparisons:

JPQL simple example
1 List<Dog> dogs = em.createQuery(
2   "select d from Dog d where d.name like :name", Dog.class)
3   .setParameter("name", "Re%")
4   .getResultList();

This is it! Straight to the point, and you can even call it fluent! Probably the best we can do with Querydsl is adding one line to introduce shorter “alias” like this:

Querydsl simple example with alias
1 QDog d = new QDog("d1");
2 List<Dog> dogs = new JPAQuery<>(em)
3   .select(d)
4   .from(d)
5   .where(d.name.startsWith("Re"))
6   .fetch();

Using aliases is very handy especially for longer and/or more complicated queries. We could use QDog.dog as the value, or here I introduced new query variable and named it d1. This name will appear in generated JPQL that looks a little bit different from the JPQL in example above:

1 select d1 from Dog d1 where d1.name like ?1 escape '!'

There is a subtle difference in how Querydsl generates like clause – which, by the way, is fully customizable using Querydsl templates. But you can see that alias appears in JPQL, although neither EclipseLink nor Hibernate bother to translate it to the generated SQL for your convenience.

Now, if we compare both code snippets above (alias creation included) we get a surprising result – there are more characters in the JPQL version! Although it’s just a nit-picking (line/char up or down), it’s clear that Querydsl can express JPQL extremely well (especially in 4.x version) and at the same time it allows for easy dynamic query creation.

You may ask about named queries. Here I admit right away, that Querydsl necessarily introduces overhead (see the next section on disadvantages), but when it comes to query reuse from programmer’s perspective Querydsl offers so called detached queries. While these are not covered in their reference manual (as of March 2016), we will talk more about them in a chapter about advanced Querydsl topics.

What about the cons?

Before we move to other Querydsl topics I’d go over its disadvantages. For one, you’re definitely adding some overhead. I don’t know exactly how big, maybe with the SQL backend (using JDBC directly) it would be more pronounced because ORM itself is big overhead anyway. In any case, there is an object graph of the query in the memory before it is serialized into JPQL – and from there it’s on the same baseline like using JPA directly.

This performance factor obviously is not a big deal for many Querydsl users (some of them are really big names), it mostly does not add too much to the SQL execution itself, but in any case – if you are interested you have to measure it yourself. Also realize that without Querydsl you either have to mess with JPQL strings before you have the complete dynamic WHERE part or you should compare it with Criteria API which does not have to go through JPQL, but also creates some object graph in the memory (richer or not? depends).

Documentation, while good, is definitely not complete. For instance, detached queries are not mentioned in their Querydsl Reference Guide and there is much more you need to find out yourself. You’ll need to find your own best practices, probably, but it is not that difficult with Querydsl. Based (not only) on my own experience, it is literally joy to work with Querydsl and explore what it can provide – for me it worked like this from day zero. It’s also very easy to find responses on the StackOverflow or their mailing-list, very often provided directly by Querydsl developers. This makes any lack of documentation easy to overcome.

Finally, you’re adding another dependency to the project, which may be a problem for some. For the teams I worked with it was well compensated by the code clarity Querydsl gave us.

Be explicit with aliases

With attribute paths at our disposal it is easy to require some data without explicitly using joins. Let’s consider the following entity chain:

Class diagram of our entity model
Class diagram of our entity model

We will start all the following queries at EntityA and first we will want the list of related EntityC objects. We may approach it like this:

Query with two-level implicit join
1 List<EntityC> result = new JPAQuery<>(em)
2   .select(QEntityA.entityA.entityB.entityC)
3   .from(QEntityA.entityA)
4   .fetch();

This works and Hibernate generates the following SQL:

Generated SQL (formatted)
1 select
2   entityc2_.id as id1_5_,
3   entityc2_.entityD_id as entityD_3_5_,
4   entityc2_.name as name2_5_
5 from
6   EntityA entitya0_,
7   EntityB entityb1_ inner join
8   EntityC entityc2_ on entityb1_.entityC_id=entityc2_.id
9 where entitya0_.entityB_id=entityb1_.id

Effectively these are both inner joins and the result is as expected. But what happens if we want to traverse to EntityD?

Three-level implicit join
1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityA.entityA.entityB.entityC.entityD)
3   .from(QEntityA.entityA)
4   .fetch();

After running this we end up with NullPointerException on the line with select method. What happened? The core problem is that it is not feasible to generate infinitely deep path using final fields. Querydsl offers some solutions to this as discussed in the reference documentation. You can either ask the generator to initialize the path you need with the @QueryInit annotation or you can mark the entity with @Config(entityAccessors=true) which generates accessor methods instead of final fields. In the latter case you’d simply use this annotation on EntityC and in the select call use entityD() which would create the property on the fly. This way you can traverse relations ad lib. It requires some Querydsl annotations on the entities – which you may already use, for instance for custom constructors, etc.

However, instead of customizing the generation I’d advocate being explicit with the aliases. If there is a join let it show! We can go around the limitation of max-two-level paths using just a single join:

Taming deep paths with a join
1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityB.entityB.entityC.entityD)
3   .from(QEntityA.entityA)
4   .join(QEntityA.entityA.entityB, QEntityB.entityB)
5   .fetch();

Here we lowered the nesting back to two levels by making EntityB explicit. We used existing default alias QEntityB.entityB. This query makes the code run, but looks… unclean, really. Let’s go all the way and make all the joins explicit:

Explicit joins with aliases
1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityD.entityD)
3   .from(QEntityA.entityA)
4   // second parameter is alias for the path in the first parameter
5   .join(QEntityA.entityA.entityB, QEntityB.entityB)
6   // first parameter uses alias from the previous line
7   .join(QEntityB.entityB.entityC, QEntityC.entityC)
8   .join(QEntityC.entityC.entityD, QEntityD.entityD)
9   .fetch();

Now this looks long, but it says exactly what it does – no surprises. We may help it in a couple of ways but the easiest one is to introduce the aliases upfront:

Explicit upfront aliases
 1 QEntityA a = new QEntityA("a");
 2 QEntityB b = new QEntityB("b");
 3 QEntityC c = new QEntityC("c");
 4 QEntityD d = new QEntityD("d");
 5 List<EntityD> result = new JPAQuery<>(em)
 6   .select(d)
 7   .from(a)
 8   .join(a.entityB, b)
 9   .join(b.entityC, c)
10   .join(c.entityD, d)
11   .fetch();

I like this most – it is couple of lines longer but very clean and the query is very easy to read. Anytime I work with joins I always go for explicit aliases.

Another related problem is joining the same entity more times (but in different roles). Newbie programmers also often fall for a trap of using the same alias – typically the default one offered on each Q-class – for multiple joins on the same entity. There can be two distinct local variables representing these aliases but they both point to the same object instance – hence it’s still the same alias. Whenever I join an entity more times in a query I always go for created aliases. When I’m sure that an entity is in a query just once I may use default alias. It really pays off to have a strict and clean strategy of how to use the aliases and local variables pointing at them.

Functions and operations

Let’s see what data we will work on in the next sections so we can understand the returned results.

Content of Breed table
id name
1 collie
2 german shepherd
3 retriever
Content of Dog table (id is not important)
name age breed_id
Lassie 7 1 (collie)
Rex 6 2 (german shepherd)
Ben 4 2 (german shepherd)
Mixer 3 NULL (unknown breed)

The following examples can be found on GitHub in FunctionsAndOperations.java.

When we want to write some operation we simply follow the fluent API. You can write QDog.dog.age. and hit auto-complete – this offers us all the possible operations and functions we can use on the attribute. These are based on the known type of that attribute, so we only get the list of functions that make sense (from technical point of view, not necessarily from business perspective, of course).

This naturally makes sense for binary operations. If we want to see dog’s age halved we can do it like this:

1 List<Tuple> dogsGettingYounger = new JPAQuery<>(em)
2   .select(QDog.dog.name, QDog.dog.age.divide(2))
3   .from(QDog.dog)
4   .fetch();

We will cover Tuples soon, now just know they can be printed easily and the output would be [[Lassie, 3], [Rex, 3], [Ben, 2], [Mixer (unknown breed), 1]]. We are not limited to constants and can add together two columns or multiply amount by rate, etc. You may also integer results of the division – this can be mended if you select age as double – like this:

1 // rest is the same
2 .select(QDog.dog.name, QDog.dog.age.castToNum(Double.class).divide(2))

Unary functions are less natural as they are appended to the attribute expression just like binary operations – they just don’t take any arguments:

1 List<Tuple> dogsAndNameLengths = new JPAQuery<>(em)
2   .select(QDog.dog.name, QDog.dog.name.length())
3   .from(QDog.dog)
4   .fetch();

This returns [[Lassie, 6], [Rex, 3], [Ben, 3], [Mixer (unknown breed), 21]]. JPA 2.1 allows the usage of any function using the FUNCTION construct. We will cover this later in the advanced chapter on Querydsl.

Aggregate functions

Using aggregate functions is the same like any other function – but as we know from SQL, we need to group by any other non-aggregated columns. For instance to see the counts of dogs for any age we can write:

1 List<Tuple> dogCountByAge = new JPAQuery<>(em)
2   .select(QDog.dog.age, QDog.dog.count())
3   .from(QDog.dog)
4   .groupBy(QDog.dog.age)
5   .fetch();

Our example data have each dog with different age, so the results are boring.

If we want to see the average age of dogs per breed we use this:

1 List<Tuple> dogAvgAgeByBreed = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.name, QDog.dog.age.avg())
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.breed, QBreed.breed)
5   .groupBy(QBreed.breed.id, QBreed.breed.name)
6   .orderBy(QBreed.breed.name.asc())
7   .fetch();

Because of leftJoin this returns also line [null, null, 3.0]. With innerJoin (or just join) we would get only dogs with defined breed. On screen I’m probably interested in breed’s name but that relies on their uniqueness – which may not be the case. To get the “true” results per breed I added breed.id into the select clause. We can always display just names and if they are not unique user will at least be curious about it. But that’s beyond the discussion for now.

Querydsl supports having clause as well if the need arises. We also demonstrated orderBy clause, just by the way, as it hardly deserves special treatment in this book.

Subqueries

Here we will use the same data like in the previous section on functions and operations. The following examples can be found on GitHub in Subqueries.java). You can also check Querydsl reference documentation.

Independent subquery

If we want to find all the dogs that are of breed with a long name, we can do this:

Dogs with a long breed name
 1 List<Dog> dogsWithLongBreedName = new JPAQuery<>(em)
 2   .select(QDog.dog)
 3   .from(QDog.dog)
 4   .where(QDog.dog.breed.in(
 5     new JPAQuery<>() // no EM here
 6       .select(QBreed.breed)
 7       .from(QBreed.breed)
 8       // no fetch on subquery
 9       .where(QBreed.breed.name.length().goe(10))))
10   .fetch();

This returns Ben and Rex, both german shepherds. Note that subquery looks like normal query, but we don’t provide EntityManager to it and we don’t use any fetch. If you do the subquery will be executed on its own first and its results will be fetched into in clause just as any other collection would. This is not what we want in most cases – especially when the code leads us to an idea of a subquery. The previous result can be achieved with join as well – and in such cases joins are almost always preferable.

To make our life easier we can use JPAExpressions.select(...) instead of new JPAQuery<>().select(...). There is also a neat shortcut for the subquery with select and from with the same expression.

1 JPAExpressions.selectFrom(QBreed.breed) ...

We will prefer using JPAExpression from now on. Another example finds average age of all dogs and returns only those above it:

Dogs older than average
1 List<Dog> dogsOlderThanAverage = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .where(QDog.dog.age.gt(
5     JPAExpressions.select(QDog.dog.age.avg())
6       .from(QDog.dog)))
7   .fetch();

This is perhaps on the edge regarding usage of the same alias QDog.dog for both inner and outer query, but I dare to say it does not matter here. In both previous examples the same result can be obtained when we execute the inner query first because it always provides the same result. What if we need to drive some conditions of the inner query by the outer one?

Correlated subquery

When we use object from the outer query in the subquery it is called correlated subquery. For example, we want to know for what breeds we have no dogs:

Breeds with no dogs
1 List<Breed> breedsWithoutDogs = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(
5     JPAExpressions.selectFrom(QDog.dog)
6       .where(QDog.dog.breed.eq(QBreed.breed))
7       .notExists())
8   .fetch();

We used a subquery with exist/notExists and we could omit select, although it can be used. This returns a single breed – retriever. Interesting aspect is that the inner query uses something from the outer select (here QBreed.breed). Compared to the subqueries from the previous section this one can have different result for each row of the outer query.

This one can actually be done by join, as well, but in this case I’d not recommend it. Depending on the mapping you use, various things work on various providers. When you have @ManyToOne Breed breed than this works on EclipseLink:

Breeds with no dogs
1 List<Breed> result = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .leftJoin(QDog.dog).on(QBreed.breed.eq(QDog.dog.breed))
5   .where(QDog.dog.isNull())
6   // distinct would be needed for isNotNull (breeds with many dogs)
7   // not really necessary for isNull as those lines are unique
8   .distinct()
9   .fetch();

Alas, it fails on Hibernate (5.2.2) with InvalidWithClauseException: with clause can only reference columns in the driving table. It seems to be an open bug mentioned in this StackOverflow answer – and the suggested solution of using plain keys instead of objects indeed work:

1 // the rest is the same
2 .leftJoin(QDog.dog).on(QBreed.breed.id.eq(QDog.dog.breedId))

For this we need to switch Breed breed mapping to plain FK mapping, or use both of them with one marked with updatable false:

1 @Column(name = "breed_id", updatable = false, insertable = false)
2 private Integer breedId;

You may be tempted to avoid this mapping and do explicit equals on keys like this:

1 // the rest is the same
2 .leftJoin(QDog.dog).on(QBreed.breed.id.eq(QDog.dog.breed.id))

But this has its own issues. EclipseLink joins the breed twice, second time using implicit join after left join which is a construct that many databases don’t really like (H2 or SQL Server, to name just those where I noticed this quite often) and it fails on JDBC/SQL level. Hibernate also generates the superfluous join to the breed but at least orders them more safely. Still, if you use EclipseLink simply don’t add explicit IDs to your paths and if you’re using Hibernate at least know it’s not efficient join-wise and also is not really portable across JPA providers.

There is another JPA query that works and also arguably reads most naturally:

1 List<Breed> result = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.dogs.isEmpty())
5   .fetch();

Internally this generates subquery with count or (not) exists anyway, there is no way around it. But for this to work we also have to add the mapping to Breed which was not necessary for the previous non-JPA joins:

1 @OneToMany(mappedBy = "breed")
2 private Set<Dog> dogs;

This mapping may be rather superfluous for some cases (why should breed know the list of dogs?) but it should not hurt.

We somehow foreshadowed themes from the following chapters on various mapping problems but we’re not done with subqueries (or Querydsl) just yet. Let’s now try to find dogs that are older than average – but this time for each breed:

Dogs older than breed average
1 QDog innerDog = new QDog("innerDog");
2 List<Dog> dogsOlderThanBreedAverage = new JPAQuery<>(em)
3   .select(QDog.dog)
4   .from(QDog.dog)
5   .where(QDog.dog.age.gt(
6     JPAExpressions.select(innerDog.age.avg())
7       .from(innerDog)
8       .where(innerDog.breed.eq(QDog.dog.breed))))
9   .fetch();

Notice the use of innerDog alias which is very important. Had you used QDog.dog in the subquery it would have returned the same results like dogsOlderThanAverage above. This query returns only Rex, because only german shepherds have more than a single dog – and a single (or no) dog can’t be older than average.

This section is not about going deep into subquery theory (I’m not the right person for it in the first place), but to demonstrate how easy and readable it is to write subqueries with Querydsl API. We’re still limited by JPQL, that means no subqueries in from or select clauses). When writing subqueries be careful not to “reuse” alias from the outer query.

Pagination

Very often we need to cut our results into smaller chunks, typically pages for some user interface, but pagination is often used also with various APIs. To do it manually, we can simply specify offset (what record is the first) and limit (how many records):

Fetching a single page
1 List<Breed> breedsPageX = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.id.lt(17)) // random where
5   .orderBy(QBreed.breed.name.asc())
6   .offset((page - 1) * pageSize)
7   .limit(pageSize)
8   .fetch();

I used variables page (starting at one, hence the correction) and pageSize to calculate the right values. Any time you paginate the results you want to use orderBy because SQL specification does not guarantee any order without it. Sure, databases typically give you results in some natural order (like “by ID”), but it is dangerous to rely on it.

Content of the page is good, but often we want to know total count of all the results. We use the same query with the same conditions, we just first use .fetchCount like this:

Fetching a single page and total count
 1 JPAQuery<Breed> breedQuery = new JPAQuery<>(em)
 2   .select(QBreed.breed)
 3   .from(QBreed.breed)
 4   .where(QBreed.breed.id.lt(17))
 5   .orderBy(QBreed.breed.name.asc());
 6 
 7 long total = breedQuery.fetchCount();
 8 List<Breed> breedsPageX = breedQuery
 9   .offset((page - 1) * pageSize)
10   .limit(pageSize)
11   .fetch();

We can even add if skipping the second query when count is 0 as we can just create an empty list. And because this scenario is so typical Querydsl provides a shortcut for it:

Pagination with Querydsl fetchResults
 1 QueryResults<Breed> results = new JPAQuery<>(em)
 2   .select(QBreed.breed)
 3   .from(QBreed.breed)
 4   .where(QBreed.breed.id.lt(17))
 5   .orderBy(QBreed.breed.name.asc())
 6   .offset((page - 1) * pageSize)
 7   .limit(pageSize)
 8   .fetchResults();
 9 System.out.println("total count: " + results.getTotal());
10 System.out.println("results = " + results.getResults());

QueryResults wrap all the information we need for paginated result, including offset and limit. It does not help us with reverse calculation of the page number from an offset, but I guess we probably track the page number in some filter object anyway as we needed it as the input in the first place.

And yes, it does not execute the second query when the count is zero.4

Tuple results

When you call fetch() the List is returned. When you call fetchOne() or fetchFirst() one object is returned. The question is – what type of object? This depends on the select clause which can have a single parameter which determines the type or you can use list of expressions for select in which case Tuple is returned. That is, a query with select(QDog.dog) will return Dog (or List<Dog>) because QDog.dog is of type QDog that extends EntityPathBase<Dog> which eventually implements Expression<Dog> – hence the Dog.

But when you return multiple results we need to wrap them somehow. In JPA you will get Object[] (array of objects) for each row. That works, but is not very convenient. Criteria API brings Tuple for that reason which is much better. Querydsl uses Tuple idea as well. What’s the deal?

Tuples of attributes (columns)
 1 List<Tuple> result = new JPAQuery<>(em)
 2   .select(QDog.dog.name, QBreed.breed.name)
 3   .from(QDog.dog)
 4   .join(QDog.dog.breed, QBreed.breed)
 5   .fetch();
 6 result.forEach(t -> {
 7   String name = t.get(QDog.dog.name);
 8   String breed = t.get(1, String.class);
 9   System.out.println("Dog: " + name + " is " + breed);
10 });

As you can see in the forEach block we can extract columns by either using the expression that was used in the select as well (here QDog.dog.name) or by index. Needless to say that the first way is preferred. You can also extract the underlying array of objects using toArray().

Because Tuple works with any expression we can use the whole entities too:

Tuples of entities
 1 List<Tuple> result = new JPAQuery<>(em)
 2   .select(QDog.dog, QBreed.breed)
 3   .from(QDog.dog)
 4   .join(QDog.dog.breed, QBreed.breed)
 5   .fetch();
 6 result.forEach(t -> {
 7   Dog dog = t.get(QDog.dog);
 8   Breed breed = t.get(QBreed.breed);
 9   System.out.println("\nDog: " + dog);
10   System.out.println("Breed: " + breed);
11 });

And of course, you can combine entities with columns ad-lib.

There are other ways how to combine multiple expressions into one object. See the reference guide to see how to populate beans, or how to use projection constructors.

Fetching to-many eagerly

In the following example we select all the breeds and we want to print the collection of the dogs of that breed:

Naive approach to to-many
 1 // mapping on Breed
 2 @OneToMany(mappedBy = "breed")
 3 private Set<Dog> dogs;
 4 
 5 // query
 6 List<Breed> breed = new JPAQuery<>(em)
 7   .select(QBreed.breed)
 8   .from(QBreed.breed)
 9   .fetch();
10 breed.forEach(b ->
11   System.out.println(b.toString() + ": " + b.getDogs()));

This executes three queries. First the one that is obvious and then one for each breed when you want to print the dogs. Actually, for EclipseLink it does not produce the other queries and merely prints {IndirectSet: not instantiated} instead of the collection content. You can nudge EclipseLink with something like b.getDogs().size() or other meaningful collection operation. It seems toString() isn’t meaningful enough for EclipseLink.

We can force the eager fetch when we adjust the mapping:

1 @OneToMany(mappedBy = "breed", fetch = FetchType.EAGER)
2 private Set<Dog> dogs;

However, while the fetch is eager, EclipseLink still does it with three queries (1+N in general). Hibernate uses one query with join. So yes, it is eager, but not necessarily efficient. There is a potential problem with paging when using join to to-many – the problem we mentioned already and we’ll return to it later. What happens when we add offset and limit?

1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .offset(1)
5   .limit(1)
6   .fetch();

Because EclipseLink uses separate query for breeds, nothing changes and we get the second result (by the way, we forgot to order it!). Hibernate is smart enough to use separate query too, as soon as it smells the offset/limit combo. It makes sense, because query says “gimme breeds!” – although not everything in JPA is always free of surprises. In any case, eager collection is not recommended. There may be notable exceptions – maybe some embeddables or other really primitive collections – but I’d never risk fetching collection of another entities as it asks for fetching the whole database eventually.

What if we write a query that joins breed and dogs ourselves? We assuming no pagination again.

1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .join(QBreed.breed.dogs, QDog.dog)
5   .distinct()
6   .fetch();
7 breeds.forEach(b ->
8   System.out.println(b.toString() + ": " + b.getDogs()));

If you check the logs queries are indeed generated with joins, but it is not enough. EclipseLink still prints uninitialized indirect list (and relevant operation would trigger the select), Hibernate prints it nicely, but needs additional N selects as well. We need to say what collection we want initialized explicitly using fetchJoin like so:

Fetch join
1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .join(QBreed.breed.dogs).fetchJoin()
5   .distinct()
6   .fetch();

You use fetchJoin() just after the join with a collection path and in this case you don’t need to use second parameter with alias, unless you want to use it in where or select, etc.5

Because we used join with to-many relations ourselves JPA assumes we want all the results (5 with our demo data defined previously). That’s why we also added distinct() – this ensures we get only unique results from the select clause. But these unique results will still have their collections initialized because SQL still returns 5 results and JPA “distincts” it in post-process. This means we cannot paginate queries with joins, distinct or not. This is something we will tackle in a dedicated section.

Querydsl and code style

Querydsl is a nice API and DSL but may lead to horrific monsters like:

 1 Integer trnId = new JPAQuery<>(em)
 2   .select(QClientAgreementBankAccountTransaction
 3     .clientAgreementBankAccountTransaction.transactionId))
 4   .from(QClientAgreement.clientAgreement)
 5   .join(QClientAgreementBankAccount.clientAgreementBankAccount).on(
 6     QClientAgreementBankAccount.clientAgreementBankAccount.clientAgreementId
 7       .eq(QClientAgreement.clientAgreement.id))
 8   .join(QClientAgreementBankAccountTransaction.clientAgreementBankAccountTransaction)
 9     .on(QClientAgreementBankAccountTransaction.
10       clientAgreementBankAccountTransaction.clientAgreementBankAccountId
11         .eq(QClientAgreementBankAccount.clientAgreementBankAccount.id))
12   .where(QClientAgreement.clientAgreement.id.in(clientAgreementsToDelete))
13   .fetch();

The problems here are caused by long entity class names but often there is nothing you can do about it. Querydsl exacerbates it with the duplication of the name in QClassName.className. There are many ways how to tackle the problem.

We can introduce local variables, even with very short names as they are very close to the query itself. In most queries I saw usage of acronym aliases, like cabat for QClientAgreementBankAccountTransaction. This clears up the query significantly, especially when the same alias is repeated many times:

 1 QClientAgreement ca = QClientAgreement.clientAgreement;
 2 QClientAgreementBankAccount caba =
 3   QClientAgreementBankAccount.clientAgreementBankAccount;
 4 QClientAgreementBankAccountTransaction cabat =
 5   QClientAgreementBankAccountTransaction.clientAgreementBankAccountTransaction;
 6 Integer trnId = new JPAQuery<>(em)
 7   .select(cabat.transactionId)
 8   .from(ca)
 9   .join(caba).on(caba.clientAgreementId.eq(ca.id))
10   .join(cabat).on(cabat.clientAgreementBankAccountId.eq(caba.id))
11   .where(ca.id.in(clientAgreementsToDelete))
12   .fetch();

You give up some explicitness in the query with those acronyms but you get much higher signal-to-noise ratio which can easily overweight the burden of mental mapping to local variables.

Another option is to use static imports which halves the QNames into half:

 1 Integer trnId = new JPAQuery<>(em)
 2   .select(clientAgreementBankAccountTransaction.transactionId)
 3   .from(clientAgreement)
 4   .join(clientAgreementBankAccount).on(
 5     clientAgreementBankAccount.clientAgreementId.eq(clientAgreement.id))
 6   .join(clientAgreementBankAccountTransaction).on(
 7     clientAgreementBankAccountTransaction.clientAgreementBankAccountId
 8       .eq(clientAgreementBankAccount.id))
 9   .where(clientAgreement.id.in(clientAgreementsToDelete))
10   .fetch()

This is clearly somewhere between the first (totally unacceptable) version and the second where the query was very brief at the cost of using acronyms for aliases. Using static names is great when they don’t collide with anything else. But while QDog itself says clearly it is metadata type, name of its static field dog clearly collides with any dog variable of a real Dog type you may have in your code. Still, we use static imports extensively in various infrastructural code where no confusion is possible.

We also often use field or variable named $ that represents the root of our query. I never ever use that name for anything else and long-term experiments showed that it brought no confusion into our team whatsoever. Snippet of our DAO class may look like this:

 1 public class ClientDao {
 2   public static final QClient $ = QClient.client;
 3 
 4   public Client findByName(String name) {
 5     return queryFrom($) // our shortcut for new/select/from
 6       .where($.name.eq(name))
 7       .fetchOne($);
 8   }
 9 
10   public List<Client> listByRole(ClientRole clientRole) {
11     return queryFrom($)
12       .where($.roles.any().eq(clientRole))
13       .list($);
14   }
15 ...

As a bonus you can see how any() is used on collection path $.roles. You can even go on to a specific property – for instance with our demo data we can obtain list of breeds where any dog is named Rex like this:

1 List<Breed> breedsWithRex = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.dogs.any().name.eq("Rex"))
5   .fetch();

This produces SQL with EXISTS subquery although the same can be written with join and distinct.

More

That’s it for our first tour around Querydsl. We will see more in the chapter on more advanced features (or perhaps just features used less). In the course of this chapter there was plenty of links to other resources too – let’s sum it up:

  • Querydsl Reference Guide is the first stop when you want to learn more. It is not the last stop though, as it does not even cover all the stuff from this chapter. It still is a useful piece of documentation.
  • Querydsl is also popular enough on StackOverflow with Timo Westkämper himself often answering the questions.
  • Finally there is their mailing-list (querydsl@googlegroups.com) – with daily activity where you get what you need.

For an introduction you can also check this slide deck that covers older version 3, but except for the fluent API changes the concept is the same.

8. Troubles with to-one relationships

Now we can get back to the topics I foreshadowed in the introduction to this opinionated part. We will show an example of @ManyToOne mapping and we will analyze our options how to load an entity with such a relationship when we are not interested in its target relation.

Simple example of @ManyToOne

We will continue experimenting on dogs in the demonstration of to-one mapping. The code is available in a GitHub sub-project many-to-one-eager. This particular demo project is also referenced in appendix Project example where particulars of build files and persistence.xml are described. We will skip these but because this is our first serious experiment in the book we will still include more of the code listing to establish kind of a baseline.

Our model is simple, but there is a twist to it. We have a Dog and Breed entities where a Dog points to its Breed – and to make things just a little bit tricky the Breed is a hierarchy where each Breed points to its parent via attribute called derivedFrom.

Class diagram of our entity model
Class diagram of our entity model

Our entities are quite plain – Dog looks like this:

Dog.java
 1 package modeltoone;
 2 
 3 import javax.persistence.*;
 4 
 5 @Entity
 6 public class Dog {
 7 
 8   @Id
 9   @GeneratedValue(strategy = GenerationType.IDENTITY)
10   private Integer id;
11 
12   private String name;
13 
14   @ManyToOne
15   private Breed breed;

The rest are getters/setters – complete code is here. Referenced Breed is mapped like this (full source code):

Breed.java
 1 package modeltoone;
 2 
 3 import javax.persistence.*;
 4 
 5 @Entity
 6 public class Breed {
 7 
 8   @Id
 9   @GeneratedValue(strategy = GenerationType.IDENTITY)
10   private Integer id;
11 
12   private String name;
13 
14   @ManyToOne
15   private Breed derivedFrom;

There is nothing special on these entities, but let’s see what happens when we load some dog. In the process we will also compare Hibernate and EclipseLink, that’s why we have two nearly equivalent persistence units in our persistence.xml. We will prepare data like this:

Preparing a dog
 1 EntityManagerFactory emf = Persistence
 2   .createEntityManagerFactory(persistenceUnitName);
 3 try {
 4   EntityManager em = emf.createEntityManager();
 5   em.getTransaction().begin();
 6 
 7   Breed wolf = new Breed();
 8   wolf.setName("wolf");
 9   em.persist(wolf);
10 
11   Breed germanShepherd = new Breed();
12   germanShepherd.setName("german shepherd");
13   germanShepherd.setDerivedFrom(wolf);
14   em.persist(germanShepherd);
15 
16   Breed collie = new Breed();
17   collie.setName("collie");
18   collie.setDerivedFrom(germanShepherd);
19   em.persist(collie);
20 
21   Dog lassie = new Dog();
22   lassie.setName("Lassie");
23   lassie.setBreed(collie);
24   em.persist(lassie);
25 
26   em.getTransaction().commit();
27   em.clear();
28   emf.getCache().evictAll();
29 
30   // here comes "demo" code
31 } finally {
32   emf.close();
33 }

Sorry for copy/pasting, eventually the breed and dog creation would end up in separate methods, but for our demo this code will do. Also the breed tree is probably far from reality, but that’s not the point. Let’s say that collie was bred from shepherd which descends from wolf.

Notice the clearing of the entity manager, alternatively we could close and reopen one. Another thing is we are clearing up the second-level cache as well. We may argue what the real case is, but unless you have all your entities cached all the time an empty cache simulates real-life scenario. And we’re just finding the dog by its ID – we know it’s 1:

1 Dog dog = em.find(Dog.class, 1);

What happens at this line? Let’s see the console for Hibernate (manually wrapped):

Hibernate find dog by id
 1 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
 2   dog0_.name as name2_1_0_, breed1_.id as id1_0_1_, breed1_.derivedFrom_id
 3   as derivedF3_0_1_, breed1_.name as name2_0_1_, breed2_.id as id1_0_2_,
 4   breed2_.derivedFrom_id as derivedF3_0_2_, breed2_.name as name2_0_2_
 5  from Dog dog0_
 6  left outer join Breed breed1_ on dog0_.breed_id=breed1_.id
 7  left outer join Breed breed2_ on breed1_.derivedFrom_id=breed2_.id
 8  where dog0_.id=?
 9 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as
10   derivedF3_0_0_, breed0_.name as name2_0_0_, breed1_.id as id1_0_1_,
11   breed1_.derivedFrom_id as derivedF3_0_1_, breed1_.name as name2_0_1_
12  from Breed breed0_
13  left outer join Breed breed1_ on breed0_.derivedFrom_id=breed1_.id
14  where breed0_.id=?

Two selects that load a dog and all three breeds – not that bad. If you know traverse up the breed tree, there would be no more select. On the other hand, if you cared about the actual breed of this dog and not about what this breed is derived from we fetched those unnecessarily. There is no way to tell the JPA that dummy breed with shepherd’s ID would do on the collie entity and that you don’t want this to-one relationship fetching to propagate any further.

Let’s look at EclipseLink now:

EclipseLink find dog by id
1 SELECT ID, NAME, BREED_ID FROM DOG WHERE (ID = ?)
2 	bind => [1]
3 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
4 	bind => [3]
5 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
6 	bind => [2]
7 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
8 	bind => [1]

This is as obvious as it gets. Each entity is loaded with a separate query. Effect is the same like with Hibernate, EclipseLink just needs more round-trips to the database. Personally I don’t care how many selects it takes in the end as this is part of the JPA provider strategy. We are not going to fine-tune Hibernate or EclipseLink. We’re going to cut off this loading chain.

How about going lazy?

Let’s be a bit naive (I was!) and mark the relationships as lazy:

Marking relationships lazy
1 @ManyToOne(fetch = FetchType.LAZY)
2 private Breed breed;

And the same for Breed.derivedFrom. Our testing code is like this:

Testing code
1 System.out.println("\nfind");
2 Dog dog = em.find(Dog.class, 1);
3 System.out.println("\ntraversing");
4 Breed breed = dog.getBreed();
5 while (breed.getDerivedFrom() != null) {
6   breed = breed.getDerivedFrom();
7 }
8 System.out.println("breed = " + breed.getName());

This way we should see what selects are executed when. Hibernate out of the box works as expected:

Hibernate lazy find and traversing
 1 find
 2 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
 3   dog0_.name as name2_1_0_ from Dog dog0_ where dog0_.id=?
 4 
 5 traversing
 6 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
 7   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
 8 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
 9   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
10 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
11   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
12 breed = wolf

Selects are executed as needed (lazily). For long-term Hibernate users this often gives an impression that it should work like this. Let’s migrate to EclipseLink then:

EclipseLink lazy find and traversing
 1 find
 2 SELECT ID, NAME, BREED_ID FROM DOG WHERE (ID = ?)
 3 	bind => [1]
 4 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 5 	bind => [3]
 6 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 7 	bind => [2]
 8 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 9 	bind => [1]
10 
11 traversing
12 breed = wolf

Just like before EclipseLink generates four selects and executes them all right away when you find the entity. How is this possible? Firstly, from the [JPspec] perspective (page 25):

Footnote [4] is important here, obviously. So “it should” but it’s just “a hint”. These results can be recreated by running SingleEntityReadLazy (in test classpath) from GitHub sub-project many-to-one-lazy.

How bulletproof is Hibernate’s solution?

Pretty much bulletproof, I have to say. As presented here, we annotated fields, which means we implied Access(FIELD). Hibernate can’t get away with just intercepting getters/setters, it has to do better. Let’s try a bit of reflection:

Reading breed with reflection
1 System.out.println("\nfind");
2 Dog dog = em.find(Dog.class, 1);
3 System.out.println("\nhacking");
4 Field breedField = dog.getClass().getDeclaredField("breed");
5 System.out.println(breedField.getType());
6 breedField.setAccessible(true);
7 Object val = breedField.get(dog);
8 System.out.println("val = " + val);

And now the console output mixed with logs:

Hibernate output for reflection access
1 find
2 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
3   dog0_.name as name2_1_0_ from Dog dog0_ where dog0_.id=?
4 
5 hacking
6 class modeltoone.Breed
7 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
8   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
9 val = Breed{id=3, name='collie'}

This is perfectly lazy. The solution seems even magic when you think about it.

Lazy to-one not guaranteed

Doing lazy collection (@OneToMany or @ManyToMany) is not a problem, we would probably mess around a bit but eventually we would create some collection implementation that does it right. The trick is we have the actual entities wrapped in a collection that works as an intermediary. We don’t need to perform any bytecode voodoo on the entity class.

With to-one situation changes dramatically as you have no intermediary. It would be interesting if JPA at least offered some wrapper to enforce lazy behaviour when you really need it regardless of the JPA provider, but that would give away you’re using some ORM (like you don’t know) and even more voices would talk about that “leaking abstraction”.

While my tests with Hibernate looked great, the results are not portable across JPA providers as we saw. Also, I’d not bet that you’ll get the same results because there’s still a lot of questions how to make Hibernate lazy for to-one relationships, many of them quite new.

So the whole trouble with to-one is that you rely on out-of-language features (bytecode manipulation) for LAZY support or suffer potentially very costly cascade of relation loading. The cost may be alleviated by a cache somewhat, but you just trade CPU/network cost for memory (and CPU for GC, mind you) – even if you don’t need most of the loaded data at all (maybe ever).

Also, LAZY can work perfectly for WAR application in Java EE container but it may suddenly stop working when switching to embedded container. I’m not saying that LAZY is not worth the trouble, on the contrary, I believe that lazy relationship is essential for both to-many (as default) and to-one (in most circumstances) but it indeed can be a lot of trouble during the build time or runtime, especially in Java SE environment. Java EE with its default provider may be nice to us, but it still isn’t guaranteed.

Dual relationship mapping

When I say dual mapping I mean mapping of the relationship both as an object and foreign key value. It looks like this:

Dual relationship mapping
1 @Entity
2 public class Dog {
3 ...
4   @ManyToOne
5   @JoinColumn(name = "breed_id", updatable = false, insertable = false)
6   private Breed breed;
7 
8   @Column(name = "breed_id")
9   private Integer breedId;

I can’t find example of this anywhere in the [JPspec], neither can I find anything that would forbid it (although I’m not really sure) – but it works in both major JPA providers. The key is marking one of these attributes as readonly, that’s what updatable/insertable=false does. The other mapping is primary – the one used for updates. I prefer when the raw foreign key is primary because now I don’t need any Breed object for update when I have its ID already (it’s easy to extract ID from the object if I have it too). Getters are returning values for respective fields, setters are designed in such a way that the persisted field (here foreign key value) is always updated:

Accessors for dual mapping
 1 public Breed getBreed() {
 2   return breed;
 3 }
 4 
 5 public void setBreed(Breed breed) {
 6   this.breed = breed;
 7   breedId = breed.getId();
 8 }
 9 
10 public Integer getBreedId() {
11   return breedId;
12 }
13 
14 public void setBreedId(Integer breedId) {
15   this.breedId = breedId;
16 }

In this case it means merely setting the breedId field when breed is updated. We don’t care about breed in the other setter. Programmer has to know that foreign key value is primary breed attribute here and act accordingly. For instance, this setup is not transparent when persisting new entities. If we persist a fresh breed there is no guarantee its ID is already available as autoincrement fields are filled in after the actual INSERT is flushed to the database. With EclipseLink we have to force it if we need the ID – calling setBreed(newBreed) would otherwise set dog’s FK to the breed to null. Hibernate is rather eager with INSERTs which is convenient for this scenario.

The reverse case – with breed as the persisted field and breedId as the readonly one – is less useful and I cannot find any reasons to use it instead of the traditional JPA relationship mapping. You can get foreign key value anytime calling breed.getId().

In any case, there are no “savings” during the load phase, but we can store new entities with raw foreign key values if we have those (and for whatever reason don’t have the entities loaded). Previously we mentioned the possibility of using entities as wrappers for the foreign keys, but we also showed that it doesn’t play well with cascade = PERSIST.

If we go back to dreaming – imagine we could request em.find for a dog by ID with a hint not to load readonly mappings (or otherwise designated). This would save us the hassle during the loading phase as well and made the whole read-update scenario (part of typical CRUD) based on foreign key values with the opportunity to load the relationships – if we wished so. Any of this could be specified on the entity level based on typical patterns of usage. Imagine all the people / fetching like they want. Oh-oh… (Lennon/Richter)

Dual mapping like this has one big advantage compared to the pure foreign key mapping – with the breed mapping we can construct JPA compliant JPQL queries without waiting for the next version of JPA – if they bring the option of joining root entities at all, that is. But currently dual mapping also means that we can only avoid the fetch cascade if we:

  • Rely on LAZY hint – not guaranteed by JPA, but if provided by a provider, you can consider it solved.
  • Or we query the entity with all the columns named explicitly (e.g. include breedId, but skip breed) – this, however, cripples the caching (if you use it), you don’t get attached entity and (above all) it is annoying to code. We will talk more about this in the following sections.

So, dual mapping is an interesting tool but does not allow us to avoid all the eager problems.

Using projections for fetching

While we cannot control em.find(...) much, we can try to select only the columns we are interested in using a query. The result handling when we “project” selected columns into an object is called the projection. Querydsl can project results using a constructor or bean accessors, both applicable to entity objects or DTOs.6

Let’s try it:

Selecting specific columns
1 Tuple tuple = new JPAQuery<>(em)
2   .select(QDog.dog.id, QDog.dog.name, QDog.dog.breed.id)
3   .from(QDog.dog)
4   .where(QDog.dog.id.eq(1))
5   .fetchOne();

Our demo code returns Querydsl Tuple but it can also return a DTO using the constructor expression. We’re selecting a single entity, but it works the same way for lists as well. I included an expression returning value of the column breed_id, we have to use an association path – and that raises questions. Let’s see what SQL is generated for this query.

EclipseLink query for a Dog with breed_id
1 SELECT t0.ID, t0.NAME, t1.ID FROM DOG t0, BREED t1
2   WHERE ((t0.ID = ?) AND (t1.ID = t0.BREED_ID))

EclipseLink does exactly what the JPQL says – but is this the best way?

Hibernate query for a Dog with breed_id
1 select dog0_.id as col_0_0_, dog0_.name as col_1_0_, dog0_.breed_id as col_2_0_
2 from Dog dog0_ where dog0_.id=?

Now this is cool – and exactly what I expect from a smart ORM. Of course the breed_id is directly on the Dog table, there is absolutely no need to join Breed just for its ID. I’d actually use the dual mapping described previously and select QDog.dog.breedId directly which avoids join with any provider.

This seems usable and it definitely does not trigger any cascading eager fetch, unless we include entity expression in the select, e.g. Using QDog.dog.breed could ruin the party as it loads the whole content of the Breed entity and if it contains any to-one it may get loose. The downside of this approach is obvious – we have to name all the attributes (columns) we need and it may get tedious if we have wide tables.

I really did not enjoy doing it for 60+ columns and that’s when you start thinking about code generation or other kind of framework. If you don’t have to, you don’t want to complicate your code with such ideas, so if you are sure you can rely on your ORM with lazy load even for to-one then do that. If you can’t you have to decide when to use the ORM as-is, probably relying on the second-level cache to offset the performance hit caused by triggered loads, and when to explicitly cherry-pick columns you want to work with.

Mentioning cache, we also have to realize that listing columns will not use an entity cache, but it may use query cache. We’re getting closer to traditional SQL, just with JPQL shape. If persistence context contains unflushed changes and flush mode is set to COMMIT query will not see any changes either. Typically though, flush mode AUTO is used which flushes pending changes before executing queries.

Entity update scenario

Most applications need to update entities as well – and projections are not very useful in this case. There is a simple and a complex way how to update an entity:

  • In the simple case we take whatever came to the service layer (or its equivalent) and simply push it to the database. We can use merge or JPQL (Querydsl) update clauses for this.
  • In many cases complex business rules requires us to consult the current state of the entity first and apply the changes in it in a more subtle way than JPA merge does. In this case we have to read the entity, change what needs to be changed and let JPA to update it. Alternatively, we may read just the selection of the columns (projection) and execute an update clause.

If the business rules don’t require it and performance is not an issue I’d recommend using merge as it’s very easy to understand and does everything for us. It executes SELECT and UPDATE for us and we don’t have to care about details. For a single entity update in some typical CRUD-based information system this may be perfect solution that doesn’t involve convoluted code.

If we need to read the entity first I’d first go for em.find(id) applying the changes from an incoming DTO to it, complying to all the necessary business rules. This also involves one SELECT and one UPDATE, except that the select may trigger some additional finds across to-one relationships even if we don’t need them in this update scenario. The whole select stage can be much faster if we use the second-level cache, as it avoids one round-trip to the database completely.

To avoid this we can read the information as a projection (or a Tuple) and later execute a surgical UPDATE clause. This is nicer to our database although we simply cannot get under the two roundtrips (read and update), unless we use some conversation scope in memory. This also avoids the interaction with the persistence context, hence takes less memory, avoids dirty checking at the end of the transaction, etc. The resulting code, naturally, is more complex, reflecting the complexity of the scenario.

If you don’t mind using the persistence context and you’re fine with using em.find(...) and later implicit update at the end of transactions you may use entity view described in the next section to eliminate traversing to-one relationships you’re not interested in.

Alternative entity “views”

If we want to solve painful read/update scenario (assuming we know the ID of the entity) and we need to work with only some attributes of the entity we can try to map only those. We will leave full mapping in some “master” class and create another entity that gives us partial view of the selected attributes. I’ll use name entity view for such a class.

The full Dog mapping looks like this:

 1 @Entity
 2 public class Dog {
 3   @Id
 4   @GeneratedValue(strategy = GenerationType.IDENTITY)
 5   private Integer id;
 6 
 7   private String name;
 8 
 9   @ManyToOne
10   private Breed breed;
11 ...

We will create alternative view of this entity named DogBasicView (naming sure is hard):

1 @Entity
2 @Table(name = "Dog")
3 public class DogBasicView {
4   @Id
5   @Column(updatable = false)
6   private Integer id;
7 
8   private String name;

The difference is we didn’t map the breed because, presumably, we don’t want to touch it in some “Edit dog’s basic attributes” form. In real life it doesn’t have to be just a name, it can plenty of attributes without just some you don’t need.

Now if we read DogBasicView, change it and wait for the end of the transaction only the columns in the entity view will be modified. We have a guarantee that UPDATE will not mention any unmapped columns. That means that you can use various entity views for various update scenarios. Such an entity view should result in much less code as you don’t have to name any columns in a projection (and the related constructor), you simply map them in a particular entity view.

Using entity view for pure reading is sub-optimal compared to projections as mentioned previously (memory, dirty-checking, …) but for straightforward update scenarios where original state of the entity is still important as well, e.g. for validation and access permission purposes, impact on the persistence context may be within your tolerance. Using projection and explicit update instead can be an overkill making the code unnecessarily complicated.

There are couple of points you have to keep in your mind though:

  • Never combine various “views” of the same entity in the same persistence context. They all live their own life and you cannot expect that dogBasic.setName(...) will affect the state of a different entity instance of different type Dog even though they both represent the same dog in the database.
  • Our DogBasicView will have its own space not only in the “first-level cache” (persistence context) but also in the second-level cache. I’d be very careful about caching an entity with couple of different classes representing the same table. (Read: I’d not cache it at all, but that’s my personal taste. You definitely want to check how the caching works in such situations.)

Using entity views in JOINs

Because of the missing foreign keys these entity views are unusable when we want to show a table of results from joined query (e.g. dogs and their breeds). While we could use a subquery in an IN clause for filtering we still need a JOIN to actually get the data. For displaying tables DTOs and things like constructor projection seem to be the right answer, however cumbersome. But if we fetch some data only for reading (e.g. to view them) it is recommended to bypass the persistence context anyway, especially for performance reasons. And we’re not talking about some marginal optimization, we’re talking about common sense and a significant waste of resources.7

If we have raw foreign key values mapped in the entity view we can use non-standard approach supported by all major JPA providers – we will discuss this in the next chapter. However, for pure reads you really should prefer projections.

Entity views and auto-generated schema

In most cases we don’t have to think about schema generation but what if we do? What if we have lines like this in our persistence.xml?

1 <property name="javax.persistence.schema-generation.database.action" value="create"/>

We utilized the property standardized in the JPA 2.1 and we expect the JPA provider will generate schema in the database for us. This is particularly handy when we run various tests or short demos. Now what happens when we declare two entities for the same table in persistence.xml?

1 <persistence-unit ...>
2   ...
3   <class>modeltoone.Dog</class>
4   <class>modeltoone.DogBasicView</class>
5   ...

When we run any demo using this persistence unit we can check the log and see the CREATE statements it uses. With EclipseLink there will be a single one for our Dog:

1 CREATE TABLE DOG (ID INTEGER IDENTITY NOT NULL, NAME VARCHAR,
2  BREED_ID INTEGER, PRIMARY KEY (ID))

And it contains all the necessary columns. But if we switch the order in persistence.xml EclipseLink generates the table based on DogBasicView and it will miss the breed_id column. Hibernate scans for the classes even in SE environment (but only within the JAR file containing persistence.xml) and if you want to try the effect of the order of <class> elements in persistence.xml you also have to include this:

1 <exclude-unlisted-classes>true</exclude-unlisted-classes>

But the order does not affect Hibernate, it somehow picks the “widest” class for the table and creates the table based on it. It works this way without putting classes in the XML as well. What happens if the main Dog entity does not have all the columns mapped and one of the entity views has additional column? Let’s say DogWithColor has a color column that is not present in Dog class.

This is not a good situation – neither EclipseLink nor Hibernate can deal with it. EclipseLink generates the table based on whatever entity is first in the list and Hibernate picks some candidate. I’m not exactly sure about Hibernate’s algorithm as it’s not an alphabetical order either, somehow it seems to find the best candidate in my limited test cases. But it does not union all the columns from all the entities for the same table and that means it has to choose between the Dog and the DogWithColor – and that means some columns will not be generated.

Summary

With current state of the standard there are following options available:

  • Leave to-one as EAGER and compensate with caches.
  • Configure your ORM to make it LAZY. This may be easy, it may even be provided out of the box, but it can also change with an upgrade or depend on the environment (Java SE/EE). It is simply not guaranteed by the JPA standard.
  • Select only the columns you’re interested in. This is very cumbersome for read/update scenarios of a single entity and bearable but still inconvenient for selecting of lists for tables, etc.
  • Use entity views, that is to map the same table with multiple classes based on the use case you want to cover. This is very practical for read/update scenarios when you can fetch by entity’s ID.

In any of these cases you can also use dual mapping – mapping both the foreign key value and the relationship as an object. When mapping more attributes for a single column we have to designate other mappings as read-only. If we just could explicitly ask JPA to fetch the object – or ask it not to fetch – we would have much better control over the fetching cascade going wild.

Dual mapping foreshadows the next chapter in which I’m going to present an alternative that is not JPA compliant, but tries to do without to-one relations altogether. It can be used by all major JPA 2.1 providers but if you insist on pure JPA you are free to just skip those pages.

9. Removing to-one altogether

To get full control over the fetching – and to do it without any high-tech solution – we have to drop the relationship mapping and map row foreign key instead. Before we go on, we will discuss the important addition of the ON keyword in JPA 2.1.

Why do we need ON anyway?

ON was dearly missing for other practical reasons. While this does not relate to to-one mapping, let’s see an example:

Class diagram of our entity model for 'ON' demo
Class diagram of our entity model for ‘ON’ demo

Here the Breed does not have a name, only some synthetic code, and names are offered in multiple languages thanks to BreedLocalizedName. Demo code is here. Our favourite breeds are:

  • wolf, with code WLF and localized names for English and Slovak (wolf and vlk respectively – and yes, I know a wolf is not a dog),
  • collie, with code COL with localized name available only for Slovak (kólia).

Now imagine we want to get list of all breeds with English names. If the localized name is not available for our requested language, we still want to see the breed with its code in the list in any case. This may seem ridiculous to SQL guys, but it’s a real problem for some Java programmers. The first naive attempt may be:

Naive JOIN to localized names
1 QBreedLocalizedName qbn = QBreedLocalizedName.breedLocalizedName;
2 List<Tuple> breedEnNames = new JPAQuery<>(em)
3   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
4   .from(QBreed.breed)
5   .join(QBreed.breed.names, qbn)
6   .where(qbn.language.eq("en"))
7   .fetch();

But this produces unsatisfiable results containing only [1, WLF, wolf]. “When I join the localized names to the breed list some results are gone!” our programmer complains. Ok, let’s introduce the LEFT JOIN concept to them. Next attempt:

LEFT JOIN without ON clause
1 breedEnNames = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
3   .from(QBreed.breed)
4   .leftJoin(QBreed.breed.names, qbn)
5   .where(qbn.language.eq("en"))
6   .fetch();

It seems promising at first, but the result is again only a single lone wolf. There is one difference though, although the WHERE clause wipes it out. If there was any breed without any localized name at all it would be preserved after the JOIN. WHERE eliminates it only because the value for language column would be NULL and not “en”. We could try to overcome it by an additional .or(qbn.language.isNull()) in the WHERE part, but this would work only for rows with no localized name – but not for those who have missing “en” localizations.

The trick is that the condition must be part of the LEFT JOIN – and that’s what ON is all about. Let’s do it right after all:

LEFT JOIN with ON clause
1 breedEnNames = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
3   .from(QBreed.breed)
4   .leftJoin(QBreed.breed.names, qbn).on(qbn.language.eq("en"))
5   .fetch();

And, voila, the result is [1, WLF, wolf], [2, COL, null] – with collie as well, although without the name localized to English. Now we could add another condition to filter these cases and report them – .where(qbr.name.isNull()) would do the trick, this time as a WHERE clause.

But this is the essence of ON. It’s a special WHERE that works very closely with related JOIN and while the difference is virtually none for inner joins, it is essential for those outer ones.

Dropping to-one annotations

Having accepted that we’re out of pure JPA land it is otherwise easy to get rid of to-one relationships. Instead of:

To-one as we know it
1 @ManyToOne
2 private Breed breed;

We can simply map the raw value of the foreign key:

Mapping raw foreign key
1 @Column(...)
2 private Integer breedId;

If you want to load Breed by its ID, you use EntityManager.find explicitly. This has domain consequences, because unless we give the entities some way to get to entity manager, such a code cannot be directly on the entity. We will try to mend it and bridge this gap that JPA automagic creates for our convenience – which is a good thing, and better part of the abstraction. The point is there is no technological problem to obtain the referenced entity by its ID.

Queries are where the problem lies. Firstly, and this cannot be overcome with current JPA, we need to use root entity in the JOIN clause – both EclipseLink and Hibernate now allow this. This enriched experience cuts both ways, of course, as we may believe we’re still in JPA world when we’re in fact not (my story). Second thing we need is ON clause, which is something JPA 2.1 provides. As discussed at the end of previous sections, ON clauses were generated for joins before – but only as implied by the mapping and a particular association path used in the join. We need to make this implicit ON condition explicit. Let’s see an example:

JOIN to root entity and explicit ON
1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QBreed.breed).on(QBreed.breed.id.eq(QDog.dog.breedId))
5   .where(QBreed.breed.name.contains("ll"))
6   .fetch();

The key part is how the leftJoin is made. Querydsl doesn’t mind at all that we’re breaking JPA rules here, it allows root entity and produces JPQL that works the same like the following code:

JPQL equivalent
1 List<Dog> dogs = em.createQuery(
2   "select dog from Dog dog left join Breed breed on breed.id = dog.breedId" +
3   " where breed.name like '%ll%'", Dog.class).getResultList();

So this leaves us covered. Queries are just a bit more explicit, but otherwise nothing you would not write into your SQL. We can argue that this way we can compare that breedId with any other integer and we’re loosing the safety that ORM mapping gives us. On the mapping you specify what columns are related on a single place and not in each query. This is true, indeed, although not really a big deal. When talking to your DB admins they will probably see even better what you do, although smart DBAs have no problem to understand JPQL either. But with dropping the mapping we’re loosing this comfort too. In theory we could leave some logical mapping information on the attribute, but without redesigning JPA we cannot benefit it in our queries anyway. We will try to tackle the problem of missing metadata on the plain FK-typed attribute later in this chapter.

Ad hoc joins across tables

It’s not that rare to see tables with a shared ID value. In the following picture Person has an ID column (its primary key) and other two tables share the same value as both their PK and FK into the Person table:

Three entities with common ID column
Three entities with common ID column

Now imagine a situation we want some information from NaturalPerson table and its residence but we don’t need anything from Person table itself. With JPA/ORM you need to join all three tables or you need to map direct relationships across the tables as well – that requires dual mapping to two different object types which is really superfluous. With ad hoc joins you have no problem at all, we can simply join tables on the side using ON condition for equality of their id attributes.

Even beyond this shared ID example there are many cases when ad hoc join is practical. Sometimes you need to build a report with join on a column that is not PK/FK pair – although there may be better ways than ORM to produce a report.

Loading the relationship from the entity

Queries are typically executed in the context out of the object itself – in a repository or DAO, you name it. There is no problem to get hold of EntityManager at these places. But if you want some loading from your entity, things necessarily get a little messy. But before we get to the messy part let’s step back a little, because we have two broad options how to do it:

  • We can have a real object attribute, but @Transient, and we can load it when the scenario requires it. This is actually pretty handy because you can load the entity with its relations with an optimized query – especially if you don’t expect that data in the cache, or don’t want to use the cache at all, etc. Very explicit, but does exactly what it says. This does not require us to put any infrastructure code into the entity as we fill the attribute from the outside in a code that loads the whole object graph.
  • We can load it lazily when and if needed. But we have to make explicit what JPA does under the surface and what we don’t see on the entity – or we can develop our solution, but I’m not sure it’s worth it for this problem.

The first solution is obvious, so let’s see what we can do about the other one. Let’s start with adding the method returning our entity – in our case we have a Dog with breedId foreign key mapped as a raw value of type Integer. To load the actual entity we may try something like:

Getter loading entity by id
1 public Breed getBreed() {
2   return DaoUtils.load(Breed.class, breedId);
3 }

Ok, we cheated a bit, I admit – we moved the problem to that DaoUtils, but for a good reason. We will need this mechanism on multiple places so we want it centralized. That load method then looks like this:

Loading the entity
1 public static <T> T load(Class<T> entityClass, Serializable entityId) {
2   EntityManager em = ...; // get it somehow (depends on the project)
3   return em.find(entityClass, entityId);
4 }

Ouch – two examples, cheating on both occasions. But here I cannot tell you how to obtain that entity manager. The method can reach to a Spring singleton component that makes itself accessible by static field, for instance. It isn’t nice, but it works. The point is that entity is not a managed component so we cannot use injection as usually. Static access seems ugly, but it also frees the entity from a reference to some infrastructural component and leaves it only with dependency to some utility class.

It is important to be familiar with the implementation of an injected EntityManager – in case of Spring their shared entity manager helps us to get to the right entity manager even if we get to the component via static path. I would probably have to consult Java EE component model to be sure it is possible and if it is reliable across application servers.

Other option how to smuggle in the entity manager utilizes a ThreadLocal variable, but this means someone has to populate it which means life-cycle – and that means additional troubles.

In any case, it is possible to add getter for a relationship on the entity, but it calls for some rough solutions – so if you can keep such an access somewhere else (data access object, for instance) do it that way. Otherwise we’re getting dangerously close to Active record pattern which by itself is legitimate, but doesn’t fit JPA. True – JPA provider can cheat somehow undercover, but we rather should not.

If you decide to load the relationship in some arbitrary code and you have only access to the entity, perhaps you can introduce DAO to that code. If you feel you’re making this decision too high in the layer model (like on the presentation layer maybe?), perhaps it should be pushed lower where the DAO is available. I’m not totally against static backdoor from the non-managed code to the managed component, but at least we should be aware of the cost and compare it with the cost of not doing it.

Loosing information without mapping annotations

When we map foreign key instead of the relationship we often loose important piece of information – that is what type it leads to. Sometimes it is clear, e.g. with Integer breedId you can easily guess it leads to Breed. But there is no type declaration anywhere around to click on when using IDE. Even worse, the foreign key often describes the role instead of the type – which I personally prefer especially when there are multiple relationships of the same type. So instead of terrible client1Id and client2Id you have ownerId and counterpartyId. The names are descriptive, but the type information is gone altogether.

For this – and also for means to automatically check validity of these IDs for newly stored or updated entities – we developed custom @References annotation. With it the mapping for a foreign key may look like this:

Using @References annotation
1 @References(type = Breed.class, required = true)
2 private Integer breedId;

I believe the documentation factor is the most important one here, but we can also check the references on an entity like this:

Using ReferenceChecker
1 new ReferenceChecker(em).checkReferences(lassie);

This either does nothing when everything is OK, or it fails with some exception of your choice depending on how you implement it. Our version is similar to this example on GitHub. If you go through the code you notice that it can do more than just checking – it can also push the real entities to some transient fields, for instance. Such a mechanism can be easily incorporated into validation framework you use if you wish to.

Wrapping it up

I presented a solution to our to-one problems – to use raw foreign key values. This bends the JPA rules a bit and builds on the availability of the ON clause introduced in JPA 2.1. However, JPA specification currently does not allow us to use root entity paths with JOINs – which is required for queries. Only association paths are allowed now. This feature may come in later versions and it is already supported by all the major players in the field – Hibernate, EclipseLink and DataNucleus.

Mapping raw foreign key value may lead to some logical information loss, which can be easily mended by custom annotation. I’d like to see standard JPA annotation for this, otherwise schema generation would not be fully functional.

Another solution is to use a single mapping like today and having hint for JPA to fetch only the ID portion of the related object. This may not work well with cascade=PERSIST when persisting new entities pointing to existing ones, although the specification allows it to work (but also allows to throw exception thanks to magical usage of “may” in the specification text). This solution preserves current query mechanics, and actually is least intrusive of them all.

Finally, the JPA specification could mandate LAZY – this is something where Hibernate seems quite reliable out of the box without any configuration hassle. On the other hand, I saw many questions asking how to make this relationship lazy in Hibernate, so it probably is not universally true. Promoting LAZY from a hint to a guaranteed thing would require JPA providers to use some bytecode magic (which is not clean from Java language perspective), but it would also end a lot of debates how to do it right.

10. Modularity, if you must

I’m convinced that people who designed and specified JPA were not aware (or were ignorant) of modern modularity trends.10 For a long time I’ve been trying to put things into domain named packages (sometimes low-level technical domains, as it happens with complex layered software) and as a next step I naturally started to split big JARs into smaller ones rather across vertical (domain) lines instead of putting everything from one layer in a single JAR.

If you have all your entities in a single JAR you have no problem altogether. But if you fancy to split the entities into multiple JARs – including their definitions in persistence.xml – then you’re venturing beyond the JPA specification. JPspec, section 8.2 Persistence Unit Packaging says:

This does not explicitly forbid using multiple persistence.xml files but I believe it implies we should not do it. It’s actually the other way around – we can have multiple persistence units in one XML, that’s a piece of cake.

Everybody knows layers

I decided to experiment with modularization of applications after I had read Java Application Architecture: Modularity Patterns with Examples Using OSGi [JAA].11 After that I couldn’t sleep that well with typical mega-JARs containing all the classes at a particular architectural level. Yeah, we modularize but only by levels.

I’ll use one of the projects from my past as an example. We had one JAR with all entity classes, sometimes it was called domain, sometimes jpa-entities (admitting the anemic domain model).

Then we had our @Service classes in a different JAR – but all of them in one – accessing entities via DAOs. Service layer was used by a presentation layer. Complex logic was pushed to specific @Components somewhere under the service layer. It was not domain-driven but it made sense for us and the dependencies flowed clearly from the top down.

Components based on features

But how about dependencies between parts of the system at the same level? When we have a lot of entity classes, some are pure business (Clients, their Transactions, financial Instruments), some are rather infrastructural (meta model, localization, audit trail). Why can’t these be separated? Most of the stuff depends on meta model, localization is quite independent in our case, audit trail needs meta model and permission module. It all clicks when one thinks about it – and it is more or less in line with modularity based on features, not on technology or layers. Sure we can still use layer separation and have permission-persistence and permission-service as well (later we will see an example where it makes a lot of sense).

Actually, this is quite a repeating question: Should we base packages by features or by layer/technology/pattern? It seems that the consensus was reached – start by feature – which can be part of your business domain. If stuff gets big you can split them into layers too, unless you find even more suitable sub-domains.

Multiple JARs with JPA entities

So we carefully try to put different entity classes into different JAR modules. Sure, we can just repackage them in the same JAR and check how tangled they are with Sonar but it is recommended to enforce the separation and to make dependencies explicit (not only in the [JAA] book). My experience confirms it quite clearly – contracts of any kind must be enforced.

And here comes the problem when your persistence is based on JPA. Because JPA clearly wasn’t designed to have a single persistence unit across multiple persistence.xml files. So what are these problems actually?

  1. How to distribute persistence.xml across these JARs? Do we even have to?
  2. What classes need to be mentioned where? E.g., we need to mention classes from an upstream JARs in persistence.xml if they are used in relations (which breaks the DRY principle).
  3. When we have multiple persistence.xml files, how to merge them in our persistence unit configuration?
  4. What about the configuration in persistence.xml? What properties are used from what file? (Little spoiler, we just don’t know reliably!) Where to put them so we don’t have to repeat ourselves again?
  5. How to use EclipseLink static weaving for all these modules? How about a module with only abstract mapped superclass (some dao-common module)?

That’s quite a lot of problems for an age when modularity is so often mentioned, too many for a technology that comes from an “enterprise” stack. And they are mostly phrased as questions – because the answers are not readily available.

Distributing persistence.xml – do we need it?

This one is difficult and may depend on the provider you use and the way you use it. We use EclipseLink and its static weaving. This requires persistence.xml. Sure we may try keep it together in some neutral module (or any path, as it can be configured for weaving plugin), but that goes against the modularity quest. What options do we have?

  • We can create some union persistence.xml in a module that depends on all needed JARs. This would be OK if we had just one such module – typically some downstream module like WAR or runnable JAR. But we have many. If we made persistence.xml for each they would contain a lot of repetition. And we’d reference downstream resource which is unacceptable.
  • We can have dummy upstream module or out of module path with union persistence.xml. This would keep things simple, but it would be more difficult to develop modules independently, maybe even with different teams.
  • Keep persistence.xml in the JAR with related classes. This seems best from the modularity point of view, but it means we need to merge multiple persistence.xml files when the persistence unit starts.
  • Or can we have different persistence unit for each persistence.xml? This is OK, if they truly are in different databases (different JDBC URL), otherwise it doesn’t make sense. In our case we have rich DB and any module can see the part it is interested in – that is entities from the JARs it has on the classpath. If you have data in different databases already, you’re probably sporting microservices anyway.

We went for the third option – especially because EclipseLink’s weaving plugin likes it and we didn’t want to redirect to non-standard path to persistence.xml – but it also seems to be the right logical way. However, there is nothing like dependency between persistence.xml files. So if we have b.jar that uses a.jar, and there is entity class B in b.jar that contains @ManyToOne to A entity from a.jar, we have to mention A class in persistence.xml in b.jar. Yes, the class is already mentioned in a.jar, of course. Again, clearly, engineers of JPA didn’t even think about possibility of using multiple JARs in a really modular way.

In any case – this works, compiles, weaves our classes during build – and more or less answers questions 1 and 2 from our problem list.

It doesn’t start anyway

When we have a single persistence.xml it will get found as a unique resource, typically in META-INF/persistence.xml – in any JAR actually (the one JPA refers to as “the root of persistence unit”). But when we have more of them they don’t get all picked and merged magically – and the application fails during startup. We need to merge all those persistence.xml files during the initialization of our persistence unit. Now we’re tackling questions 3 and 4 at once, for they are linked.

To merge all the configuration XMLs into one unit we can use this configuration for PersistenceUnitManger in Spring (using MergingPersistenceUnitManager is the key):

1 @Bean
2 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
3   MergingPersistenceUnitManager persistenceUnitManager =
4     new MergingPersistenceUnitManager();
5   persistenceUnitManager.setDefaultDataSource(dataSource);
6   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
7   // default persistence.xml location is OK, goes through all classpath*
8   return persistenceUnitManager;
9 }

Before I unveil the whole configuration we should talk about the configuration that was in the original singleton persistence.xml – which looked something like this:

 1 <exclude-unlisted-classes>true</exclude-unlisted-classes>
 2 <shared-cache-mode>ENABLE_SELECTIVE</shared-cache-mode>
 3 <properties>
 4   <property name="eclipselink.weaving" value="static"/>
 5   <property name="eclipselink.allow-zero-id" value="true"/>
 6   <!--
 7   Without this there were other corner cases when field change was ignored.
 8   This can be worked-around calling setter, but that sucks.
 9   -->
10   <property name="eclipselink.weaving.changetracking" value="false"/>
11 </properties>

The biggest question here is: What is used during build (e.g. by static weaving) and what can be put into runtime configuration somewhere else? Why somewhere else? Because we don’t want to repeat these properties in all XMLs.

Now we will take a little detour to shared-cache-mode that displays the problem with merging persistence.xml files in the most bizarre way.

Shared cache mode

We wanted to enable cached entities selectively (to ensure that @Cacheable annotations have any effect). But I made a big mistake when I created another persistence.xml file – I forgot to mention shared-cache-mode there. My persistence unit picked both XMLs (using MergingPersistenceUnitManager) but my caching went completely nuts. It cached more than expected and I was totally confused. The trouble here is – persistence.xml don’t get really merged. The lists of classes in them do, but the configurations do not. Somehow my second persistence XML became dominant (one always does!) and because there was no shared-cache-mode specified, it used defaults – which is anything EclipseLink thinks is the best. No blame there, just another manifestation that JPA people didn’t even think about this setup scenarios.

If you really want to get some hard evidence how things are in your setup put a breakpoint anywhere you can reach your configured EntityManagerFactory and when it stops there dig deeper to find what your cache mode is. Or anything else – you can check the list of known entity classes, JPA properties, anything really. And it’s much faster than guessing.

Debugging JPA EntityManagerFactory setup
Debugging JPA EntityManagerFactory setup

In the picture above we can see the following:

  1. We’re using Springs MergingPersistenceUnitManager.
  2. Persistence unit name is as expected.
  3. This is the path of “the root of the persistence unit”. It’s a directory here, but it would be a JAR in production.
  4. Our managed classes (collapsed here).
  5. Shared cache mode is set to ENABLE_SELECTIVE – as required.
  6. Other cofigured properties.

Luckily, we don’t rely on the persistence.xml root anymore, at least not for runtime configuration.

Putting the Spring configuration together

Now we’re ready to wrap up our configuration and move some of repeated configuration snippets from persistence.xml into Spring configuration – here we will use Java-based configuration (XML works too, of course).

Most of our properties were related to EclipseLink. I had read their weaving manual, but I still didn’t understand what works when and how. I had to debug some of the stuff to be really sure.

It seems that eclipselink.weaving is the crucial property namespace, that should stay in our persistence.xml, because it gets used by the plugin performing the static weaving. I debugged Maven build and the plugin definitely uses property eclipselink.weaving.changetracking (we set it to false which is not default). Funny enough, it doesn’t need eclipselink.weaving itself because running the plugin implies we wish for static weaving. During startup it gets picked though, so EclipseLink knows it can treat classes as statically weaved. This means it can be pushed into programmatic configuration.

The rest of the properties (and shared cache mode) are clearly used at the startup time. Spring configuration may then look like this:

 1 @Bean public DataSource dataSource(...) { /* as usual */ }
 2 
 3 @Bean
 4 public JpaVendorAdapter jpaVendorAdapter() {
 5   EclipseLinkJpaVendorAdapter jpaVendorAdapter = new EclipseLinkJpaVendorAdapter();
 6   jpaVendorAdapter.setShowSql(true);
 7   jpaVendorAdapter.setDatabase(
 8     Database.valueOf(env.getProperty("jpa.dbPlatform", "SQL_SERVER")));
 9   return jpaVendorAdapter;
10 }
11 
12 @Bean
13 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
14   MergingPersistenceUnitManager persistenceUnitManager =
15     new MergingPersistenceUnitManager();
16   persistenceUnitManager.setDefaultDataSource(dataSource);
17   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
18   persistenceUnitManager.setSharedCacheMode(
19     SharedCacheMode.ENABLE_SELECTIVE);
20   // default persistence.xml location is OK, goes through all classpath*
21 
22   return persistenceUnitManager;
23 }
24 
25 @Bean
26 public FactoryBean<EntityManagerFactory> entityManagerFactory(
27   PersistenceUnitManager persistenceUnitManager, JpaVendorAdapter jpaVendorAdapter)
28 {
29   LocalContainerEntityManagerFactoryBean emfFactoryBean =
30     new LocalContainerEntityManagerFactoryBean();
31   emfFactoryBean.setJpaVendorAdapter(jpaVendorAdapter);
32   emfFactoryBean.setPersistenceUnitManager(persistenceUnitManager);
33 
34   Properties jpaProperties = new Properties();
35   jpaProperties.setProperty("eclipselink.weaving", "static");
36   jpaProperties.setProperty("eclipselink.allow-zero-id",
37     env.getProperty("eclipselink.allow-zero-id", "true"));
38   jpaProperties.setProperty("eclipselink.logging.parameters",
39     env.getProperty("eclipselink.logging.parameters", "true"));
40   emfFactoryBean.setJpaProperties(jpaProperties);
41   return emfFactoryBean;
42 }

Clearly, we can set the database platform, shared cache mode and all runtime relevant properties programmatically – and we can do it at a single place. This is not a problem for a single persistence.xml, but in any case it offers better control. We can now use Spring’s @Autowired private Environment env; and override whatever we want with property files or even -D JVM arguments – and still fallback to default values – just as we do for database property of the JpaVendorAdapter. Or we can use SpEL. This is the flexibility that persistence.xml simply cannot provide.

And of course, all the things mentioned in the configuration can now be removed from all our persistence.xml files.

I’d love to get rid of eclipselink.weaving.changetracking in the XML too, but I don’t see any way how to provide this as the Maven plugin configuration option, which we have neatly unified in our parent POM. That would also eliminate some repeating.

Common base entity classes in separate JAR

The question 5 from our problem list is no problem after all the previous, just a nuisance. EclipseLink refuses to weave our base class regardless of @MappedSuperclass usage. But as mentioned in one SO question/answer, just add dummy concrete @Entity class and we are done. You’ll never use it, it is no problem at all. And you can vote for this bug.

This may be no issue for load-time weaving or for Hibernate, but I haven’t tried that so I can’t claim it reliably.

Cutting cyclic dependencies

Imagine we have two modules – users and permissions – where permissions depends on the users module. Module users contains User and Role entity while permission module contains Permission and RolePermission association that is mapped explicitly because it contains some additional information. See the following picture:

Modules `users` and `permissions`
Modules users and permissions

Obviously, Role cannot depend on RolePermission because that would cross module boundaries in a wrong direction. But when the Role is deleted we also want to remove all related RolePermission (but not Permission, of course). Imagine we have RoleDao and RolePermissionDao and both these are derived from some layer supertype named BaseDao. This handy class implements all common functionality, including delete(...) method that in the process calls common preDelete(entity) method. For convenience this is already pre-implemented, but does nothing.

Were all the classes in a single module our RoleDao could contain this:

1 @Override
2 protected void preDelete(Role entity) {
3   QRolePermission rp = QRolePermission.rolePermission;
4   new JPADeleteClause(entityManager, rp)
5     .where(rp.roleId.eq(entity.getId()))
6     .execute();
7 }

With separate modules it’s not possible because QRolePermission (just like RolePermission) is not available in the users module.

Sure, we can argue whether we have the entities split into modules properly – for instance, is Role really a concept belonging to User without considering permissions? However after moving Role to the permissions module, we would have problem with @ManyToMany relationship named roles on the User class – which is naturally the owning side of the relationship – as it would go against module dependency.

When the cycle occurs it may also indicate that one of the modules is too big. In our case we used User in the permission module, but it is not really necessary in the entities directly. It is used by upper layers of the module so maybe we should split the modules like this:

Reversing `users` and `permissions` dependency with introducing another module.
Reversing users and permissions dependency with introducing another module.

Now users module can depend on permission-persistence and bring RolePermission where it really belongs.

In the following cases we will focus on the relationship between the entities Role and Permission – this time with modules users always depending on the permissions module (which more or less matches permissions-persistence from the previous picture). Let’s see how we can model that many-to-many relationship and how it can cross module boundaries.

Unidirectional `@ManyToMany` follows the dependency.
Unidirectional @ManyToMany follows the dependency.

First we use implicit role_permission table without explicit entity. Role is a natural owner of the information about its Permissions and Permission doesn’t necessarily need the list of the roles using it. It gets interesting when we introduce RolePermission explicitly as an entity – perhaps because there are additional attributes on that association.

Explicit `RolePermission` in `users`.
Explicit RolePermission in users.

Notice that the database relational design limitations, dictating where to put the foreign key, are actually quite useful to decide which way the dependency goes. The module knowing about RolePermission table is the one that depends on the other one. If it wasn’t that way we would get invalid case with cyclic dependency like this:

Explicit `RolePermission` in `permissions` forms a cycle.
Explicit RolePermission in permissions forms a cycle.

We either need to reverse the dependency on the module or split one module into two like shown previously with permissions. If we absolutely needed this dependency from users to permissions and also the “red arrow” then something else in the users model depends on permission – which is a case for splitting users module. (This is rather unnatural, previous case for splitting permissions was much better.)

In any of these cases we still have a problem when we delete the “blind” side of the relationship. In the pictures of valid cases it was the Permission entity. Deletion of a Permission somehow needs to signal to the users module that it needs to remove relations too. For that we can use the Observer pattern that allows us to register an arbitrary callback with a DAO.

It will be implemented by already mentioned BaseDao to make it generally available. The following example demonstrates how it can be used when explicit RolePermission entity is used:

 1 // in some dao-base module
 2 public abstract class BaseDao<T> ... {
 3   private List<PreDeleteHook<T>> preDeleteHooks = new ArrayList<>();
 4 
 5   public void delete(T entity) {
 6     preDeleteHooks.forEach(h -> h.preDelete(entity));
 7     entityManager.remove(entity);
 8     ...
 9   }
10 
11   public interface PreDeleteHook<E> {
12     void preDelete(E entity);
13   }
14 
15   public void addPreDeleteHook(PreDeleteHook<T> hook) {
16     preDeleteHooks.add(hook);
17   }
18   ...
19 }
20 
21 // in users module
22 public class RolePermissionDao extends BaseDao<RolePermission> {
23 	public static final QRolePermission $ = QRolePermission.rolePermission;
24 
25   @Autowired PermissionDao permissionDao; // from downstream permissions module, OK
26 
27   @PostConstruct public void init() {
28     permissionDao.addPreDeleteHook(this::deleteByPermission);
29   }
30 
31   // this method "implements" PreDeleteHook<Permission>
32   public void deleteByPermission(Permission permission) {
33     new JPADeleteClause(entityManager, $)
34       .where($.permissionId.eq(permission.getId()))
35       .execute();
36   }
37   ...
38 }
39 
40 // no special changes in PermissionsDao, it can't know about users module anyway

Using unmanaged association table with @ManyToMany would require to find all the Roles having this permission and removing it from its permissions collection. I, again, vouch for more explicit way as that “delete all relations that use this permissions” is simpler and likely more efficient.

Conclusion

  • JPA clearly wasn’t designed with modularity in mind – especially not when modules form a single persistence unit, which is perfectly legitimate usage. Just because we use one big legacy database it doesn’t mean we don’t want to factor our code into smaller modules.
  • It is possible to distribute persistence classes into multiple JARs, and then:
    • We can go either with a single union persistence.xml, which can be downstream or upstream – this depends if we need it only in runtime or during build too.
    • I believe it is more proper to pack partial persistence.xml into each JAR, especially if we need it during build. Unfortunately, there is no escape from repeating some upstream classes in the module again just because they are referenced in relations (typical culprit when some error makes you scream “I don’t understand how this is not an entity, when it clearly is!”).
  • If we have multiple persistence.xml files, it is possible to merge them using Spring’s MergingPersistenceUnitManager. I don’t know if we can use it for non-Spring applications, but I saw this idea reimplemented and it wasn’t that hard. (If I had to reimplement it, I’d try to merge the configuration part too!)
  • When we are merging persistence.xml files it is recommended to minimize configuration in them so it doesn’t have to be repeated. E.g., for Eclipselink we leave only stuff necessary for built-time static weaving, the rest is set programmatically in our Spring @Configuration class.

There are still some open questions but I think they lead nowhere. Can I use multiple persistence units with a single data source? This way I can have each persistence.xml as a separate unit. But I doubt relationships would work across these and the same goes for transactions (without XA that is).

III Common problems

After the previous part this one will be more relaxed. We will talk about couple of typical problems related to the use of ORM/JPA, some of them stemming from the underlying relational model. We will try to load our data in an appropriate manner, avoiding executing many unnecessary statements or adding superfluous joins where possible.

Finally we will wrap it all with more advanced Querydsl topics.

Unlike in the previous part there is no recommended order of the chapters and you can read any of them as you like.

11. Avoid N+1 select

While performance tuning is not the main goal of this book we should follow some elementary performance common sense. If we can avoid an unnecessary query we should do so. With ORM/JPA we can generate a lot of needless queries without even realizing. In this chapter we will cover the most pronounced problem called N+1 select.

Anatomy of N+1 select

I’d prefer to call this problem 1+N because it mostly starts with one query that returns N rows and induces up to N additional queries. While addition is commutative, hence 1+N is the same like N+1, I’ll stick to N+1 as usually used in literature. The typical scenarios when the N+1 problem appears are:

  • Query for N entities that have eager to-one relationship – or more of them – and the provider is not smart enough to use joins.
  • Query for N entities that have eager to-many relationship and the provider is either not smart enough to use the join (again) or it is not possible to use it for other reasons like pagination. We will cover paginating of entities with to-many later in this chapter.
  • Query for N entities with lazy relationship that is triggered later, e.g. in the view as usual with the Open Session in View (OSIV) pattern.

There are probably more scenarios, but these are the most typical ones. First let’s look at the eager examples.

Eager to-one without joins

If you recall our simple example with @ManyToOne from the chapter Troubles with to-one relationships you know that to-one relationships may trigger additional fetching. These may result in DB queries or they can be found in the cache – depends on your setting – and this all must be taken into consideration.

For the next sections let’s use the following data for dogs and their owners:

Content of Owner table
id name
1 Adam
2 Charlie
3 Joe
4 Mike
Content of Dog table
id name owner_id
1 Alan 1 (Adam)
2 Beastie 1 (Adam)
3 Cessna 1 (Adam)
4 Rex 3 (Joe)
5 Lassie 3 (Joe)
6 Dunco 4 (Mike)
7 Goro NULL

Our mapping for the Dog looks like this:

 1 @Entity
 2 public class Dog {
 3 
 4   @Id
 5   @GeneratedValue(strategy = GenerationType.IDENTITY)
 6   private Integer id;
 7 
 8   private String name;
 9 
10   @ManyToOne(fetch = FetchType.EAGER)
11   @JoinColumn(name = "owner_id")
12   private Owner owner;

And for the Owner:

 1 @Entity
 2 public class Owner implements Serializable {
 3 
 4   @Id
 5   @GeneratedValue(strategy = GenerationType.IDENTITY)
 6   private Integer id;
 7 
 8   private String name;
 9 
10   @OneToMany(mappedBy = "owner")
11   private Set<Dog> dogs;

Now let’s list all the dogs with this code:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .fetch();

We get seven dogs for three different owners (one dog is not owned) but what happened on the SQL level? Both Hibernate and EclipseLink do something like this (output from EclipseLink):

1 SELECT ID, NAME, OWNER_ID FROM DOG
2 SELECT ID, NAME FROM OWNER WHERE (ID = 1)
3 SELECT ID, NAME FROM OWNER WHERE (ID = 3)
4 SELECT ID, NAME FROM OWNER WHERE (ID = 4)

That classifies as N+1 problem, although the N may be lower than the count of selected rows thanks to the persistence context. JPA providers may be persuaded to use JOIN to fetch the information but this is beyond the current version of JPA specification. EclipseLink offers @JoinFetch(JoinFetchType.OUTER) and Hibernate has @Fetch(FetchMode.JOIN) (also uses outer select). In a related demo I tried both and to my surprise EclipseLink obeyed but Hibernate did not – which went against my previous experiences that Hibernate tries to optimize queries better in general.

Now, what if we don’t need any data for owner? You may try to use lazy fetch for the data you don’t need if you know you can rely on LAZY or try some other technique described in Troubles with to-one relationships. Here entity views come to mind, but projections may be even better.

Anytime I wanted to load dogs with their owners I’d go for explicit JOIN. Let’s see how to do that properly. Even though we don’t use the owners in select it is not sufficient construct query like this:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner)
5   .fetch();

This results in an invalid JPQL query:

1 select dog
2 from Dog dog
3   left join dog.owner

While this runs on Hibernate, it fails on EclipseLink with a syntax error: An identification variable must be defined for a JOIN expression. This, indeed, is necessary according to the specification and we must add an alias like so:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner, QOwner.owner)
5   .fetch();

This results in a valid JPQL query:

1 select dog
2 from Dog dog
3   left join dog.owner as owner

But without using it in select – which we don’t want because we don’t want the list of Tuples – we end up with a query with our join, but the data for owners is still not fetched and N additional queries are executed just like before. Had we used it in the select it would be fetched, of course.

The right way to do it if we insist on the result typed as List<Dog> is this:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner).fetchJoin()
5   .fetch();

This results in a valid and correct JPQL query:

1 select dog
2 from Dog dog
3   left join fetch dog.owner

Notice we haven’t used alias this time and looking at BNF (Backus–Naur form) notation from [JPspec], section 4.4.5 Joins I believe identification_variable is allowed only for join and not for fetch_join rules. Nonetheless, both Hibernate and EclipseLink tolerate this.

Eager to-many relationships

Using EAGER on collections by default is rather risky. I’d personally not use it and use explicit joins when needed instead. In our example we will use OwnerEager entity that uses the same table like Owner (hence the same prepared data) but maps dogs collection as:

1 @OneToMany(mappedBy = "owner", fetch = FetchType.EAGER)
2 private Set<DogEager> dogs;

Now we run the following code:

1 List<OwnerEager> owners = new JPAQuery<>(em)
2   .select(QOwnerEager.ownerEager)
3   .from(QOwnerEager.ownerEager)
4   .fetch();
5 System.out.println("\nowners = " + owners);
6 for (OwnerEager owner : owners) {
7   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
8 }

SQL produced at the point fetch() is executed is (output from EclipseLink, Hibernate is similar):

1 SELECT ID, NAME FROM Owner
2 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
3 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
4 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
5 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)

And output of the subsequent print statements (manually wrapped):

1 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
2  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
3 Adam's dogs = [Dog{id=1, name='Alan', owner.id=1}, Dog{id=2,
4  name='Beastie', owner.id=1}, Dog{id=3, name='Cessna', owner.id=1}]
5 Charlie's dogs = []
6 Joe's dogs = [Dog{id=4, name='Rex', owner.id=3},
7  Dog{id=5, name='Lassie', owner.id=3}]
8 Mike's dogs = [Dog{id=6, name='Dunco', owner.id=4}]

Everything is OK, except it can all be done in a single query. Had we added a single fetch join line in the query we would have had it in a single go:

1 List<OwnerEager> owners = new JPAQuery<>(em)
2   .select(QOwnerEager.ownerEager)
3   .from(QOwnerEager.ownerEager)
4   .leftJoin(QOwnerEager.ownerEager.dogs).fetchJoin()
5   .fetch();

Join reaches for the data – notice we have to use leftJoin if the collection may be empty – and fetchJoin takes care of putting them in the collection. This is probably the best what we can explicitly do with eager collection mapping in place. Resulting SQL (here EclipseLink):

1 SELECT t1.ID, t1.NAME, t0.ID, t0.NAME, t0.owner_id
2  FROM {oj Owner t1 LEFT OUTER JOIN Dog t0 ON (t0.owner_id = t1.ID)}

In general, if eager fetch on collection is not executed using a join on the background it is not worth it. We can get the same N+1 behavior with lazy loads although it may require some explicit code to actually fetch the collection when you want – e.g. in the service layer instead of later in presentation layer with persistence context already not available.

There is one crucial difference between N+1 across to-one and to-many relationships. In case of to-one JPA provider may utilize entity cache as the ID of the needed entity is already available. In case of to-many, on the other hand, we have to execute actual select. This may use query cache – but setting that one appropriately is definitely more tricky than the entity cache. Without caches this difference is blurred away and the only difference is that for to-one we’re accessing rows in DB by its primary key, while for to-many we’re using the foreign key. PK is virtually always indexed, FK not necessarily so – but I guess it should be when we’re using this access pattern.

Perhaps there are some settings or custom annotations that make JPA provider perform the join, but it must be smart enough not to use it when limit/offset is in play. As this capability is not available in JPA – and because I believe eager collections are even worse than eager to-one – we will not delve into it any more.

Lazy relationships triggered later

Lazy fetch is a reasonable default for mapping collections – and it is default according to the specification. Because a collection can be implemented in a custom way all JPA providers offer lazy behaviour without the need for any bytecode magic.

Mapping on the Owner is natural:

1 @OneToMany(mappedBy = "owner")
2 private Set<Dog> dogs;

Now we run the code very similar to the previous section (entity classes are different, but they work on the same tables):

1 List<Owner> owners = new JPAQuery<>(em)
2   .select(QOwner.owner)
3   .from(QOwner.owner)
4   .fetch();
5 System.out.println("\nowners = " + owners);
6 for (Owner owner : owners) {
7   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
8 }

EclipseLink produces the following output (mixing SQL and standard output):

1 SELECT ID, NAME FROM OWNER
2 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
3  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
4 Adam's dogs = {IndirectSet: not instantiated}
5 Charlie's dogs = {IndirectSet: not instantiated}
6 Joe's dogs = {IndirectSet: not instantiated}
7 Mike's dogs = {IndirectSet: not instantiated}

Is that all? This may seem a bit disturbing and leads us to the question: What should toString on the lazy collection do?

Hibernate treats toString differently and prints the content as expected:

 1 Hibernate: select owner0_.id as id1_1_, owner0_.name as name2_1_
 2  from Owner owner0_
 3 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
 4  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
 5 Hibernate: select dogs0_.owner_id as owner_id3_0_0_,
 6   dogs0_.id as id1_0_0_, ...
 7  from Dog dogs0_ where dogs0_.owner_id=?
 8 Adam's dogs = [Dog{id=1, name='Alan', owner.id=1}, Dog{id=2,
 9  name='Beastie', owner.id=1}, Dog{id=3, name='Cessna', owner.id=1}]
10 ...and more for other owners

This toString triggers the lazy load process, just like other methods that do the same for EclipseLink. This means N+1 problem, of course. Lazy load is supposed to prevent loading the data that is not needed – sometimes we say that the best queries are the ones we don’t have to perform. But what if want to display a table of owners and list of their dogs in each row? Postponing the load into the view is the worst thing we can do for couple of reasons:

  • We actually don’t gain anything, we’re still stuck with N+1 problem while there are better ways to load the same data.
  • We know what data are needed and the responsibility for their loading is split between backend layer and presentation layer.
  • To allow presentation layer to load the data we need to keep persistence context open longer. This leads to the infamous Open Session in View (OSIV) pattern, or rather an antipattern.

When we need the data in the collection we should use join just like we used it for an eager collection – there is actually no difference if we use the join explicitly. But as we demonstrated previously there is the issue with paginating. And that’s what we’re going to talk about next.

Paginating with to-many

I mentioned previously that we can’t escape SQL underneath. If we join another table across to-many relationship we change the number of fetched rows and with that we cannot paginate reliably. We can demonstrate this with SQL or even with Querydsl (that is JPQL).

We will try to obtain owners and their dogs ordered by owner’s name. We will set offset to 1 and limit to 2 which should return Charlie and Joe. I realize that offset that is not multiple of limit (that is a page size) does not make best sense, but I want these two particular items for the following reasons:

  • I want to draw from the middle of our owner list, not the beginning or the end.
  • I want to see the effect of Charlie’s empty collection.
  • I want to see how Joe’s two dogs affect the results.

With our limited data this limit (2) and offset (1) demonstrates all that.

We already saw that both eager and lazy collection allows us to query the master table and let N other queries pull the data:

 1 List<Owner> owners = new JPAQuery<>(em)
 2   .select(QOwner.owner)
 3   .from(QOwner.owner)
 4   .orderBy(QOwner.owner.name.asc())
 5   .offset(1)
 6   .limit(2)
 7   .fetch();
 8 for (Owner owner : owners) {
 9   owner.getDogs().isEmpty(); // to assure lazy load even on EclipseLink
10   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
11 }

Join leads to incorrect results

We can’t fix the N+1 problem with JOIN FETCH anymore:

1 List<Owner> owners = new JPAQuery<>(em)
2   .select(QOwner.owner)
3   .from(QOwner.owner)
4   .leftJoin(QOwner.owner.dogs).fetchJoin()
5   .orderBy(o.name.asc(), d.name.asc())
6   .offset(1)
7   .limit(2)
8   .fetch();

With the same print loop this produces:

1 Adam's dogs = {[Dog{id=1, name='Alan', owner.id=1}, Dog{id=2, ...
2 Adam's dogs = {[Dog{id=1, name='Alan', owner.id=1}, Dog{id=2, ...

The same happens if we want to be explicit. Here we even use “raw” versions of entities, ad hoc join (to root entity) and explicit ON:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<Tuple> results = new JPAQuery<>(em)
 4   .select(o, d)
 5   .from(o)
 6   .leftJoin(d).on(d.ownerId.eq(o.id))
 7   .orderBy(o.name.asc(), d.name.asc())
 8   .offset(1)
 9   .limit(2)
10   .fetch();
11 for (Tuple row : results) {
12   DogRaw dog = row.get(d);
13   System.out.println(row.get(o).getName() + ", " +
14     (dog != null ? dog.getName() : null));
15 }

Now we had to change the print loop and we put the results into tuple:

1 Adam, Beastie
2 Adam, Cessna

We could just select(o) but it would still two rows of Adam. Ok, how does the whole result set looks like and what are we selecting with our limit/offset?

owner dog
Adam Alan
Adam Beastie
Adam Cessna
Charlie null
Joe Lassie
Joe Rex
Mike Dunco

This makes it more obvious. Offset and limit are not smart and they don’t care about what we want to paginate. With values 1 and 2 respectively they will give us lines 2 and 3 of whatever result we apply it to. And that is very important.

Using native SQL

Is it even possible to paginate with a single query? Yes, but we have to use subquery in a FROM or JOIN clause, neither of which is available in JPQL (as of JPA 2.1). We can try native query:

 1 // select * would be enough for EclipseLink
 2 // Hibernate would complains about duplicated sql alias
 3 Query nativeQuery = em.createNativeQuery(
 4   "SELECT o.name AS oname, d.name AS dname" +
 5     " FROM (SELECT * FROM owner LIMIT 2 OFFSET 1) o" +
 6     " LEFT JOIN dog d ON o.id=d.owner_id");
 7 List<Object[]> resultList = nativeQuery.getResultList();
 8 
 9 System.out.println("resultList = " + resultList.stream()
10   .map(row -> Arrays.toString(row))
11   .collect(Collectors.toList()));

Returned resultList has three rows:

1 resultList = [[Charlie, null], [Joe, Rex], [Joe, Lassie]]

But all rows are related to our two selected owners. We got all the data in relational form, there is no other way, it’s up to us to group the data as we need. For this we may want to add a transient dogs collection in our “raw” variant of the Owner class.

JPA/Querydsl solutions

Because of current JPQL disabilities we cannot do it all in one go, but we can do it in a single additional query:

1) Query with pagination is executed. This may return some data from the master table or perhaps only IDs of the rows. Order must be applied for pagination to work correctly. Any where conditions are applied here as well, more about it in a moment. 2) Fetch the missing data limiting the results to the scope of the previous result. The concrete implementation of this step differs depending on the nature of the first query.

Let’s say the first query just returns IDs like this:

1 QOwnerRaw o = QOwnerRaw.ownerRaw;
2 List<Integer> ownerIds = new JPAQuery<>(em)
3   .select(o.id)
4   .from(o)
5   .orderBy(o.name.asc())
6   // WHERE ad lib here
7   .offset(1)
8   .limit(2)
9   .fetch();

In this case we can get the final result like this:

 1 QDogRaw d = QDogRaw.dogRaw;
 2 Map<OwnerRaw, List<DogRaw>> ownerDogs = new JPAQuery<>(em)
 3   .select(o, d)
 4   .from(o)
 5   .leftJoin(d).on(d.ownerId.eq(o.id))
 6   .where(o.id.in(ownerIds))
 7   .orderBy(o.name.asc()) // use the same order as in select #1
 8   .orderBy(d.name.desc()) // dogs in each list ordered DESC
 9   // no limit/offset, where took care of it
10   .transform(groupBy(o).as(list(d)));
11 System.out.println("ownerDogs = " + ownerDogs);

We will talk more about transform method later in the chapter on advanced Querydsl, and its section result transformation. Obviously it is the trick that allows us to return Map instead of List.

Using the same order like in the first query is essential, additional orders are allowed. Important question is whether you need to join any to-many relationship for the first query as well. If we need to (e.g. for WHERE conditions) we have to use DISTINCT which means we also have to bring expression from ORDER BY into the SELECT clause. This is mandated by SQL, because DISTINCT must be applied before we can order the results. This complicates the first select:

 1 List<Integer> ownerIds = new JPAQuery<>(em)
 2   // If distinct is used SQL requires o.name here because it is used to
 3   // order which happens after distinct. EclipseLink can handle this,
 4   // Hibernate generated SQL fails. JPA spec is not specific on this.
 5   .select(o.id, o.name)
 6   // or perhaps just use o, that encompasses o.name
 7   // .select(o)
 8   .distinct() // needed if join across to-many is used to allow WHERE
 9   .from(o)
10   .orderBy(o.name.asc())
11   // WHERE ad lib here
12   .offset(1)
13   .limit(2)
14   .fetch()
15   .stream()
16   .map(t -> t.get(o.id))
17   // .map(Owner::getId) // if .select(o) was used
18   .collect(Collectors.toList());

We could stop at fetch and deal with the List<Tuple> but that’s not what we wanted hence the additional mapping to the list of IDs. The commented version using just o alias and more straightforward mapping to ID is probably better, but it still raises the question: Do we really need to join across to-many to add where conditions? Can’t we work around it with subquery? For example, to get owners with some dogs we can do this with join across to-many:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<OwnerRaw> owners = new JPAQuery<>(em)
 4   .select(o)
 5   .distinct()
 6   .from(o)
 7   .leftJoin(d).on(d.ownerId.eq(o.id))
 8   .where(d.isNotNull())
 9   .orderBy(o.name.asc())
10   .offset(1)
11   .limit(2)
12   .fetch();

This skips Charlie and returns Joe and Mike. We can get the same without JOIN and DISTINCT:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<OwnerRaw> owners = new JPAQuery<>(em)
 4   .select(o)
 5   .from(o)
 6   .where(new JPAQuery<>()
 7     .select(d)
 8     .from(d)
 9     .where(d.ownerId.eq(o.id))
10     .exists())
11   .orderBy(o.name.asc())
12   .offset(1)
13   .limit(2)
14   .fetch();

Subquery reads a bit longer but if we wanted to select o.id directly it would save a lot of troubles we would encounter because of the distinct/orderBy combo.

Personally I’m not a friend of the first select returning only IDs, although there may be scenarios when it’s the best solution. Now we know we can filter list of owners by dog’s attributes (that is across to-many relationship), either using subqueries or distinct clause, I’d suggest returning whole owners. We can safely join any to-one associations and return Tuple of many entities or we can enumerate all the desired columns and return custom DTO (projection). Projection would further relieve the persistence context if the data merely flow through the service layer to the presentation layer.

In the second query we don’t have to load the actual results anymore, the query is needed only for fetching the data in to-many relationship. But we fetch all of it for all the items on the page currently being displayed.

To-many relationship fetcher

To demonstrate how we can implement this fetching pattern I decided to build “quick and dirty” fluent API for the second query. It not only executes the fetch for the to-many relationship for all the items in the page, but it also merges them into a single object – which is a kind of post-processing we glossed over.

I’m not saying it can’t be done better but I think if it looked like this it would be really useful:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 List<OwnerRaw> owners = new JPAQuery<>(em)
 3   .select(o)
 4   .from(o)
 5   .orderBy(o.name.asc())
 6   .offset(1)
 7   .limit(2)
 8   .fetch();
 9 
10 QDogRaw d = QDogRaw.dogRaw;
11 List<OwnerRaw> ownersWithDogs = ToManyFetcher.forItems(owners)
12   .by(OwnerRaw::getId)
13   .from(d)
14   .joiningOn(d.ownerId)
15   .orderBy(d.name.desc())
16   .fetchAndCombine(em, OwnerRaw::setDogs);

This implies some transient dogs collection on the owner and existing setter:

1 @Transient
2 private List<DogRaw> dogs;
3 
4 //...
5 
6 public void setDogs(List<DogRaw> dogs) {
7   this.dogs = dogs;
8 }

Or we can combine both in some DTO like this (first query is still the same):

 1 // first query up to QDogRaw d = ... here
 2 List<OwnerWithDogs> ownersWithDogs = ToManyFetcher.forItems(owners)
 3   .by(OwnerRaw::getId)
 4   .from(d)
 5   .joiningOn(d.ownerId)
 6   .orderBy(d.name.desc())
 7   .fetchAs(em, OwnerWithDogs::new);
 8 
 9 // elsewhere...
10 public class OwnerWithDogs {
11   public final OwnerRaw owner;
12   public final List<DogRaw> dogs;
13 
14   OwnerWithDogs(OwnerRaw owner, List<DogRaw> dogs) {
15     this.owner = owner;
16     this.dogs = dogs;
17   }
18 }

Ok, now how to construct such an API? We can use paths of Querydsl – although we don’t use its type-safety capabilities for perfect result – rather we go for something useful without putting too many parametrized type in the code. Also, in order to make it fluent with the need of adding additional parametrized types, I broke my personal rules about code nesting – in this case the nesting of inner classes. Sorry for that, it’s up to you to do it your way, this really is just a proof of concept:

  1 import com.querydsl.core.types.*;
  2 import com.querydsl.core.types.dsl.*;
  3 import com.querydsl.jpa.impl.JPAQuery;
  4 
  5 import javax.persistence.EntityManager;
  6 import java.util.*;
  7 import java.util.function.*;
  8 import java.util.stream.Collectors;
  9 
 10 import static com.querydsl.core.group.GroupBy.*;
 11 
 12 public class ToManyFetcher<T> {
 13 
 14   private final List<T> rows;
 15 
 16   private ToManyFetcher(List<T> rows) {
 17     this.rows = rows;
 18   }
 19 
 20   public static <T> ToManyFetcher<T> forItems(List<T> rows) {
 21     return new ToManyFetcher<>(rows);
 22   }
 23 
 24   public <PK> ToManyFetcherWithIdFunction<PK> by(Function<T, PK> idFunction) {
 25     return new ToManyFetcherWithIdFunction<>(idFunction);
 26   }
 27 
 28   public class ToManyFetcherWithIdFunction<PK> {
 29     private final Function<T, PK> idFunction;
 30 
 31     public ToManyFetcherWithIdFunction(Function<T, PK> idFunction) {
 32       this.idFunction = idFunction;
 33     }
 34 
 35     public <TMC> ToManyFetcherWithFrom<TMC> from(
 36       EntityPathBase<TMC> toManyEntityPathBase)
 37     {
 38       return new ToManyFetcherWithFrom<>(toManyEntityPathBase);
 39     }
 40 
 41     public class ToManyFetcherWithFrom<TMC> {
 42       private EntityPathBase<TMC> toManyEntityPathBase;
 43       private Path<PK> fkPath;
 44       private OrderSpecifier orderSpecifier;
 45 
 46       public ToManyFetcherWithFrom(EntityPathBase<TMC> toManyEntityPathBase)
 47       {
 48         this.toManyEntityPathBase = toManyEntityPathBase;
 49       }
 50 
 51       public ToManyFetcherWithFrom<TMC> joiningOn(Path<PK> fkPath) {
 52         this.fkPath = fkPath;
 53         return this;
 54       }
 55 
 56       public ToManyFetcherWithFrom<TMC> orderBy(OrderSpecifier orderSpecifier) {
 57         this.orderSpecifier = orderSpecifier;
 58         return this;
 59       }
 60 
 61       public <R> List<R> fetchAs(
 62         EntityManager em, BiFunction<T, List<TMC>, R> combineFunction)
 63       {
 64         Map<PK, List<TMC>> toManyResults = getToManyMap(em);
 65 
 66         return rows.stream()
 67           .map(row -> combineFunction.apply(row,
 68             toManyResults.getOrDefault(
 69               idFunction.apply(row), Collections.emptyList())))
 70           .collect(Collectors.toList());
 71       }
 72 
 73       public List<T> fetchAndCombine(EntityManager em,
 74         BiConsumer<T, List<TMC>> combiner)
 75       {
 76         Map<PK, List<TMC>> toManyResults = getToManyMap(em);
 77 
 78         rows.forEach(row -> combiner.accept(row,
 79           toManyResults.getOrDefault(
 80             idFunction.apply(row), Collections.emptyList())));
 81 
 82         return rows;
 83       }
 84 
 85       private Map<PK, List<TMC>> getToManyMap(EntityManager em) {
 86         List<PK> ids = rows.stream()
 87           .map(idFunction)
 88           .collect(Collectors.toList());
 89         JPAQuery<TMC> tmcQuery = new JPAQuery<>(em)
 90           .select(toManyEntityPathBase)
 91           .from(toManyEntityPathBase)
 92           .where(Expressions.booleanOperation(
 93             Ops.IN, fkPath, Expressions.constant(ids)));
 94         if (orderSpecifier != null) {
 95           tmcQuery.orderBy(orderSpecifier);
 96         }
 97         return tmcQuery
 98           .transform(groupBy(fkPath).as(list(toManyEntityPathBase)));
 99       }
100     }
101   }
102 }

First we start with ToManyFetcher that preserves the type for the master entity (OwnerRaw in our example). I could also store EntityManager here – it is eventually needed for query execution, but I chose to put it into the terminal operation (rather arbitrary decision here). Method by returns ToManyFetcherWithIdFunction to preserve the type of the master entity primary key (ID) and also to store the function that maps entity to it. For that we need to return different type. Next from preserves the type of EntityPathBase of the detail entity (that is entity Q-class) – and again returns instance of another type that wraps all the accumulated knowledge.

Further joiningOn and orderBy don’t add any parametrized types so they just return this. While orderBy is optional, joiningOn is very important and we may construct API in a way that enforces its call – which did not happen in my demo. Finally we can choose from two handy fetch operations depending on whether we want to combine the result on the master entity class (using a setter for instance) or in another type (like DAO).

Java 8 is used nicely here, especially when the API is used. It can certainly be done without streams but using inner classes instead of lambdas would by and fetch calls arguably unwieldy. Trying to do it in a single type would require defining all the parametrized types at once – either explicitly using <....> notation or using a call with multiple arguments.

I tried both and this second fluent approach is much better as the names of fluent calls introduce the parameter and make the code virtually self-documented. I hope this demonstrates how nicely we can use Querydsl API (and Java 8!) for recurring problems.

Wrapping up N+1 problem

N+1 problem can occur when the load of multiple (N) entities subsequently triggers load of their related data one by one. The relations may be to-one or to-many and there may be be multiple of these. The worst case is when the data loaded by up to N additional queries are not used at all. This happens often with to-one relationships which I personally solve with raw FK values – as proposed in the chapter Removing to-one altogether. To-many relationships should be declared as lazy for most cases to avoid “loading the whole DB”.

Needed related data should be loaded explicitly, that is with joins, when we need the data. Scenarios where conscious lazy load is appropriate are rather rare. Scenarios where people either don’t know or are lazy to think are less rare and that’s what leads to N+1 so often. N queries are not necessary in most cases and don’t scale very well with the page size.

For data across to-one relationships there are rarely any negative implications when we join them. With to-many situation gets a little bit messy, especially if we need data across multiple such relationships. If we don’t need pagination we may use join and then group the data appropriately, but with pagination we have to ensure that the driving query on the master entity is paginated correctly. With JPA/JPQL this is typically not possible in a single select and each to-many relationship requires an additional query – but it is possible to keep query count constant and not driven by number of items.

In any case, letting lazy loading doing the job we should do in cases when the data gets loaded anyway is sloppy. Using Open Session in View (OSIV) on top of it instead of having proper service layer contract is plain hazardous.

12. Mapping Java enum

This chapter is based on my two blog posts.

I adapted them for the book and our typical model of dogs and also refined them a bit – there is no need to read the original posts.

Before we embark on our journey we should mention that Java enums are not necessarily the best way to model various code lists and other enumerations. We will tackle this at the end of the chapter.

Typical way how an enumeration is stored in a DB is a numeric column mapped to enum. We all know that using EnumType.ORDINAL is one of those worse solutions as values returned by ordinal() get out of control quickly as we evolve the enum – and most of them do. EnumType.STRING is not loved by our DB admins (if you have them) – even though it’s easy to read for sure. Normalization suffers, enum constant renaming is a bit problem too. So we want to map it to numeric (for instance) but not ordinal value. In the entity class we can map it as a raw Integer or as an enum type – one way or the other we will need to convert back and forth.

We will map them as enum and demonstrate JPA 2.1 converters in the process – AttributeConverter<X,Y> to be precise.

Only the final version of the code as we get to it at the end of this chapter is included in the repository.

Naive approach

Let’s have our simple enum:

1 public enum Gender {
2   MALE,
3   FEMALE,
4   OTHER
5 }

And we have our entity that is mapping numeric column to our enum:

1 @Entity
2 public class Dog {
3   @Id private Integer id;
4 
5 //..
6   @Convert(converter = GenderConverter.class)
7   @Column(name = "gender")
8   private Gender gender;
9 }

This entity will be the same throughout all of our solutions, so we will not repeat it. Important line is the one with @Convert annotation that allows us to do conversion. All we have to do now is to implement the converter:

 1 import javax.persistence.AttributeConverter;
 2 import javax.persistence.Converter;
 3 
 4 /**
 5  * This is coupled too much to enum and you always have to change both
 6  * classes in tandem. That's a big STOP (and think) sign in any case.
 7  */
 8 @Converter
 9 public class GenderConverter implements AttributeConverter<Gender, Integer> {
10   @Override
11   public Integer convertToDatabaseColumn(Gender someEntityType) {
12     switch (someEntityType) {
13       case MALE:
14         return 0;
15       case FEMALE:
16         return 1;
17       default:
18         // do we need this?  it catches forgotten case when enum is modified
19         throw new IllegalArgumentException("Invalid value " + someEntityType);
20         // the value is valid, just this externalized switch sucks of course
21     }
22   }
23 
24   @Override
25   public Gender convertToEntityAttribute(Integer dbValue) {
26     switch (dbValue) {
27       case 0:
28         return Gender.MALE;
29       case 1:
30         return Gender.FEMALE;
31       case 2:
32         return Gender.OTHER;
33     }
34     // now what? probably exception would be better just to warn programmer
35     return null;
36   }
37 }

I revealed the problems in the comments. There may be just one reason to do it this way – when we need the enum independent from that dbValue mapping. This may be reasonable if the enum is used in many other contexts. But if storing it in the database is typical we should go for more cohesive solution. And in this case we will be just fine with encapsulation and put the stuff that changes on one place – into the enum.

Encapsulated conversion

We still need to implement AttributeConverter because it is kind of glue between JPA and our class. But there is no reason to leave the actual mapping in this infrastructure class. So let’s enhance the enum to keep the converter simple. Converter we want to see looks like this:

 1 @Converter
 2 public class GenderConverter implements AttributeConverter<Gender, Integer> {
 3   @Override
 4   public Integer convertToDatabaseColumn(Gender gender) {
 5     return gender.toDbValue();
 6   }
 7 
 8   @Override
 9   public Gender convertToEntityAttribute(Integer dbValue) {
10     // this can still return null unless it throws IllegalArgumentException
11     // which would be in line with enums static valueOf method
12     return Gender.fromDbValue(dbValue);
13   }
14 }

Much better, we don’t have to think about this class at all when we add values to enum. The complexity is now here, but it’s all tight in a single class:

 1 public enum Gender {
 2   MALE(0),
 3   FEMALE(1),
 4   OTHER(-1);
 5 
 6   private final Integer dbValue;
 7 
 8   Gender(Integer dbValue) {
 9     this.dbValue = dbValue;
10   }
11 
12   public Integer toDbValue() {
13     return dbValue;
14   }
15 
16   public static final Map<Integer, Gender> dbValues = new HashMap<>();
17 
18   static {
19     for (Gender value : values()) {
20       dbValues.put(value.dbValue, value);
21     }
22   }
23 
24   public static Gender fromDbValue(Integer dbValue) {
25     // this returns null for invalid value,
26     // check for null and throw exception if you need it
27     return dbValues.get(dbValue);
28   }
29 }

I saw also some half-solutions without the static reverse resolving, but I hope we all agree it goes into the enum. If it’s two value enum, you may start with switch in fromDbValue, but that’s just another thing to think about – and one static map will not kill anyone.

Now this works, so let’s imagine we need this for many enums. Can we find some common ground here? I think we can.

Conversion micro-framework

Let’s say we want to have order in these things so we will require the method named toDbValue. Our enums will implement interface ConvertedEnum:

 1 /**
 2  * Declares this enum as converted into database, column value of type Y.
 3  *
 4  * In addition to implementing {@link #toDbValue()} converted enum should also
 5  * provide static method for reverse conversion, for instance
 6  * {@code X fromDbValue(Y)}. This one should throw {@link
 7  * IllegalArgumentException} just as {@link Enum#valueOf(Class, String)} does.
 8  * Check {@link EnumAttributeConverter} for helper methods that can be used
 9  * during reverse conversion.
10  */
11 public interface ConvertedEnum<Y> {
12     Y toDbValue();
13 }

It’s parametrized, hence flexible. Javadoc says it all – we can’t enforce the static stuff, because that’s how Java works. While I suggest that reverse fromDbValue should throw IllegalArgumentException, I’ll leave it to return null for now – just know that I’m aware of this. I’d personally go strictly for exception but maybe you want to use this method elsewhere in the code and null works fine for you. We will enforce the exception in our converter instead.

What are the changes in the enum? Minimal really, just add implements ConvertedEnum<Integer> and you can add @Override for toDbValue method. Not worth the listing. Now to utilize all this we need a base class implementing AttributeConverter – here it goes:

 1 /**
 2  * Base implementation for converting enums stored in DB.
 3  * Enums must implement {@link ConvertedEnum}.
 4  */
 5 public abstract class EnumAttributeConverter<X extends ConvertedEnum<Y>, Y>
 6   implements AttributeConverter<X, Y>
 7 {
 8   @Override
 9   public final Y convertToDatabaseColumn(X enumValue) {
10     return enumValue != null ? enumValue.toDbValue() : null;
11   }
12 
13   @Override
14   public final X convertToEntityAttribute(Y dbValue) {
15     return dbValue != null ? notNull(fromDbValue(dbValue), dbValue) : null;
16   }
17 
18   protected abstract X fromDbValue(Y dbValue);
19 
20   private X notNull(X x, Y dbValue) {
21     if (x == null) {
22       throw new IllegalArgumentException("No enum constant" +
23         (dbValue != null ? (" for DB value " + dbValue) : ""));
24     }
25     return x;
26   }
27 }

With this our concrete converters only need to provide reverse conversion that requires static method call to fromDbValue:

1 @Converter
2 public class GenderConverter extends EnumAttributeConverter<Gender, Integer> {
3   @Override
4   protected Gender fromDbValue(Integer dbValue) {
5     return Gender.fromDbValue(dbValue);
6   }
7 }

What a beauty suddenly! Converter translates null to null which works perfectly with optional attributes (nullable columns). Null checking should be done on the database level with optional validation in an application as necessary. However, if invalid non-null value is provided we will throw aforementioned IllegalArgumentException.

Refinement?

And now the last push. What is repeating? And what we can do about it?

It would be cool to have just a single converter class – but this is impossible, because there is no way how to instruct the converter about its types. Especially method convertToEntityAttribute is immune to any approach because there is nothing during runtime that can tell you what the expected enum type would be. No reflection or anything helps here, so it seems.

So we have to have separate AttributeConverter classes, but can we pull convertToEntityAttribute into our EnumAttributeConverter? Not easily really, but we’ll try something.

How about the static resolving? Can we get rid of that static initialization block in our enum? It is static, so it seems difficult to abstract it away – but indeed, we can do something about it.

Let’s try to hack our converters first. We need to get the type information into the instance of the superclass. It can be protected field like this:

1 public abstract class EnumAttributeConverter<X extends ConvertedEnum<Y>, Y>
2   implements AttributeConverter<X, Y>
3 {
4   protected Class<X> enumClass;

And subclass would initialize it this way:

1 public class GenderConverter extends EnumAttributeConverter<Gender, Integer> {
2   {
3     enumClass = Gender.class;
4   }

But this is not enforced in any way! Rather use abstract method that must be implemented. In abstract converter:

1 protected abstract Class<X> enumClass();

And in concrete class:

1 public class GenderConverter extends EnumAttributeConverter<Gender, Integer> {
2   @Override
3   protected Class<Gender> enumClass() {
4     return Gender.class;
5   }
6 }

But we’re back to 3 lines of code (excluding @Override) and we didn’t get to ugly unified convertToEntityAttribute:

 1 @Override
 2 public X convertToEntityAttribute(Y dbValue) {
 3   try {
 4     Method method = enumClass().getMethod("fromDbValue", dbValue.getClass());
 5     return (X) method.invoke(null, dbValue);
 6   } catch (IllegalAccessException | InvocationTargetException |
 7     NoSuchMethodException e)
 8   {
 9     throw new IllegalArgumentException("...this really doesn't make sense", e);
10   }
11 }

Maybe I missed something on the path, but this doesn’t sound like good solution. It would be if it lead to unified converter class, but it did not. There may be one more problem with hunt for unified solution. While the concrete implementation contains methods that have concrete parameter and return types, unified abstract implementation don’t. They can use the right types during runtime, but the method wouldn’t tell you if you used reflection. Imagine JPA checking this. Right now I know that unified public final Y convertToDatabaseColumn(X x) {...} works with EclipseLink, but maybe we’re asking for problems. Let’s check it really:

1 // throws NoSuchMethodException
2 // Method method = converter.getClass().getMethod(
3 //   "convertToDatabaseColumn", Integer.class);
4 Method method = converter.getClass().getMethod(
5   "convertToDatabaseColumn", Object.class);
6 System.out.println("method = " + method);

This prints:

1 method = public java.lang.Object
2   EnumAttributeConverter.convertToDatabaseColumn(java.lang.Object)

If something strictly matches the method types with column/enum types, we may have a misunderstanding because our method claims it converts Object to Object. Sometimes too smart may be just that – way too smart.

Simplified static resolving

Anyway, let’s look into that enum’s static resolving. This is actually really useful. Without further ado, this is how enum part may look like:

1 // static resolving:
2 public static final ConvertedEnumResolver<Gender, Integer> resolver =
3   new ConvertedEnumResolver<>(Gender.class);
4 
5 public static Gender fromDbValue(Integer dbValue) {
6     return resolver.get(dbValue);
7 }

So we got rid of the static map and static initializer. But the code must be somewhere:

 1 package support;
 2 
 3 import java.util.HashMap;
 4 import java.util.Map;
 5 
 6 /**
 7  * Helps reverse resolving of {@link ConvertedEnum} from a DB value back to
 8  * enum instance. Enums that can be resolved this way must have unified
 9  * interface in order to obtain {@link ConvertedEnum#toDbValue()}.
10  *
11  * @param <T> type of an enum
12  * @param <Y> type of DB value
13  */
14 public class ConvertedEnumResolver<T extends ConvertedEnum<Y>, Y> {
15 
16   private final String classCanonicalName;
17   private final Map<Y, T> dbValues = new HashMap<>();
18 
19   public ConvertedEnumResolver(Class<T> enumClass) {
20     classCanonicalName = enumClass.getCanonicalName();
21     for (T t : enumClass.getEnumConstants()) {
22       dbValues.put(t.toDbValue(), t);
23     }
24   }
25 
26   public T get(Y dbValue) {
27     T enumValue = dbValues.get(dbValue);
28     if (enumValue == null) {
29       throw new IllegalArgumentException("No enum constant for dbValue " +
30         dbValue + " in " + classCanonicalName);
31     }
32     return enumValue;
33   }
34 }

And this I actually really like. Here I went for strict checking, throwing exception. Without it or without needing/wanting the type name it could be even shorter. But you write this one once and save in each enum.

So there are three players in this “framework”:

  • ConvertedEnum – interface for your enums that are converted in this way.
  • ConvertedEnumResolver – wraps the reverse mapping and saves you most of the static lines in each converted enum.
  • EnumAttributeConverter – implements both-way conversion in a unified way, we just need to implement its abstract fromDbValue method. Just be aware of potential problems if something introspect the method parameter and return types because these are “generified”.

Alternative conversions within entities

So far we were demonstrating this all in the context of JPA 2.1. I’d like to at least mention alternative solutions for older JPA versions.

If you use AccessType.FIELD then the fields must be strictly in a supported type (that is some raw type, e.g. Integer for enum), but accessors can do the conversion. If you use AccessType.PROPERTY than the fields can be in your desired type and you can have two sets of accessors – one for JPA and other working with the type you desire. You need to mark the latter as @Transient (or break get/set naming convention).

Long time ago we used this technique to back Dates by long fields (or Long if nullable). This way various tools (like Sonar) didn’t complain about silly mutable Date breaking the encapsulation when used directly in get/set methods, because it was not stored directly anymore. JPA used get/set for Date, and we had @Transient get/set for millis long available. We actually preferred comparing millis to before/after methods on Date but that’s another story. The same can be used for mapping enums – just have JPA compatible type mapped for JPA and @Transient get/set with enum type. Most of the stuff about enum encapsulation and static resolving still applies.

Java 8 and reverse enum resolution

I focused mostly on conversion to values and back in the context of JPA 2.1 converters. Now we’ll focus on the part that helped us with “reverse resolution”. We have the value – for instance an int, but not ordinal number, of course! – that represents particular enum instance in the database and you want that enum instance. Mapping between these values and enum instances is a bijection, of course.

Our solution so far works fine but there are two limitations:

  • ConvertedEnumResolver depends on the common interface ConvertedEnum our enums must implement hence it is implicitly tied to the conversion framework.
  • If we need another representation (mapping) to different set of values we have to develop new resolver class.

The first one may not be a big deal. The second one is a real thing though. I worked on a system where enum was represented in the DB as int and elsewhere as String – but we didn’t want to tie it to the Java enum constant strings. New resolver class was developed – but is another class really necessary?

Everything in the resolver class is the same – except for the instance method getting the value on the enum. We didn’t want to use any reflection, but we had recently switched to Java 8 and I’d heard it has these method references! If we can pass it to the resolver constructor… is it possible?

Those who know Java 8 know that the answer is indeed positive. Our new resolver will look like this (I renamed it since the last time):

 1 /**
 2  * Helps reverse resolving of enums from any value back to enum instance.
 3  * Resolver uses provided function that obtains value from enum instance.
 4  *
 5  * @param <T> type of an enum
 6  * @param <Y> type of a value
 7  */
 8 public final class ReverseEnumResolver<T extends Enum, Y> {
 9   private final String classCanonicalName;
10   private final Map<Y, T> valueMap = new HashMap<>();
11 
12   public ReverseEnumResolver(Class<T> enumClass,
13     Function<T, Y> toValueFunction)
14   {
15     classCanonicalName = enumClass.getCanonicalName();
16     for (T t : enumClass.getEnumConstants()) {
17       valueMap.put(toValueFunction.apply(t), t);
18     }
19   }
20 
21   public T get(Y value) {
22     T enumVal = valueMap.get(value);
23 
24     if (enumVal == null) {
25       throw new IllegalArgumentException("No enum constant for '" +
26         value + "' in " + classCanonicalName);
27     }
28     return enumVal;
29   }
30 }

There is no conversion mentioned anymore. Just reverse resolving. So, our new enum does not have to be “Converted” anymore – unless you still need this interface for JPA conversion (which you may). Let’s show complete listing of our Gender enum that this time can be converted in two sets of values. Notice those method references:

 1 public enum Gender implements ConvertedEnum<Integer> {
 2   MALE(0, "mal"),
 3   FEMALE(1, "fem"),
 4   OTHER(-1, "oth");
 5 
 6   private final Integer dbValue;
 7   private final String code;
 8 
 9   Gender(Integer dbValue, String code) {
10     this.dbValue = dbValue;
11     this.code = code;
12   }
13 
14   @Override
15   public Integer toDbValue() {
16     return dbValue;
17   }
18 
19   public String toCode() {
20     return code;
21   }
22 
23   // static resolving:
24   public static final ReverseEnumResolver<Gender, Integer> resolver =
25     new ReverseEnumResolver<>(Gender.class, Gender::toDbValue);
26 
27   public static Gender fromDbValue(Integer dbValue) {
28     return resolver.get(dbValue);
29   }
30 
31   // static resolving to string:
32   public static final ReverseEnumResolver<Gender, String> strResolver =
33     new ReverseEnumResolver<>(Gender.class, Gender::toCode);
34 
35   public static Gender fromCode(String code) {
36     return strResolver.get(code);
37   }
38 }

So this is how we can have two different values (dbValue and code) resolved with the same class utilizing Java 8 method references.

If we wanted to store dbValue into one DB table and code into another table we would need two different converters. At least one of them wouldn’t be able to utilize ConvertedEnum contract anymore. Being on Java 8 I’d again use method references in converter subclass to specify both conversion methods.

Let’s say our universal new AbstractAttributeConverter looks like this:

 1 /** Base implementation for converting values in DB, not only for enums. */
 2 public abstract class AbstractAttributeConverter<X, Y>
 3   implements AttributeConverter<X, Y>
 4 {
 5   private final Function<X, Y> toDbFunction;
 6   private final Function<Y, X> fromDbFunction;
 7 
 8   public AbstractAttributeConverter(
 9     Function<X, Y> toDbFunction, Function<Y, X> fromDbFunction)
10   {
11     this.toDbFunction = toDbFunction;
12     this.fromDbFunction = fromDbFunction;
13   }
14 
15   @Override
16   public final Y convertToDatabaseColumn(X enumValue) {
17     return enumValue != null ? toDbFunction.apply(enumValue) : null;
18   }
19 
20   @Override
21   public final X convertToEntityAttribute(Y dbValue) {
22     return dbValue != null ? fromDbFunction.apply(dbValue) : null;
23   }
24 }

Notice that I dropped previous notNull enforcing method because our reverse resolver does it already (it still allows null values). The converter for Gender enum would be:

1 @Converter
2 public class GenderConverter extends AbstractAttributeConverter<Gender, Integer> {
3   public GenderConverterJava8() {
4     super(Gender::toDbValue, Gender::fromDbValue);
5   }
6 }

The only thing the abstract superclass does now (after dropping notNull) is that ternary expression treating null values. So you may consider it overkill as of now. On the other hand it prepares everything in such a way that concrete converter is really declarative and as concise as possible (considering Java syntax). Also notice that we made ConvertedEnum obsolete.

In this last section we underlined how handy some Java 8 features can get. Method references allowed us to create more flexible reverse resolver which we can use for multiple mappings, it allowed us to declare concrete attribute converters in a super short manner where nothing is superfluous and we could get rid of contract interface for our enums.

Relational concerns

Mapping enumerated values from Java to DB is a more complex issue than just the conversion. The enumeration can be based on an existing dedicated table or it can exist exclusively in Java world. It may or may not change over time. All these factors play a role in how complicated things can get.

If DB is “just a store” and application is driving the design you may be happy with encoding enums into a DB column as discussed throughout this chapter. But if we want to query the data using meaningful values (like MALE instead of 0) or we want to get this meaningful description into the outputs of SQL queries we’re asking for complications.

We may use enum names directly which is supported by JPA directly using EnumType.STRING but this has a couple of disadvantages:

  • It takes more space than necessary.
  • In case we decide to rename the constant we have to synchronize it with the code deployment. Alternatively we can use converter to support both values in DB for some time – this may be necessary when we’re not able to rename the data quickly, e.g. there’s too much of it.

Actually, if we mention renaming in front of our DB admins they will want to put enum values into another table as it makes renaming a non-issue (at least from DB perspective). We use FK instead of the names. We can also use a single lookup table for all enum types or even more sophisticated meta-data facility. In every case it comes with a cost though and it virtually always violates DRY principle (don’t repeat yourself).

Often we are extending a legacy application where the DB schema is a given and we have to deal with its design. For some code list tables we may mirror the entries in an enum because it does not change that much (e.g. gender). This will save us a lot of troubles in the Java code, giving us all the power of enums. Other lists are not good candidates (e.g. countries or currencies) as they tend to change occasionally and are also much longer. These should be mapped normally as entities.

In this chapter we focused only on mapping enums to arbitrary values and back, but before mapping something that looks like an enumeration into the Java code always think twice whether Java enum is the best way. Especially if that enumeration has its own dedicated table.

13. Advanced Querydsl

At the beginning of the chapter dedicated to some of the more advanced points of Querydsl I’d like to reiterate that Querydsl has quite a good documentation. It’s not perfect and does not cover all the possibilities of the library – even in this chapter we will talk about some things that are not presented there. I tried to verify these not only by personal experience but also discussing some of them on the Querydsl Google Group. Other problems were solved thanks to StackOverflow, often thanks to answers from the authors of the library personally.

As mentioned in the first chapter about Querydsl you can find the description of a demonstration project in the appendix Project example. Virtually all code examples on GitHub for this book are based on that project layout so it should be easy to replicate.

Introducing handy utility classes

Most of the time using Querydsl we work with our generated Q-classes and JPAQuery class. There are cases, however, when the fluent API is not enough and we need to add something else. This typically happens in the where part and helps us putting predicates together, but there are other cases too. Let’s quickly mention the classes we should know – our starting points for various minor cases. Some of them will be just mentioned here, others will be discussed later in this chapter.

  • Expressions and ExpressionUtils are probably the most useful classes allowing creation of various Expressions. These may be functions (like current date) or various Predicates (which is Expression subclass).
  • JPADeleteClause and JPAUpdateClause (both implementing DMLClause) are natural complement to our well-known JPAQuery when we want to modify the data.
  • BooleanBuilder mutable predicate expression for building AND or OR groups from arbitrary number of predicates – discussed later in Groups of AND/OR predicates.
  • GroupByBuilder is a fluent builder for GroupBy transformer instances. This class is not to be used directly, but via GroupBy – see Result transformation.
  • CaseBuilder and its special case CaseForEqBuilder – not covered in this book but quite easy to use (see it in reference documentation).
  • PathBuilder is an extension to EntityPathBase for dynamic path construction.

Builders are to be instantiated to offer fluent API where it is not naturally available – like right after opening parenthesis. They don’t necessarily follow builder pattern as they often implement what they suppose to build and don’t require some final build method call. Other classes offer bunch of convenient static methods.

Tricky predicate parts

Writing static queries in Querydsl is nice but Querydsl is very strong beyond that. We can use Querydsl expressions (implementations of com.querydsl.core.types.Expression) to create dynamic predicates, or to define dynamic lists of returned columns. It’s easy to build your own “framework” around these – easy and useful, as many of the problems occur again and again.

In a small project we may be satisfied with distinct code for various queries. Let’s say we have three screens with lists with couple of filtering criteria. There is hardly a need to invest into some universal framework. We simply have some base query and we just slap couple of conditions to it like so:

1 if (filter.getName() != null) {
2   query.where(d.name.eq(filter.getName()));
3 }
4 if (filter.getBirth() != null) {
5   query.where(d.birthdate.isNull()
6     .or(d.birthdate.goe(LocalDate.now())));
7 }

Each where call adds another condition into an implicit top-level AND group. Condition for birthdate will be enclosed in parenthesis to preserve the intended meaning.

You can toString the query (or just its where part, or any Querydsl expression for that matter), this one will produce this output:

1 select dog
2 from Dog dog
3 where dog.name = ?1 and (dog.birthdate is null or dog.birthdate >= ?2)

But what if we have a project with many list/table views with configurable columns and filtering conditions for each column? This definitely calls for a framework of sort. We may still have predefined query objects but predicate construction should be unified. We may even drop query objects and have dynamic way how to add joins to a root entity. This, however, may be a considerable endeavour even though it may be well worth the effort in the end. My experiences in this area are quite extensive and I know that Querydsl will support you greatly.1

Groups of AND/OR predicates

Sometimes the structure of a filter gives us multiple values. If we try to find a dog with any of provided names we can use single in, but often we have to create a predicate that joins multiple conditions with or or and. How to do that?

If we want to code it ourselves, we can try something like (not a good version):

 1 private static Predicate naiveOrGrouping(List<Predicate> predicates) {
 2   if (predicates.isEmpty()) {
 3     return Expressions.TRUE; // no condition, accepting all
 4   }
 5 
 6   Predicate result = null;
 7   for (Predicate predicate : predicates) {
 8     if (result == null) {
 9       result = predicate;
10     } else {
11       result = Expressions.booleanOperation(Ops.OR, result, predicate);
12     }
13   }
14 
15   return result;
16 }

The first thing to consider is what to return when the list is empty. If thrown into a where clause TRUE seems to be neutral option – but only if combined with other predicates using AND. If OR was used it would break the condition. However, Querydsl is smart enough to treat null as neutral value so we can simply remove the first three lines:

 1 private static Predicate naiveOrGrouping(List<Predicate> predicates) {
 2   Predicate result = null;
 3   for (Predicate predicate : predicates) {
 4     if (result == null) {
 5       result = predicate;
 6     } else {
 7       result = Expressions.booleanOperation(Ops.OR, result, predicate);
 8     }
 9   }
10 
11   return result;
12 }

Try it in various queries and see that it does what you’d expect, with empty list not changing the condition.

Notice that we use Expressions.booleanOperation to construct the OR operation. Fluent or is available on BooleanExpression but not on its supertype Predicate. You can use that subtype but Expressions.booleanOperation will take care of it.

There is a shorter version using a loop:

1 private static Predicate naiveOrGroupingSimplified(List<Predicate> predicates) {
2   Predicate result = null;
3   for (Predicate predicate : predicates) {
4     result = ExpressionUtils.or(result, predicate);
5   }
6   return result;
7 }

This utilizes ExpressionUtils.or which is extremely useful because – as you may guess from the code – it treats null cases for us in a convenient way. This may not be a good way (and you can use more strict Expressions.booleanOperation then) but here it’s exactly what we want.

Now this all seems to be working fine but we’re creating tree of immutable expressions and using some expression utils “magic” in a process (good to learn and un-magic it if you are serious with Querydsl). Because joining multiple predicates in AND or OR groups is so common there is a dedicated tool for it – BooleanBuilder. Compared to what we know we don’t save that much:

1 private static Predicate buildOrGroup(List<Predicate> predicates) {
2   BooleanBuilder bb = new BooleanBuilder();
3   for (Predicate predicate : predicates) {
4     bb.or(predicate);
5   }
6   return bb;
7 }

Good thing is that the builder also communicates our intention better. It actually uses ExpressionUtils internally so we’re still creating all those objects behind the scene. If you use this method in a fluent syntax you may rather return BooleanBuilder so you can chain other calls on it.

However, in all the examples we used loop for something so trivial – collection of predicates coming in and we wanted to join them with simple boolean operation. Isn’t there even better way? Yes, there is:

1 Predicate orResult = ExpressionUtils.anyOf(predicates);
2 Predicate andResult = ExpressionUtils.allOf(predicates);

There are both collection and vararg versions so we’re pretty much covered. Other previously mentioned constructs, especially BooleanBuilder are still useful for cases when the predicates are created on the fly inside of a loop, but if we have predicates ready ExpressionUtils are there for us.

Operator precedence and expression serialization

Let’s return to the first condition from this chapter as added if both ifs are executed. The result was:

1 where dog.name = ?1 and (dog.birthdate is null or dog.birthdate >= ?2)

If we wanted to write the same condition in one go, fluently, how would we do it?

1 System.out.println("the same condition fluently (WRONG):\n" + dogQuery()
2   .where(QDog.dog.name.eq("Rex")
3     .and(QDog.dog.birthdate.isNull())
4     .or(QDog.dog.birthdate.goe(LocalDate.now()))));

Previous code produces result where AND would be evaluated first on the SQL level:

1 where dog.name = ?1 and dog.birthdate is null or dog.birthdate >= ?2

We can easily fix this. We opened the parenthesis after and in the Java code and we can nest the whole OR inside:

1 System.out.println("\nthe same condition fluently:\n" + dogQuery()
2   .where(QDog.dog.name.eq("Rex")
3     .and(QDog.dog.birthdate.isNull()
4       .or(QDog.dog.birthdate.goe(LocalDate.now())))));

This gives us what we expected again with OR grouped together.

1 where dog.name = ?1 and (dog.birthdate is null or dog.birthdate >= ?2)

Now imagine we wanted to write the OR group first:

1 System.out.println("\nthe same condition, different order:\n" + dogQuery()
2   .where(QDog.dog.birthdate.isNull()
3     .or(QDog.dog.birthdate.goe(LocalDate.now()))
4     .and(QDog.dog.name.eq("Rex"))));

This means we started our fluent predicate with an operator with a lower precedence. The result:

1 where (dog.birthdate is null or dog.birthdate >= ?1) and dog.name = ?2

Wow! That went well – but you may as well wonder how? In the Java code it flows seemingly on the same level, first OR then AND but the generated query didn’t put it on the same level. In the previous example we could nest OR inside the AND on Java syntactic level (using parenthesis) – but how does it work here?

The answer is actually pretty simple and it stems from Java syntax. If we add redundant parenthesis around OR it will be absolutely clear:

1 .where(
2   (QDog.dog.birthdate.isNull()
3     .or(QDog.dog.birthdate.goe(LocalDate.now()))
4   ).and(QDog.dog.name.eq("Rex"))));

Now we know what happened. As the calls are chained whatever before another call on the same level is already grouped together as an expression. We can imagine the same parenthesis in the generated JPQL as well, but if they are redundant they will be removed – this is the case when AND goes first, but not the case when we start with OR. Parenthesis were added in the JPQL in order to preserve the evaluation order.

In the following code all “expression-snakes” mean the same:

1 e1.or(e2).and(e3).or(e4)
2 (e1.or(e2)).and(e3).or(e4)
3 ((e1.or(e2)).and(e3)).or(e4)

We just have to realize that the first one is not the same like SQL:

1 e1 OR e2 AND e3 OR e4

AND has the highest precedence and is evaluated first. In Querydsl it’s very easy to play with expressions as you don’t need any entities at all and you don’t have to touch the database either (don’t mind the BooleanTemplate, it’s just a kind of Querydsl Expression):

1 BooleanTemplate e1 = Expressions.booleanTemplate("e1");
2 BooleanTemplate e2 = Expressions.booleanTemplate("e2");
3 BooleanTemplate e3 = Expressions.booleanTemplate("e3");
4 BooleanTemplate e4 = Expressions.booleanTemplate("e4");
5 System.out.println("\ne1.or(e2).and(e3).or(e4) = " +
6   e1.or(e2).and(e3).or(e4));
7 System.out.println("\ne1.or(e2).and(e3).or(e4) = " + new JPAQuery<>()
8   .where(e1.or(e2).and(e3).or(e4)));

This produces:

1 e1.or(e2).and(e3).or(e4) = (e1 || e2) && e3 || e4
2 
3 e1.or(e2).and(e3).or(e4) = where (e1 or e2) and e3 or e4

The first line is not much SQL like but if we create just an empty query with where we will get nice SQL-like result (although the whole query is obviously invalid). We see that the first OR group is enclosed in parenthesis, just as we saw it in one of the examples before.

Finally, let’s try something even more complicated - let’s say we need:

1 ((e1 OR e2) AND (e3 OR e4)) OR (e5 AND e6)

For that we write this in Java:

1 .where(e1.or(e2).and(e3.or(e4)).or(e5.and(e6))))

Let’s check the output:

1 (e1 or e2) and (e3 or e4) or e5 and e6

Yup, it’s the one, unnecessary parenthesis around AND expressions are omitted but it will work. Knowing this we can now write expressions confidently and visualize the resulting JPQL/SQL.

The bottom line is: Precedence as we know it from SQL does not matter – Java expression evaluation order does.

Querydsl Expression hierarchy

We talk about Expressions, Paths and Predicates a lot, but what is their relation? You may guess there is some hierarchy involved and this one is a rich one, indeed. Let’s take a look at the snippet of it from the top:

Expression with few of the subtypes
Expression with few of the subtypes

Most of the things we work with in Querydsl are somehow subtypes of Expression interface. Picture shows just couple of examples – be it Constant used to wrap constants in expressions like age.add(4) (here internally), or Path representing properties of our entity classes (among other things), or Operation for results of combining other expressions with operators.

The list of direct sub-interfaces is longer, there are also three direct abstract sub-classes that are another “path” how to get from concrete expression implementation to the top of this hierarchy. In the picture above I used NumberOperation as an example. It implements Operation interface, but also follows a longer path that adds a lot of expected methods to the type, here from NumberExpression that has many more methods than listed in the picture.

Creators of Querydsl had to deal with both single class inheritance and general problems making taxonomy trees and sometimes we can question whether they got it right, for instance: Isn’t number constant a literal? Why doesn’t it extend LiteralExpression? Other times you expect some operation on a common supertype but you find it defined in two separate branches of a hierarchy. But these problems are expected in any complex hierarchy and designers must make some arbitrary decisions. It’s not really difficult to find the proper solutions for our Querydsl problems.2

Let’s focus on another example of expression hierarchy, BooleanPath – this time from the bottom to the top:

BooleanPath and its many supertypes
BooleanPath and its many supertypes

Here we followed the extends line all the way (any T is effectively Boolean). BooleanExpression implements expected operations (I omitted andAnyOf(Predicate...) and orAllOf(Predicate...) from the picture), any of those returning BooleanOperation which is also a subclass of BooleanExpression.

LiteralExpression is a common supertype for other types of expressions StringExpression (and StringPath and StringOperation), TemporalExpression with subtypes DateExpression, TimeExpression and DateTimeExpression (and their paths), and EnumExpression. All these subtypes having their Path and Operation implementations this part of the hierarchy seems to be pretty regular and logical. (There are also Template implementations, but I don’t cover those in the book.) LiteralExpression doesn’t add methods we would use that much, so let’s go one step higher.

ComparableExpression adds tons of operation methods: between, notBetween, goe (greater or equal), gt, loe and lt, the last four with ALL and ANY variants (e.g. goeAll). We will get to the missing eq and similar higher in the hierarchy.

Let’s go to up to the ComparableExpressionBase which adds asc and desc methods returning OrderSpecifier we use in the orderBy clause. It also contains coalesce method for SQL function COALESCE returning first non-null argument. As the picture indicates this is also the point where NumberExpression hierarchy starts (with NumberPath and NumberExpression) – it does not sit at the same place like string, boolean and date/time expressions. It also implements all the comparable functions (like gt) on its own which means that any infrastructure code for comparing expressions has to have two separate branches. Never mind, let’s go up again.

SimpleExpression adds eq and ne (with ALL and ANY variants), isNull, isNotNull, count, countDistinct, in, notIn, nullif and when for simple CASE cases with equals – for more complicated cases you want to utilize more universal CaseBuilder.

Finally DslExpression gives us support for expression aliases using as method. (These are different than aliases for joins we used until now.) Getting all the way up we now have all the necessary methods we can use on BooleanPath – and the same applies for StringPath or, although a bit sideways, for NumberPath and other expressions as well.

Constant expressions

When we start using expressions in a more advanced fashion (e.g. for the sake of some filtering framework) we soon get to the need to convert constant (literal) to Querydsl Expression. Consider a simple predicate like this:

1 QDog.dog.name.eq("Axiom")

Fluent API offers two options for the type of the parameter of eq (or similar operations). String is most natural here and API knows it should be string because QDog.name is of type StringPath – that is Path<String> which itself is Expression<String>. And Expression<? super String> (not that String can have subtypes) is the second option we can use as the type of the parameter for the eq method.

But when we try to use Expressions utilities to construct Operation we can only use expressions and we need to convert any literal to the expression of a respective type – like this:

1 Expressions.booleanOperation(Ops.EQ,
2   QDog.dog.name, Expressions.constant("Axiom"))

Equals being used quite often there is an alternative shortcut as well:

1 ExpressionUtils.eqConst(QDog.dog.name, "Axiom")

With this we can construct virtually any expression using Querydsl utility classes in a very flexible way.

Using FUNCTION template

JPA 2.1 allows us to call any function beyond the (quite limited) set of those supported by JPA specification – which is a nice extension point. Let’s see JPQL how to do that:

1 em.createQuery(
2   "select d.name, function('dayname', d.died) from Dog d ", Object[].class)
3   .getResultList()
4   .stream()
5   .map(Arrays::toString)
6   .forEach(System.out::println);

This may print (depending on the actual data):

1 [Lassie, Tuesday]
2 [Rex, null]
3 [Oracle, Wednesday]

Now how can we do the same with Querydsl? While it doesn’t provide this FUNCTION construct directly it is very flexible with its templates. We already met BooleanTemplate in one of our previous experiments, now let’s try to get this DAYNAME function working:

 1 QDog d = QDog.dog;
 2 List<Tuple> result = new JPAQuery<>(em)
 3   .from(d)
 4   .select(d.name, dayname(d.died))
 5   .fetch();
 6 System.out.println("result = " + result);
 7 // ...
 8 
 9 private static StringTemplate dayname(Expression<LocalDate> date) {
10   return Expressions.stringTemplate("FUNCTION('dayname', {0})", date);
11 }

Of course, Expressions.stringTemplate can be inlined, but we made it reusable this way. Parameters following the template string are either Object... vararg or List<?>. This is flexible enough but raises the question whether we want to be type-safe on the level of our method call or not. If I chose Object for date parameter (or even more universal Object...) I could freely call it with a java.util.Date literal as well (not LocalDate as H2 database does not support these directly).

Type of the template – which happens to be some subtype of TemplateExpression – can be anything. The most common types of templates are easy to create using specific methods on Expressions – like stringTemplate or dateTemplate – and then there is template (or simpleTemplate) where we can specify the type parameter as a first argument.

Using it in a query itself is then very easy as demonstrated above.

Update/delete clauses and exception handling

Very often we need some common exceptional handling for insert/update/delete clauses – especially for the cases of constraint violations. We achieve this with some common BaseDao (also known as layer supertype). It’s easy to have a single common persist method, but we often need various specific update/delete clauses (these are the parts that bypass the persistence context to a degree).

So instead of a block like this (which would have to be wrapped in a try/catch and the exception handling would be repeated over and over):

1 // delete by holidays by calendar ID
2 new JPADeleteClause(entityManager, $)
3   .where($.calendarId.eq(calendar.getId()))
4   .execute();

We use this with exception handling hidden inside the call:

1 execute(deleteFrom($)
2   .where($.calendarId.eq(calendar.getId())));

This requires us to introduce the following methods in our common BaseDao:

 1 protected final JPADeleteClause deleteFrom(EntityPath<?> path) {
 2   return new JPADeleteClause(entityManager, path);
 3 }
 4 
 5 protected long execute(DMLClause dmlClause) {
 6   try {
 7     return dmlClause.execute();
 8   } catch (PersistenceException e) {
 9     throw convertPersistenceException(e);
10   }
11 }

This is just an example and we don’t have to use common layer supertype, but we definitely want to put the common exception handling on a single place. I didn’t present JPAUpdateClause in this section but the need for common exception handling relates to it as well, of course.

JPAQueryFactory

Querydsl reference documentation, part about querying JPA states:

Personally I’ve been using JPAQuery directly for ages and never missed the factory, but there are some minor advantages:

  • JPAQueryFactory can be pre-created once per managed component with thread-safe persistence context (EntityManager that is). This also means that in non-managed context we perhaps don’t use it that much.
  • If we need to customize JPQLTemplates we don’t have to repeat it every time we create the query but only when we create the JPAQueryFactory.
  • Factory has also handy combo methods like selectFrom and other convenient shortcuts.

Other than this it doesn’t really matter. Looking at the implementation the factory is so lightweight that if you want to use something like selectFrom you may create it even if it’s short-lived. But you’re perfectly fine without it as well without missing anything crucial.

In the end we typically implement some “mini-framework” in our data layer superclass anyway. It may use the factory or not and still offer any convenient methods for query creation and more.

Detached queries

Typically we create the JPAQuery with EntityManager parameter, but sometimes it feels more natural to put queries into a constant a reuse them later – something resembling named queries facility of JPA. This is possible using so called “detached queries” – simply create a query without entity manager and later clone it with entity manager provided:

 1 private static QDog DOG_ALIAS = new QDog("d1");
 2 private static Param<String> DOG_NAME_PREFIX =
 3   new Param<String>(String.class);
 4 private static JPAQuery<Dog> DOG_QUERY = new JPAQuery<>()
 5   .select(DOG_ALIAS)
 6   .from(DOG_ALIAS)
 7   .where(DOG_ALIAS.name.startsWith(DOG_NAME_PREFIX));
 8 
 9 //... and somewhere in a method
10 List<Dog> dogs = DOG_QUERY.clone(em)
11   .set(DOG_NAME_PREFIX, "Re")
12   .fetch();

The benefit is questionable though. We need to name the query well enough so it expresses what it does while seeing the query sometimes says it better. We have to walk the distance to introduce parameter constants and use them explicitly after the clone. There is also probably hardly-if-any performance benefit because cloning and creating a query from the scratch are essentially the same activity.

Perhaps there are legitimate cases when to use this feature, but I personally used it only in query objects wrapping complicated query where I have a lot of aliases around already – and even then mostly without the where part which I rather add dynamically later. But the same can be achieved by a method call – preferably placed in a designated query object:

 1 public class DogQuery {
 2   private static QDog DOG_ALIAS = new QDog("d1");
 3 
 4   private final EntityManager em;
 5 
 6   public DogQuery(EntityManager em) {
 7     this.em = em;
 8   }
 9 
10   public JPAQuery<Dog> query() {
11     return new JPAQuery<>(em)
12       .select(DOG_ALIAS)
13       .from(DOG_ALIAS);
14   }
15 
16   public JPAQuery<Dog> nameStartsWith(String namePrefix) {
17     return query()
18       .where(DOG_ALIAS.name.startsWith(namePrefix));
19   }
20 }

First method query does not have to be exposed but often it comes handy, next one produces the query from the previous example. To use it we need EntityManager, just like we needed it in the clone example above.

1 List<Dog> dogs = new DogQuery(em)
2   .nameStartsWith("Re")
3   .fetch();

We can use builder pattern if we want to, many options open up when the query has a place to live.

Working with dates and java.time API

Because Querydsl 4.x still supports Java before version 8 it does not have native support for new date/time types. But with JPA we can use any other type that can capture date and time information, so technically Querydsl should be prepared for any class to represent date/time. This is achieved by DateTimeExpression being parametrized type which covers our needs as long as the class we use implements Comparable.

Then there are static methods on the DateTimeExpression representing SQL functions like current_date. These by default return expression parametrized to Date, but we can specific the requested date/time type using additional Class parameter:

 1 QDog d = QDog.dog;
 2 List<Dog> liveDogs = new JPAQuery<>(em)
 3   .select(d)
 4   .from(d)
 5 // client-side option - no problem there with generics
 6 //.where(d.birthdate.before(LocalDate.now()))
 7 //.where(d.died.after(LocalDate.now()).or(d.died.isNull()))
 8 
 9   // using function on the database side
10   .where(d.birthdate.before(DateExpression.currentDate(LocalDate.class))
11     .and(d.died.after(DateExpression.currentDate(LocalDate.class))
12       .or(d.died.isNull())))
13   .fetch();

Alas, even with Querydsl having us covered there can still be striking problems with the support of Java 8 types (or other converted types) in concrete JPA implementations. For instance, EclipseLink ignores date/time converters in coalesce.

To mix dates with timestamps we might need to use custom functions described earlier. In general JPA does not support much of temporal arithmetic.

Result transformation

We already know how to select an entity or a single attribute (column) or tuples of entities and/or attributes (columns). We also saw some groupBy examples previously. But very often we want to get a Map instead of List. For instance, instance of this:

1 List<Tuple> tuple = new JPAQuery<>(em)
2   .select(QDog.dog.name, QDog.dog.id.count())
3   .from(QDog.dog)
4   .groupBy(QDog.dog.name)
5   .fetch();
6 System.out.println("name and count (as list, not map) = " + tuple);

We would prefer this:

1 Map<String, Long> countByName = new JPAQuery<>(em)
2   .from(QDog.dog)
3   .groupBy(QDog.dog.name)
4   .transform(groupBy(QDog.dog.name).as(QDog.dog.id.count()));
5 System.out.println("countByName = " + countByName);

Couple of things are clear from the code:

  • We don’t need select as it is implied by transform arguments and called internally.
  • There is also no need to call explicit fetchtransform is a terminal operation and calls it internally.

Method groupBy is statically imported from provided GroupBy class. You can return list of multiple results in as part and you can use aggregation functions like sum, avg, etc.

Transform itself does not affect generated query, it only post-process its results. It is in no way tied to groupBy call, although they often come together. Very often it is used to get a map of something unique to the values. We can, for instance, map IDs to their entities (Map of entities with ID as a key) or IDs to a particular attribute:

1 Map<Integer, Dog> breedsById = new JPAQuery<>(em)
2   .from(QDog.dog)
3   .transform(groupBy(QDog.dog.id).as(QDog.dog));
4 System.out.println("breedsById = " + breedsById);

I personally miss simple list and map terminal operations from Querydsl 3 which leads us to a final part of this chapter.

Note about Querydsl 3 versus 4

I’ve used major versions 2 and 3 in my projects and started to use version 4 only when I started to write this book. After initial investigation I realized that I’ll hardly switch from version 3 to version 4 in any reasonably sized project easily. Other thing is whether I even want. I don’t want to let version 4 down, it does a lot to get the DSL closer to SQL semantics – but that’s the question: Is it really necessary?

Let’s compare a query from version 4 with the same query from version 3 – let’s start with 4:

Querydsl version 4 query
1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .where(QDog.dog.name.like("Re%"))
5   .fetch();

Next example shows the same in Querydsl version 3:

Querydsl version 3 query
1 List<Dog> dogs = new JPAQuery(em)
2   .from(QDog.dog)
3   .where(QDog.dog.name.like("Re%"))
4   .list(QDog.dog);

Personally I like the latter more even though the first one is more SQL-like notation. Version 3 is one line shorter – that purely technical fetch() call is pure noise. Further that fetch() was used in version 3 to declare fetching of the joined entity, in version 4 you have to use fetchAll() for that. This means that fetch*() methods are not part of one family – that’s far from ideal from API/DSL point of view.

Convenient map is gone

In Querydsl 3 you could also use handy map(key, value) method instead of list – and it returned Map exactly as you’d expect:

Querydsl version 3 query returning map
1 Map<Integer, Dog> dogsById = new JPAQuery(em)
2   .from(QDog.dog)
3   .map(QDog.dog.id, QDog.dog);

In Querydsl 4 you are left with something more abstract, more elusive and definitely more talkative:

Querydsl version 4 query returning map
1 Map<Integer, Dog> breedsById = new JPAQuery<>(em)
2   .from(QDog.dog)
3   .transform(groupBy(QDog.dog.id).as(QDog.dog));

Solution with transform is available in Querydsl 3 as well and covers also cases map does not (see previous section on result transformation). You don’t need select here because transform specifies what to return and you don’t need fetch either as it works as a terminal operation – just like list in version 3. So in order to get a list you have to use different flow of the fluent API than for a map.

Placement of select

As for the fluency, most programmers in our team agreed to prefer finishing with the list(expressions...) call as the last call clearly says what gets returned. With SQL-like approach you do this first, then add various JOINs – but this all goes against typical Java-like programming mindset. For me personally version 3 hit the sweet spot perfectly – it gave me very SQL-like style of queries to a degree I needed and the terminal operation (e.g. list) perfectly expressed what I want to return without any superfluous dangling fetch.

We can also question what changes more. Sometimes we change the list of expressions, which means we have to construct the whole query again and the most moving part – the list of resulting expressions – goes right at the start. I cannot have my query template with FROM and JOINs ready anymore. All I had to do before was to clone it (so we don’t change the template), add where parts based on some filter and declare what expressions we want as results, based on well-known available aliases, columns, etc.

Sure you have to have all the JOIN aliases thought out before anyway, so it’s not such a big deal to create all the queries dynamically and add JOINs after you “use them” in the SELECT part, because the aliases are the common ground and probably available as some constants. But there is another good scenario for pre-created template query – you can inspect its metadata and do some preparation based on this.

We used this for our filter framework where the guys from UI know exactly what kind of aliases we offer, because we crate a Map<String, SimpleExpression<?>> to get to the paths representing the alias by its name very quickly. We can still do this with Querydsl 4. We create one query selecting the entity used in the FROM clause (makes sense) and extract the map of aliases on this one, discarding this “probe query” afterwards. Not a big deal, but still supports the idea that after the WHERE clause it is the SELECT part that is most flexible and using it right at the start of the “sentence” may sound natural, but not programmatically right.

Rather accidentally we found out that in Querydsl 4 we actually can move select to the end and finish with select(...).fetch(). The fetch here looks even more superfluous than before, but the question is whether we want to use both styles in one project. With Querydsl 3 this question was never raised.

Subqueries

In Querydsl 3 we used JPASubQuery for subqueries – in Querydsl 4 we use ordinary JPAQuery, we just don’t provide it with entity manager and we don’t use terminal operations like fetch that would trigger it immediately. This change is not substantial from developer’s perspective, either of this works without problems.

IV Final thoughts

So here we are after the ride through highs and lows of JPA with some bits of common sense advices. Perhaps the story wasn’t that good because – at least for me – it culminated quite soon around the troubles with to-one relationships, not to mention that my proposed solution is not pure JPA.

JPA/ORM – yes or no?

Would I use JPA or ORM again? Voluntarily? Yes – especially for simple cases. I’d use it even for bigger projects, of course, but I’d tune it down, not using all its features, especially avoiding some mapping problems.

This is a paradox – and a problem too. Firstly, ORM ideas were developed mostly to help us with rich domain models. Using it for simple systems and domains you pay the cost of a technology intended for complex domains. But from my experience ORM gets unwieldy for larger domains – especially if we use more mapping features. All the programmers must understand those, unless we make the data layer fixed and isolated, but then some developers can’t develop features top-to-bottom.

I can’t speak much about how DDD helps with ORM problems. I’ve recently saw a presentation by Eric Evans, the DDD guy. He admitted that proper DDD may be easier with systems where context boundaries are not only logical, but also physical – like microservices. I agree, because implicit is never as solid as explicit, as I often stressed in this book. I saw this over and over again in many software problems.

On the other hand, we must also agree that many implicit and automagic solutions make our live easier. But they must be supported by knowledge. When people maintaining the system change and new ones don’t understand those implicit parts all the benefits are gone.

If we take the ORM shortcut for simple case it may still happen that the simple problem gets more complex, application grows a lot and we will suddenly have to live with a lot of ORM investment with a very difficult retreat path (business will not be able to pay for it, for sure). So if I know that I would rather use some other solution instead, I’d think couple of times about the possibility of this small project/product becoming big.

There is another way how to treat JPA and ORM – especially for rich domains. Understand it fully – really, really fully – tune the caches, use reliable lazy where appropriate and reap the benefits. Personally, I doubt this is a good solution (or realistic for that matter) but it can be OK for command part of CQRS application. ORM simply falls short in area of queries beyond the most primitive levels of sophistication.

JPA or concrete ORM provider?

Do I prefer JPA or concrete ORM provider? For many years the answer was pretty clear – I always strove to stay within JPA realm. Accidentally, though, I went down the path with EclipseLink and its capability to JOIN with entity roots1 – which I believe would be an incredible addition for the JPA 2.2. The more I work with JPA the less I believe in easy transition from one provider to another – except for simple cases. The more complex queries the more I’d try to find some alternative. Maybe the first step would be staying with ORM and utilizing the full power of a single provider – whatever the provider is. That means checking the bugs on their issue tracking system when you encounter one, reporting new ones and so on.

Sticking with EclipseLink for a while – and diving deep – leaves you enough time to encounter bugs, some of them quite painful. I decided to collect all the bugs we found (not necessarily being the first ones to report it) and you can check them in the appendix Bugs discovered while writing this book. Would I find more Hibernate bugs had I worked with it? I’m pretty sure I would as the few Hibernate bugs found there were found during our attempt to switch JPA providers quite early on the project (before we left the JPA path with root entities after JOINs) or during experiments with JPA 2.1 features for the sake of this book. That cannot be compared to what EclipseLink went through on our project.

What bothers me are some very serious bugs that are completely unattended by project maintainers, often not even answered. You create a test case for concurrency problem with CASE (read “don’t use CASE at all if the query may be executed in parallel”), issue tracker tells you it sent emails to half a dozen people – and no response at all. To be honest I lost my faith in EclipseLink quality quite a lot. Before Hibernate 5.1 we were stuck with EclipseLink but even now with ad hoc join now generally available it’s still mainly about trading one set of bugs for other (less known) set of bugs. Or perhaps we can try some other JPA provider – like DataNucleus that also allows JOIN to another root entity.

This all means you have to know your provider one way or the other. I always try to use as much of JPA capabilities (mapping annotations, JPQL) but when the time comes I go beyond. It’s very easy with a feature like ad hoc joins where all the providers understand its usefulness (although I doubt they did it to promote my raw FK mapping style). With something specific it’s up to you. Will you ever switch? This depends. If you’re pure JPA using whatever some application server provides you have to be super compliant and super compatible – avoiding bugs included. That sounds rather unlikely for me.

JPA, Querydsl and beyond

So, will I use ORM on bigger systems again? Probably – as I don’t design every system I work on. Would I try something else if I could? Definitely. I have much more faith in Querydsl itself, for instance. You can use it over SQL directly, bypassing the JPA altogether. Main Querydsl contributor, Timo Westkämper, answers virtually any question anywhere2 (groups, issues, StackOverflow)… the only risk here is it seems to be predominantly one-man show even though it is backed by a company. You can also check jOOQ, though here you can’t avoid commercial licences if you want to use it with commercial databases. Both these projects – and their popularity – show that the JPA is far from being the last word in Java persistence landscape and also not the only way how to do it.

In any case, I always put Querydsl as another layer over JPA. I recently tried JPA 2.0 project without it and the experience was terrible if you had got used to it before. Also JPA 2.0 version was nothing to be happy about and missing ad hoc joins also proved to be incredible pain as some tables were already mapped with raw FKs and without ad hoc join you’re out of luck – you have to remap it properly with all the consequences.

JPA 2.2 coming

JPA itself is now progressing slowly. JPA 2.1 was a massive improvement over JPA 2.0, but that was already 4 years ago. JPA 2.2 coming with Java EE 8 will be merely a maintenance release. Even so it should bring couple of good things:

  • Support for Java 8 Date and Time types.
  • Ability to stream a query result.

Now the first one is sweet – especially when EclipseLink currently has a bug when COALESCE is used on converted types (not that they should not fix it, of course).

The second one is much more interesting but I don’t know what to expect. Will it be true streaming of results from a cursor? Will the entities be created lazily as we consume them? Will it allow us to avoid OutOfMemoryError for longer queries (e.g. hundreds of thousands lines exported to CSV)? Or is it just stream() called on the list returned from getResultList()? I hope for the first as that would be a massive improvement for some less typical cases.

This book

Finally, about this book. It was much bigger story for me in the end, especially as I discovered I’d been working outside of JPA realm for more than a year. That obviously totally ruined all my plans, my wannabe climax was suddenly dull and I wasn’t sure whether to go on at all. But I also learnt a lot about the JPA (and I really had thought I’was quite solid already!) and about the two most popular providers. I hope it was helpful also for you, even though I might have raised more questions than provided answers. At least I tried to keep it reasonably short.

I’m considering another edition discussing other typical problems (why to map many-to-many association entity explicitly, why to avoid Open session in view, etc.) but right now I need a break. :-) With next version after JPA 2.2 probably far away I have plenty of time for it.

Appendices

H2 Database

H2 (project page, or on Wikipedia) is a SQL database implemented in Java with couple of features that are very handy especially for testing:

  • Firstly, H2 is all packed in a single JAR which weights way under 2 MB and contains features you would not expect in such a cute packaging.
  • It can store its tables both in-memory and on the disk. Using in-memory storage is great for tests, because it’s fast (for reasonable database size) and also naturally disappears after the test is run.
  • It supports both embedded mode – running in-process with the application itself – and client-server mode where it can run standalone and you can connect to it from different process. For automated tests I prefer embedded mode, but when I experiment I use client-server mode, so I can check the state of my database independently from my demo application.
  • It contains various tools – most prominent of which is web-based SQL console. This is what you can use to check the database in server mode while it’s being modified by an application.
  • Naturally, it contains JDBC driver.
  • SQL support is rich enough to mimic much more evolved/expensive databases, I used it to test JPA applications that were originally developed for Oracle or Microsoft SQL Server – no changes besides the JPA configuration were required.
  • While it does not support any scripting (like Oracle’s PL/SQL or PostgreSQL PL/pgSQL) you can plug triggers and procedures developed in Java. Maybe I’d not use it if the database contained a lot of code, but when you need to handle couple of simple procedures and some triggers, it can be done.

Starting it

You can download it as a Windows Installer or multi-platform ZIP, but all you really need is the JAR file. If you manage your application’s dependencies you probably have it somewhere on the disk already. So just locate it and run it. If everything is set up properly, just pressing Enter on the JAR in some file manager will do. If not, try the following command:

$ java -jar $HOME/.m2/repository/com/h2database/h2/1.4.190/h2-1.4.190.jar

Of course, adjust the JAR location and name, and if you don’t have java on your PATH (yet!) do something about it. When everything clicks your default browser will open and you can log into your default database (user sa, empty password). You can learn more in this Quickstart. I probably forgot to mention that it is well documented too!

Using it as a devel/test DB

I have a long relationship with H2 and I can only recommend it. In 100% of projects we used it as a developer database – if for nothing else than for running fast commit tests at least – everything was better and smoother. There is some overhead involved when we had to manage alternative schema and test data for H2 as well, but a lot of it can be automated (Groovy is my language of choice for this). But this overhead was always offset by the benefits for testing and the fact that we looked at our schema through the perspective of two different RDBMS (production one and H2) actually often helped too.

Project example

Setting up project with JPA, Querydsl and H2 as a database may seem daunting task for a newcomer, but it’s not really that difficult. I will show both Maven 3.x and Gradle 4.x projects using Java 8.

Querydsl and Maven

Maven is supported out-of-the-box with a plugin as documented here.

First we need to declare our dependencies (provided scope for APT JAR is enough).

Maven: Querydsl dependencies
 1 <dependency>
 2   <groupId>com.querydsl</groupId>
 3   <artifactId>querydsl-apt</artifactId>
 4   <version>${querydsl.version}</version>
 5   <scope>provided</scope>
 6 </dependency>
 7 
 8 <dependency>
 9   <groupId>com.querydsl</groupId>
10   <artifactId>querydsl-jpa</artifactId>
11   <version>${querydsl.version}</version>
12 </dependency>
13 
14 <dependency>
15   <groupId>org.slf4j</groupId>
16   <artifactId>slf4j-log4j12</artifactId>
17   <version>1.6.1</version>
18 </dependency>

Next we need to configure the Maven APT plugin.

Maven: Querydsl generator plugin
 1 <project>
 2   <build>
 3   <plugins>
 4     ...
 5     <plugin>
 6       <groupId>com.mysema.maven</groupId>
 7       <artifactId>apt-maven-plugin</artifactId>
 8       <version>1.1.3</version>
 9       <executions>
10         <execution>
11           <goals>
12             <goal>process</goal>
13           </goals>
14           <configuration>
15             <outputDirectory>target/generated-sources/java</outputDirectory>
16             <processor>com.querydsl.apt.jpa.JPAAnnotationProcessor</processor>
17           </configuration>
18         </execution>
19       </executions>
20     </plugin>
21     ...
22   </plugins>
23   </build>
24 </project>

This assures that the Q-classes will be generated during the build. There is also alternative processor to generate these from Hibernate annotations. Most example projects for this book have pom.xml files like this.

Unrelated to Querydsl is the plugin configuration for generating JPA metadata. Here you can use various providers to do so. In querydsl-basic/pom.xml I use EclipseLink’s annotation processor.

IDE support

As per instructions, for Eclipse to recognize the generated classes one should run:

1 mvn eclipse:eclipse

I have no idea how the regeneration of Q-classes works as I don’t use Eclipse.

In IntelliJ IDEA the generated classes are recognized when the Maven project is imported. To generate the classes after an entity source was changed we need to set up annotation processor in Settings > Build, Execution, Deployment > Compiler > Annotation Processors.

Querydsl and Gradle

Next I will show an example of Gradle project. I didn’t use any particular plugin for Querydsl although there are some out there, but the situation around Gradle and its plugins is much more volatile hence I decided to do it without a plugin. If you’re familiar with Gradle it should be easy to modify example scripts.

I experimented with various scripts in querydsl-basic project that – as mentioned above with Maven – generates also JPA metadata. Probably most typical Gradle script would be build-traditional.gradle – let’s go over it bit by bit:

 1 plugins {
 2   id 'java'
 3 }
 4 
 5 repositories {
 6   jcenter()
 7 }
 8 
 9 configurations {
10   querydslApt
11   jpaMetamodelApt
12 }

So far nothing special, java plugin will be sufficient, we just need additional configuration for each generate task (querydslApt and jpaMetamodelApt).

1 sourceCompatibility = 1.8
2 targetCompatibility = 1.8
3 
4 tasks.withType(JavaCompile) {
5   options.encoding = 'UTF-8'
6 }

Next we just configure Java version and encoding for compilation (this should always be explicit). Next we’ll set up dependencies:

 1 ext {
 2   querydslVersion = '4.1.4'
 3   hibernateVersion = '5.2.2.Final'
 4   eclipseLinkVersion = '2.6.2'
 5   h2Version = '1.4.190'
 6   logbackVersion = '1.2.3'
 7   testNgVersion = '6.11'
 8 }
 9 
10 dependencies {
11   compileOnly 'javax:javaee-api:7.0'
12   compile "com.querydsl:querydsl-jpa:$querydslVersion"
13   compile "org.hibernate:hibernate-entitymanager:$hibernateVersion"
14   compile "org.eclipse.persistence:org.eclipse.persistence.jpa:$eclipseLinkVersion"
15   compile "com.h2database:h2:$h2Version"
16   compile "ch.qos.logback:logback-classic:$logbackVersion"
17 
18   testCompile "org.testng:testng:$testNgVersion"
19 
20   querydslApt "com.querydsl:querydsl-apt:$querydslVersion"
21   jpaMetamodelApt "org.eclipse.persistence:" +
22     "org.eclipse.persistence.jpa.modelgen.processor:$eclipseLinkVersion"
23 }

Dependencies needed only for generator tasks are in their respective configurations – we don’t need these to run our programs after build.

 1 sourceSets {
 2   generated {
 3     java {
 4       srcDirs = ["$buildDir/generated-src"]
 5     }
 6   }
 7   test {
 8     // This is required for tests to "see" generated classes as well
 9     runtimeClasspath += generated.output
10   }
11 }

Here we defined new source set for generated classes. I’m not sure what’s the best practice, whether different set for each generator would be better or not and why, so I’ll leave it like this. Next the generator tasks – they are both very similar and definitely could be refactored and put into plugins, but that’s beyond the scope of this book and my current experience:

 1 task generateQuerydsl(type: JavaCompile, group: 'build',
 2   description: 'Generates the QueryDSL query types')
 3 {
 4   source = sourceSets.main.java
 5   classpath = configurations.compile + configurations.querydslApt
 6   options.compilerArgs = [
 7     '-proc:only',
 8     '-processor', 'com.querydsl.apt.jpa.JPAAnnotationProcessor'
 9   ]
10   destinationDir = sourceSets.generated.java.srcDirs.iterator().next()
11 }
12 
13 task generateJpaMetamodel(type: JavaCompile, group: 'build',
14   description: 'Generates metamodel for JPA Criteria (not QueryDSL)')
15 {
16   source = sourceSets.main.java
17   classpath = configurations.compile + configurations.jpaMetamodelApt
18   options.compilerArgs = [
19     '-proc:only',
20     '-processor',
21     'org.eclipse.persistence.internal.jpa.modelgen.CanonicalModelProcessor',
22     '-Aeclipselink.persistencexml=src/main/resources/META-INF/persistence.xml',
23     '-Aeclipselink.persistenceunits=demo-el'
24   ]
25   destinationDir = sourceSets.generated.java.srcDirs.iterator().next()
26 }

Finally we want these tasks to interplay nicely with the rest of the build:

 1 compileJava {
 2   dependsOn generateQuerydsl
 3   dependsOn generateJpaMetamodel
 4   source generateQuerydsl.destinationDir
 5   source generateJpaMetamodel.destinationDir
 6 }
 7 
 8 compileGeneratedJava {
 9   dependsOn generateQuerydsl
10   dependsOn generateJpaMetamodel
11   options.warnings = false
12   classpath += sourceSets.main.runtimeClasspath
13 }
14 
15 test {
16   useTestNG()
17 }

That’s it. This all works, but I encountered a minor problem with Hibernate if we rely on entity automatic discovery. This happens only within a single JAR in which the persistence.xml was found. However, Gradle builds projects to a different structure than Maven and keeps resources and classes as separate directories – where a directory as a classpath entry effectively means separate JAR. So if you download the project go to manuscript/examples/querydsl-basic and run:

1 gradle -b build-traditional.gradle clean build

You will see a failure of DemoTest (very simple TestNG-based test class) for Hibernate test. This can be fixed by listing the classes in persistence.xml – which would actually be compliant with how JPA works in Java SE environment – or you need somehow make build/resources and build/classes be one.

This actually is not that complicated and that’s what gradle.build is like in the end (including a long explanation comment in it). All we need to do is add three lines into sources sets:

1 sourceSets {
2   // these three lines is the only difference against querydsl-traditional.build
3   main {
4     output.resourcesDir = "$buildDir/classes/java/main"
5   }
6   //... the rest is the same

This is a very subtle change, but very effective and I didn’t notice any other negative side effects. I can’t comment on the effect of this change on more complicated big projects though. I further tried to make even more drastic change and declare generated source directory as part of Java main source set:

1 sourceSets {
2   main {
3     // This is actually not recommended, read on...
4     java.srcDir file(generatedSrc)

This also works, sort of, but the build is not able to detect changes properly and always runs the generation, even when everything is up-to-date. You can try this build, it’s in build-extreme.gradle, but it seems to be obviously not well behaved Gradle build.

IntelliJ IDEA support

Gradle support in IntelliJ IDEA is obviously constantly getting better and depending on the used version of the tools (I used Gradle 4.0, 4.1 and 4.2 on IDEA 2017) it’s not always easy to figure out how well it should work. Recently I saw a breathtaking presentation of Kotlin DSL support in Gradle (build file named gradle.build.kts) but I simply wasn’t able to reproduce it on my projects build from scratch. The same I remember for reasonable support for Gradle DSL years ago, but this seems to be much better now.

We can either import an existing sources as new IDEA module from external Gradle model or create new Gradle module. In either case after Gradle project refresh IDEA understands the project model quite well. There is a Settings option for Gradle called “Create separate module per source set” which is enabled by default but I personally like it disabled to get just one IDEA module per demo (like with Maven).

There is also interesting settings in Gradle/Runner called “Delegate IDE build/run actions to gradle” and this one I actually prefer. This completely bypasses IDEA’s internal build and calls Gradle build for any build/compilation and to run application or tests as well.

As we discussed the hiccup with Hibernate’s scan for entities this affects IDEA’s internal build mechanism even with the right Gradle build. While with Maven IDEA builds into its target directory with Gradle IDEA uses its own out directory because reportedly Gradle’s and IDEA’s way of building are way different and would not work if they stepped on each other’s toes. This is exactly what that “Delegate…” setting option fixes. Builds are pretty fast (it’s Gradle after all!) and it feels good when you know that you use the same build in IDE and on your CI as well.

Bugs discovered while writing this book

To demonstrate the dark side of JPA, I’ll point to a couple of bugs I discovered while writing this book (or those that lead to it). With a single ORM you have no choice to switch – with many you always trade some bugs (most of the relevant ones known to you already) for new bugs. If you have good test coverage, you may be able to make a qualified decision. Otherwise you’ll find the bugs later and randomly (or your users will).

Some bugs are closed already, I added them to show it makes sense to report them. Other bugs, on the other hand, give you the feeling it doesn’t make sense to report anything – they are open for years and nothing is happening. ORM and JPA are not different from any other projects and with open source implementation you’ll get some transparency at least. Not all bugs are reported by me originally, but if I participated, commented and created a test case they are included. In any case these are all bugs that were encountered on a single Java project.

  • JPA 2.1 FUNCTION not supported? (Hibernate) I reported this bug after trying this JPA 2.1 feature successfully on EclipseLink. Hibernate provides custom functions in HQL/JPQL, but this requires additional configuration. Whether this is against the standard JPA 2.1 is questionable (section 4.6.17.3 on custom functions does not say “no additional configuration must be required” after all) but Hibernate’s grammar has also a bug and does not support function calls without arguments.
  • Wrong SQL generated for delete with where crossing tables (Hibernate) Open in Apr 2015, but older linked bug was open in May 2012. Works on EclipseLink.
  • Bad SQL result for method countDistinct with join from criteria API This got reported for EclipseLink, however, it is not a JPA 2.1 bug as the specification explicitly says: “The use of DISTINCT with COUNT is not supported for arguments of embeddable types or map entry types.” It is just so unexpected it got reported by someone. This is a big blow for many reasons. You either cannot use embeddables, or you cannot count such cases in the query. This means you actually have to list it (you can use DISTINCT then) and then check the size of the result. This may not be possible for long lists. Finally, while in SQL you can go around this using SELECT in FROM clause, JPA does not allow this (yet). By the way, Hibernate supports this (I mean distinct in count, not select in from clause), even against explicit “not supported”. It seems to be a violation of JPA specification (although handy) and it may become a problem when migrating to another provider. (Question is what “not supported” means exactly in terms of words like MUST or MUST NOT used in RFCs. I understand leaving the feature in your ORM, maybe some compatibility switch/property could control the behaviour.)
  • Stream API gets no results (EclipseLink) This one is fixed already. It caused that streams based on lazy lists were empty. It was easy to work around it – just stream the copy of the list – but it was annoying and unexpected. Reason for the bug was the fact that IndirectList (lazy list implementation) extended from Vector, (probably some legacy reasons, JPA does not explicitly mentions Vector as specially supported collection), but ignored invariants of the superclass and its protected fields. This shows two things – first, when you extend something you should follow through completely with its contract, and second, Vector was badly designed back in 1995 or so. Oh, and third, prefer delegation instead of extension, of course. That way you can often avoid both previous pitfalls.
  • Abstract MappedSuperclass in separate JAR is not weaved statically (EclipseLink) This relates to modularity and may be fixed in version 2.7.
  • JPQL Case operator is not thread-safe (EclipseLink) CASE in queries cause randomly either ArrayIndexOutOfBoundsException in EclipseLink stack or SQLException because the CASE clause is serialized wrongly in the SQL. As a concurrency problem it shows very randomly, more likely for bigger/longer queries. No response from EclipseLink team so far, so we just synchronize around queries where we absolutely need to use CASE.
  • H2 dialect generates FOR UPDATE before LIMIT/OFFSET (EclipseLink) This one is about particular database vendor SQL support. In H2 you need to say FOR UPDATE after LIMIT/OFFSET combo, but EclipseLink does not respect this syntactic detail. Fix is currently available only as a patch, you can implement your own custom H2Platform (very small class) and use that one.
  • Deleting in a ManyToManyRelation with a JoinTable with referencedColumn != name in JoinColumn is not working (EclipseLink) This one is rather tricky and we hit it when we migrated one table from composite PK to a single one. First we just introduced new IDENTITY column, we marked it as @Id in entity and everything worked fine… except for a @ManyToMany collection that was mapped using the two @JoinColumns from the original id (and in DB it was still an ID). This collection did not delete the entries from association table (not mapped as an entity). This actually did not relate to the number of columns, merely to the fact that they were not @Id columns anymore. JPA specification does not mention any exception like this.
  • ON for LEFT JOIN across association tables not generated in SQL (EclipseLink) When using additional ON conditions for LEFT JOIN these don’t get generated into SQL if the LEFT JOIN follows @ManyToMany or @OneToMany with association table. Works fine if the collection is mapped by a single FK on the entity contained in the collection. Workaround is to explicitly map the association table or using different way to get to the result – possibly more queries, because LEFT JOIN is difficult to rewrite without losing some results. Should work according to specification and works fine with Hibernate.
  • Coalesce ignores converters (EclipseLink) When you want to coalesce value of converted column like LocalDate to some default value (e.g. LocalDate.now()) the query fails on JDBC driver level (JTDS), or later when it encounters a row where it needs to apply the default (in H2), or possibly succeeds if no such row is found – which is probably even more sneaky. In any case, value is passed into JDBC without conversion and is treated as raw object, not as a date type. Funny enough, if we try to call it with java.sql.Date instead, EclipseLink complaints right away that the types in coalesce are not compatible. Works fine with Hibernate.

List of acronyms

  • API – application programming interface
  • APT – Java Annotation Processing Tool (can be used to generate sources from metadata)
  • BLOB – binary large object
  • CLOB – character large object
  • CPU – central processing unit (processor)
  • CQRS – command query responsibility segregation (pattern)
  • CRUD – create, read, update and delete
  • DB – database
  • DBA – database administrator
  • DCI – data, context and interaction (pattern)
  • DDD – domain-driven design
  • DDL – data definition language or data description language (e.g. CREATE statements)
  • DML – data manipulation language (e.g. SELECT and UPDATE statements)
  • DRY – don’t repeat yourself (principle)
  • DSL – domain-specific language
  • DTO – data transfer object
  • EE – enterprise edition (typically meaning Java EE or older J2EE)
  • EJB – Enterprise JavaBean
  • EM – entity manager (or class EntityManager)
  • EMF – entity manager factory (or class EntityManagerFactory)
  • E-R – entity-relationship (model)
  • FK – foreign key
  • GC – garbage collection (or collector, from context)
  • GORM – Grails Object Relational Mapper
  • HTTP – Hypertext Transfer Protocol
  • ID – identifier
  • IDE – integrated development environment
  • JAR – Java Archive
  • JDBC – Java Database Connectivity
  • JPA – Java Persistence API
  • JPQL – Java Persistence Query Language
  • JSR – Java Specification Requests
  • JTA – Java Transaction API
  • JVM – Java virtual machine
  • LOB – large object
  • OO – object-oriented
  • OOP – object-oriented programming
  • ORM (also O/RM, or O/R mapping) – object-relational mapping
  • OSGi – Open Service Gateway Initiative
  • OSIV – Open Session in View (antipattern)
  • OSS – open-source software
  • OWASP – Open Web Application Security Project
  • PK – primary key
  • POM – Project Object Model (in Maven)
  • QL – query language (in general)
  • RDBMS – relational database management system
  • SpEL – Spring Expression Language
  • SQL – Structured Query Language
  • SRP – single responsibility principle
  • SVN – Subversion
  • UI – user interface
  • URL – Uniform Resource Locator
  • WAR – Web application ARchive
  • XML – Extensible Markup Language

Resources

Books

[DDD] Eric Evans. Domain-Driven Design, Addison-Wesley Professional, 2003.

[HPJP] Vlad Mihalcea. High Performance Java Persistence, Leanpub, 2017. Book site

[JAA] Kirk Knoernschild. Java Application Architecture: Modularity Patterns with Examples Using OSGi, Prentice Hall, 2012.

[JP] Charlie Hunt, Binu John. Java Performance, Addison-Wesley, 2012.

[JPDG] Scott Oaks. Java Performance: The Definitive Guide, O’Reilly, 2014.

[JPspec] Java Persistence 2.1 Expert Group. JSR 338: JavaTM Persistence API, Version 2.1, Oracle, 2013. Download

[PoEAA] Martin Fowler. Patterns of Enterprise Application Architecture, Addison-Wesley, 2002.

[ProJPA2] Mike Keith, Merrick Schincariol. Pro JPA 2, Apress Media, LLC, 2013.

[SData] Mark Pollack, Oliver Gierke, Thomas Risberg, Jon Brisbin, and Michael Hunger. Spring Data, O’Reilly Media, Inc., 2011.

[SQLPE] Markus Winand. SQL Performance Explained, 2012. Book site

Online articles, blogs

Other online resources

  • Querydsl: http://www.querydsl.com/
  • DataNucleus: http://www.datanucleus.org/
  • jOOQ: http://www.jooq.org
  • H2 database: http://h2database.com
  • Groovy, Working with a relational database

Notes

Preface

1However, Pragmatic Programmer (1999) was already out there. After many years of being aware this book exists I’ve decided to buy it and read it. It is beyond any recommendation and tip 25 How to Balance Resources is tackling exactly this. Personally I’d say this book is more essential than Clean Code but both should be read in any case.

JPA Good Times, Bad Times

1My favourite is JPspec, section 4.8.5, mentioning that “use of DISTINCT with COUNT is not supported for arguments of embeddable types or map entry types”. In general, using embeddables may limit our options, which is a pity and may force design decisions not to use them where we otherwise would.

2One especially big pain is no support for date arithmetic like GETDATE() + 1.

3For paginating with fetch across to-many relations see the chapter Avoid N+1 select, concretely this section.

4You can see this demonstrated in examples/querydsl-basic if you run: mvn test-compile exec:java -Dexec.mainClass="tests.GeneratedIdSettingDemo"

5Run the code from this GitHub example and see the console output.

6We will, however, prefer more expressive Querydsl that generates JPQL.

7See [JPspec], section 3.10.8.

8In many popular databases that is. Standard SQL pagination is quite young and I doubt database vendors agreed to implement it.

9We can check video with this and more JPA Gotchas. Note that many are related to the use of Hibernate and not necessarily general for other JPA providers.

10You can try it yourself when you watch the console output while debugging and stepping through the class tests.PersistAndSetDemo from querydsl-basic module.

11Second-level cache is most popular term, used also in [JPspec]. It appears in [PJPA2] too, but in-memory cache is used more often there.

12Tested with Hibernate and EclipseLink.

13Books on Java performance often mention that memory versus CPU is not such a clear cut in JVM, as memory may affect garbage collector in such a way that we actually trade CPU for memory and CPU. Another reason to be really aware of what we do – we should not rely on automagic and we should always measure the impact as well.

14They may think programmers learn themselves, many of them do. Sometimes I think though, that the managers expecting self-learning most are those who don’t learn anything themselves.

15In a different post, domain objects were “enriched” with all kinds of serialization needed, effectively binding possibly all domain objects to possibly all technologies you need. I believe OOP has been there already, maybe not before I was born, but definitely long before I started programming. Author later used various decorators to keep classes focused and small, but this added complexity (software is all about trade-offs).

16While class diagram does not say exactly the same like E-R diagram I used it successfully to communicate design of tables to my DBA who had absolutely no problem to understand it, even for cases where many-to-many associative table was implied by a named line. E-R diagram was much easier to generate ex-post for documentation purposes.

Opinionated JPA

1This was late March 2016. What I didn’t check is the newest Hibernate version then. Hibernate 5.1 release notes from Feb 10, 2016 brought the news about Entity joins (or ad hoc joins) (the first item). I would find out four months later. Of course it still is not JPA 2.1 compliant, but with two major ORMs backing this idea it becomes definitely much more practical.

2Hibernate sometimes does it out of the box and sometimes does not do it even when the relationship is marked with its proprietary annotation as @Fetch(FetchMode.JOIN). EclipseLink does not do it by default, but can be persuaded with its own annotation @JoinFetch(JoinFetchType.OUTER).

3View from the perspective of a service layer, it could be model in MVC at the presentation layer. We can also talk about client of the service layer but client may be just as confusing out of context.

4Example source code: https://github.com/virgo47/opinionatedjpawithquerydsl/blob/master/manuscript/examples/querydsl-basic/src/test/java/tests/Pagination.java

5There is also Hibernate specific fetchAll() that loads lazy single-valued properties, but this is beyond JPA specification.

6http://www.querydsl.com/static/querydsl/4.1.4/reference/html_single/#result_handling

7See [HPJP], introduction of the chapter Fetching in part JPA and Hibernate.

8The whole Java EE 8 is kind of dormant as of 2016.

9This chapter is based on my blog post: https://virgo47.wordpress.com/2015/05/05/jpa-modularity-denied

10Trends in software are not necessarily always progressive, we all know plenty of useless buzzwords and sometimes what is en vogue are not the best ideas available.

11Amazon reviews of this book mostly settle on 4 stars but there are many lower scores for various reasons. Some say it doesn’t cover OSGi enough, some that it covers it too much. I think it covers it just fine – not too much, mostly for examples, as it focuses on the principles of modularity in the first place.

Common problems

1If I see a demand for the second edition of this book I definitely plan to elaborate on an example of such a framework.

2Perhaps default methods in interfaces could help with some of it, but these came with Java 8 and it’s unlikely to affect existing versions of Querydsl.

Final thoughts

1Available in Hibernate starting with version 5.1.0, released early in 2016.

2Querydsl project seems more sleepy in 2017 but it still provides and there is a lot of interest in it (e.g. Atlassian is using it a lot and they wish to get involved).