II Opinionated JPA

To use ORM/JPA really well you need to know a lot about it, about the provider, its settings, about caching mechanisms and so on. And you should not be oblivious to SQL either. I’ve never worked on a system where ORM/JPA was used and at the same time everybody was really happy with the result. The system got slower, DBAs asked why we produce crappy SQLs and so on. We fought with ORM drawbacks before we could benefit from the features. It all comes down to inherent complexity of ORM. Any pro can instantly blame us for “just not knowing”… but I think the problem goes deeper.

While I too complain about programmers’ laziness to learn more about the technologies they work with, JPA (plus the provider, mind you!) is a really complex topic. I’ve been studying it for years already. Sure, there was no 100% study time like when you prepare for a certification. On the other hand, most of my study was based on experiences – and not only reactive (solutions to imminent problems) but also proactive (how to do it better on any level, as a coder, designer, architect, whatever).

I always welcome to learn more about JPA. Not because it’s my favourite technology, I just happened to spend so much time with it. When we started one recent project we discussed what to use, I was quite fresh to the team and somehow got convinced to use JPA although I would rather risk Querydsl over SQL directly. But I didn’t mind that much, JPA was quite well known to me. Little I knew how many surprises it would still have for me! The biggest of them all was probably the reason why I started to complain about the JPA on my blog and, eventually, I decided to put my twisted ideas how to utilize JPA into this book.

Generally I don’t like ideas like “let’s just use subset of this”. I don’t believe that hiring dumb Java programmers and forbidding them to use latest Java 8 features is the solution. I rarely saw such a “subset” programmer to write clean code – or they were bright enough to dislike such an arrangement and typically left soon for something better. But the JPA is not a programming language, you’re working with an API and you may choose the patterns you want to use and others you don’t. This time I decided “let’s use SQL more” and “let’s not load graph of objects (even if cached) when I just want to update this single one”. And that’s how it all started – the more I know about JPA the less of it I use. But it still has benefits I don’t want to let go.

So the question is: Can we tune the JPA down without crippling it – or ourselves?

6. Tuning ORM down

Full-blown object-relational mapping maps database foreign keys as relations between objects. Instead of Integer breedId you have Breed breed where mapping information provides all the necessary low-level information about the foreign key. This mapping information can be stored elsewhere (typically XML mapping file) or as annotations directly on the entity (probably more popular nowadays). Relations can be mapped in any direction, even in reverse, for one or both directions and in any cardinality with annotations @OneOnOne, @OneToMany, @ManyToOne and @ManyToMany. If you map both directions you choose the owner’s side, which is kinda more “in control” of the relationship. Mapping foreign keys as relations is probably the most pronounced feature of ORM – and originally I wanted to make a case against it in this very book.

In an ironic twist I discovered that the way we bent JPA 2.1 works only on its reference implementation, that is EclipseLink¹, this all more than a year too late. I was considering to stop any further work on the book after loosing this most important showcase point. After all I decided to go on with a weaker version after all. We will describe the troubles with to-one relations, their possible solutions – including the one that is not JPA compliant – and there are also other helpful tips, many Querydsl examples throughout and two Querydsl dedicated chapters.

Most of the points I’m going to explain somehow revolve around two rules:

Be explicit. For instance don’t leave persistence context (session) open after you leave the service layer and don’t wait for the presentation layer to load something lazily. Make explicit contract and fetch eagerly what you need for presentation, just like you would with plain SQL (but still avoiding ORM syndrome “load the whole DB”). Also, don’t rely on caches blindly.
Don’t go lazy. Or at least not easily. This obviously stems from the previous point, but goes further. Lazy may save some queries here and there, but in practice we rely on it too much. In most cases we can be explicit, but we’re lazy to be so (so let’s not go lazy ourselves either). There are places where FetchType.LAZY means nothing to JPA. Oh, they will say it’s a hint for a provider, but it’s not guaranteed. That’s nothing for me. Let’s face it – any to-one mapping is eager, unless you add some complexity to your system to make it lazy. It’s not lazy because you annotate it so, deal with it.

As a kind of shortcut I have to mention that Hibernate 5.x supports LAZY to-one relationships as you’d expect, more about it later in part on how lazy is (not) guaranteed. When talking about often vain LAZY on to-one, let me be clear that it is still preferred fetching for collections – and quite reliable for them as well. This does not mean you should rely on these lazy to-many relations when you want to fetch them for multiple entities at once. For instance if you want to display a page with the list of messages with attachments, possibly multiple of them for each message, you should not wait for lazy fetch. More about this in the chapter Avoid N+1 select.

Price for relations

For me the biggest problem is that JPA does not provide any convenient way how to stop cascading loads for @ManyToOne and @OneToOne (commonly named to-one in this book as you probably noticed) relationships. I don’t mind to-many relations, they have their twists, but at least their lazy works. But to-one typically triggers find by id. If you have a Dog that has an owner (type Person) and is of specific Breed you must load these two things along with a dog. Maybe they will be joined by the JPA provider, maybe not², maybe they are already in second-level cache and will be “loaded” nearly “for free”. All these options should be seriously considered, analyzed and proved before you can say that you know what is going on in your application.

And it just starts there, because Person has an Address which – in case of a rich system – may further point to District, County, State and so on. Once I wanted to change something like Currency in a treasury system. It loaded around 50 objects – all of them across these to-one relations. This may happen when you naively want to change a single attribute in that currency. In SQL terms, that’s an update of a single column for a single row.

When you insert or update a Dog you can use em.getReference(clazz, id) to get a Breed object containing only id of the breed. This effectively works as a wrapped foreign key (FK). Heck, you can just new an empty Breed and set its id, you don’t even have to ask em for the reference. In case of update you will not save much as that Dog is probably managed already anyway. You can either update managed entity – which means it loaded previous values for all to-one FKs cascading as far as it needed – or you can try to merge the entity – but this works only if you overwrite it completely, it’s not usable for update of a single column. Or should you just use JPQL update and possibly remove all dogs from the second-level cache? How ORM is that?

Why JPA doesn’t provide better fetch control for finds? I want to work with this Dog object now, I’m not interested in its relations, just wrap those FK values into otherwise empty entity objects (like references) and let me do my stuff! How I wished I could just map raw FK value instead of related object… actually, you can, but while you are able to load such related object explicitly (find by id), you can’t join on a relationship defined so.

If you use EclipseLink you can do it, but that means you can’t switch to other JPA 2.1 providers reliably, not to mention that dropping the mapping is by no means a minor detour from standard path. However the problems with to-one ultimately led me to these extreme measures and I got rid of to-one mappings and never looked back. Because I based my work on reference implementation of JPA 2.1 (EclipseLink) it took me more than a year to discover I diverged from the standard – and even then only because I started to write tests for this book with both major JPA providers.

More about this in the chapter Troubles with to-one relationships. Other points are much less radical compared to this one. Long before I learnt that being explicit and less lazy is definitely better and we will talk about it in other sections.

How does this affect my domain model?

I stated that this book will not be about architecture, but this is the part where we have to tackle it a bit. If we talk about domain model, we probably also talk about domain driven design (DDD) best described in [DDD]. I can’t claim experience with DDD because I never saw it in practice, but it must work for some, reportedly, especially for complex business domains with lots of rules, etc. Reading [PoEAA] it is obvious that there’s a lot of synergy between DDD and ORM. One may even ask whether to use ORM without DDD at all. And I can’t answer that, sorry.

[PoEAA] recommends other patterns for simpler domain problems (Transaction Script, Table Module) with related data source patterns (Table Data Gateway, Row Data Gateway or Active Mapper). There are some newer architectural solutions as well, like Command Query Responsibility Segregation (CQRS) or Data, context and interaction (DCI), both of them usable with object-oriented languages, but not necessarily for every problem.

Back to domain, though. One antipattern often mentioned in relation to ORM/JPA is called Anemic domain model. Let’s quote master Martin Fowler a bit again:

I’m not able to slip out of this topic and make object purists happy, but there are many other styles of programming and the question (far beyond the scope of this book) is whether being purely OO is the most important thing after all. In most projects I treated my entities like java beans with just a little low-level logic to it – so it was mostly this “anemic domain model” song. But entity objects can be perfect data transfer objects (DTOs) in many situations. It is still far more advanced compared to amorphous rows of a ResultSet. We have dumb model, but we can utilize it in a domain if we need to.

I even contemplated something like “domain over entity model” solution where entity is as dumb as possible (it still has responsibility for me, it maps to a database table) and I can wrap it with whatever logic I want in a domain object. This even allows me to reverse some dependencies if I wish so – for instance, in a database my order items point to the order using a foreign key, but in my higher model it can be the other way around, invoice can aggregate items that may not even know about an order.

In any case, whatever we do, we can hardly get rid of all the notions of a particular data store we use, because any such “total abstraction” must lead to some mismatch – and that typically leads to complexity and often significant performance penalty as well. The way I go with this book is trying to get as much benefit from using the JPA without incurring too much cost. Maybe I’m not going for the best approach, but I’m trying to avoid the biggest pain points while accepting that I have SQL somewhere down there.

One thing I respect on Fowler et al. is that they try to balance costs/benefits and they neither push ORM always forward nor do they criticize it all the time without offering real alternatives for the complex cases. However, many other smart and experienced people dislike ORM. Listening to “pure haters” does not bring much to a discussion, but some cases are well-argued, offering options and you can sense the deep understanding of the topic.

When to tune down and when not?

I’d say: “Just use it out of the box until it works for you.” There are good reasons not to rely on something we don’t understand but in custom enterprise software there often is no other way. For many systems everything will be OK, but for slightly complicated database/domain models a lot of things can go wrong.

Performance problems

Most typically the performance degrades first. You definitely should start looking at the generated queries. Something can be fixed in the database (typically with indexes), some queries can be rewritten in the application but sometimes you have to adjust the mapping.

With to-one relations you either have ORM with reliable lazy fetch or you have to do tricks described in Troubles with to-one relationships or even change the mapping in a way that is not JPA compliant, as in Removing to-one altogether. There are also still JPA native queries and JDBC underneath – needless to say that all of this introduces new concept into the code and if we mix it too much the complexity is on the rise.

To-many has less problems but some mappers generate more SQL joins than necessary. This can be often mended by introducing explicit entity mapping for the associative table. This does not raise conceptual complexity but some query code may look awkward and more SQL like instead of JPA like. I argue that explicit joins are not that bad though.

Other typical problems involve N+1 select problem, but this type of problems is easy to avoid if we are explicit with our service contracts and don’t let presentation layer to initiate lazy load.

Strange persistence bugs

Most typical enterprise applications can keep persistence context short – and we should. When ORM got mainstream a lot of people complained about lazy load exceptions (like the Hibernate famous LazyInitializationException) but this results from misusing ORM. Programmers solved it with making the session (persistence context) longer and introduced Open Session in View (OSIV) pattern. This is now widely considered antipattern and there are many good reasons for that.

Using OSIV allowed to keep back-end and front-end programmers separated (with all the disadvantages when two minds do what one mind can do as well), dealing only with some common service API, but this API was leaky when it didn’t load all the data and let presentation fetch the rest on demand. Instead of making it explicit what data the service call provides for the view we let the view³ to perform virtually anything on the managed (“live”) entities. This means that the presentation programmers must be very careful what they do.

There is a serious consequence when the service layer loses control over the persistence context. Transaction boundaries will stop working as one would expect. When the presentation layer combines multiple calls to the service layer you may get unexpected results – results no one have ever consciously implemented.

Presentation layer first fetches some entity A. Programmer knows it is read-only call – which is not reliable across the board, but let’s say they know it works for their environment – and because used entity objects as DTO.
Based on the request we are servicing it changes the entity – knowing it will not get persisted in the database.
There is some transactional read-write service call working with completely different type of entities B.

Result without OSIV is quite deterministic and if we hit some lazy-load exception than we should fetch all the necessary data in the first service call. With OSIV, however, modified A gets persisted together with changes to B – because they are both part of the same persistence context which got flushed and committed.

There is a couple of wrong decisions made here. We may argue that even when the presentation layer is not remote we should still service it with one coarse-grained call. Some will scowl at entities used as DTOs and will propose separate DTO classes (others will scowl at that in turn). But OSIV is the real culprit, the one that gives you unexpected behaviour spanning across multiple service calls. Even if the example (real-life one, mind you) seems contrived when we dropped OSIV from our project everything around transactions got much cleaner.

7. No further step without Querydsl

I do realize I’m breaking a flow of the book a little bit, but since I’ve been introduced to this neat library I became a huge fan and I never looked back. It doesn’t affect JPA itself because it sits above it but in your code it virtually replaces both JPQL and especially Criteria API.

To put it simply, Querydsl is a library that – from programmer’s perspective – works like a more expressive and readable version of Java Persistence Criteria API. Internally it first generates JPQL and the rest from there is handled by the JPA provider. Querydsl can actually talk to many more back-ends: SQL, Hibernate directly and others, but we will focus on Querydsl over JPA here. To use it goes in these steps:

declare the dependency for Querydsl library and to its annotation processor,
add a step in your build to generate metamodel classes,
and write queries happily in sort of criteria-like way.

But Criteria API also lets you use generated metamodel, so what’s the deal? Why would I introduce non-standard third-party library when it does not provide any additional advantage? If you even hate this idea, then you probably can stop reading this book – or you can translate all my Querydsl code into JPQL or Criteria yourself, which is perfectly doable, of course! Querydsl does not disrupt the stack under it, it is “just” slapped over the JPA to make it more convenient.

Querydsl brings in a domain-specific language (DSL) in the form of its fluent API. It happens to be so-called internal DSL because it’s still embedded in Java, it’s not different language per se. This fluent API is more concise, readable and convenient than Criteria API.

DSL is a language of some specific domain – in this case it’s a query language very close to JPQL or SQL. It builds on a generated metamodel a lot. API of this metamodel is well-though and takes type-safety to a higher level compared with Criteria API. This not only gives us compile-time checks for our queries, but also offers even better auto-completion in IDEs.

Simple example with Querydsl

Let’s be concrete now – but also very simple. We have a class Dog, each dog has a name and we will query by this name. Assuming we got hold of EntityManager (variable em) the code goes like this:

Querydsl simple example

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .where(QDog.dog.name.like("Re%"))
5 //.where(QDog.dog.name.startsWith("Re")) // alternative
6   .fetch();

Both where alternatives produce the same result in this case, but startsWith may communicate the intention better, unless you go for like("%any%") in which case contains would be better. If you are provided input values for like, leave it. If you can tell from the logic that more specific name for the operation is better, go for it.

This is a very subtle thing, but we can see that this DSL contains variations that can communicate our intention more clearly. Criteria API sticks to like only, because that is its internal model. Other thing is how beautifully the whole query flows. In version 4.x the fluent API got even closer to JPQL/SQL semantics, it starts with select (what) and ends with fetch which is a mere signal to deliver the results. As with any other fluent API, you need a terminal operation. In previous versions you would have no select because it was included in a terminal operation, e.g. list(QDog.dog). Newer version is one line longer, but closer to the target domain of query languages. Advanced chapter has a dedicated section on differences between versions 3 and 4.

Comparison with Criteria API

Both Querydsl and Criteria API are natural fit for dynamic query creation. Doing this with JPQL is rather painful. Imagine a search form with separate fields for a person entity, so you can search by name, address, date of birth from–to, etc. We don’t want to add the search condition when the respective input field is empty. If you have done this before with any form of query string concatenation then you probably know the pain. In extreme cases of plain JDBC with prepared statement you even have to write all the ifs twice – first to add where condition (or and for any next one) and second to set parameters. Technically you can embed the parameter values into the query, but let’s help the infamous injection vulnerability get off the top of the OWASP Top 10 list.

Let’s see how query for our dogs looks like with Criteria API – again starting from em:

Criteria API simple example

 1 CriteriaBuilder cb = em.getCriteriaBuilder();
 2 CriteriaQuery<Dog> query = cb.createQuery(Dog.class);
 3 Root<Dog> dog = query.from(Dog.class);
 4 query.select(dog)
 5   // this is the only place where we can use metamodel in this example
 6   .where(cb.like(dog.get(Dog_.name), "Re%"));
 7   // without metamodel it would be:
 8 //.where(cb.like(dog.<String>get("name"), "Re%"));
 9 List<Dog> dogs = em.createQuery(query)
10   .getResultList();

Let’s observe now:

First you need to get CriteriaBuilder from existing em. You might “cache” this into a field but it may not play well with EE component model, so I’d rather get it before using. This should not be heavy operation, in most cases entity manager holds this builder already and merely gives it to you (hence get and not new or create).
Then you create an instance of CriteriaQuery.
From this you need to get a Root object representing content of a from clause.
Then you use the query in a nearly-fluent fashion. Version with metamodel is presented with alternative without it in the comment.
Finally, you use em again to get a TypedQuery based on the CriteriaQuery and we ask it for results.

While in case of Criteria API you don’t need to generate metamodel from the entity classes, in Querydsl this is not optional. But using metamodel in Criteria API is advantageous anyway so the need to generate the metamodel for Querydsl using annotation processor can hardly be considered a drawback. It can be easily integrated with Maven or other build as demonstrated in the companion sources to this book or documented on Querydsl site.

For another comparison of Querydsl and Criteria API, you can also check the original blog post from 2010. Querydsl was much younger then (version 1.x) but the difference was striking already.

Comparison with JPQL

Comparing Querydsl with Criteria API was rather easy as they are in the same ballpark. Querydsl, however, with its fluency can be compared to JPQL as well. After all JPQL is non-Java DSL, even though it typically is embedded in Java code. Let’s see JPQL in action first to finish our side-by-side comparisons:

JPQL simple example

1 List<Dog> dogs = em.createQuery(
2   "select d from Dog d where d.name like :name", Dog.class)
3   .setParameter("name", "Re%")
4   .getResultList();

This is it! Straight to the point, and you can even call it fluent! Probably the best we can do with Querydsl is adding one line to introduce shorter “alias” like this:

Querydsl simple example with alias

1 QDog d = new QDog("d1");
2 List<Dog> dogs = new JPAQuery<>(em)
3   .select(d)
4   .from(d)
5   .where(d.name.startsWith("Re"))
6   .fetch();

Using aliases is very handy especially for longer and/or more complicated queries. We could use QDog.dog as the value, or here I introduced new query variable and named it d1. This name will appear in generated JPQL that looks a little bit different from the JPQL in example above:

1 select d1 from Dog d1 where d1.name like ?1 escape '!'

There is a subtle difference in how Querydsl generates like clause – which, by the way, is fully customizable using Querydsl templates. But you can see that alias appears in JPQL, although neither EclipseLink nor Hibernate bother to translate it to the generated SQL for your convenience.

Now, if we compare both code snippets above (alias creation included) we get a surprising result – there are more characters in the JPQL version! Although it’s just a nit-picking (line/char up or down), it’s clear that Querydsl can express JPQL extremely well (especially in 4.x version) and at the same time it allows for easy dynamic query creation.

You may ask about named queries. Here I admit right away, that Querydsl necessarily introduces overhead (see the next section on disadvantages), but when it comes to query reuse from programmer’s perspective Querydsl offers so called detached queries. While these are not covered in their reference manual (as of March 2016), we will talk more about them in a chapter about advanced Querydsl topics.

What about the cons?

Before we move to other Querydsl topics I’d go over its disadvantages. For one, you’re definitely adding some overhead. I don’t know exactly how big, maybe with the SQL backend (using JDBC directly) it would be more pronounced because ORM itself is big overhead anyway. In any case, there is an object graph of the query in the memory before it is serialized into JPQL – and from there it’s on the same baseline like using JPA directly.

This performance factor obviously is not a big deal for many Querydsl users (some of them are really big names), it mostly does not add too much to the SQL execution itself, but in any case – if you are interested you have to measure it yourself. Also realize that without Querydsl you either have to mess with JPQL strings before you have the complete dynamic WHERE part or you should compare it with Criteria API which does not have to go through JPQL, but also creates some object graph in the memory (richer or not? depends).

Documentation, while good, is definitely not complete. For instance, detached queries are not mentioned in their Querydsl Reference Guide and there is much more you need to find out yourself. You’ll need to find your own best practices, probably, but it is not that difficult with Querydsl. Based (not only) on my own experience, it is literally joy to work with Querydsl and explore what it can provide – for me it worked like this from day zero. It’s also very easy to find responses on the StackOverflow or their mailing-list, very often provided directly by Querydsl developers. This makes any lack of documentation easy to overcome.

Finally, you’re adding another dependency to the project, which may be a problem for some. For the teams I worked with it was well compensated by the code clarity Querydsl gave us.

Be explicit with aliases

With attribute paths at our disposal it is easy to require some data without explicitly using joins. Let’s consider the following entity chain:

We will start all the following queries at EntityA and first we will want the list of related EntityC objects. We may approach it like this:

Query with two-level implicit join

1 List<EntityC> result = new JPAQuery<>(em)
2   .select(QEntityA.entityA.entityB.entityC)
3   .from(QEntityA.entityA)
4   .fetch();

This works and Hibernate generates the following SQL:

Generated SQL (formatted)

1 select
2   entityc2_.id as id1_5_,
3   entityc2_.entityD_id as entityD_3_5_,
4   entityc2_.name as name2_5_
5 from
6   EntityA entitya0_,
7   EntityB entityb1_ inner join
8   EntityC entityc2_ on entityb1_.entityC_id=entityc2_.id
9 where entitya0_.entityB_id=entityb1_.id

Effectively these are both inner joins and the result is as expected. But what happens if we want to traverse to EntityD?

Three-level implicit join

1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityA.entityA.entityB.entityC.entityD)
3   .from(QEntityA.entityA)
4   .fetch();

After running this we end up with NullPointerException on the line with select method. What happened? The core problem is that it is not feasible to generate infinitely deep path using final fields. Querydsl offers some solutions to this as discussed in the reference documentation. You can either ask the generator to initialize the path you need with the @QueryInit annotation or you can mark the entity with @Config(entityAccessors=true) which generates accessor methods instead of final fields. In the latter case you’d simply use this annotation on EntityC and in the select call use entityD() which would create the property on the fly. This way you can traverse relations ad lib. It requires some Querydsl annotations on the entities – which you may already use, for instance for custom constructors, etc.

However, instead of customizing the generation I’d advocate being explicit with the aliases. If there is a join let it show! We can go around the limitation of max-two-level paths using just a single join:

Taming deep paths with a join

1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityB.entityB.entityC.entityD)
3   .from(QEntityA.entityA)
4   .join(QEntityA.entityA.entityB, QEntityB.entityB)
5   .fetch();

Here we lowered the nesting back to two levels by making EntityB explicit. We used existing default alias QEntityB.entityB. This query makes the code run, but looks… unclean, really. Let’s go all the way and make all the joins explicit:

Explicit joins with aliases

1 List<EntityD> result = new JPAQuery<>(em)
2   .select(QEntityD.entityD)
3   .from(QEntityA.entityA)
4   // second parameter is alias for the path in the first parameter
5   .join(QEntityA.entityA.entityB, QEntityB.entityB)
6   // first parameter uses alias from the previous line
7   .join(QEntityB.entityB.entityC, QEntityC.entityC)
8   .join(QEntityC.entityC.entityD, QEntityD.entityD)
9   .fetch();

Now this looks long, but it says exactly what it does – no surprises. We may help it in a couple of ways but the easiest one is to introduce the aliases upfront:

Explicit upfront aliases

 1 QEntityA a = new QEntityA("a");
 2 QEntityB b = new QEntityB("b");
 3 QEntityC c = new QEntityC("c");
 4 QEntityD d = new QEntityD("d");
 5 List<EntityD> result = new JPAQuery<>(em)
 6   .select(d)
 7   .from(a)
 8   .join(a.entityB, b)
 9   .join(b.entityC, c)
10   .join(c.entityD, d)
11   .fetch();

I like this most – it is couple of lines longer but very clean and the query is very easy to read. Anytime I work with joins I always go for explicit aliases.

Another related problem is joining the same entity more times (but in different roles). Newbie programmers also often fall for a trap of using the same alias – typically the default one offered on each Q-class – for multiple joins on the same entity. There can be two distinct local variables representing these aliases but they both point to the same object instance – hence it’s still the same alias. Whenever I join an entity more times in a query I always go for created aliases. When I’m sure that an entity is in a query just once I may use default alias. It really pays off to have a strict and clean strategy of how to use the aliases and local variables pointing at them.

Functions and operations

Let’s see what data we will work on in the next sections so we can understand the returned results.

Content of **Breed** table
id	name
1	collie
2	german shepherd
3	retriever

Content of **Dog** table (id is not important)
name	age	breed_id
Lassie	7	1 (collie)
Rex	6	2 (german shepherd)
Ben	4	2 (german shepherd)
Mixer	3	`NULL` (unknown breed)

The following examples can be found on GitHub in FunctionsAndOperations.java.

When we want to write some operation we simply follow the fluent API. You can write QDog.dog.age. and hit auto-complete – this offers us all the possible operations and functions we can use on the attribute. These are based on the known type of that attribute, so we only get the list of functions that make sense (from technical point of view, not necessarily from business perspective, of course).

This naturally makes sense for binary operations. If we want to see dog’s age halved we can do it like this:

1 List<Tuple> dogsGettingYounger = new JPAQuery<>(em)
2   .select(QDog.dog.name, QDog.dog.age.divide(2))
3   .from(QDog.dog)
4   .fetch();

We will cover Tuples soon, now just know they can be printed easily and the output would be [[Lassie, 3], [Rex, 3], [Ben, 2], [Mixer (unknown breed), 1]]. We are not limited to constants and can add together two columns or multiply amount by rate, etc. You may also integer results of the division – this can be mended if you select age as double – like this:

1 // rest is the same
2 .select(QDog.dog.name, QDog.dog.age.castToNum(Double.class).divide(2))

Unary functions are less natural as they are appended to the attribute expression just like binary operations – they just don’t take any arguments:

1 List<Tuple> dogsAndNameLengths = new JPAQuery<>(em)
2   .select(QDog.dog.name, QDog.dog.name.length())
3   .from(QDog.dog)
4   .fetch();

This returns [[Lassie, 6], [Rex, 3], [Ben, 3], [Mixer (unknown breed), 21]]. JPA 2.1 allows the usage of any function using the FUNCTION construct. We will cover this later in the advanced chapter on Querydsl.

Aggregate functions

Using aggregate functions is the same like any other function – but as we know from SQL, we need to group by any other non-aggregated columns. For instance to see the counts of dogs for any age we can write:

1 List<Tuple> dogCountByAge = new JPAQuery<>(em)
2   .select(QDog.dog.age, QDog.dog.count())
3   .from(QDog.dog)
4   .groupBy(QDog.dog.age)
5   .fetch();

Our example data have each dog with different age, so the results are boring.

If we want to see the average age of dogs per breed we use this:

1 List<Tuple> dogAvgAgeByBreed = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.name, QDog.dog.age.avg())
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.breed, QBreed.breed)
5   .groupBy(QBreed.breed.id, QBreed.breed.name)
6   .orderBy(QBreed.breed.name.asc())
7   .fetch();

Because of leftJoin this returns also line [null, null, 3.0]. With innerJoin (or just join) we would get only dogs with defined breed. On screen I’m probably interested in breed’s name but that relies on their uniqueness – which may not be the case. To get the “true” results per breed I added breed.id into the select clause. We can always display just names and if they are not unique user will at least be curious about it. But that’s beyond the discussion for now.

Querydsl supports having clause as well if the need arises. We also demonstrated orderBy clause, just by the way, as it hardly deserves special treatment in this book.

Subqueries

Here we will use the same data like in the previous section on functions and operations. The following examples can be found on GitHub in Subqueries.java). You can also check Querydsl reference documentation.

Independent subquery

If we want to find all the dogs that are of breed with a long name, we can do this:

Dogs with a long breed name

 1 List<Dog> dogsWithLongBreedName = new JPAQuery<>(em)
 2   .select(QDog.dog)
 3   .from(QDog.dog)
 4   .where(QDog.dog.breed.in(
 5     new JPAQuery<>() // no EM here
 6       .select(QBreed.breed)
 7       .from(QBreed.breed)
 8       // no fetch on subquery
 9       .where(QBreed.breed.name.length().goe(10))))
10   .fetch();

This returns Ben and Rex, both german shepherds. Note that subquery looks like normal query, but we don’t provide EntityManager to it and we don’t use any fetch. If you do the subquery will be executed on its own first and its results will be fetched into in clause just as any other collection would. This is not what we want in most cases – especially when the code leads us to an idea of a subquery. The previous result can be achieved with join as well – and in such cases joins are almost always preferable.

To make our life easier we can use JPAExpressions.select(...) instead of new JPAQuery<>().select(...). There is also a neat shortcut for the subquery with select and from with the same expression.

1 JPAExpressions.selectFrom(QBreed.breed) ...

We will prefer using JPAExpression from now on. Another example finds average age of all dogs and returns only those above it:

Dogs older than average

1 List<Dog> dogsOlderThanAverage = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .where(QDog.dog.age.gt(
5     JPAExpressions.select(QDog.dog.age.avg())
6       .from(QDog.dog)))
7   .fetch();

This is perhaps on the edge regarding usage of the same alias QDog.dog for both inner and outer query, but I dare to say it does not matter here. In both previous examples the same result can be obtained when we execute the inner query first because it always provides the same result. What if we need to drive some conditions of the inner query by the outer one?

Correlated subquery

When we use object from the outer query in the subquery it is called correlated subquery. For example, we want to know for what breeds we have no dogs:

Breeds with no dogs

1 List<Breed> breedsWithoutDogs = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(
5     JPAExpressions.selectFrom(QDog.dog)
6       .where(QDog.dog.breed.eq(QBreed.breed))
7       .notExists())
8   .fetch();

We used a subquery with exist/notExists and we could omit select, although it can be used. This returns a single breed – retriever. Interesting aspect is that the inner query uses something from the outer select (here QBreed.breed). Compared to the subqueries from the previous section this one can have different result for each row of the outer query.

This one can actually be done by join, as well, but in this case I’d not recommend it. Depending on the mapping you use, various things work on various providers. When you have @ManyToOne Breed breed than this works on EclipseLink:

Breeds with no dogs

1 List<Breed> result = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .leftJoin(QDog.dog).on(QBreed.breed.eq(QDog.dog.breed))
5   .where(QDog.dog.isNull())
6   // distinct would be needed for isNotNull (breeds with many dogs)
7   // not really necessary for isNull as those lines are unique
8   .distinct()
9   .fetch();

Alas, it fails on Hibernate (5.2.2) with InvalidWithClauseException: with clause can only reference columns in the driving table. It seems to be an open bug mentioned in this StackOverflow answer – and the suggested solution of using plain keys instead of objects indeed work:

1 // the rest is the same
2 .leftJoin(QDog.dog).on(QBreed.breed.id.eq(QDog.dog.breedId))

For this we need to switch Breed breed mapping to plain FK mapping, or use both of them with one marked with updatable false:

1 @Column(name = "breed_id", updatable = false, insertable = false)
2 private Integer breedId;

You may be tempted to avoid this mapping and do explicit equals on keys like this:

1 // the rest is the same
2 .leftJoin(QDog.dog).on(QBreed.breed.id.eq(QDog.dog.breed.id))

But this has its own issues. EclipseLink joins the breed twice, second time using implicit join after left join which is a construct that many databases don’t really like (H2 or SQL Server, to name just those where I noticed this quite often) and it fails on JDBC/SQL level. Hibernate also generates the superfluous join to the breed but at least orders them more safely. Still, if you use EclipseLink simply don’t add explicit IDs to your paths and if you’re using Hibernate at least know it’s not efficient join-wise and also is not really portable across JPA providers.

There is another JPA query that works and also arguably reads most naturally:

1 List<Breed> result = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.dogs.isEmpty())
5   .fetch();

Internally this generates subquery with count or (not) exists anyway, there is no way around it. But for this to work we also have to add the mapping to Breed which was not necessary for the previous non-JPA joins:

1 @OneToMany(mappedBy = "breed")
2 private Set<Dog> dogs;

This mapping may be rather superfluous for some cases (why should breed know the list of dogs?) but it should not hurt.

We somehow foreshadowed themes from the following chapters on various mapping problems but we’re not done with subqueries (or Querydsl) just yet. Let’s now try to find dogs that are older than average – but this time for each breed:

Dogs older than breed average

1 QDog innerDog = new QDog("innerDog");
2 List<Dog> dogsOlderThanBreedAverage = new JPAQuery<>(em)
3   .select(QDog.dog)
4   .from(QDog.dog)
5   .where(QDog.dog.age.gt(
6     JPAExpressions.select(innerDog.age.avg())
7       .from(innerDog)
8       .where(innerDog.breed.eq(QDog.dog.breed))))
9   .fetch();

Notice the use of innerDog alias which is very important. Had you used QDog.dog in the subquery it would have returned the same results like dogsOlderThanAverage above. This query returns only Rex, because only german shepherds have more than a single dog – and a single (or no) dog can’t be older than average.

This section is not about going deep into subquery theory (I’m not the right person for it in the first place), but to demonstrate how easy and readable it is to write subqueries with Querydsl API. We’re still limited by JPQL, that means no subqueries in from or select clauses). When writing subqueries be careful not to “reuse” alias from the outer query.

Pagination

Very often we need to cut our results into smaller chunks, typically pages for some user interface, but pagination is often used also with various APIs. To do it manually, we can simply specify offset (what record is the first) and limit (how many records):

Fetching a single page

1 List<Breed> breedsPageX = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.id.lt(17)) // random where
5   .orderBy(QBreed.breed.name.asc())
6   .offset((page - 1) * pageSize)
7   .limit(pageSize)
8   .fetch();

I used variables page (starting at one, hence the correction) and pageSize to calculate the right values. Any time you paginate the results you want to use orderBy because SQL specification does not guarantee any order without it. Sure, databases typically give you results in some natural order (like “by ID”), but it is dangerous to rely on it.

Content of the page is good, but often we want to know total count of all the results. We use the same query with the same conditions, we just first use .fetchCount like this:

Fetching a single page and total count

 1 JPAQuery<Breed> breedQuery = new JPAQuery<>(em)
 2   .select(QBreed.breed)
 3   .from(QBreed.breed)
 4   .where(QBreed.breed.id.lt(17))
 5   .orderBy(QBreed.breed.name.asc());
 6 
 7 long total = breedQuery.fetchCount();
 8 List<Breed> breedsPageX = breedQuery
 9   .offset((page - 1) * pageSize)
10   .limit(pageSize)
11   .fetch();

We can even add if skipping the second query when count is 0 as we can just create an empty list. And because this scenario is so typical Querydsl provides a shortcut for it:

Pagination with Querydsl fetchResults

 1 QueryResults<Breed> results = new JPAQuery<>(em)
 2   .select(QBreed.breed)
 3   .from(QBreed.breed)
 4   .where(QBreed.breed.id.lt(17))
 5   .orderBy(QBreed.breed.name.asc())
 6   .offset((page - 1) * pageSize)
 7   .limit(pageSize)
 8   .fetchResults();
 9 System.out.println("total count: " + results.getTotal());
10 System.out.println("results = " + results.getResults());

QueryResults wrap all the information we need for paginated result, including offset and limit. It does not help us with reverse calculation of the page number from an offset, but I guess we probably track the page number in some filter object anyway as we needed it as the input in the first place.

And yes, it does not execute the second query when the count is zero.⁴

Tuple results

When you call fetch() the List is returned. When you call fetchOne() or fetchFirst() one object is returned. The question is – what type of object? This depends on the select clause which can have a single parameter which determines the type or you can use list of expressions for select in which case Tuple is returned. That is, a query with select(QDog.dog) will return Dog (or List<Dog>) because QDog.dog is of type QDog that extends EntityPathBase<Dog> which eventually implements Expression<Dog> – hence the Dog.

But when you return multiple results we need to wrap them somehow. In JPA you will get Object[] (array of objects) for each row. That works, but is not very convenient. Criteria API brings Tuple for that reason which is much better. Querydsl uses Tuple idea as well. What’s the deal?

Tuples of attributes (columns)

 1 List<Tuple> result = new JPAQuery<>(em)
 2   .select(QDog.dog.name, QBreed.breed.name)
 3   .from(QDog.dog)
 4   .join(QDog.dog.breed, QBreed.breed)
 5   .fetch();
 6 result.forEach(t -> {
 7   String name = t.get(QDog.dog.name);
 8   String breed = t.get(1, String.class);
 9   System.out.println("Dog: " + name + " is " + breed);
10 });

As you can see in the forEach block we can extract columns by either using the expression that was used in the select as well (here QDog.dog.name) or by index. Needless to say that the first way is preferred. You can also extract the underlying array of objects using toArray().

Because Tuple works with any expression we can use the whole entities too:

Tuples of entities

 1 List<Tuple> result = new JPAQuery<>(em)
 2   .select(QDog.dog, QBreed.breed)
 3   .from(QDog.dog)
 4   .join(QDog.dog.breed, QBreed.breed)
 5   .fetch();
 6 result.forEach(t -> {
 7   Dog dog = t.get(QDog.dog);
 8   Breed breed = t.get(QBreed.breed);
 9   System.out.println("\nDog: " + dog);
10   System.out.println("Breed: " + breed);
11 });

And of course, you can combine entities with columns ad-lib.

There are other ways how to combine multiple expressions into one object. See the reference guide to see how to populate beans, or how to use projection constructors.

Fetching to-many eagerly

In the following example we select all the breeds and we want to print the collection of the dogs of that breed:

Naive approach to to-many

 1 // mapping on Breed
 2 @OneToMany(mappedBy = "breed")
 3 private Set<Dog> dogs;
 4 
 5 // query
 6 List<Breed> breed = new JPAQuery<>(em)
 7   .select(QBreed.breed)
 8   .from(QBreed.breed)
 9   .fetch();
10 breed.forEach(b ->
11   System.out.println(b.toString() + ": " + b.getDogs()));

This executes three queries. First the one that is obvious and then one for each breed when you want to print the dogs. Actually, for EclipseLink it does not produce the other queries and merely prints {IndirectSet: not instantiated} instead of the collection content. You can nudge EclipseLink with something like b.getDogs().size() or other meaningful collection operation. It seems toString() isn’t meaningful enough for EclipseLink.

We can force the eager fetch when we adjust the mapping:

1 @OneToMany(mappedBy = "breed", fetch = FetchType.EAGER)
2 private Set<Dog> dogs;

However, while the fetch is eager, EclipseLink still does it with three queries (1+N in general). Hibernate uses one query with join. So yes, it is eager, but not necessarily efficient. There is a potential problem with paging when using join to to-many – the problem we mentioned already and we’ll return to it later. What happens when we add offset and limit?

1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .offset(1)
5   .limit(1)
6   .fetch();

Because EclipseLink uses separate query for breeds, nothing changes and we get the second result (by the way, we forgot to order it!). Hibernate is smart enough to use separate query too, as soon as it smells the offset/limit combo. It makes sense, because query says “gimme breeds!” – although not everything in JPA is always free of surprises. In any case, eager collection is not recommended. There may be notable exceptions – maybe some embeddables or other really primitive collections – but I’d never risk fetching collection of another entities as it asks for fetching the whole database eventually.

What if we write a query that joins breed and dogs ourselves? We assuming no pagination again.

1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .join(QBreed.breed.dogs, QDog.dog)
5   .distinct()
6   .fetch();
7 breeds.forEach(b ->
8   System.out.println(b.toString() + ": " + b.getDogs()));

If you check the logs queries are indeed generated with joins, but it is not enough. EclipseLink still prints uninitialized indirect list (and relevant operation would trigger the select), Hibernate prints it nicely, but needs additional N selects as well. We need to say what collection we want initialized explicitly using fetchJoin like so:

Fetch join

1 List<Breed> breeds = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .join(QBreed.breed.dogs).fetchJoin()
5   .distinct()
6   .fetch();

You use fetchJoin() just after the join with a collection path and in this case you don’t need to use second parameter with alias, unless you want to use it in where or select, etc.⁵

Because we used join with to-many relations ourselves JPA assumes we want all the results (5 with our demo data defined previously). That’s why we also added distinct() – this ensures we get only unique results from the select clause. But these unique results will still have their collections initialized because SQL still returns 5 results and JPA “distincts” it in post-process. This means we cannot paginate queries with joins, distinct or not. This is something we will tackle in a dedicated section.

Querydsl and code style

Querydsl is a nice API and DSL but may lead to horrific monsters like:

 1 Integer trnId = new JPAQuery<>(em)
 2   .select(QClientAgreementBankAccountTransaction
 3     .clientAgreementBankAccountTransaction.transactionId))
 4   .from(QClientAgreement.clientAgreement)
 5   .join(QClientAgreementBankAccount.clientAgreementBankAccount).on(
 6     QClientAgreementBankAccount.clientAgreementBankAccount.clientAgreementId
 7       .eq(QClientAgreement.clientAgreement.id))
 8   .join(QClientAgreementBankAccountTransaction.clientAgreementBankAccountTransaction)
 9     .on(QClientAgreementBankAccountTransaction.
10       clientAgreementBankAccountTransaction.clientAgreementBankAccountId
11         .eq(QClientAgreementBankAccount.clientAgreementBankAccount.id))
12   .where(QClientAgreement.clientAgreement.id.in(clientAgreementsToDelete))
13   .fetch();

The problems here are caused by long entity class names but often there is nothing you can do about it. Querydsl exacerbates it with the duplication of the name in QClassName.className. There are many ways how to tackle the problem.

We can introduce local variables, even with very short names as they are very close to the query itself. In most queries I saw usage of acronym aliases, like cabat for QClientAgreementBankAccountTransaction. This clears up the query significantly, especially when the same alias is repeated many times:

 1 QClientAgreement ca = QClientAgreement.clientAgreement;
 2 QClientAgreementBankAccount caba =
 3   QClientAgreementBankAccount.clientAgreementBankAccount;
 4 QClientAgreementBankAccountTransaction cabat =
 5   QClientAgreementBankAccountTransaction.clientAgreementBankAccountTransaction;
 6 Integer trnId = new JPAQuery<>(em)
 7   .select(cabat.transactionId)
 8   .from(ca)
 9   .join(caba).on(caba.clientAgreementId.eq(ca.id))
10   .join(cabat).on(cabat.clientAgreementBankAccountId.eq(caba.id))
11   .where(ca.id.in(clientAgreementsToDelete))
12   .fetch();

You give up some explicitness in the query with those acronyms but you get much higher signal-to-noise ratio which can easily overweight the burden of mental mapping to local variables.

Another option is to use static imports which halves the QNames into half:

 1 Integer trnId = new JPAQuery<>(em)
 2   .select(clientAgreementBankAccountTransaction.transactionId)
 3   .from(clientAgreement)
 4   .join(clientAgreementBankAccount).on(
 5     clientAgreementBankAccount.clientAgreementId.eq(clientAgreement.id))
 6   .join(clientAgreementBankAccountTransaction).on(
 7     clientAgreementBankAccountTransaction.clientAgreementBankAccountId
 8       .eq(clientAgreementBankAccount.id))
 9   .where(clientAgreement.id.in(clientAgreementsToDelete))
10   .fetch()

This is clearly somewhere between the first (totally unacceptable) version and the second where the query was very brief at the cost of using acronyms for aliases. Using static names is great when they don’t collide with anything else. But while QDog itself says clearly it is metadata type, name of its static field dog clearly collides with any dog variable of a real Dog type you may have in your code. Still, we use static imports extensively in various infrastructural code where no confusion is possible.

We also often use field or variable named $ that represents the root of our query. I never ever use that name for anything else and long-term experiments showed that it brought no confusion into our team whatsoever. Snippet of our DAO class may look like this:

 1 public class ClientDao {
 2   public static final QClient $ = QClient.client;
 3 
 4   public Client findByName(String name) {
 5     return queryFrom($) // our shortcut for new/select/from
 6       .where($.name.eq(name))
 7       .fetchOne($);
 8   }
 9 
10   public List<Client> listByRole(ClientRole clientRole) {
11     return queryFrom($)
12       .where($.roles.any().eq(clientRole))
13       .list($);
14   }
15 ...

As a bonus you can see how any() is used on collection path $.roles. You can even go on to a specific property – for instance with our demo data we can obtain list of breeds where any dog is named Rex like this:

1 List<Breed> breedsWithRex = new JPAQuery<>(em)
2   .select(QBreed.breed)
3   .from(QBreed.breed)
4   .where(QBreed.breed.dogs.any().name.eq("Rex"))
5   .fetch();

This produces SQL with EXISTS subquery although the same can be written with join and distinct.

That’s it for our first tour around Querydsl. We will see more in the chapter on more advanced features (or perhaps just features used less). In the course of this chapter there was plenty of links to other resources too – let’s sum it up:

Querydsl Reference Guide is the first stop when you want to learn more. It is not the last stop though, as it does not even cover all the stuff from this chapter. It still is a useful piece of documentation.
Querydsl is also popular enough on StackOverflow with Timo Westkämper himself often answering the questions.
Finally there is their mailing-list (querydsl@googlegroups.com) – with daily activity where you get what you need.

For an introduction you can also check this slide deck that covers older version 3, but except for the fluent API changes the concept is the same.

8. Troubles with to-one relationships

Now we can get back to the topics I foreshadowed in the introduction to this opinionated part. We will show an example of @ManyToOne mapping and we will analyze our options how to load an entity with such a relationship when we are not interested in its target relation.

Simple example of `@ManyToOne`

We will continue experimenting on dogs in the demonstration of to-one mapping. The code is available in a GitHub sub-project many-to-one-eager. This particular demo project is also referenced in appendix Project example where particulars of build files and persistence.xml are described. We will skip these but because this is our first serious experiment in the book we will still include more of the code listing to establish kind of a baseline.

Our model is simple, but there is a twist to it. We have a Dog and Breed entities where a Dog points to its Breed – and to make things just a little bit tricky the Breed is a hierarchy where each Breed points to its parent via attribute called derivedFrom.

Our entities are quite plain – Dog looks like this:

Dog.java

 1 package modeltoone;
 2 
 3 import javax.persistence.*;
 4 
 5 @Entity
 6 public class Dog {
 7 
 8   @Id
 9   @GeneratedValue(strategy = GenerationType.IDENTITY)
10   private Integer id;
11 
12   private String name;
13 
14   @ManyToOne
15   private Breed breed;

The rest are getters/setters – complete code is here. Referenced Breed is mapped like this (full source code):

Breed.java

 1 package modeltoone;
 2 
 3 import javax.persistence.*;
 4 
 5 @Entity
 6 public class Breed {
 7 
 8   @Id
 9   @GeneratedValue(strategy = GenerationType.IDENTITY)
10   private Integer id;
11 
12   private String name;
13 
14   @ManyToOne
15   private Breed derivedFrom;

There is nothing special on these entities, but let’s see what happens when we load some dog. In the process we will also compare Hibernate and EclipseLink, that’s why we have two nearly equivalent persistence units in our persistence.xml. We will prepare data like this:

Preparing a dog

 1 EntityManagerFactory emf = Persistence
 2   .createEntityManagerFactory(persistenceUnitName);
 3 try {
 4   EntityManager em = emf.createEntityManager();
 5   em.getTransaction().begin();
 6 
 7   Breed wolf = new Breed();
 8   wolf.setName("wolf");
 9   em.persist(wolf);
10 
11   Breed germanShepherd = new Breed();
12   germanShepherd.setName("german shepherd");
13   germanShepherd.setDerivedFrom(wolf);
14   em.persist(germanShepherd);
15 
16   Breed collie = new Breed();
17   collie.setName("collie");
18   collie.setDerivedFrom(germanShepherd);
19   em.persist(collie);
20 
21   Dog lassie = new Dog();
22   lassie.setName("Lassie");
23   lassie.setBreed(collie);
24   em.persist(lassie);
25 
26   em.getTransaction().commit();
27   em.clear();
28   emf.getCache().evictAll();
29 
30   // here comes "demo" code
31 } finally {
32   emf.close();
33 }

Sorry for copy/pasting, eventually the breed and dog creation would end up in separate methods, but for our demo this code will do. Also the breed tree is probably far from reality, but that’s not the point. Let’s say that collie was bred from shepherd which descends from wolf.

Notice the clearing of the entity manager, alternatively we could close and reopen one. Another thing is we are clearing up the second-level cache as well. We may argue what the real case is, but unless you have all your entities cached all the time an empty cache simulates real-life scenario. And we’re just finding the dog by its ID – we know it’s 1:

1 Dog dog = em.find(Dog.class, 1);

What happens at this line? Let’s see the console for Hibernate (manually wrapped):

Hibernate find dog by id

 1 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
 2   dog0_.name as name2_1_0_, breed1_.id as id1_0_1_, breed1_.derivedFrom_id
 3   as derivedF3_0_1_, breed1_.name as name2_0_1_, breed2_.id as id1_0_2_,
 4   breed2_.derivedFrom_id as derivedF3_0_2_, breed2_.name as name2_0_2_
 5  from Dog dog0_
 6  left outer join Breed breed1_ on dog0_.breed_id=breed1_.id
 7  left outer join Breed breed2_ on breed1_.derivedFrom_id=breed2_.id
 8  where dog0_.id=?
 9 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as
10   derivedF3_0_0_, breed0_.name as name2_0_0_, breed1_.id as id1_0_1_,
11   breed1_.derivedFrom_id as derivedF3_0_1_, breed1_.name as name2_0_1_
12  from Breed breed0_
13  left outer join Breed breed1_ on breed0_.derivedFrom_id=breed1_.id
14  where breed0_.id=?

Two selects that load a dog and all three breeds – not that bad. If you know traverse up the breed tree, there would be no more select. On the other hand, if you cared about the actual breed of this dog and not about what this breed is derived from we fetched those unnecessarily. There is no way to tell the JPA that dummy breed with shepherd’s ID would do on the collie entity and that you don’t want this to-one relationship fetching to propagate any further.

Let’s look at EclipseLink now:

EclipseLink find dog by id

1 SELECT ID, NAME, BREED_ID FROM DOG WHERE (ID = ?)
2 	bind => [1]
3 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
4 	bind => [3]
5 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
6 	bind => [2]
7 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
8 	bind => [1]

This is as obvious as it gets. Each entity is loaded with a separate query. Effect is the same like with Hibernate, EclipseLink just needs more round-trips to the database. Personally I don’t care how many selects it takes in the end as this is part of the JPA provider strategy. We are not going to fine-tune Hibernate or EclipseLink. We’re going to cut off this loading chain.

How about going lazy?

Let’s be a bit naive (I was!) and mark the relationships as lazy:

Marking relationships lazy

1 @ManyToOne(fetch = FetchType.LAZY)
2 private Breed breed;

And the same for Breed.derivedFrom. Our testing code is like this:

Testing code

1 System.out.println("\nfind");
2 Dog dog = em.find(Dog.class, 1);
3 System.out.println("\ntraversing");
4 Breed breed = dog.getBreed();
5 while (breed.getDerivedFrom() != null) {
6   breed = breed.getDerivedFrom();
7 }
8 System.out.println("breed = " + breed.getName());

This way we should see what selects are executed when. Hibernate out of the box works as expected:

Hibernate lazy find and traversing

 1 find
 2 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
 3   dog0_.name as name2_1_0_ from Dog dog0_ where dog0_.id=?
 4 
 5 traversing
 6 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
 7   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
 8 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
 9   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
10 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
11   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
12 breed = wolf

Selects are executed as needed (lazily). For long-term Hibernate users this often gives an impression that it should work like this. Let’s migrate to EclipseLink then:

EclipseLink lazy find and traversing

 1 find
 2 SELECT ID, NAME, BREED_ID FROM DOG WHERE (ID = ?)
 3 	bind => [1]
 4 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 5 	bind => [3]
 6 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 7 	bind => [2]
 8 SELECT ID, NAME, DERIVEDFROM_ID FROM BREED WHERE (ID = ?)
 9 	bind => [1]
10 
11 traversing
12 breed = wolf

Just like before EclipseLink generates four selects and executes them right away when you find the entity. How is this possible? Firstly, from the [JPspec] perspective (section 2.2):

Footnote [4] is important here, obviously. So “it should” but it’s just “a hint”. These results can be recreated by running SingleEntityReadLazy (in test classpath) from GitHub sub-project many-to-one-lazy.

How bulletproof is Hibernate’s solution?

Pretty much bulletproof, I have to say. As presented here, we annotated fields, which means we implied Access(FIELD). Hibernate can’t get away with just intercepting getters/setters, it has to do better. Let’s try a bit of reflection:

Reading breed with reflection

1 System.out.println("\nfind");
2 Dog dog = em.find(Dog.class, 1);
3 System.out.println("\nhacking");
4 Field breedField = dog.getClass().getDeclaredField("breed");
5 System.out.println(breedField.getType());
6 breedField.setAccessible(true);
7 Object val = breedField.get(dog);
8 System.out.println("val = " + val);

And now the console output mixed with logs:

Hibernate output for reflection access

1 find
2 Hibernate: select dog0_.id as id1_1_0_, dog0_.breed_id as breed_id3_1_0_,
3   dog0_.name as name2_1_0_ from Dog dog0_ where dog0_.id=?
4 
5 hacking
6 class modeltoone.Breed
7 Hibernate: select breed0_.id as id1_0_0_, breed0_.derivedFrom_id as derivedF3_0_0_,
8   breed0_.name as name2_0_0_ from Breed breed0_ where breed0_.id=?
9 val = Breed{id=3, name='collie'}

This is perfectly lazy. The solution seems even magic when you think about it.

Lazy to-one not guaranteed

Doing lazy collection (@OneToMany or @ManyToMany) is not a problem, we would probably mess around a bit but eventually we would create some collection implementation that does it right. The trick is we have the actual entities wrapped in a collection that works as an intermediary. We don’t need to perform any bytecode voodoo on the entity class.

With to-one situation changes dramatically as you have no intermediary. It would be interesting if JPA at least offered some wrapper to enforce lazy behaviour when you really need it regardless of the JPA provider, but that would give away you’re using some ORM (like you don’t know) and even more voices would talk about that “leaking abstraction”.

While my tests with Hibernate looked great, the results are not portable across JPA providers as we saw. Also, I’d not bet that you’ll get the same results because there’s still a lot of questions how to make Hibernate lazy for to-one relationships, many of them quite new.

Are there any other options how JPA could have give us lazy and transparent to-one relations? Not without bytecode manipulation, really. It could have supported explicit non-fetching of such a relationship though. Imagine a scenario where you know for sure you want to just modify attributes of the entity, but not any related entity. You’d load it by ID with explicit instruction not to load any related to-one entries. With some kind of metadata you’d even be able to specify relationships one by one.

How to represent these unloaded relations? There are various options, but some are plain bad. Not to load them at all is wrong – what does the null mean when you save the entity at the end of the transaction? Normally it means you removed the relationship, there is no meaningful way for save operation to know it means something else in this scenario. There could be some wrapper around the relationship, but that spoils the domain model. I’d personally just instantiate the related entity with only ID filled-in (ghost) – or leave the relation null if empty (as usual now).

This plays nicely until you try to save new objects with PERSIST cascade. If the relationship attribute contains an unmanaged object with ID and the mapping is defined with cascade = PERSIST it works fine with EclipseLink, but Hibernate throws PersistenceException with message “detached entity passed to persist” which relates to the ghost, not to the root entity being persisted. Both is allowed according to [JPspec], section 3.2.2, which states that “If X is a detached object … EntityExistsException or another PersistenceException may be thrown”. X here is our ghost representing the related entity and cascade=PERSIST (or ALL) causes persist on the root entity to recursively persist the ghost. Damn “may” in a specification, although strictly speaking Hibernate behaviour seems logical.

So the whole trouble with to-one is that you rely on out-of-language features (bytecode manipulation) for LAZY support or suffer potentially very costly cascade of relation loading. The cost may be alleviated by a cache somewhat, but you just trade CPU/network cost for memory (and CPU for GC, mind you) – even if you don’t need most of the loaded data at all (maybe ever).

Also, LAZY can work perfectly for WAR application in Java EE container but it may suddenly stop working when switching to embedded container. I’m not saying that LAZY is not worth the trouble, on the contrary, I believe that lazy relationship is essential for both to-many (as default) and to-one (in most circumstances) but it indeed can be a lot of trouble during the build time or runtime, especially in Java SE environment. Java EE with its default provider may be nice to us, but it still isn’t guaranteed.

Dual relationship mapping

When I say dual mapping I mean mapping of the relationship both as an object and foreign key value. It looks like this:

Dual relationship mapping

1 @Entity
2 public class Dog {
3 ...
4   @ManyToOne
5   @JoinColumn(name = "breed_id", updatable = false, insertable = false)
6   private Breed breed;
7 
8   @Column(name = "breed_id")
9   private Integer breedId;

I can’t find example of this anywhere in the [JPspec], neither can I find anything that would forbid it (although I’m not really sure) – but it works in both major JPA providers. The key is marking one of these attributes as readonly, that’s what updatable/insertable=false does. The other mapping is primary – the one used for updates. I prefer when the raw foreign key is primary because now I don’t need any Breed object for update when I have its ID already (it’s easy to extract ID from the object if I have it too). Getters are returning values for respective fields, setters are designed in such a way that the persisted field (here foreign key value) is always updated:

Accessors for dual mapping

 1 public Breed getBreed() {
 2   return breed;
 3 }
 4 
 5 public void setBreed(Breed breed) {
 6   this.breed = breed;
 7   breedId = breed.getId();
 8 }
 9 
10 public Integer getBreedId() {
11   return breedId;
12 }
13 
14 public void setBreedId(Integer breedId) {
15   this.breedId = breedId;
16 }

In this case it means merely setting the breedId field when breed is updated. We don’t care about breed in the other setter. Programmer has to know that foreign key value is primary breed attribute here and act accordingly. For instance, this setup is not transparent when persisting new entities. If we persist a fresh breed there is no guarantee its ID is already available as autoincrement fields are filled in after the actual INSERT is flushed to the database. With EclipseLink we have to force it if we need the ID – calling setBreed(newBreed) would otherwise set dog’s FK to the breed to null. Hibernate is rather eager with INSERTs which is convenient for this scenario.

The reverse case – with breed as the persisted field and breedId as the readonly one – is less useful and I cannot find any reasons to use it instead of the traditional JPA relationship mapping. You can get foreign key value anytime calling breed.getId().

In any case, there are no “savings” during the load phase, but we can store new entities with raw foreign key values if we have those (and for whatever reason don’t have the entities loaded). Previously we mentioned the possibility of using entities as wrappers for the foreign keys, but we also showed that it doesn’t play well with cascade = PERSIST.

If we go back to dreaming – imagine we could request em.find for a dog by ID with a hint not to load readonly mappings (or otherwise designated). This would save us the hassle during the loading phase as well and made the whole read-update scenario (part of typical CRUD) based on foreign key values with the opportunity to load the relationships – if we wished so. Any of this could be specified on the entity level based on typical patterns of usage. Imagine all the people / fetching like they want. Oh-oh… (Lennon/Richter)

Dual mapping like this has one big advantage compared to the pure foreign key mapping – with the breed mapping we can construct JPA compliant JPQL queries without waiting for the next version of JPA – if they bring the option of joining root entities at all, that is. But currently dual mapping also means that we can only avoid the fetch cascade if we:

Rely on LAZY hint – not guaranteed by JPA, but if provided by a provider, you can consider it solved.
Or we query the entity with all the columns named explicitly (e.g. include breedId, but skip breed) – this, however, cripples the caching (if you use it), you don’t get attached entity and (above all) it is annoying to code. We will talk more about this in the following sections.

So, dual mapping is an interesting tool but does not allow us to avoid all the eager problems.

Something similar to the dual mapping is mentioned in the [JPspec]. It relates to mapping of relations that are part of the primary key and involves @MapsId. The specification describes this in the section 2.4.1 and examples therein, also see 11.1.39. For instance, instead of combined @Id and @OneToOne mapping when primary key is also a foreign key to some master entity you may want to have @Id as a value and separate @OneToOne mapping. In such a case you override @JoinColumn (not @Column for @Id) and you add @MapsId that tells JPA that this mapping and @Id mapping play together. @MapsId can relate to one column or more columns when the referenced entity uses composite primary key and can be useful for @OneToOne or @ManyToOne.

However, you don’t need this annotation if you use @IdClass instead of @EmbeddedId. That’s actually why I never saw this annotation before I started working on this book – we prefer @IdClass annotations.

Using projections for fetching

While we cannot control em.find(...) much, we can try to select only the columns we are interested in using a query. The result handling when we “project” selected columns into an object is called the projection. Querydsl can project results using a constructor or bean accessors, both applicable to entity objects or DTOs.⁶

Just as we should only fetch rows we want (the right “lenght” of the result set) we should also pick only the necessary columns (the right “width” of the result set). For some cases the difference may not be noticeable and it may be convenient to find the attribute in the result should we need it in the future, but it lowers the performance as mentioned in many resources, among others also in the recent [HPJP], chapter on Fetching.

Using projection helps on more fronts actually. Not only you save the bandwidth but unit of work does not manage these results either which means no dirty checking at the end of the transaction. So when we have pure reading operations – especially when paginating over entities, displaying only particular columns, it makes sense to use projections.

Let’s try it:

Selecting specific columns

1 Tuple tuple = new JPAQuery<>(em)
2   .select(QDog.dog.id, QDog.dog.name, QDog.dog.breed.id)
3   .from(QDog.dog)
4   .where(QDog.dog.id.eq(1))
5   .fetchOne();

Our demo code returns Querydsl Tuple but it can also return a DTO using the constructor expression. We’re selecting a single entity, but it works the same way for lists as well. I included an expression returning value of the column breed_id, we have to use an association path – and that raises questions. Let’s see what SQL is generated for this query.

EclipseLink query for a Dog with breed_id

1 SELECT t0.ID, t0.NAME, t1.ID FROM DOG t0, BREED t1
2   WHERE ((t0.ID = ?) AND (t1.ID = t0.BREED_ID))

EclipseLink does exactly what the JPQL says – but is this the best way?

Hibernate query for a Dog with breed_id

1 select dog0_.id as col_0_0_, dog0_.name as col_1_0_, dog0_.breed_id as col_2_0_
2 from Dog dog0_ where dog0_.id=?

Now this is cool – and exactly what I expect from a smart ORM. Of course the breed_id is directly on the Dog table, there is absolutely no need to join Breed just for its ID. I’d actually use the dual mapping described previously and select QDog.dog.breedId directly which avoids join with any provider.

This seems usable and it definitely does not trigger any cascading eager fetch, unless we include entity expression in the select, e.g. Using QDog.dog.breed could ruin the party as it loads the whole content of the Breed entity and if it contains any to-one it may get loose. The downside of this approach is obvious – we have to name all the attributes (columns) we need and it may get tedious if we have wide tables.

I really did not enjoy doing it for 60+ columns and that’s when you start thinking about code generation or other kind of framework. If you don’t have to, you don’t want to complicate your code with such ideas, so if you are sure you can rely on your ORM with lazy load even for to-one then do that. If you can’t you have to decide when to use the ORM as-is, probably relying on the second-level cache to offset the performance hit caused by triggered loads, and when to explicitly cherry-pick columns you want to work with.

Mentioning cache, we also have to realize that listing columns will not use an entity cache, but it may use query cache. We’re getting closer to traditional SQL, just with JPQL shape. If persistence context contains unflushed changes and flush mode is set to COMMIT query will not see any changes either. Typically though, flush mode AUTO is used which flushes pending changes before executing queries.

Entity update scenario

Most applications need to update entities as well – and projections are not very useful in this case. There is a simple and a complex way how to update an entity:

In the simple case we take whatever came to the service layer (or its equivalent) and simply push it to the database. We can use merge or JPQL (Querydsl) update clauses for this.
In many cases complex business rules requires us to consult the current state of the entity first and apply the changes in it in a more subtle way than JPA merge does. In this case we have to read the entity, change what needs to be changed and let JPA to update it. Alternatively, we may read just the selection of the columns (projection) and execute an update clause.

If the business rules don’t require it and performance is not an issue I’d recommend using merge as it’s very easy to understand and does everything for us. It executes SELECT and UPDATE for us and we don’t have to care about details. For a single entity update in some typical CRUD-based information system this may be perfect solution that doesn’t involve convoluted code.

If we need to read the entity first I’d first go for em.find(id) applying the changes from an incoming DTO to it, complying to all the necessary business rules. This also involves one SELECT and one UPDATE, except that the select may trigger some additional finds across to-one relationships even if we don’t need them in this update scenario. The whole select stage can be much faster if we use the second-level cache, as it avoids one round-trip to the database completely.

To avoid this we can read the information as a projection (or a Tuple) and later execute a surgical UPDATE clause. This is nicer to our database although we simply cannot get under the two roundtrips (read and update), unless we use some conversation scope in memory. This also avoids the interaction with the persistence context, hence takes less memory, avoids dirty checking at the end of the transaction, etc. The resulting code, naturally, is more complex, reflecting the complexity of the scenario.

If you don’t mind using the persistence context and you’re fine with using em.find(...) and later implicit update at the end of transactions you may use entity view described in the next section to eliminate traversing to-one relationships you’re not interested in.

Alternative entity “views”

If we want to solve painful read/update scenario (assuming we know the ID of the entity) and we need to work with only some attributes of the entity we can try to map only those. We will leave full mapping in some “master” class and create another entity that gives us partial view of the selected attributes. I’ll use name entity view for such a class.

The full Dog mapping looks like this:

 1 @Entity
 2 public class Dog {
 3   @Id
 4   @GeneratedValue(strategy = GenerationType.IDENTITY)
 5   private Integer id;
 6 
 7   private String name;
 8 
 9   @ManyToOne
10   private Breed breed;
11 ...

We will create alternative view of this entity named DogBasicView (naming sure is hard):

1 @Entity
2 @Table(name = "Dog")
3 public class DogBasicView {
4   @Id
5   @Column(updatable = false)
6   private Integer id;
7 
8   private String name;

The difference is we didn’t map the breed because, presumably, we don’t want to touch it in some “Edit dog’s basic attributes” form. In real life it doesn’t have to be just a name, it can plenty of attributes without just some you don’t need.

Now if we read DogBasicView, change it and wait for the end of the transaction only the columns in the entity view will be modified. We have a guarantee that UPDATE will not mention any unmapped columns. That means that you can use various entity views for various update scenarios. Such an entity view should result in much less code as you don’t have to name any columns in a projection (and the related constructor), you simply map them in a particular entity view.

Using entity view for pure reading is sub-optimal compared to projections as mentioned previously (memory, dirty-checking, …) but for straightforward update scenarios where original state of the entity is still important as well, e.g. for validation and access permission purposes, impact on the persistence context may be within your tolerance. Using projection and explicit update instead can be an overkill making the code unnecessarily complicated.

There are couple of points you have to keep in your mind though:

Never combine various “views” of the same entity in the same persistence context. They all live their own life and you cannot expect that dogBasic.setName(...) will affect the state of a different entity instance of different type Dog even though they both represent the same dog in the database.
Our DogBasicView will have its own space not only in the “first-level cache” (persistence context) but also in the second-level cache. I’d be very careful about caching an entity with couple of different classes representing the same table. (Read: I’d not cache it at all, but that’s my personal taste. You definitely want to check how the caching works in such situations.)

Using entity views in `JOIN`s

Because of the missing foreign keys these entity views are unusable when we want to show a table of results from joined query (e.g. dogs and their breeds). While we could use a subquery in an IN clause for filtering we still need a JOIN to actually get the data. For displaying tables DTOs and things like constructor projection seem to be the right answer, however cumbersome. But if we fetch some data only for reading (e.g. to view them) it is recommended to bypass the persistence context anyway, especially for performance reasons. And we’re not talking about some marginal optimization, we’re talking about common sense and a significant waste of resources.⁷

If we have raw foreign key values mapped in the entity view we can use non-standard approach supported by all major JPA providers – we will discuss this in the next chapter. However, for pure reads you really should prefer projections.

Entity views and auto-generated schema

In most cases we don’t have to think about schema generation but what if we do? What if we have lines like this in our persistence.xml?

1 <property name="javax.persistence.schema-generation.database.action" value="create"/>

We utilized the property standardized in the JPA 2.1 and we expect the JPA provider will generate schema in the database for us. This is particularly handy when we run various tests or short demos. Now what happens when we declare two entities for the same table in persistence.xml?

1 <persistence-unit ...>
2   ...
3   <class>modeltoone.Dog</class>
4   <class>modeltoone.DogBasicView</class>
5   ...

When we run any demo using this persistence unit we can check the log and see the CREATE statements it uses. With EclipseLink there will be a single one for our Dog:

1 CREATE TABLE DOG (ID INTEGER IDENTITY NOT NULL, NAME VARCHAR,
2  BREED_ID INTEGER, PRIMARY KEY (ID))

And it contains all the necessary columns. But if we switch the order in persistence.xml EclipseLink generates the table based on DogBasicView and it will miss the breed_id column. Hibernate scans for the classes even in SE environment (but only within the JAR file containing persistence.xml) and if you want to try the effect of the order of <class> elements in persistence.xml you also have to include this:

1 <exclude-unlisted-classes>true</exclude-unlisted-classes>

But the order does not affect Hibernate, it somehow picks the “widest” class for the table and creates the table based on it. It works this way without putting classes in the XML as well. What happens if the main Dog entity does not have all the columns mapped and one of the entity views has additional column? Let’s say DogWithColor has a color column that is not present in Dog class.

This is not a good situation – neither EclipseLink nor Hibernate can deal with it. EclipseLink generates the table based on whatever entity is first in the list and Hibernate picks some candidate. I’m not exactly sure about Hibernate’s algorithm as it’s not an alphabetical order either, somehow it seems to find the best candidate in my limited test cases. But it does not union all the columns from all the entities for the same table and that means it has to choose between the Dog and the DogWithColor – and that means some columns will not be generated.

Summary

With current state of the standard there are following options available:

Leave to-one as EAGER and compensate with caches.
Configure your ORM to make it LAZY. This may be easy, it may even be provided out of the box, but it can also change with an upgrade or depend on the environment (Java SE/EE). It is simply not guaranteed by the JPA standard.
Select only the columns you’re interested in. This is very cumbersome for read/update scenarios of a single entity and bearable but still inconvenient for selecting of lists for tables, etc.
Use entity views, that is to map the same table with multiple classes based on the use case you want to cover. This is very practical for read/update scenarios when you can fetch by entity’s ID.

In any of these cases you can also use dual mapping – mapping both the foreign key value and the relationship as an object. When mapping more attributes for a single column we have to designate other mappings as read-only. If we just could explicitly ask JPA to fetch the object – or ask it not to fetch – we would have much better control over the fetching cascade going wild.

Dual mapping foreshadows the next chapter in which I’m going to present an alternative that is not JPA compliant, but tries to do without to-one relations altogether. It can be used by all major JPA 2.1 providers but if you insist on pure JPA you are free to just skip those pages.

9. Removing to-one altogether

This solution builds on JPA 2.1 ON clause but is not JPA compliant because it also uses root entity after JOIN keyword (also known as ad hoc join) that is supported by current JPA providers but not according to [JPspec]. Specification details this on page 176 and the following sections 4.4.5.x. I discovered this option when I played with Querydsl and various options how to state my joins – and it worked on the reference implementation! With this I was more than a year in a mistake that we’re bending JPA, using it as it was not intended, but still within the rules. I hit the wall when I tried it with Hibernate in course of writing this book. Checking with the specification I realized I ran into serious troubles and got caught by the irony of me saying “JPA is hard and complex”.

Sure there is “nobody’s fault but mine” part to it. I experimented trying to fight JPA. I tried to get us from the quagmire of that ORM Vietnam and got us into another quagmire, still in ORM war, obviously – but now out of specification fences. This was no real problem for our team as we couldn’t switch to Hibernate with the project easily anyway (for real Hibernate bugs for a change), but when using ORM on Java I always voted for JPA way (well, since JPA 2.0 at least). In the meantime Hibernate joined the “crowd” supporting entity joins with its 5.1.0 release (February of 2016), so this solution is not an obstacle when migrating between major JPA providers anymore.

The solution – or a pattern if you will – is interesting enough to be presented and with backing of major providers also readily available. I’d just be more happy if the option for joining root entities was in the specification. To that end I also filed an issue, although the recent (non-)activity on that Jira project lets me wonder where all the JPA developers/specifiers are.⁸

To get full control over the fetching – and to do it without any high-tech solution – we have to drop the relationship mapping and map row foreign key instead. Before we go on, we will discuss the important addition of the ON keyword in JPA 2.1.

Why do we need `ON` anyway?

ON was dearly missing for other practical reasons. While this does not relate to to-one mapping, let’s see an example:

Class diagram of our entity model for 'ON' demo — Class diagram of our entity model for ‘ON’ demo

Here the Breed does not have a name, only some synthetic code, and names are offered in multiple languages thanks to BreedLocalizedName. Demo code is here. Our favourite breeds are:

wolf, with code WLF and localized names for English and Slovak (wolf and vlk respectively – and yes, I know a wolf is not a dog),
collie, with code COL with localized name available only for Slovak (kólia).

Now imagine we want to get list of all breeds with English names. If the localized name is not available for our requested language, we still want to see the breed with its code in the list in any case. This may seem ridiculous to SQL guys, but it’s a real problem for some Java programmers. The first naive attempt may be:

Naive JOIN to localized names

1 QBreedLocalizedName qbn = QBreedLocalizedName.breedLocalizedName;
2 List<Tuple> breedEnNames = new JPAQuery<>(em)
3   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
4   .from(QBreed.breed)
5   .join(QBreed.breed.names, qbn)
6   .where(qbn.language.eq("en"))
7   .fetch();

But this produces unsatisfiable results containing only [1, WLF, wolf]. “When I join the localized names to the breed list some results are gone!” our programmer complains. Ok, let’s introduce the LEFT JOIN concept to them. Next attempt:

LEFT JOIN without ON clause

1 breedEnNames = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
3   .from(QBreed.breed)
4   .leftJoin(QBreed.breed.names, qbn)
5   .where(qbn.language.eq("en"))
6   .fetch();

It seems promising at first, but the result is again only a single lone wolf. There is one difference though, although the WHERE clause wipes it out. If there was any breed without any localized name at all it would be preserved after the JOIN. WHERE eliminates it only because the value for language column would be NULL and not “en”. We could try to overcome it by an additional .or(qbn.language.isNull()) in the WHERE part, but this would work only for rows with no localized name – but not for those who have missing “en” localizations.

The trick is that the condition must be part of the LEFT JOIN – and that’s what ON is all about. Let’s do it right after all:

LEFT JOIN with ON clause

1 breedEnNames = new JPAQuery<>(em)
2   .select(QBreed.breed.id, QBreed.breed.code, qbn.name)
3   .from(QBreed.breed)
4   .leftJoin(QBreed.breed.names, qbn).on(qbn.language.eq("en"))
5   .fetch();

And, voila, the result is [1, WLF, wolf], [2, COL, null] – with collie as well, although without the name localized to English. Now we could add another condition to filter these cases and report them – .where(qbr.name.isNull()) would do the trick, this time as a WHERE clause.

But this is the essence of ON. It’s a special WHERE that works very closely with related JOIN and while the difference is virtually none for inner joins, it is essential for those outer ones.

Dropping to-one annotations

Having accepted that we’re out of pure JPA land it is otherwise easy to get rid of to-one relationships. Instead of:

To-one as we know it

1 @ManyToOne
2 private Breed breed;

We can simply map the raw value of the foreign key:

Mapping raw foreign key

1 @Column(...)
2 private Integer breedId;

If you want to load Breed by its ID, you use EntityManager.find explicitly. This has domain consequences, because unless we give the entities some way to get to entity manager, such a code cannot be directly on the entity. We will try to mend it and bridge this gap that JPA automagic creates for our convenience – which is a good thing, and better part of the abstraction. The point is there is no technological problem to obtain the referenced entity by its ID.

Queries are where the problem lies. Firstly, and this cannot be overcome with current JPA, we need to use root entity in the JOIN clause – both EclipseLink and Hibernate now allow this. This enriched experience cuts both ways, of course, as we may believe we’re still in JPA world when we’re in fact not (my story). Second thing we need is ON clause, which is something JPA 2.1 provides. As discussed at the end of previous sections, ON clauses were generated for joins before – but only as implied by the mapping and a particular association path used in the join. We need to make this implicit ON condition explicit. Let’s see an example:

JOIN to root entity and explicit ON

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QBreed.breed).on(QBreed.breed.id.eq(QDog.dog.breedId))
5   .where(QBreed.breed.name.contains("ll"))
6   .fetch();

The key part is how the leftJoin is made. Querydsl doesn’t mind at all that we’re breaking JPA rules here, it allows root entity and produces JPQL that works the same like the following code:

JPQL equivalent

1 List<Dog> dogs = em.createQuery(
2   "select dog from Dog dog left join Breed breed on breed.id = dog.breedId" +
3   " where breed.name like '%ll%'", Dog.class).getResultList();

So this leaves us covered. Queries are just a bit more explicit, but otherwise nothing you would not write into your SQL. We can argue that this way we can compare that breedId with any other integer and we’re loosing the safety that ORM mapping gives us. On the mapping you specify what columns are related on a single place and not in each query. This is true, indeed, although not really a big deal. When talking to your DB admins they will probably see even better what you do, although smart DBAs have no problem to understand JPQL either. But with dropping the mapping we’re loosing this comfort too. In theory we could leave some logical mapping information on the attribute, but without redesigning JPA we cannot benefit it in our queries anyway. We will try to tackle the problem of missing metadata on the plain FK-typed attribute later in this chapter.

Ad hoc joins across tables

It’s not that rare to see tables with a shared ID value. In the following picture Person has an ID column (its primary key) and other two tables share the same value as both their PK and FK into the Person table:

Now imagine a situation we want some information from NaturalPerson table and its residence but we don’t need anything from Person table itself. With JPA/ORM you need to join all three tables or you need to map direct relationships across the tables as well – that requires dual mapping to two different object types which is really superfluous. With ad hoc joins you have no problem at all, we can simply join tables on the side using ON condition for equality of their id attributes.

Even beyond this shared ID example there are many cases when ad hoc join is practical. Sometimes you need to build a report with join on a column that is not PK/FK pair – although there may be better ways than ORM to produce a report.

Loading the relationship from the entity

Queries are typically executed in the context out of the object itself – in a repository or DAO, you name it. There is no problem to get hold of EntityManager at these places. But if you want some loading from your entity, things necessarily get a little messy. But before we get to the messy part let’s step back a little, because we have two broad options how to do it:

We can have a real object attribute, but @Transient, and we can load it when the scenario requires it. This is actually pretty handy because you can load the entity with its relations with an optimized query – especially if you don’t expect that data in the cache, or don’t want to use the cache at all, etc. Very explicit, but does exactly what it says. This does not require us to put any infrastructure code into the entity as we fill the attribute from the outside in a code that loads the whole object graph.
We can load it lazily when and if needed. But we have to make explicit what JPA does under the surface and what we don’t see on the entity – or we can develop our solution, but I’m not sure it’s worth it for this problem.

The first solution is obvious, so let’s see what we can do about the other one. Let’s start with adding the method returning our entity – in our case we have a Dog with breedId foreign key mapped as a raw value of type Integer. To load the actual entity we may try something like:

Getter loading entity by id

1 public Breed getBreed() {
2   return DaoUtils.load(Breed.class, breedId);
3 }

Ok, we cheated a bit, I admit – we moved the problem to that DaoUtils, but for a good reason. We will need this mechanism on multiple places so we want it centralized. That load method then looks like this:

Loading the entity

1 public static <T> T load(Class<T> entityClass, Serializable entityId) {
2   EntityManager em = ...; // get it somehow (depends on the project)
3   return em.find(entityClass, entityId);
4 }

Ouch – two examples, cheating on both occasions. But here I cannot tell you how to obtain that entity manager. The method can reach to a Spring singleton component that makes itself accessible by static field, for instance. It isn’t nice, but it works. The point is that entity is not a managed component so we cannot use injection as usually. Static access seems ugly, but it also frees the entity from a reference to some infrastructural component and leaves it only with dependency to some utility class.

It is important to be familiar with the implementation of an injected EntityManager – in case of Spring their shared entity manager helps us to get to the right entity manager even if we get to the component via static path. I would probably have to consult Java EE component model to be sure it is possible and if it is reliable across application servers.

Other option how to smuggle in the entity manager utilizes a ThreadLocal variable, but this means someone has to populate it which means life-cycle – and that means additional troubles.

In any case, it is possible to add getter for a relationship on the entity, but it calls for some rough solutions – so if you can keep such an access somewhere else (data access object, for instance) do it that way. Otherwise we’re getting dangerously close to Active record pattern which by itself is legitimate, but doesn’t fit JPA. True – JPA provider can cheat somehow undercover, but we rather should not.

If you decide to load the relationship in some arbitrary code and you have only access to the entity, perhaps you can introduce DAO to that code. If you feel you’re making this decision too high in the layer model (like on the presentation layer maybe?), perhaps it should be pushed lower where the DAO is available. I’m not totally against static backdoor from the non-managed code to the managed component, but at least we should be aware of the cost and compare it with the cost of not doing it.

Loosing information without mapping annotations

When we map foreign key instead of the relationship we often loose important piece of information – that is what type it leads to. Sometimes it is clear, e.g. with Integer breedId you can easily guess it leads to Breed. But there is no type declaration anywhere around to click on when using IDE. Even worse, the foreign key often describes the role instead of the type – which I personally prefer especially when there are multiple relationships of the same type. So instead of terrible client1Id and client2Id you have ownerId and counterpartyId. The names are descriptive, but the type information is gone altogether.

For this – and also for means to automatically check validity of these IDs for newly stored or updated entities – we developed custom @References annotation. With it the mapping for a foreign key may look like this:

Using @References annotation

1 @References(type = Breed.class, required = true)
2 private Integer breedId;

I believe the documentation factor is the most important one here, but we can also check the references on an entity like this:

Using ReferenceChecker

1 new ReferenceChecker(em).checkReferences(lassie);

This either does nothing when everything is OK, or it fails with some exception of your choice depending on how you implement it. Our version is similar to this example on GitHub. If you go through the code you notice that it can do more than just checking – it can also push the real entities to some transient fields, for instance. Such a mechanism can be easily incorporated into validation framework you use if you wish to.

Wrapping it up

I presented a solution to our to-one problems – to use raw foreign key values. This bends the JPA rules a bit and builds on the availability of the ON clause introduced in JPA 2.1. However, JPA specification currently does not allow us to use root entity paths with JOINs – which is required for queries. Only association paths are allowed now. This feature may come in later versions and it is already supported by all the major players in the field – Hibernate, EclipseLink and DataNucleus.

Mapping raw foreign key value may lead to some logical information loss, which can be easily mended by custom annotation. I’d like to see standard JPA annotation for this, otherwise schema generation would not be fully functional.

Another solution is to use a single mapping like today and having hint for JPA to fetch only the ID portion of the related object. This may not work well with cascade=PERSIST when persisting new entities pointing to existing ones, although the specification allows it to work (but also allows to throw exception thanks to magical usage of “may” in the specification text). This solution preserves current query mechanics, and actually is least intrusive of them all.

Finally, the JPA specification could mandate LAZY – this is something where Hibernate seems quite reliable out of the box without any configuration hassle. On the other hand, I saw many questions asking how to make this relationship lazy in Hibernate, so it probably is not universally true. Promoting LAZY from a hint to a guaranteed thing would require JPA providers to use some bytecode magic (which is not clean from Java language perspective), but it would also end a lot of debates how to do it right.

10. Modularity, if you must

I’m convinced that people who designed and specified JPA were not aware (or were ignorant) of modern modularity trends.¹⁰ For a long time I’ve been trying to put things into domain named packages (sometimes low-level technical domains, as it happens with complex layered software) and as a next step I naturally started to split big JARs into smaller ones rather across vertical (domain) lines instead of putting everything from one layer in a single JAR.

If you have all your entities in a single JAR you have no problem altogether. But if you fancy to split the entities into multiple JARs – including their definitions in persistence.xml – then you’re venturing beyond the JPA specification. JPspec, section 8.2 Persistence Unit Packaging says:

This does not explicitly forbid using multiple persistence.xml files but I believe it implies we should not do it. It’s actually the other way around – we can have multiple persistence units in one XML, that’s a piece of cake.

Everybody knows layers

I decided to experiment with modularization of applications after I had read Java Application Architecture: Modularity Patterns with Examples Using OSGi [JAA].¹¹ After that I couldn’t sleep that well with typical mega-JARs containing all the classes at a particular architectural level. Yeah, we modularize but only by levels.

I’ll use one of the projects from my past as an example. We had one JAR with all entity classes, sometimes it was called domain, sometimes jpa-entities (admitting the anemic domain model).

Then we had our @Service classes in a different JAR – but all of them in one – accessing entities via DAOs. Service layer was used by a presentation layer. Complex logic was pushed to specific @Components somewhere under the service layer. It was not domain-driven but it made sense for us and the dependencies flowed clearly from the top down.

Components based on features

But how about dependencies between parts of the system at the same level? When we have a lot of entity classes, some are pure business (Clients, their Transactions, financial Instruments), some are rather infrastructural (meta model, localization, audit trail). Why can’t these be separated? Most of the stuff depends on meta model, localization is quite independent in our case, audit trail needs meta model and permission module. It all clicks when one thinks about it – and it is more or less in line with modularity based on features, not on technology or layers. Sure we can still use layer separation and have permission-persistence and permission-service as well (later we will see an example where it makes a lot of sense).

Actually, this is quite a repeating question: Should we base packages by features or by layer/technology/pattern? It seems that the consensus was reached – start by feature – which can be part of your business domain. If stuff gets big you can split them into layers too, unless you find even more suitable sub-domains.

Multiple JARs with JPA entities

So we carefully try to put different entity classes into different JAR modules. Sure, we can just repackage them in the same JAR and check how tangled they are with Sonar but it is recommended to enforce the separation and to make dependencies explicit (not only in the [JAA] book). My experience confirms it quite clearly – contracts of any kind must be enforced.

And here comes the problem when your persistence is based on JPA. Because JPA clearly wasn’t designed to have a single persistence unit across multiple persistence.xml files. So what are these problems actually?

How to distribute persistence.xml across these JARs? Do we even have to?
What classes need to be mentioned where? E.g., we need to mention classes from an upstream JARs in persistence.xml if they are used in relations (which breaks the DRY principle).
When we have multiple persistence.xml files, how to merge them in our persistence unit configuration?
What about the configuration in persistence.xml? What properties are used from what file? (Little spoiler, we just don’t know reliably!) Where to put them so we don’t have to repeat ourselves again?
How to use EclipseLink static weaving for all these modules? How about a module with only abstract mapped superclass (some dao-common module)?

That’s quite a lot of problems for an age when modularity is so often mentioned, too many for a technology that comes from an “enterprise” stack. And they are mostly phrased as questions – because the answers are not readily available.

Distributing `persistence.xml` – do we need it?

This one is difficult and may depend on the provider you use and the way you use it. We use EclipseLink and its static weaving. This requires persistence.xml. Sure we may try keep it together in some neutral module (or any path, as it can be configured for weaving plugin), but that goes against the modularity quest. What options do we have?

We can create some union persistence.xml in a module that depends on all needed JARs. This would be OK if we had just one such module – typically some downstream module like WAR or runnable JAR. But we have many. If we made persistence.xml for each they would contain a lot of repetition. And we’d reference downstream resource which is unacceptable.
We can have dummy upstream module or out of module path with union persistence.xml. This would keep things simple, but it would be more difficult to develop modules independently, maybe even with different teams.
Keep persistence.xml in the JAR with related classes. This seems best from the modularity point of view, but it means we need to merge multiple persistence.xml files when the persistence unit starts.
Or can we have different persistence unit for each persistence.xml? This is OK, if they truly are in different databases (different JDBC URL), otherwise it doesn’t make sense. In our case we have rich DB and any module can see the part it is interested in – that is entities from the JARs it has on the classpath. If you have data in different databases already, you’re probably sporting microservices anyway.

We went for the third option – especially because EclipseLink’s weaving plugin likes it and we didn’t want to redirect to non-standard path to persistence.xml – but it also seems to be the right logical way. However, there is nothing like dependency between persistence.xml files. So if we have b.jar that uses a.jar, and there is entity class B in b.jar that contains @ManyToOne to A entity from a.jar, we have to mention A class in persistence.xml in b.jar. Yes, the class is already mentioned in a.jar, of course. Again, clearly, engineers of JPA didn’t even think about possibility of using multiple JARs in a really modular way.

In any case – this works, compiles, weaves our classes during build – and more or less answers questions 1 and 2 from our problem list.

It doesn’t start anyway

When we have a single persistence.xml it will get found as a unique resource, typically in META-INF/persistence.xml – in any JAR actually (the one JPA refers to as “the root of persistence unit”). But when we have more of them they don’t get all picked and merged magically – and the application fails during startup. We need to merge all those persistence.xml files during the initialization of our persistence unit. Now we’re tackling questions 3 and 4 at once, for they are linked.

To merge all the configuration XMLs into one unit we can use this configuration for PersistenceUnitManger in Spring (using MergingPersistenceUnitManager is the key):

1 @Bean
2 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
3   MergingPersistenceUnitManager persistenceUnitManager =
4     new MergingPersistenceUnitManager();
5   persistenceUnitManager.setDefaultDataSource(dataSource);
6   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
7   // default persistence.xml location is OK, goes through all classpath*
8   return persistenceUnitManager;
9 }

Before I unveil the whole configuration we should talk about the configuration that was in the original singleton persistence.xml – which looked something like this:

 1 <exclude-unlisted-classes>true</exclude-unlisted-classes>
 2 <shared-cache-mode>ENABLE_SELECTIVE</shared-cache-mode>
 3 <properties>
 4   <property name="eclipselink.weaving" value="static"/>
 5   <property name="eclipselink.allow-zero-id" value="true"/>
 6   <!--
 7   Without this there were other corner cases when field change was ignored.
 8   This can be worked-around calling setter, but that sucks.
 9   -->
10   <property name="eclipselink.weaving.changetracking" value="false"/>
11 </properties>

The biggest question here is: What is used during build (e.g. by static weaving) and what can be put into runtime configuration somewhere else? Why somewhere else? Because we don’t want to repeat these properties in all XMLs.

Now we will take a little detour to shared-cache-mode that displays the problem with merging persistence.xml files in the most bizarre way.

Shared cache mode

We wanted to enable cached entities selectively (to ensure that @Cacheable annotations have any effect). But I made a big mistake when I created another persistence.xml file – I forgot to mention shared-cache-mode there. My persistence unit picked both XMLs (using MergingPersistenceUnitManager) but my caching went completely nuts. It cached more than expected and I was totally confused. The trouble here is – persistence.xml don’t get really merged. The lists of classes in them do, but the configurations do not. Somehow my second persistence XML became dominant (one always does!) and because there was no shared-cache-mode specified, it used defaults – which is anything EclipseLink thinks is the best. No blame there, just another manifestation that JPA people didn’t even think about this setup scenarios.

If you really want to get some hard evidence how things are in your setup put a breakpoint anywhere you can reach your configured EntityManagerFactory and when it stops there dig deeper to find what your cache mode is. Or anything else – you can check the list of known entity classes, JPA properties, anything really. And it’s much faster than guessing.

Debugging JPA EntityManagerFactory setup

In the picture above we can see the following:

We’re using Springs MergingPersistenceUnitManager.
Persistence unit name is as expected.
This is the path of “the root of the persistence unit”. It’s a directory here, but it would be a JAR in production.
Our managed classes (collapsed here).
Shared cache mode is set to ENABLE_SELECTIVE – as required.
Other cofigured properties.

Luckily, we don’t rely on the persistence.xml root anymore, at least not for runtime configuration.

Putting the Spring configuration together

Now we’re ready to wrap up our configuration and move some of repeated configuration snippets from persistence.xml into Spring configuration – here we will use Java-based configuration (XML works too, of course).

Most of our properties were related to EclipseLink. I had read their weaving manual, but I still didn’t understand what works when and how. I had to debug some of the stuff to be really sure.

It seems that eclipselink.weaving is the crucial property namespace, that should stay in our persistence.xml, because it gets used by the plugin performing the static weaving. I debugged Maven build and the plugin definitely uses property eclipselink.weaving.changetracking (we set it to false which is not default). Funny enough, it doesn’t need eclipselink.weaving itself because running the plugin implies we wish for static weaving. During startup it gets picked though, so EclipseLink knows it can treat classes as statically weaved. This means it can be pushed into programmatic configuration.

The rest of the properties (and shared cache mode) are clearly used at the startup time. Spring configuration may then look like this:

 1 @Bean public DataSource dataSource(...) { /* as usual */ }
 2 
 3 @Bean
 4 public JpaVendorAdapter jpaVendorAdapter() {
 5   EclipseLinkJpaVendorAdapter jpaVendorAdapter = new EclipseLinkJpaVendorAdapter();
 6   jpaVendorAdapter.setShowSql(true);
 7   jpaVendorAdapter.setDatabase(
 8     Database.valueOf(env.getProperty("jpa.dbPlatform", "SQL_SERVER")));
 9   return jpaVendorAdapter;
10 }
11 
12 @Bean
13 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
14   MergingPersistenceUnitManager persistenceUnitManager =
15     new MergingPersistenceUnitManager();
16   persistenceUnitManager.setDefaultDataSource(dataSource);
17   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
18   persistenceUnitManager.setSharedCacheMode(
19     SharedCacheMode.ENABLE_SELECTIVE);
20   // default persistence.xml location is OK, goes through all classpath*
21 
22   return persistenceUnitManager;
23 }
24 
25 @Bean
26 public FactoryBean<EntityManagerFactory> entityManagerFactory(
27   PersistenceUnitManager persistenceUnitManager, JpaVendorAdapter jpaVendorAdapter)
28 {
29   LocalContainerEntityManagerFactoryBean emfFactoryBean =
30     new LocalContainerEntityManagerFactoryBean();
31   emfFactoryBean.setJpaVendorAdapter(jpaVendorAdapter);
32   emfFactoryBean.setPersistenceUnitManager(persistenceUnitManager);
33 
34   Properties jpaProperties = new Properties();
35   jpaProperties.setProperty("eclipselink.weaving", "static");
36   jpaProperties.setProperty("eclipselink.allow-zero-id",
37     env.getProperty("eclipselink.allow-zero-id", "true"));
38   jpaProperties.setProperty("eclipselink.logging.parameters",
39     env.getProperty("eclipselink.logging.parameters", "true"));
40   emfFactoryBean.setJpaProperties(jpaProperties);
41   return emfFactoryBean;
42 }

Clearly, we can set the database platform, shared cache mode and all runtime relevant properties programmatically – and we can do it at a single place. This is not a problem for a single persistence.xml, but in any case it offers better control. We can now use Spring’s @Autowired private Environment env; and override whatever we want with property files or even -D JVM arguments – and still fallback to default values – just as we do for database property of the JpaVendorAdapter. Or we can use SpEL. This is the flexibility that persistence.xml simply cannot provide.

And of course, all the things mentioned in the configuration can now be removed from all our persistence.xml files.

I’d love to get rid of eclipselink.weaving.changetracking in the XML too, but I don’t see any way how to provide this as the Maven plugin configuration option, which we have neatly unified in our parent POM. That would also eliminate some repeating.

Common base entity classes in separate JAR

The question 5 from our problem list is no problem after all the previous, just a nuisance. EclipseLink refuses to weave our base class regardless of @MappedSuperclass usage. But as mentioned in one SO question/answer, just add dummy concrete @Entity class and we are done. You’ll never use it, it is no problem at all. And you can vote for this bug.

This may be no issue for load-time weaving or for Hibernate, but I haven’t tried that so I can’t claim it reliably.

Cutting cyclic dependencies

Imagine we have two modules – users and permissions – where permissions depends on the users module. Module users contains User and Role entity while permission module contains Permission and RolePermission association that is mapped explicitly because it contains some additional information. See the following picture:

Obviously, Role cannot depend on RolePermission because that would cross module boundaries in a wrong direction. But when the Role is deleted we also want to remove all related RolePermission (but not Permission, of course). Imagine we have RoleDao and RolePermissionDao and both these are derived from some layer supertype named BaseDao. This handy class implements all common functionality, including delete(...) method that in the process calls common preDelete(entity) method. For convenience this is already pre-implemented, but does nothing.

Were all the classes in a single module our RoleDao could contain this:

1 @Override
2 protected void preDelete(Role entity) {
3   QRolePermission rp = QRolePermission.rolePermission;
4   new JPADeleteClause(entityManager, rp)
5     .where(rp.roleId.eq(entity.getId()))
6     .execute();
7 }

With separate modules it’s not possible because QRolePermission (just like RolePermission) is not available in the users module.

Sure, we can argue whether we have the entities split into modules properly – for instance, is Role really a concept belonging to User without considering permissions? However after moving Role to the permissions module, we would have problem with @ManyToMany relationship named roles on the User class – which is naturally the owning side of the relationship – as it would go against module dependency.

When the cycle occurs it may also indicate that one of the modules is too big. In our case we used User in the permission module, but it is not really necessary in the entities directly. It is used by upper layers of the module so maybe we should split the modules like this:

Reversing `users` and `permissions` dependency with introducing another module.

Now users module can depend on permission-persistence and bring RolePermission where it really belongs.

In the following cases we will focus on the relationship between the entities Role and Permission – this time with modules users always depending on the permissions module (which more or less matches permissions-persistence from the previous picture). Let’s see how we can model that many-to-many relationship and how it can cross module boundaries.

Unidirectional `@ManyToMany` follows the dependency.

First we use implicit role_permission table without explicit entity. Role is a natural owner of the information about its Permissions and Permission doesn’t necessarily need the list of the roles using it. It gets interesting when we introduce RolePermission explicitly as an entity – perhaps because there are additional attributes on that association.

Notice that the database relational design limitations, dictating where to put the foreign key, are actually quite useful to decide which way the dependency goes. The module knowing about RolePermission table is the one that depends on the other one. If it wasn’t that way we would get invalid case with cyclic dependency like this:

Explicit `RolePermission` in `permissions` forms a cycle.

We either need to reverse the dependency on the module or split one module into two like shown previously with permissions. If we absolutely needed this dependency from users to permissions and also the “red arrow” then something else in the users model depends on permission – which is a case for splitting users module. (This is rather unnatural, previous case for splitting permissions was much better.)

In any of these cases we still have a problem when we delete the “blind” side of the relationship. In the pictures of valid cases it was the Permission entity. Deletion of a Permission somehow needs to signal to the users module that it needs to remove relations too. For that we can use the Observer pattern that allows us to register an arbitrary callback with a DAO.

It will be implemented by already mentioned BaseDao to make it generally available. The following example demonstrates how it can be used when explicit RolePermission entity is used:

 1 // in some dao-base module
 2 public abstract class BaseDao<T> ... {
 3   private List<PreDeleteHook<T>> preDeleteHooks = new ArrayList<>();
 4 
 5   public void delete(T entity) {
 6     preDeleteHooks.forEach(h -> h.preDelete(entity));
 7     entityManager.remove(entity);
 8     ...
 9   }
10 
11   public interface PreDeleteHook<E> {
12     void preDelete(E entity);
13   }
14 
15   public void addPreDeleteHook(PreDeleteHook<T> hook) {
16     preDeleteHooks.add(hook);
17   }
18   ...
19 }
20 
21 // in users module
22 public class RolePermissionDao extends BaseDao<RolePermission> {
23 	public static final QRolePermission $ = QRolePermission.rolePermission;
24 
25   @Autowired PermissionDao permissionDao; // from downstream permissions module, OK
26 
27   @PostConstruct public void init() {
28     permissionDao.addPreDeleteHook(this::deleteByPermission);
29   }
30 
31   // this method "implements" PreDeleteHook<Permission>
32   public void deleteByPermission(Permission permission) {
33     new JPADeleteClause(entityManager, $)
34       .where($.permissionId.eq(permission.getId()))
35       .execute();
36   }
37   ...
38 }
39 
40 // no special changes in PermissionsDao, it can't know about users module anyway

Using unmanaged association table with @ManyToMany would require to find all the Roles having this permission and removing it from its permissions collection. I, again, vouch for more explicit way as that “delete all relations that use this permissions” is simpler and likely more efficient.

Conclusion

JPA clearly wasn’t designed with modularity in mind – especially not when modules form a single persistence unit, which is perfectly legitimate usage. Just because we use one big legacy database it doesn’t mean we don’t want to factor our code into smaller modules.
It is possible to distribute persistence classes into multiple JARs, and then:
- We can go either with a single union persistence.xml, which can be downstream or upstream – this depends if we need it only in runtime or during build too.
- I believe it is more proper to pack partial persistence.xml into each JAR, especially if we need it during build. Unfortunately, there is no escape from repeating some upstream classes in the module again just because they are referenced in relations (typical culprit when some error makes you scream “I don’t understand how this is not an entity, when it clearly is!”).
If we have multiple persistence.xml files, it is possible to merge them using Spring’s MergingPersistenceUnitManager. I don’t know if we can use it for non-Spring applications, but I saw this idea reimplemented and it wasn’t that hard. (If I had to reimplement it, I’d try to merge the configuration part too!)
When we are merging persistence.xml files it is recommended to minimize configuration in them so it doesn’t have to be repeated. E.g., for Eclipselink we leave only stuff necessary for built-time static weaving, the rest is set programmatically in our Spring @Configuration class.

There are still some open questions but I think they lead nowhere. Can I use multiple persistence units with a single data source? This way I can have each persistence.xml as a separate unit. But I doubt relationships would work across these and the same goes for transactions (without XA that is).

Up next

6. Tuning ORM down

II Opinionated JPA

6. Tuning ORM down

Price for relations

How does this affect my domain model?

When to tune down and when not?

Performance problems

Strange persistence bugs

7. No further step without Querydsl

Simple example with Querydsl

Comparison with Criteria API

Comparison with JPQL

What about the cons?

Be explicit with aliases

Functions and operations

Aggregate functions

Subqueries

Independent subquery

Correlated subquery

Pagination

Tuple results

Fetching to-many eagerly

Querydsl and code style

More

8. Troubles with to-one relationships

Simple example of @ManyToOne

How about going lazy?

How bulletproof is Hibernate’s solution?

Lazy to-one not guaranteed

Dual relationship mapping

Using projections for fetching

Entity update scenario

Alternative entity “views”

Using entity views in JOINs

Entity views and auto-generated schema

Summary

9. Removing to-one altogether

Why do we need ON anyway?

Dropping to-one annotations

Ad hoc joins across tables

Loading the relationship from the entity

Loosing information without mapping annotations

Wrapping it up

10. Modularity, if you must

Everybody knows layers

Components based on features

Multiple JARs with JPA entities

Distributing persistence.xml – do we need it?

It doesn’t start anyway

Shared cache mode

Putting the Spring configuration together

Common base entity classes in separate JAR

Cutting cyclic dependencies

Conclusion

Simple example of `@ManyToOne`

Using entity views in `JOIN`s

Why do we need `ON` anyway?

Distributing `persistence.xml` – do we need it?