Leanpub: Publish Early, Publish Often

11. Avoid N+1 select

While performance tuning is not the main goal of this book we should follow some elementary performance common sense. If we can avoid an unnecessary query we should do so. With ORM/JPA we can generate a lot of needless queries without even realizing. In this chapter we will cover the most pronounced problem called N+1 select.

Anatomy of N+1 select

I’d prefer to call this problem 1+N because it mostly starts with one query that returns N rows and induces up to N additional queries. While addition is commutative, hence 1+N is the same like N+1, I’ll stick to N+1 as usually used in literature. The typical scenarios when the N+1 problem appears are:

Query for N entities that have eager to-one relationship – or more of them – and the provider is not smart enough to use joins.
Query for N entities that have eager to-many relationship and the provider is either not smart enough to use the join (again) or it is not possible to use it for other reasons like pagination. We will cover paginating of entities with to-many later in this chapter.
Query for N entities with lazy relationship that is triggered later, e.g. in the view as usual with the Open Session in View (OSIV) pattern.

There are probably more scenarios, but these are the most typical ones. First let’s look at the eager examples.

Eager to-one without joins

If you recall our simple example with @ManyToOne from the chapter Troubles with to-one relationships you know that to-one relationships may trigger additional fetching. These may result in DB queries or they can be found in the cache – depends on your setting – and this all must be taken into consideration.

For the next sections let’s use the following data for dogs and their owners:

Content of **Owner** table
id	name
1	Adam
2	Charlie
3	Joe
4	Mike

Content of **Dog** table
id	name	owner_id
1	Alan	1 (Adam)
2	Beastie	1 (Adam)
3	Cessna	1 (Adam)
4	Rex	3 (Joe)
5	Lassie	3 (Joe)
6	Dunco	4 (Mike)
7	Goro	`NULL`

Our mapping for the Dog looks like this:

 1 @Entity
 2 public class Dog {
 3 
 4   @Id
 5   @GeneratedValue(strategy = GenerationType.IDENTITY)
 6   private Integer id;
 7 
 8   private String name;
 9 
10   @ManyToOne(fetch = FetchType.EAGER)
11   @JoinColumn(name = "owner_id")
12   private Owner owner;

And for the Owner:

 1 @Entity
 2 public class Owner implements Serializable {
 3 
 4   @Id
 5   @GeneratedValue(strategy = GenerationType.IDENTITY)
 6   private Integer id;
 7 
 8   private String name;
 9 
10   @OneToMany(mappedBy = "owner")
11   private Set<Dog> dogs;

Now let’s list all the dogs with this code:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .fetch();

We get seven dogs for three different owners (one dog is not owned) but what happened on the SQL level? Both Hibernate and EclipseLink do something like this (output from EclipseLink):

1 SELECT ID, NAME, OWNER_ID FROM DOG
2 SELECT ID, NAME FROM OWNER WHERE (ID = 1)
3 SELECT ID, NAME FROM OWNER WHERE (ID = 3)
4 SELECT ID, NAME FROM OWNER WHERE (ID = 4)

That classifies as N+1 problem, although the N may be lower than the count of selected rows thanks to the persistence context. JPA providers may be persuaded to use JOIN to fetch the information but this is beyond the current version of JPA specification. EclipseLink offers @JoinFetch(JoinFetchType.OUTER) and Hibernate has @Fetch(FetchMode.JOIN) (also uses outer select). In a related demo I tried both and to my surprise EclipseLink obeyed but Hibernate did not – which went against my previous experiences that Hibernate tries to optimize queries better in general.

Now, what if we don’t need any data for owner? You may try to use lazy fetch for the data you don’t need if you know you can rely on LAZY or try some other technique described in Troubles with to-one relationships. Here entity views come to mind, but projections may be even better.

Anytime I wanted to load dogs with their owners I’d go for explicit JOIN. Let’s see how to do that properly. Even though we don’t use the owners in select it is not sufficient construct query like this:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner)
5   .fetch();

This results in an invalid JPQL query:

1 select dog
2 from Dog dog
3   left join dog.owner

While this runs on Hibernate, it fails on EclipseLink with a syntax error: An identification variable must be defined for a JOIN expression. This, indeed, is necessary according to the specification and we must add an alias like so:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner, QOwner.owner)
5   .fetch();

This results in a valid JPQL query:

1 select dog
2 from Dog dog
3   left join dog.owner as owner

But without using it in select – which we don’t want because we don’t want the list of Tuples – we end up with a query with our join, but the data for owners is still not fetched and N additional queries are executed just like before. Had we used it in the select it would be fetched, of course.

The right way to do it if we insist on the result typed as List<Dog> is this:

1 List<Dog> dogs = new JPAQuery<>(em)
2   .select(QDog.dog)
3   .from(QDog.dog)
4   .leftJoin(QDog.dog.owner).fetchJoin()
5   .fetch();

This results in a valid and correct JPQL query:

1 select dog
2 from Dog dog
3   left join fetch dog.owner

Notice we haven’t used alias this time and looking at BNF (Backus–Naur form) notation from [JPspec], section 4.4.5 Joins I believe identification_variable is allowed only for join and not for fetch_join rules. Nonetheless, both Hibernate and EclipseLink tolerate this.

Eager to-many relationships

Using EAGER on collections by default is rather risky. I’d personally not use it and use explicit joins when needed instead. In our example we will use OwnerEager entity that uses the same table like Owner (hence the same prepared data) but maps dogs collection as:

1 @OneToMany(mappedBy = "owner", fetch = FetchType.EAGER)
2 private Set<DogEager> dogs;

Now we run the following code:

1 List<OwnerEager> owners = new JPAQuery<>(em)
2   .select(QOwnerEager.ownerEager)
3   .from(QOwnerEager.ownerEager)
4   .fetch();
5 System.out.println("\nowners = " + owners);
6 for (OwnerEager owner : owners) {
7   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
8 }

SQL produced at the point fetch() is executed is (output from EclipseLink, Hibernate is similar):

1 SELECT ID, NAME FROM Owner
2 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
3 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
4 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)
5 SELECT ID, NAME, owner_id FROM Dog WHERE (owner_id = ?)

And output of the subsequent print statements (manually wrapped):

1 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
2  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
3 Adam's dogs = [Dog{id=1, name='Alan', owner.id=1}, Dog{id=2,
4  name='Beastie', owner.id=1}, Dog{id=3, name='Cessna', owner.id=1}]
5 Charlie's dogs = []
6 Joe's dogs = [Dog{id=4, name='Rex', owner.id=3},
7  Dog{id=5, name='Lassie', owner.id=3}]
8 Mike's dogs = [Dog{id=6, name='Dunco', owner.id=4}]

Everything is OK, except it can all be done in a single query. Had we added a single fetch join line in the query we would have had it in a single go:

1 List<OwnerEager> owners = new JPAQuery<>(em)
2   .select(QOwnerEager.ownerEager)
3   .from(QOwnerEager.ownerEager)
4   .leftJoin(QOwnerEager.ownerEager.dogs).fetchJoin()
5   .fetch();

Join reaches for the data – notice we have to use leftJoin if the collection may be empty – and fetchJoin takes care of putting them in the collection. This is probably the best what we can explicitly do with eager collection mapping in place. Resulting SQL (here EclipseLink):

1 SELECT t1.ID, t1.NAME, t0.ID, t0.NAME, t0.owner_id
2  FROM {oj Owner t1 LEFT OUTER JOIN Dog t0 ON (t0.owner_id = t1.ID)}

In general, if eager fetch on collection is not executed using a join on the background it is not worth it. We can get the same N+1 behavior with lazy loads although it may require some explicit code to actually fetch the collection when you want – e.g. in the service layer instead of later in presentation layer with persistence context already not available.

There is one crucial difference between N+1 across to-one and to-many relationships. In case of to-one JPA provider may utilize entity cache as the ID of the needed entity is already available. In case of to-many, on the other hand, we have to execute actual select. This may use query cache – but setting that one appropriately is definitely more tricky than the entity cache. Without caches this difference is blurred away and the only difference is that for to-one we’re accessing rows in DB by its primary key, while for to-many we’re using the foreign key. PK is virtually always indexed, FK not necessarily so – but I guess it should be when we’re using this access pattern.

Perhaps there are some settings or custom annotations that make JPA provider perform the join, but it must be smart enough not to use it when limit/offset is in play. As this capability is not available in JPA – and because I believe eager collections are even worse than eager to-one – we will not delve into it any more.

Lazy relationships triggered later

Lazy fetch is a reasonable default for mapping collections – and it is default according to the specification. Because a collection can be implemented in a custom way all JPA providers offer lazy behaviour without the need for any bytecode magic.

Mapping on the Owner is natural:

1 @OneToMany(mappedBy = "owner")
2 private Set<Dog> dogs;

Now we run the code very similar to the previous section (entity classes are different, but they work on the same tables):

1 List<Owner> owners = new JPAQuery<>(em)
2   .select(QOwner.owner)
3   .from(QOwner.owner)
4   .fetch();
5 System.out.println("\nowners = " + owners);
6 for (Owner owner : owners) {
7   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
8 }

EclipseLink produces the following output (mixing SQL and standard output):

1 SELECT ID, NAME FROM OWNER
2 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
3  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
4 Adam's dogs = {IndirectSet: not instantiated}
5 Charlie's dogs = {IndirectSet: not instantiated}
6 Joe's dogs = {IndirectSet: not instantiated}
7 Mike's dogs = {IndirectSet: not instantiated}

Is that all? This may seem a bit disturbing and leads us to the question: What should toString on the lazy collection do?

Hibernate treats toString differently and prints the content as expected:

 1 Hibernate: select owner0_.id as id1_1_, owner0_.name as name2_1_
 2  from Owner owner0_
 3 owners = [Person{id=1, name=Adam}, Person{id=2, name=Charlie},
 4  Person{id=3, name=Joe}, Person{id=4, name=Mike}]
 5 Hibernate: select dogs0_.owner_id as owner_id3_0_0_,
 6   dogs0_.id as id1_0_0_, ...
 7  from Dog dogs0_ where dogs0_.owner_id=?
 8 Adam's dogs = [Dog{id=1, name='Alan', owner.id=1}, Dog{id=2,
 9  name='Beastie', owner.id=1}, Dog{id=3, name='Cessna', owner.id=1}]
10 ...and more for other owners

This toString triggers the lazy load process, just like other methods that do the same for EclipseLink. This means N+1 problem, of course. Lazy load is supposed to prevent loading the data that is not needed – sometimes we say that the best queries are the ones we don’t have to perform. But what if want to display a table of owners and list of their dogs in each row? Postponing the load into the view is the worst thing we can do for couple of reasons:

We actually don’t gain anything, we’re still stuck with N+1 problem while there are better ways to load the same data.
We know what data are needed and the responsibility for their loading is split between backend layer and presentation layer.
To allow presentation layer to load the data we need to keep persistence context open longer. This leads to the infamous Open Session in View (OSIV) pattern, or rather an antipattern.

Using OSIV may lead to situations that for some simple cases the changes are needed only at the presentation layer. While this may sound appealing it is, in fact, rather appalling. Not only we need more queries and your site may not scale well, especially for pages with more items per page. It also opens holes in the contract between the persistence and service layer, leading to all kinds of mysterious bugs, e.g. unexpected updates of objects for which no actual code for update exists.

This is the price for pulling attached entity objects up to the presentation layer. This is not about utilizing lazy load to avoid some queries anymore – this is the consequence of being lazy when designing the system. This kind of laziness is not pushing the progress forward. (And yes, I’ve been there and I’ve sinned too.)

When we need the data in the collection we should use join just like we used it for an eager collection – there is actually no difference if we use the join explicitly. But as we demonstrated previously there is the issue with paginating. And that’s what we’re going to talk about next.

Paginating with to-many

I mentioned previously that we can’t escape SQL underneath. If we join another table across to-many relationship we change the number of fetched rows and with that we cannot paginate reliably. We can demonstrate this with SQL or even with Querydsl (that is JPQL).

We will try to obtain owners and their dogs ordered by owner’s name. We will set offset to 1 and limit to 2 which should return Charlie and Joe. I realize that offset that is not multiple of limit (that is a page size) does not make best sense, but I want these two particular items for the following reasons:

I want to draw from the middle of our owner list, not the beginning or the end.
I want to see the effect of Charlie’s empty collection.
I want to see how Joe’s two dogs affect the results.

With our limited data this limit (2) and offset (1) demonstrates all that.

We already saw that both eager and lazy collection allows us to query the master table and let N other queries pull the data:

 1 List<Owner> owners = new JPAQuery<>(em)
 2   .select(QOwner.owner)
 3   .from(QOwner.owner)
 4   .orderBy(QOwner.owner.name.asc())
 5   .offset(1)
 6   .limit(2)
 7   .fetch();
 8 for (Owner owner : owners) {
 9   owner.getDogs().isEmpty(); // to assure lazy load even on EclipseLink
10   System.out.println(owner.getName() + "'s dogs = " + owner.getDogs());
11 }

Join leads to incorrect results

We can’t fix the N+1 problem with JOIN FETCH anymore:

1 List<Owner> owners = new JPAQuery<>(em)
2   .select(QOwner.owner)
3   .from(QOwner.owner)
4   .leftJoin(QOwner.owner.dogs).fetchJoin()
5   .orderBy(o.name.asc(), d.name.asc())
6   .offset(1)
7   .limit(2)
8   .fetch();

With the same print loop this produces:

1 Adam's dogs = {[Dog{id=1, name='Alan', owner.id=1}, Dog{id=2, ...
2 Adam's dogs = {[Dog{id=1, name='Alan', owner.id=1}, Dog{id=2, ...

The same happens if we want to be explicit. Here we even use “raw” versions of entities, ad hoc join (to root entity) and explicit ON:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<Tuple> results = new JPAQuery<>(em)
 4   .select(o, d)
 5   .from(o)
 6   .leftJoin(d).on(d.ownerId.eq(o.id))
 7   .orderBy(o.name.asc(), d.name.asc())
 8   .offset(1)
 9   .limit(2)
10   .fetch();
11 for (Tuple row : results) {
12   DogRaw dog = row.get(d);
13   System.out.println(row.get(o).getName() + ", " +
14     (dog != null ? dog.getName() : null));
15 }

Now we had to change the print loop and we put the results into tuple:

1 Adam, Beastie
2 Adam, Cessna

We could just select(o) but it would still two rows of Adam. Ok, how does the whole result set looks like and what are we selecting with our limit/offset?

owner	dog
Adam	Alan
Adam	Beastie
Adam	Cessna
Charlie	null
Joe	Lassie
Joe	Rex
Mike	Dunco

This makes it more obvious. Offset and limit are not smart and they don’t care about what we want to paginate. With values 1 and 2 respectively they will give us lines 2 and 3 of whatever result we apply it to. And that is very important.

Using native SQL

Is it even possible to paginate with a single query? Yes, but we have to use subquery in a FROM or JOIN clause, neither of which is available in JPQL (as of JPA 2.1). We can try native query:

 1 // select * would be enough for EclipseLink
 2 // Hibernate would complains about duplicated sql alias
 3 Query nativeQuery = em.createNativeQuery(
 4   "SELECT o.name AS oname, d.name AS dname" +
 5     " FROM (SELECT * FROM owner LIMIT 2 OFFSET 1) o" +
 6     " LEFT JOIN dog d ON o.id=d.owner_id");
 7 List<Object[]> resultList = nativeQuery.getResultList();
 8 
 9 System.out.println("resultList = " + resultList.stream()
10   .map(row -> Arrays.toString(row))
11   .collect(Collectors.toList()));

Returned resultList has three rows:

1 resultList = [[Charlie, null], [Joe, Rex], [Joe, Lassie]]

But all rows are related to our two selected owners. We got all the data in relational form, there is no other way, it’s up to us to group the data as we need. For this we may want to add a transient dogs collection in our “raw” variant of the Owner class.

JPA/Querydsl solutions

Because of current JPQL disabilities we cannot do it all in one go, but we can do it in a single additional query:

1) Query with pagination is executed. This may return some data from the master table or perhaps only IDs of the rows. Order must be applied for pagination to work correctly. Any where conditions are applied here as well, more about it in a moment. 2) Fetch the missing data limiting the results to the scope of the previous result. The concrete implementation of this step differs depending on the nature of the first query.

Let’s say the first query just returns IDs like this:

1 QOwnerRaw o = QOwnerRaw.ownerRaw;
2 List<Integer> ownerIds = new JPAQuery<>(em)
3   .select(o.id)
4   .from(o)
5   .orderBy(o.name.asc())
6   // WHERE ad lib here
7   .offset(1)
8   .limit(2)
9   .fetch();

In this case we can get the final result like this:

 1 QDogRaw d = QDogRaw.dogRaw;
 2 Map<OwnerRaw, List<DogRaw>> ownerDogs = new JPAQuery<>(em)
 3   .select(o, d)
 4   .from(o)
 5   .leftJoin(d).on(d.ownerId.eq(o.id))
 6   .where(o.id.in(ownerIds))
 7   .orderBy(o.name.asc()) // use the same order as in select #1
 8   .orderBy(d.name.desc()) // dogs in each list ordered DESC
 9   // no limit/offset, where took care of it
10   .transform(groupBy(o).as(list(d)));
11 System.out.println("ownerDogs = " + ownerDogs);

We will talk more about transform method later in the chapter on advanced Querydsl, and its section result transformation. Obviously it is the trick that allows us to return Map instead of List.

Using the same order like in the first query is essential, additional orders are allowed. Important question is whether you need to join any to-many relationship for the first query as well. If we need to (e.g. for WHERE conditions) we have to use DISTINCT which means we also have to bring expression from ORDER BY into the SELECT clause. This is mandated by SQL, because DISTINCT must be applied before we can order the results. This complicates the first select:

 1 List<Integer> ownerIds = new JPAQuery<>(em)
 2   // If distinct is used SQL requires o.name here because it is used to
 3   // order which happens after distinct. EclipseLink can handle this,
 4   // Hibernate generated SQL fails. JPA spec is not specific on this.
 5   .select(o.id, o.name)
 6   // or perhaps just use o, that encompasses o.name
 7   // .select(o)
 8   .distinct() // needed if join across to-many is used to allow WHERE
 9   .from(o)
10   .orderBy(o.name.asc())
11   // WHERE ad lib here
12   .offset(1)
13   .limit(2)
14   .fetch()
15   .stream()
16   .map(t -> t.get(o.id))
17   // .map(Owner::getId) // if .select(o) was used
18   .collect(Collectors.toList());

We could stop at fetch and deal with the List<Tuple> but that’s not what we wanted hence the additional mapping to the list of IDs. The commented version using just o alias and more straightforward mapping to ID is probably better, but it still raises the question: Do we really need to join across to-many to add where conditions? Can’t we work around it with subquery? For example, to get owners with some dogs we can do this with join across to-many:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<OwnerRaw> owners = new JPAQuery<>(em)
 4   .select(o)
 5   .distinct()
 6   .from(o)
 7   .leftJoin(d).on(d.ownerId.eq(o.id))
 8   .where(d.isNotNull())
 9   .orderBy(o.name.asc())
10   .offset(1)
11   .limit(2)
12   .fetch();

This skips Charlie and returns Joe and Mike. We can get the same without JOIN and DISTINCT:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 QDogRaw d = QDogRaw.dogRaw;
 3 List<OwnerRaw> owners = new JPAQuery<>(em)
 4   .select(o)
 5   .from(o)
 6   .where(new JPAQuery<>()
 7     .select(d)
 8     .from(d)
 9     .where(d.ownerId.eq(o.id))
10     .exists())
11   .orderBy(o.name.asc())
12   .offset(1)
13   .limit(2)
14   .fetch();

Subquery reads a bit longer but if we wanted to select o.id directly it would save a lot of troubles we would encounter because of the distinct/orderBy combo.

Personally I’m not a friend of the first select returning only IDs, although there may be scenarios when it’s the best solution. Now we know we can filter list of owners by dog’s attributes (that is across to-many relationship), either using subqueries or distinct clause, I’d suggest returning whole owners. We can safely join any to-one associations and return Tuple of many entities or we can enumerate all the desired columns and return custom DTO (projection). Projection would further relieve the persistence context if the data merely flow through the service layer to the presentation layer.

In the second query we don’t have to load the actual results anymore, the query is needed only for fetching the data in to-many relationship. But we fetch all of it for all the items on the page currently being displayed.

To-many relationship fetcher

To demonstrate how we can implement this fetching pattern I decided to build “quick and dirty” fluent API for the second query. It not only executes the fetch for the to-many relationship for all the items in the page, but it also merges them into a single object – which is a kind of post-processing we glossed over.

I’m not saying it can’t be done better but I think if it looked like this it would be really useful:

 1 QOwnerRaw o = QOwnerRaw.ownerRaw;
 2 List<OwnerRaw> owners = new JPAQuery<>(em)
 3   .select(o)
 4   .from(o)
 5   .orderBy(o.name.asc())
 6   .offset(1)
 7   .limit(2)
 8   .fetch();
 9 
10 QDogRaw d = QDogRaw.dogRaw;
11 List<OwnerRaw> ownersWithDogs = ToManyFetcher.forItems(owners)
12   .by(OwnerRaw::getId)
13   .from(d)
14   .joiningOn(d.ownerId)
15   .orderBy(d.name.desc())
16   .fetchAndCombine(em, OwnerRaw::setDogs);

This implies some transient dogs collection on the owner and existing setter:

1 @Transient
2 private List<DogRaw> dogs;
3 
4 //...
5 
6 public void setDogs(List<DogRaw> dogs) {
7   this.dogs = dogs;
8 }

Or we can combine both in some DTO like this (first query is still the same):

 1 // first query up to QDogRaw d = ... here
 2 List<OwnerWithDogs> ownersWithDogs = ToManyFetcher.forItems(owners)
 3   .by(OwnerRaw::getId)
 4   .from(d)
 5   .joiningOn(d.ownerId)
 6   .orderBy(d.name.desc())
 7   .fetchAs(em, OwnerWithDogs::new);
 8 
 9 // elsewhere...
10 public class OwnerWithDogs {
11   public final OwnerRaw owner;
12   public final List<DogRaw> dogs;
13 
14   OwnerWithDogs(OwnerRaw owner, List<DogRaw> dogs) {
15     this.owner = owner;
16     this.dogs = dogs;
17   }
18 }

Ok, now how to construct such an API? We can use paths of Querydsl – although we don’t use its type-safety capabilities for perfect result – rather we go for something useful without putting too many parametrized type in the code. Also, in order to make it fluent with the need of adding additional parametrized types, I broke my personal rules about code nesting – in this case the nesting of inner classes. Sorry for that, it’s up to you to do it your way, this really is just a proof of concept:

  1 import com.querydsl.core.types.*;
  2 import com.querydsl.core.types.dsl.*;
  3 import com.querydsl.jpa.impl.JPAQuery;
  4 
  5 import javax.persistence.EntityManager;
  6 import java.util.*;
  7 import java.util.function.*;
  8 import java.util.stream.Collectors;
  9 
 10 import static com.querydsl.core.group.GroupBy.*;
 11 
 12 public class ToManyFetcher<T> {
 13 
 14   private final List<T> rows;
 15 
 16   private ToManyFetcher(List<T> rows) {
 17     this.rows = rows;
 18   }
 19 
 20   public static <T> ToManyFetcher<T> forItems(List<T> rows) {
 21     return new ToManyFetcher<>(rows);
 22   }
 23 
 24   public <PK> ToManyFetcherWithIdFunction<PK> by(Function<T, PK> idFunction) {
 25     return new ToManyFetcherWithIdFunction<>(idFunction);
 26   }
 27 
 28   public class ToManyFetcherWithIdFunction<PK> {
 29     private final Function<T, PK> idFunction;
 30 
 31     public ToManyFetcherWithIdFunction(Function<T, PK> idFunction) {
 32       this.idFunction = idFunction;
 33     }
 34 
 35     public <TMC> ToManyFetcherWithFrom<TMC> from(
 36       EntityPathBase<TMC> toManyEntityPathBase)
 37     {
 38       return new ToManyFetcherWithFrom<>(toManyEntityPathBase);
 39     }
 40 
 41     public class ToManyFetcherWithFrom<TMC> {
 42       private EntityPathBase<TMC> toManyEntityPathBase;
 43       private Path<PK> fkPath;
 44       private OrderSpecifier orderSpecifier;
 45 
 46       public ToManyFetcherWithFrom(EntityPathBase<TMC> toManyEntityPathBase)
 47       {
 48         this.toManyEntityPathBase = toManyEntityPathBase;
 49       }
 50 
 51       public ToManyFetcherWithFrom<TMC> joiningOn(Path<PK> fkPath) {
 52         this.fkPath = fkPath;
 53         return this;
 54       }
 55 
 56       public ToManyFetcherWithFrom<TMC> orderBy(OrderSpecifier orderSpecifier) {
 57         this.orderSpecifier = orderSpecifier;
 58         return this;
 59       }
 60 
 61       public <R> List<R> fetchAs(
 62         EntityManager em, BiFunction<T, List<TMC>, R> combineFunction)
 63       {
 64         Map<PK, List<TMC>> toManyResults = getToManyMap(em);
 65 
 66         return rows.stream()
 67           .map(row -> combineFunction.apply(row,
 68             toManyResults.getOrDefault(
 69               idFunction.apply(row), Collections.emptyList())))
 70           .collect(Collectors.toList());
 71       }
 72 
 73       public List<T> fetchAndCombine(EntityManager em,
 74         BiConsumer<T, List<TMC>> combiner)
 75       {
 76         Map<PK, List<TMC>> toManyResults = getToManyMap(em);
 77 
 78         rows.forEach(row -> combiner.accept(row,
 79           toManyResults.getOrDefault(
 80             idFunction.apply(row), Collections.emptyList())));
 81 
 82         return rows;
 83       }
 84 
 85       private Map<PK, List<TMC>> getToManyMap(EntityManager em) {
 86         List<PK> ids = rows.stream()
 87           .map(idFunction)
 88           .collect(Collectors.toList());
 89         JPAQuery<TMC> tmcQuery = new JPAQuery<>(em)
 90           .select(toManyEntityPathBase)
 91           .from(toManyEntityPathBase)
 92           .where(Expressions.booleanOperation(
 93             Ops.IN, fkPath, Expressions.constant(ids)));
 94         if (orderSpecifier != null) {
 95           tmcQuery.orderBy(orderSpecifier);
 96         }
 97         return tmcQuery
 98           .transform(groupBy(fkPath).as(list(toManyEntityPathBase)));
 99       }
100     }
101   }
102 }

First we start with ToManyFetcher that preserves the type for the master entity (OwnerRaw in our example). I could also store EntityManager here – it is eventually needed for query execution, but I chose to put it into the terminal operation (rather arbitrary decision here). Method by returns ToManyFetcherWithIdFunction to preserve the type of the master entity primary key (ID) and also to store the function that maps entity to it. For that we need to return different type. Next from preserves the type of EntityPathBase of the detail entity (that is entity Q-class) – and again returns instance of another type that wraps all the accumulated knowledge.

Further joiningOn and orderBy don’t add any parametrized types so they just return this. While orderBy is optional, joiningOn is very important and we may construct API in a way that enforces its call – which did not happen in my demo. Finally we can choose from two handy fetch operations depending on whether we want to combine the result on the master entity class (using a setter for instance) or in another type (like DAO).

Java 8 is used nicely here, especially when the API is used. It can certainly be done without streams but using inner classes instead of lambdas would by and fetch calls arguably unwieldy. Trying to do it in a single type would require defining all the parametrized types at once – either explicitly using <....> notation or using a call with multiple arguments.

I tried both and this second fluent approach is much better as the names of fluent calls introduce the parameter and make the code virtually self-documented. I hope this demonstrates how nicely we can use Querydsl API (and Java 8!) for recurring problems.

Wrapping up N+1 problem

N+1 problem can occur when the load of multiple (N) entities subsequently triggers load of their related data one by one. The relations may be to-one or to-many and there may be be multiple of these. The worst case is when the data loaded by up to N additional queries are not used at all. This happens often with to-one relationships which I personally solve with raw FK values – as proposed in the chapter Removing to-one altogether. To-many relationships should be declared as lazy for most cases to avoid “loading the whole DB”.

Needed related data should be loaded explicitly, that is with joins, when we need the data. Scenarios where conscious lazy load is appropriate are rather rare. Scenarios where people either don’t know or are lazy to think are less rare and that’s what leads to N+1 so often. N queries are not necessary in most cases and don’t scale very well with the page size.

For data across to-one relationships there are rarely any negative implications when we join them. With to-many situation gets a little bit messy, especially if we need data across multiple such relationships. If we don’t need pagination we may use join and then group the data appropriately, but with pagination we have to ensure that the driving query on the master entity is paginated correctly. With JPA/JPQL this is typically not possible in a single select and each to-many relationship requires an additional query – but it is possible to keep query count constant and not driven by number of items.

In any case, letting lazy loading doing the job we should do in cases when the data gets loaded anyway is sloppy. Using Open Session in View (OSIV) on top of it instead of having proper service layer contract is plain hazardous.

Up next

12. Mapping Java enum