Leanpub: Publish Early, Publish Often

10. Modularity, if you must

I’m convinced that people who designed and specified JPA were not aware (or were ignorant) of modern modularity trends.¹⁰ For a long time I’ve been trying to put things into domain named packages (sometimes low-level technical domains, as it happens with complex layered software) and as a next step I naturally started to split big JARs into smaller ones rather across vertical (domain) lines instead of putting everything from one layer in a single JAR.

If you have all your entities in a single JAR you have no problem altogether. But if you fancy to split the entities into multiple JARs – including their definitions in persistence.xml – then you’re venturing beyond the JPA specification. JPspec, section 8.2 Persistence Unit Packaging says:

This does not explicitly forbid using multiple persistence.xml files but I believe it implies we should not do it. It’s actually the other way around – we can have multiple persistence units in one XML, that’s a piece of cake.

Everybody knows layers

I decided to experiment with modularization of applications after I had read Java Application Architecture: Modularity Patterns with Examples Using OSGi [JAA].¹¹ After that I couldn’t sleep that well with typical mega-JARs containing all the classes at a particular architectural level. Yeah, we modularize but only by levels.

I’ll use one of the projects from my past as an example. We had one JAR with all entity classes, sometimes it was called domain, sometimes jpa-entities (admitting the anemic domain model).

Then we had our @Service classes in a different JAR – but all of them in one – accessing entities via DAOs. Service layer was used by a presentation layer. Complex logic was pushed to specific @Components somewhere under the service layer. It was not domain-driven but it made sense for us and the dependencies flowed clearly from the top down.

Components based on features

But how about dependencies between parts of the system at the same level? When we have a lot of entity classes, some are pure business (Clients, their Transactions, financial Instruments), some are rather infrastructural (meta model, localization, audit trail). Why can’t these be separated? Most of the stuff depends on meta model, localization is quite independent in our case, audit trail needs meta model and permission module. It all clicks when one thinks about it – and it is more or less in line with modularity based on features, not on technology or layers. Sure we can still use layer separation and have permission-persistence and permission-service as well (later we will see an example where it makes a lot of sense).

Actually, this is quite a repeating question: Should we base packages by features or by layer/technology/pattern? It seems that the consensus was reached – start by feature – which can be part of your business domain. If stuff gets big you can split them into layers too, unless you find even more suitable sub-domains.

Multiple JARs with JPA entities

So we carefully try to put different entity classes into different JAR modules. Sure, we can just repackage them in the same JAR and check how tangled they are with Sonar but it is recommended to enforce the separation and to make dependencies explicit (not only in the [JAA] book). My experience confirms it quite clearly – contracts of any kind must be enforced.

And here comes the problem when your persistence is based on JPA. Because JPA clearly wasn’t designed to have a single persistence unit across multiple persistence.xml files. So what are these problems actually?

How to distribute persistence.xml across these JARs? Do we even have to?
What classes need to be mentioned where? E.g., we need to mention classes from an upstream JARs in persistence.xml if they are used in relations (which breaks the DRY principle).
When we have multiple persistence.xml files, how to merge them in our persistence unit configuration?
What about the configuration in persistence.xml? What properties are used from what file? (Little spoiler, we just don’t know reliably!) Where to put them so we don’t have to repeat ourselves again?
How to use EclipseLink static weaving for all these modules? How about a module with only abstract mapped superclass (some dao-common module)?

That’s quite a lot of problems for an age when modularity is so often mentioned, too many for a technology that comes from an “enterprise” stack. And they are mostly phrased as questions – because the answers are not readily available.

Distributing `persistence.xml` – do we need it?

This one is difficult and may depend on the provider you use and the way you use it. We use EclipseLink and its static weaving. This requires persistence.xml. Sure we may try keep it together in some neutral module (or any path, as it can be configured for weaving plugin), but that goes against the modularity quest. What options do we have?

We can create some union persistence.xml in a module that depends on all needed JARs. This would be OK if we had just one such module – typically some downstream module like WAR or runnable JAR. But we have many. If we made persistence.xml for each they would contain a lot of repetition. And we’d reference downstream resource which is unacceptable.
We can have dummy upstream module or out of module path with union persistence.xml. This would keep things simple, but it would be more difficult to develop modules independently, maybe even with different teams.
Keep persistence.xml in the JAR with related classes. This seems best from the modularity point of view, but it means we need to merge multiple persistence.xml files when the persistence unit starts.
Or can we have different persistence unit for each persistence.xml? This is OK, if they truly are in different databases (different JDBC URL), otherwise it doesn’t make sense. In our case we have rich DB and any module can see the part it is interested in – that is entities from the JARs it has on the classpath. If you have data in different databases already, you’re probably sporting microservices anyway.

We went for the third option – especially because EclipseLink’s weaving plugin likes it and we didn’t want to redirect to non-standard path to persistence.xml – but it also seems to be the right logical way. However, there is nothing like dependency between persistence.xml files. So if we have b.jar that uses a.jar, and there is entity class B in b.jar that contains @ManyToOne to A entity from a.jar, we have to mention A class in persistence.xml in b.jar. Yes, the class is already mentioned in a.jar, of course. Again, clearly, engineers of JPA didn’t even think about possibility of using multiple JARs in a really modular way.

In any case – this works, compiles, weaves our classes during build – and more or less answers questions 1 and 2 from our problem list.

It doesn’t start anyway

When we have a single persistence.xml it will get found as a unique resource, typically in META-INF/persistence.xml – in any JAR actually (the one JPA refers to as “the root of persistence unit”). But when we have more of them they don’t get all picked and merged magically – and the application fails during startup. We need to merge all those persistence.xml files during the initialization of our persistence unit. Now we’re tackling questions 3 and 4 at once, for they are linked.

To merge all the configuration XMLs into one unit we can use this configuration for PersistenceUnitManger in Spring (using MergingPersistenceUnitManager is the key):

1 @Bean
2 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
3   MergingPersistenceUnitManager persistenceUnitManager =
4     new MergingPersistenceUnitManager();
5   persistenceUnitManager.setDefaultDataSource(dataSource);
6   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
7   // default persistence.xml location is OK, goes through all classpath*
8   return persistenceUnitManager;
9 }

Before I unveil the whole configuration we should talk about the configuration that was in the original singleton persistence.xml – which looked something like this:

 1 <exclude-unlisted-classes>true</exclude-unlisted-classes>
 2 <shared-cache-mode>ENABLE_SELECTIVE</shared-cache-mode>
 3 <properties>
 4   <property name="eclipselink.weaving" value="static"/>
 5   <property name="eclipselink.allow-zero-id" value="true"/>
 6   <!--
 7   Without this there were other corner cases when field change was ignored.
 8   This can be worked-around calling setter, but that sucks.
 9   -->
10   <property name="eclipselink.weaving.changetracking" value="false"/>
11 </properties>

The biggest question here is: What is used during build (e.g. by static weaving) and what can be put into runtime configuration somewhere else? Why somewhere else? Because we don’t want to repeat these properties in all XMLs.

Now we will take a little detour to shared-cache-mode that displays the problem with merging persistence.xml files in the most bizarre way.

Shared cache mode

We wanted to enable cached entities selectively (to ensure that @Cacheable annotations have any effect). But I made a big mistake when I created another persistence.xml file – I forgot to mention shared-cache-mode there. My persistence unit picked both XMLs (using MergingPersistenceUnitManager) but my caching went completely nuts. It cached more than expected and I was totally confused. The trouble here is – persistence.xml don’t get really merged. The lists of classes in them do, but the configurations do not. Somehow my second persistence XML became dominant (one always does!) and because there was no shared-cache-mode specified, it used defaults – which is anything EclipseLink thinks is the best. No blame there, just another manifestation that JPA people didn’t even think about this setup scenarios.

If you really want to get some hard evidence how things are in your setup put a breakpoint anywhere you can reach your configured EntityManagerFactory and when it stops there dig deeper to find what your cache mode is. Or anything else – you can check the list of known entity classes, JPA properties, anything really. And it’s much faster than guessing.

Debugging JPA EntityManagerFactory setup

In the picture above we can see the following:

We’re using Springs MergingPersistenceUnitManager.
Persistence unit name is as expected.
This is the path of “the root of the persistence unit”. It’s a directory here, but it would be a JAR in production.
Our managed classes (collapsed here).
Shared cache mode is set to ENABLE_SELECTIVE – as required.
Other cofigured properties.

Luckily, we don’t rely on the persistence.xml root anymore, at least not for runtime configuration.

Putting the Spring configuration together

Now we’re ready to wrap up our configuration and move some of repeated configuration snippets from persistence.xml into Spring configuration – here we will use Java-based configuration (XML works too, of course).

Most of our properties were related to EclipseLink. I had read their weaving manual, but I still didn’t understand what works when and how. I had to debug some of the stuff to be really sure.

It seems that eclipselink.weaving is the crucial property namespace, that should stay in our persistence.xml, because it gets used by the plugin performing the static weaving. I debugged Maven build and the plugin definitely uses property eclipselink.weaving.changetracking (we set it to false which is not default). Funny enough, it doesn’t need eclipselink.weaving itself because running the plugin implies we wish for static weaving. During startup it gets picked though, so EclipseLink knows it can treat classes as statically weaved. This means it can be pushed into programmatic configuration.

The rest of the properties (and shared cache mode) are clearly used at the startup time. Spring configuration may then look like this:

 1 @Bean public DataSource dataSource(...) { /* as usual */ }
 2 
 3 @Bean
 4 public JpaVendorAdapter jpaVendorAdapter() {
 5   EclipseLinkJpaVendorAdapter jpaVendorAdapter = new EclipseLinkJpaVendorAdapter();
 6   jpaVendorAdapter.setShowSql(true);
 7   jpaVendorAdapter.setDatabase(
 8     Database.valueOf(env.getProperty("jpa.dbPlatform", "SQL_SERVER")));
 9   return jpaVendorAdapter;
10 }
11 
12 @Bean
13 public PersistenceUnitManager persistenceUnitManager(DataSource dataSource) {
14   MergingPersistenceUnitManager persistenceUnitManager =
15     new MergingPersistenceUnitManager();
16   persistenceUnitManager.setDefaultDataSource(dataSource);
17   persistenceUnitManager.setDefaultPersistenceUnitName("you-choose");
18   persistenceUnitManager.setSharedCacheMode(
19     SharedCacheMode.ENABLE_SELECTIVE);
20   // default persistence.xml location is OK, goes through all classpath*
21 
22   return persistenceUnitManager;
23 }
24 
25 @Bean
26 public FactoryBean<EntityManagerFactory> entityManagerFactory(
27   PersistenceUnitManager persistenceUnitManager, JpaVendorAdapter jpaVendorAdapter)
28 {
29   LocalContainerEntityManagerFactoryBean emfFactoryBean =
30     new LocalContainerEntityManagerFactoryBean();
31   emfFactoryBean.setJpaVendorAdapter(jpaVendorAdapter);
32   emfFactoryBean.setPersistenceUnitManager(persistenceUnitManager);
33 
34   Properties jpaProperties = new Properties();
35   jpaProperties.setProperty("eclipselink.weaving", "static");
36   jpaProperties.setProperty("eclipselink.allow-zero-id",
37     env.getProperty("eclipselink.allow-zero-id", "true"));
38   jpaProperties.setProperty("eclipselink.logging.parameters",
39     env.getProperty("eclipselink.logging.parameters", "true"));
40   emfFactoryBean.setJpaProperties(jpaProperties);
41   return emfFactoryBean;
42 }

Clearly, we can set the database platform, shared cache mode and all runtime relevant properties programmatically – and we can do it at a single place. This is not a problem for a single persistence.xml, but in any case it offers better control. We can now use Spring’s @Autowired private Environment env; and override whatever we want with property files or even -D JVM arguments – and still fallback to default values – just as we do for database property of the JpaVendorAdapter. Or we can use SpEL. This is the flexibility that persistence.xml simply cannot provide.

And of course, all the things mentioned in the configuration can now be removed from all our persistence.xml files.

I’d love to get rid of eclipselink.weaving.changetracking in the XML too, but I don’t see any way how to provide this as the Maven plugin configuration option, which we have neatly unified in our parent POM. That would also eliminate some repeating.

Common base entity classes in separate JAR

The question 5 from our problem list is no problem after all the previous, just a nuisance. EclipseLink refuses to weave our base class regardless of @MappedSuperclass usage. But as mentioned in one SO question/answer, just add dummy concrete @Entity class and we are done. You’ll never use it, it is no problem at all. And you can vote for this bug.

This may be no issue for load-time weaving or for Hibernate, but I haven’t tried that so I can’t claim it reliably.

Cutting cyclic dependencies

Imagine we have two modules – users and permissions – where permissions depends on the users module. Module users contains User and Role entity while permission module contains Permission and RolePermission association that is mapped explicitly because it contains some additional information. See the following picture:

Obviously, Role cannot depend on RolePermission because that would cross module boundaries in a wrong direction. But when the Role is deleted we also want to remove all related RolePermission (but not Permission, of course). Imagine we have RoleDao and RolePermissionDao and both these are derived from some layer supertype named BaseDao. This handy class implements all common functionality, including delete(...) method that in the process calls common preDelete(entity) method. For convenience this is already pre-implemented, but does nothing.

Were all the classes in a single module our RoleDao could contain this:

1 @Override
2 protected void preDelete(Role entity) {
3   QRolePermission rp = QRolePermission.rolePermission;
4   new JPADeleteClause(entityManager, rp)
5     .where(rp.roleId.eq(entity.getId()))
6     .execute();
7 }

With separate modules it’s not possible because QRolePermission (just like RolePermission) is not available in the users module.

Sure, we can argue whether we have the entities split into modules properly – for instance, is Role really a concept belonging to User without considering permissions? However after moving Role to the permissions module, we would have problem with @ManyToMany relationship named roles on the User class – which is naturally the owning side of the relationship – as it would go against module dependency.

When the cycle occurs it may also indicate that one of the modules is too big. In our case we used User in the permission module, but it is not really necessary in the entities directly. It is used by upper layers of the module so maybe we should split the modules like this:

Reversing `users` and `permissions` dependency with introducing another module.

Now users module can depend on permission-persistence and bring RolePermission where it really belongs.

In the following cases we will focus on the relationship between the entities Role and Permission – this time with modules users always depending on the permissions module (which more or less matches permissions-persistence from the previous picture). Let’s see how we can model that many-to-many relationship and how it can cross module boundaries.

Unidirectional `@ManyToMany` follows the dependency.

First we use implicit role_permission table without explicit entity. Role is a natural owner of the information about its Permissions and Permission doesn’t necessarily need the list of the roles using it. It gets interesting when we introduce RolePermission explicitly as an entity – perhaps because there are additional attributes on that association.

Notice that the database relational design limitations, dictating where to put the foreign key, are actually quite useful to decide which way the dependency goes. The module knowing about RolePermission table is the one that depends on the other one. If it wasn’t that way we would get invalid case with cyclic dependency like this:

Explicit `RolePermission` in `permissions` forms a cycle.

We either need to reverse the dependency on the module or split one module into two like shown previously with permissions. If we absolutely needed this dependency from users to permissions and also the “red arrow” then something else in the users model depends on permission – which is a case for splitting users module. (This is rather unnatural, previous case for splitting permissions was much better.)

In any of these cases we still have a problem when we delete the “blind” side of the relationship. In the pictures of valid cases it was the Permission entity. Deletion of a Permission somehow needs to signal to the users module that it needs to remove relations too. For that we can use the Observer pattern that allows us to register an arbitrary callback with a DAO.

It will be implemented by already mentioned BaseDao to make it generally available. The following example demonstrates how it can be used when explicit RolePermission entity is used:

 1 // in some dao-base module
 2 public abstract class BaseDao<T> ... {
 3   private List<PreDeleteHook<T>> preDeleteHooks = new ArrayList<>();
 4 
 5   public void delete(T entity) {
 6     preDeleteHooks.forEach(h -> h.preDelete(entity));
 7     entityManager.remove(entity);
 8     ...
 9   }
10 
11   public interface PreDeleteHook<E> {
12     void preDelete(E entity);
13   }
14 
15   public void addPreDeleteHook(PreDeleteHook<T> hook) {
16     preDeleteHooks.add(hook);
17   }
18   ...
19 }
20 
21 // in users module
22 public class RolePermissionDao extends BaseDao<RolePermission> {
23 	public static final QRolePermission $ = QRolePermission.rolePermission;
24 
25   @Autowired PermissionDao permissionDao; // from downstream permissions module, OK
26 
27   @PostConstruct public void init() {
28     permissionDao.addPreDeleteHook(this::deleteByPermission);
29   }
30 
31   // this method "implements" PreDeleteHook<Permission>
32   public void deleteByPermission(Permission permission) {
33     new JPADeleteClause(entityManager, $)
34       .where($.permissionId.eq(permission.getId()))
35       .execute();
36   }
37   ...
38 }
39 
40 // no special changes in PermissionsDao, it can't know about users module anyway

Using unmanaged association table with @ManyToMany would require to find all the Roles having this permission and removing it from its permissions collection. I, again, vouch for more explicit way as that “delete all relations that use this permissions” is simpler and likely more efficient.

Conclusion

JPA clearly wasn’t designed with modularity in mind – especially not when modules form a single persistence unit, which is perfectly legitimate usage. Just because we use one big legacy database it doesn’t mean we don’t want to factor our code into smaller modules.
It is possible to distribute persistence classes into multiple JARs, and then:
- We can go either with a single union persistence.xml, which can be downstream or upstream – this depends if we need it only in runtime or during build too.
- I believe it is more proper to pack partial persistence.xml into each JAR, especially if we need it during build. Unfortunately, there is no escape from repeating some upstream classes in the module again just because they are referenced in relations (typical culprit when some error makes you scream “I don’t understand how this is not an entity, when it clearly is!”).
If we have multiple persistence.xml files, it is possible to merge them using Spring’s MergingPersistenceUnitManager. I don’t know if we can use it for non-Spring applications, but I saw this idea reimplemented and it wasn’t that hard. (If I had to reimplement it, I’d try to merge the configuration part too!)
When we are merging persistence.xml files it is recommended to minimize configuration in them so it doesn’t have to be repeated. E.g., for Eclipselink we leave only stuff necessary for built-time static weaving, the rest is set programmatically in our Spring @Configuration class.

There are still some open questions but I think they lead nowhere. Can I use multiple persistence units with a single data source? This way I can have each persistence.xml as a separate unit. But I doubt relationships would work across these and the same goes for transactions (without XA that is).

Up next

11. Avoid N+1 select