Leanpub: Publish Early, Publish Often

Internal vs External Models

Up to this point in the book, the focus has been on handling versioning issues of internal models. In practice however it is common to have both internal and external models for a system. The different models serve different needs. The versioning patterns can be used for both but external models tend to have some material differences compared with how to handle internal models.

External models do not necesarrily imply dealing with say a third party. It is quite common in an organization for a group of services to interact with each other through an internal model but to interact with the rest of integrations through a less granular external model.

An example of this can be seen in a medical system. Pharmacy, Medical Record Data, and Patient Data may all be implemented as individual services internally but the rest of the system interacts with them as a group through an external model.

The other systems may even be developed by the same teams. Often the choice between an internal vs. an external model is not team based but focused on attributes such as stability of the contract and the concept of knowledge hiding. Though there are other reasons that an external model may be chosen.

External Integrations

One of the most common reasons to choose to add an external integration model is conforming the system to an existing standardized integration model. It is not always a possibility to be able to choose the integration model that others will consume.

At the same time when you integrate to an external model it is not always a good idea to use the external model as system’s internal model. Often there are things that the system will need to support that are not in the shared integration model. It will also be shown later in the chapter that often the external model will want things at a different granularity than what the internal system wants to view them as.

Examples of external integration models could include the Global Justice XML model, HL7v2 for medical data in the UK, or even FIX which is one protocol amongst many in financial systems.

These protocols tend to be created by committee. They are often massively complex and try to handle every possible situation regardless of how unlikely it is. The complexity of trying to normalize all of the medical coding in disparate health care systems is easy to imagine.

//TODO add example

Supporting these protocols can offer a lot of benefit on the large. They can allow for modularity between systems. They can allow other systems to easily integrate with your system. Often times there are even off the shelf tools that support such protocols that can be leveraged if you support the canonical protocol that it supports.

Not all external models come from external integrations however. It is quite common to treat integrations with other services outside of a tight service group as an external integration due to some of the differences between how internal and external models are treated.

Granularity

One of the largest differences between internal and external models is the level of granularity of the events that are provided for the model. Internal and external models have drastically different needs and when looking through the API published via each there will be large differences in the granularity of information provided on an event.

Internal models tend to have very fine-grained events. This is largely due to the separation between services and the focus on a use-case being represented by an event.

1 OrderPlaced : {
2 	orderId : 8282,
3 	customerId : 4432,
4 	products : [5757, 3321],
5 	total : 17.95
6 }

In the simplified example of an internal OrderPlaced event above much of the information provided is normalized. The information is represented by identifiers that point to the customer and product ids. In order for a consumer to interact with this event it must already know who customer 4432 is or to not be interested in that information.

When dealing with internal models the service boundaries chosen for the system tend to be visible within the events in the model. Generally when dealing with service boundaries identifiers are used as opposed to denormalization. The inverse is also true which is why using an external model as your internal model has a tenency to lead towards a more monolithic system.

A consumer of this model would need to listen to multiple events such as ProductCreated and CustomerCreated. It would then likely require some form of intermediate storage to remember who customer 4432 was if it wanted to also look into customer information associated with the order. A common place in systems to see such a behaviour off an internal model is a reporting database.

External models generally prefer more coarse-grained events. An external model will generally denormalize relevant information onto the event. The reasoning behind this is simple, the external consumer only wants to listen to a single event and to be able to act upon it. An external consumer prefers to not have to listen to multiple events with storing intermediary information.

 1 OrderPlaced : {
 2 	orderId : 8282,
 3 	customer : {
 4                    customerId : 4432,
 5                    customerName : 'John Doe',
 6                    postalCode : 'EC1-001',
 7                    customerAge : 35,
 8                    customerStatus : 'Gold'
 9 			   }
10 	products : [
11 					{name : 'razor blade ice cream', productId : 6565},
12 					{name : 'thing you should never eat', productId 4242}
13 		       ],
14 	total : 17.95
15 }

In the example of the external model event instead of only providing ids, the information has been denormalized into the event itself. There is not only a customerId but the information this system deems to be relevant about the customer. Note that this is just an example and I hope such a system would not be as naive as this one.

A consumer however can subscribe to this event and act directly upon it without the need to subscribe to other events. For an outside consumer this is far simpler than needing to subscribe to and understand the interactions of multiple events.

The external model is providing a level of indirection from the internal model. It is also hiding the details of how the internal model works. Much like with any other form of abstraction it allows the internal details of the system to change without breaking the consumer. If the internal model wanted to add a new use case for a ManagerialOverridedProductName, the external consumers of the API would not need to be aware of it as they would still receive just the OrderPlaced with the product information associated.

At the same time the internal system is free to change it’s service boundaries or how it handles it’s internal eventing without affecting the external consumers. Said simply the external model provides a level of indirection from the internal model.

Rate of Change

Levels of indirection are often useful, they are used regularly in code. Whether we discuss an interface, a function being passed, or a facade all represent methods of indirection. All can provide reusability as well as information hiding.

As with most cases of indirection the contract associated generally has more design work associated with it. Changes to a contract associated with indirection tend to be expensive in comparison to other types of changes in software. The reason for this is that changes to a contract require a change in every user of the contract as well. When all of the consumers are in the same codebase this can be automated with tools such as intellij but when they are not expense can escalate quickly.

External models have the same tradeoffs associated with them. Changes to external models will possibly affect all consumers of the external model. For this reason external models generally are more formalized than internal models and changes to external models generally go through further diligence than changes to internal models.

External models will generally be designed up front with the idea of extensibility. They will normally be formalized to media types or described in a schema language. Interactions will generally be well documented for consumers.

Changes to external models often happen at a glacial pace unless the model has been specifically designed for extensibility. Breaking changes to external models rarely if ever happen. I have seen numerous projects utilizing an external model that had embarassing spelling mistakes in them that the team will keep forever because they loath the idea of introducing a possibly breaking change into their model. The costs of a breaking change far outweigh the benefits of removing the embarassing spelling mistakes.

Due to the worry of breaking changes many external models are built using weak-schema. Adding things to the external model will not break things. External models are almost never built using the type based versioning discussed though many frameworks push developers to do this.

When dealing with external models and versioning if pushing an event through a stage that then produces a similar event to the next stage or enriches the same event. Deserialization to a type is generally avoided. Instead a rule is put in place that “things that you do not understand” must be copied to your output. This avoids a change being required when an additive change is made.

It is common to use a negotiation model such as in the Hypermedia chapter in association with an external model to further isolate external consumers from changes to the externa model.

Internal models are the opposite of this. If you have a small cluster of services that are under control of the same group, changes are not nearly as dire as when dealing with a large number of external consumers.

The introduction of an external model to the system can prune many of the consumers. This pruning of consumers can allow more agility in dealing with the internal model as there are less coupling points to consider. A closely related set of services and a read model or two can evolve quite quickly together without having drastic effects on the majority of the downstream consumers associated.

How to Implement

When faced with an internal vs an external model decision, many developers want to stick to a single model. This is a natural tendency to consider that having one model will be simpler than maintaining two. The focus is generally to either make the external model the internal model or vice-versa.

This in practice can introduce a large amount of complexity to the system. The two models have different purposes and have quite different levels of granularity. Another common issue to arise here is what if the system needs to support multiple external models say HL7v2 and HL7v3 or some other protocol for medical information?

A common implementation to introduce an external model is to introduce a service that acts as a Message Translator. This service listens to events happening on the internal model. It may or may not have intermediary storage in order to denormalize information to the external model.

There are a few benefits to using this strategy. The first is that all of the code associated with the translation to the external model is located in a single service. The rest of the services know nothing of the external model. A secondary benefit is that it is quite reasonable to introduce multiple of these services to support multiple external models.

Multiple transalators are just an addition of an additional translator. For systems that are deployed on a per client basis, the translator services often become optional components of the system where zero or many may be deployed for a given client.

Another consideration to keep in mind is whether to buy or build the translator. For widespread protocols there are often existing adapters built into existing messaging products. These existing adapters often handle more tricky aspects of the external model such as validation and normalization of data within the messages. I know of a client who uses biztalk solely for the HL7 integration, they don’t actually use it for handling their own messaging, only as a translator.

Summary

There are many differences between internal and external models. Smaller systems can usually get away with only using an internal model. As the system size and complexity grows however it is often needed to introduce levels of indirection via the introduction of an external model. External models are often used even to other services maintained by the same team, the distintion is how the model is used not who uses it.

Internal and external models operate at quite different levels of granularity. Internal models tend to be very fine-grained while external models are more coarse. This difference is due to how consumers of the model view them and how they wish to use the model. Attempting to make a more coarse-grained internal model to also be used as an external model will likely start to blur your service boundaries.

Rate of change and overall risk of change are quite different between the two models as well. External models tend to be much more conservative in how they approach change compared to internal models. A common pattern is to introduce an external model to allow the internal model to be more agile in regards to change.

Up next

Versioning Process Managers