General Versioning Concerns

There are lots of general concerns around creating an managing events that are orthogonal to any form of serialization. Over the years I have seen many teams get into trouble as although they had a good grasp on the issues that can come up with varying forms of serialization and interchange, they made mistakes in other more general areas.

This chapter will focus on the other more general areas of versioning in an Event Sourced system. Many apply to messaging systems in general but some become more important when faced with the need to be able to replay the history of a system by replaying events. These areas apply whichever serialization mechanisms are chosen though many can be fixed after the fact using patterns found later in the book.

Versioning of Behaviour

The versioning of behaviour is probably the most common question people new to Event Sourcing run into. Often teams will push forward on a project without having understood an appropriate answer to this question which can lead to interesting places. The question is typically phrased in similar terms to “How do I version behaviour of my system as my domain logic changes?”

The thought here is when hydrating state to validate a command logic in the domain can change over time. If replaying events to get to current state how do you ensure that the same logic that was used when producing the event is also used when applying the event to hydrate a piece of state?

The answer is: you don’t.

The basis of the question misses a lot. The quintessential example of such a case would be a tax calculation. You could when hydrating your piece of state use something like the following using a standard OO way of structuring the domain object.

1 public void Apply(ItemSold e) {
2 	_tax = e.Subtotal * .18;
3 }

This will work fine for handling a tax of 18%. This however is a time bomb just waiting to go off. What will happen when they decide that as of Jan 1 2017 the tax will be 19%? The easy answer would be to go back and change your method.

1 public void Apply(ItemSold e) {
2 	_tax = e.Subtotal * .19;
3 }

When a projection is replayed what will happen? Remember that the domain object is a projection the same as a read model it just happens to be building a piece of state used in the domain model. This issue affects all projections. The next time an event gets replayed through this it will take a transaction that occurred on June 17, 2016 and say the tax is 19% of the subtotal. Do you call up all your customers from previous years and ask them to send the difference that occurs in tax? The rule starts from Jan 1 2017, it does not apply to previous events.

What many people fall into the trap of is putting logic for this into their handler. To try to versionise busines logic.

1 public void Apply(ItemSold e) {
2     if(e.Date < Convert.ToDateTime("1/1/2017")) {
3 		_tax = e.Subtotal * .18;
4     } else {
5         _tax = e.Subtotal * .19;
6 	}
7 }

The logic is now looking at the datetime of the event and deciding which business logic should be used. What about after five years? What about for seven rules? This code very quickly spirals out of control. It is important to remember as well that this code needs to be applied to every projection that wants to look at tax on this particular event.

In single-temporal systems this is usually a bad idea. In multi-temporal systems such as accounting which is bi-temporal (transactions do not necessarily apply at the time they are accepted) there are times where it is required to implement such things.

Instead of doing this what should be done is to enrich the event with the result of the tax calculation at the time that it is being created. In other words put a property Tax on the event and when creating the event (in the command context, or input event context if events only).

1 public void SellItem(...) {
2 	//snip
3 	var tax = subtotal * .18;
4 	Apply(new ItemSold(_id,....,tax));
5 }

Later when the tax changes a new version of software can be released to support the new tax rate.

1 public void SellItem(...) {
2 	//snip
3 	var tax = subtotal * .19;
4 	Apply(new ItemSold(_id,....,tax));
5 }

Anything that wants to process the event now just looks at the Tax property of the event and will receive the appropriate value for tax without the need to recalculate it.

Note that in multi-temporal systems it is sometimes a necessity to versionize domain logic and may be required to make calls to external systems as events are processed. This is due to the nature of an event not necessarily applying to now when it is created. I cannot as example always determine what a lookup value will be in the future and may need to ask in the future. When integrating with such systems it is best if those systems are temporal in nature as well.

Exterior Calls

Very closely related to versioning of domain logic is dealing with calls to external systems. Most developers figure out really quickly that calling out actions on external services in the context of any projection is a really bad idea. Try to imagine charging somebody’s credit card, now let’s replay that projection! External calls requesting information have a tendency of slipping through though.

If you have an external call getting information from another service and you later do a replay, will the projection result in the same output after? What if the data in the external service has changed?

Handling external calls is quite similar to the versioning of domain logic. Make the call at the time of the event creation and enrich the information returned from the call onto the event. This allows for deterministic replays as the projection (whether a domain object or a read model) self-contained again. All the information needed to build the state is located in the event stream. Every replay will come out the same.

1 public void SellItem(...) {
2 	//snip
3 	var customerScore = _externalService.GetCustomerScoreFor(customer);
4 	Apply(new ItemSold(_id,....,customerScore));
5 }

This is the ideal world, but you can’t always do this …

There are times with external integrations where the cost of storing may be high or due to legal restrictions the information is not allowed to be stored with the event. I have seen a few such cases over the years. The best example was customer credit information, it was contractually disallowed to store the information that came back from a credit request with the event. As such there were places that when viewed would need to call out to get the credit score. The projection itself was replayable but from a user’s perspective the infomation was non-deterministic. This was understood due to the contractual obligation.

A related problem to this happens when projections start doing lookups to other projections. As example there could be a customer table managed by a CustomerProjection and an orders table managed by an OrdersProjection. Instead of having the OrdersProjection listen to customer events the decision was made to have it lookup in the CustomerProjection what the current Customer Name is to copy into the orders table. This can save time and duplication in the OrdersProjection code.

On a replay of only the Orders projection all of the orders will end up with the current Customer Name not the Customer Name from the time that the original order was placed. A replay will result in different results that the original. This is a tradeoff to the reduced code that was originally gained, the two projections now have a coupling.

In the case of sharing information the two projections can no longer be replayed independently, they instead need to be replayed as a group. It is quite easy to end up with an unmanageable web of dependencies like this though it can save a lot of code in some circumstances.

A common pattern to get around this is to consider the entire read model a single projection and replay the read model as a whole. Replaying all the projections for that given database together will return this situation to being deterministic. While you may be rebuilding things you don’t need to in most systems the performace hit taken is far less costly than the conceptual one to understand the web of dependencies between things. Remember as well that the replay is asynchronous, even it it takes 12 hours, it is not that big of a deal.

In a very large system replaying the entire model might be too costly, imagine a 10TB db. In this case the choice is forced between adding to the complexity of each projection to remove all dependencies between them or managing all the dependencies. The latter is normally done by placing projections into Projection Groups for replay but this can easily be overlooked and should be looked at as a last resort. It is easier and cleaner in most cases to remove the dependencies between the projections.

As mentioned in dealing with tax in the versioning of behaviour, a place to watch out for when changing the results of calculations is that doing so may also be changing the semantic meaning of a given property of an event.

Changing Semantic Meaning

One important aspect of versioning is that semantic meaning cannot change between versions of software. There is no good way for a downstream consumer to understand a semantic meaning change.

The quintessential example of a semantic meaning change is that of a thermostat sensor.

  • 27 degrees.
  • 27 degrees.
  • 27 degrees.
  • 80.6 degrees.
  • 80.6 degrees.

Did it just get very hot around the thermostat or did the thermostat switch its measurements between Celcius and Fahrenheit? How would a downstream consumer be able to know the difference?

When a producer changes the semantic meaning of a property of an event, all downstream consumers must be updated to support the new semantic meaning or to be able to differentiate between the multiple meanings. This becomes far worse in Event Sourced systems as in the future replays need to happen.

In the temperature example, how could a future projection replay that series of events to derive some new information or a new read model? In order to do this it would to understand that at this point in time the value temperature changed from celcius to fahrenheit. It would need to have an if statement similar to:

1 if(event.Date < DateItChanged) {
2     realTemp = event.Temperature;  
3 } else {
4 	realTemp = (event.Temperature - 32) * 5/9;
5 }

To stress the point, this would need to be in every consumer of the thermostat events. When its only one it does not seem that horrific, what happens when there are 5? How easy would it be to subtley forget one of these? Very quickly this becomes completely unmanageable and a more extreme measure will need to be taken such as the Copy-Replace pattern discussed later in the text.

Snapshots

One area where developers tend to run into versioning issues is dealing with the versioning of snapshots. In Event Sourced systems it is common to think of all state as being transient. Domain objects or state in a functional system can be rebuilt by replaying a stream of events.

When the stream of events gets large, say 1000 events it is beneficial to begin to save the state of a replay at a given point. This allows for faster replays. As example a snapshot could be taken at event 990 of the stream and a replay to get the current state would only involve loading the sanpshot at event 990 then playing forward 10 events as opposed to needing to replay 1000 events.

It is worth noting that due to the conceptual and operational costs associated with persisted snapshots they are often not worth implementing. Even replaying 1000 events can be done within an acceptable response time for many systems.

Snapshots however have all the typical versioning problems found when versioning structural data. If a snapshot changes in requirements over time it is likely that it will not be able to be upgraded but will instead need to be rebuilt. If previously a state used by the domain contained

1 {
2    id : 848484,
3    activated : false
4 }

A change in logic required that the state must now also contain the name of the inventory item.

1 {
2    id : 848484,
3    activated : false,
4    name : "Gran Canaria Shirt"
5 }

The information about the name is in the events that have been previously written to the stream. The state as of event 990 would be different at event 990 providing the name was being tracked previously. There is no way to “upgrade” the state as of version 990, instead the entire stream must be replayed to build the new state that also tracks the name.

The normal way of handling this is to rebuild the entire snapshot and to later delete the previously generated snapshot. On a system where down time is acceptable this is relatively easy to do. Bring the system down, recreate the snapshots, bring the system back up. This can also be done via a Big Flip.

When running side by side versions the versioning of snapshots can be a bit more complex as it is required to maintain both version 1 and version 2 snapshots at the same time. When running side by side the general strategy is first to build the new versions of the snapshots next to the old version of the snapshots. Once the snapshots are built the new version of software is released. The new version of software will run next to the old version of software and each will use its own version of snapshots.

At some point later the old version of software and its associated snapshots will be depreceated. The software version dependent on the snapshots is removed first and once there is no longer anything using the snapshots, the snapshots can be deleted as well.

One thing to watch out for is how quickly to depreceate the snapshots. Once the snapshot data is deleted it can become a very expensive operation to rebuild them. The rebuilding is also a prerequisite for attempting to downgrade to an older version as the older version is dependent on the snapshots it uses. It is often a good idea to keep older snapshots around for a while if only to allow the ability to downgrade to an older version of software in case of bugs etc.

Another issue worth considering is that multiple versions of snapshots need to be maintained. This obviously has some complexity associated with it but also requires storing multiple copies of the information. While most systems will not run into a storage problem due to the keeping of multiple snapshot versions it may become an issue in some systems.

Avoid ‘And’

For people who are new to building Event Sourced or event-based systems in general, it is common to want to use the word “And” in an event name. However, this is an anti-pattern that leads to temporal coupling and is likely to become a versioning issue in the future.

A common example of this would be “TicketPaidForAndIssued”. Today the system will handle the payment settlement and the issuing of a ticket at the same time. But will it tomorrow? What would happen if these two things were to become temporally separated?

Developers new to Event Sourced systems have a tendency to think about these systems as being input->output. Under most circumstances, this is actually true. The system receives the command “PlaceOrder” and will then produce an event of “OrderPlaced”. This is not, however, always the case. Event Sourced systems have two sets of Use Cases: those you can tell the system to do, and those the system has said it has done. They do not always align.

A common example of this is looking at a stock market (or any other type of system where orders are matched). We can tell the system that we would like to “PlaceOrder”. The system also has a “TradeOccurred” event that occurs when orders match up based on price. There is, however, no “PlaceTrade” command we can send to the system.

For instance, let’s say we’re given a certain number of people who want to buy something:

Volume Price Instrument
100 $1.00 something
500 $0.99 something
500 $0.99 something
800 $0.98 something

In the stock market there are existing orders waiting to be filled. When you place an order the system checks orders that exist within the market to see if there are any orders that match based upon the price. If there are existing orders that match based upon price trades will occur. As example if there is an order to sell 100@$1.00 and an order is placed to buy 100@$1.00 a trade would occur for 100@1.00.

If we were to execute a command to try to sell 400@$0.99 of something with the book above, the system would first cross the order with the 100@$1.00 (at a price of $0.99 depending on the market). After crossing 100 of the 400 we wanted to sell, it would then cross 300@$0.99 with the second order. As such, from an outside perspective, it would be viewed as:

  • PlaceOrder 400@$0.99 something
  • TradeOccurred 100@$1.00 something
  • TradeOccurred 300@$0.99 something

We could also end up in a slightly different scenario:

  • PlaceOrder 400@$0.99 something
  • TradeOccurred 100@$1.00 something
  • OrderPlaced 300@$0.99 something

If less shares were available at 0.99:

  • PlaceOrder 400@$0.99 something
  • TradeOccurred 100@$1.00 something
  • OrderCancelled

It is important to remember that stimulus and output are not necessarily the same set of use cases. There exist many cases where inputs and outputs of the system may differ both in number and name. Under the usual circumstances, they tend to be aligned, but this is not a given.

Coming back to the use of “And” in an event name: it forces a temporal coupling between the two things that are happening. If, later, these two concepts are split apart, there will be a messy versioning issue, likely requiring a Copy-Replace (to be discussed later) to be made. In short, do not use the word “And” in an event name; use two events instead.