Leanpub: Publish Early, Publish Often

Basic Type Based Versioning

Throughout this chapter, we will look at how to version events in an Event Sourced system. We will, however, take on a few constraints that make the problem both more difficult and more unrealistic. We will assume that we are forced to use a serializer that has no support for versioning (such as a .NET or Java binary serializer, for example).

Even in such an unusual case, we can still handle versioning, but it may result in unusual conditions, depending on the circumstances.

In order to look through a concrete example, I will use the SimpleCQRS example. In particular, we will focus on how to version a change to the Deactivate behavior.

Getting Started

Most developers have dealt with versioning at some level before. For instance, when releasing a new version of software, it is common to upgrade a SQL database from one schema to another. When dealing with such changes, developers need to consider the differences between breaking and non-breaking changes.

Adding a table
Renaming a column
Deleting a column
Adding a view
Deleting a stored procedure
Adding a stored procedure

Normally, additive changes are non-breaking, though I once heard a funny story about a table named “foo” that put all the stored procedures into a special “debug mode”. We should strive, however, to make additive changes non-breaking.

As such, when dealing with versioning, we will insist on only having additive changes as opposed to destructive ones. We will always add a new version of an event. Accordingly, both destructive changes, such as the renaming of a property, as well as additive changes can be made when adding a new version of an event, since the addition of that new event still represents a change that is additive overall.

In the code example, we start with an event that looks like:

1 public class InventoryItemDeactivated : Event {
2     public readonly Guid Id;
3 
4     public InventoryItemDeactivated(Guid id)
5     {
6         Id = id;
7     }
8 }

The software change requested is our inclusion of a reason associated with why the user was deactivating the inventory item, and we will also rename the ID property “ItemId”. This is allowed as it is still an additive change, since it introduces a new version of the event and does not change the existing one.

To start, let’s add a new version of the event. Note that if we were to have started the system with this type of versioning in mind, we would have used the versioned naming format from the beginning.

 1 public class InventoryItemDeactivated_v1 : Event {
 2     public readonly Guid Id;
 3 
 4     public InventoryItemDeactivated_v1(Guid id)
 5     {
 6         Id = id;
 7     }
 8 }
 9 
10 public class InventoryItemDeactivated_v2 : Event {
11     public readonly Guid ItemId;
12     public readonly string Reason;
13 
14     public InventoryItemDeactivated_v2(Guid id, string reason)
15     {
16         ItemId = id;
17         Reason = reason;
18     }
19 }

There are now two versions of the event. Version 1, the original version, contains just an ID, while version 2 contains an ID (renamed as ItemId) and a reason why the user deactivated this item.

Next to edit is the domain model, which will change from something like:

1 public void Deactivate()
2 {
3     if(!_activated) throw new InvalidOperationException("already deactivated");
4     ApplyChange(new InventoryItemDeactivated_v1(_id));
5 }

1 public void Deactivate(string reason)
2 {
3     if(!_activated) throw new InvalidOperationException("already deactivated");
4     ApplyChange(new InventoryItemDeactivated_v2(_id, reason));
5 }

We will also add a new Apply method to the InventoryItem class to support the second version of the event.

1 private void Apply(InventoryItemDeactivated_v2 e)
2 {
3     _activated = false;
4 }

And we’re done. Our aggregate will write out version 2 of the event in the future and supports replaying of version 1 and version 2 events.

However, there are still a few issues here. What happens when get to InventoryItemDeactivated_v17? What if there are five behaviors and we have now versioned all of them?

Method explosion. I have always joked in class that this is why the concept of code folding/regions was added to editors. We will just hide that code.

Define a Version of an Event

In our description of a new version of an event above, an important detail was left out.

This rule is quite important. It also implies that any version of any event should be convertible from any version of a given event. If you find yourself trying to figure out how to convert your old event (Evil Kinevil jumping over a school bus on a motorcycle) to your new version of the event (a monkey eating a banana) and you can’t, this is because you have a new event and not a new version of of the old event. A new version of an event is by definition convertible from the last version of the event.

Provided we can always convert the old version of the event to the new version, we could upcast the version of the event as we read it from the Event Store. The event could be passed through some converters that move it forward in terms of version.

1 InventoryItemDeactivated_v2 ConvertFrom(InventoryItemDeactivated_v1 e) {
2     return new InventoryItemDeactivated_v2(e.Id, "Before we cared about reason.\
3 ");
4 }

The code will now upgrade the event to version 2 of InventoryItemDeactivated. As such, the code will no longer see an InventoryItemDeactivated_v1 event, and the handler in the domain object can be removed. You may notice we have to pass a default for a reason; this is the same decision that gets made when adding a non-nullable column to a table in a SQL database where there are existing rows.

There are, however, additional and more sinister problems with this style of versioning.

What happens when we have a deployment where there are multiple concurrent versions of the software running side by side? If we upgrade node A to be version 2 here and node B is still on version 1, can node B read from the event stream that node A has written a version 2 event? Unfortunately, using this style of serialization requires every node to actually have the type for the event, as the type describes the schema of the event. If a consumer does not have the type, it will not be able to even deserialize that event. One might try to move around types of events, or we could move away from using this type system as our schema.

This applies to all consumers: projections, other version nodes, and downstream consumers.

There are some implementations that exist that try to move the schema resolution to a central service. This can have pros and cons. On the one hand, it can remove some of the updating of consumer versioning issues, as they will request the schema on demand from a centralized service and apply it dynamically. On the other hand, it can introduce significant complexity, need for a framework, and a possible availability problem to the system. I have not worked enough with production systems doing this to give a proper weighing of these concerns, though, and tend to lean towards other mechanisms at the time of writing.

Double Write

There is a common way in which people try to avoid the issues that arise from having old subscribers that do not understand the new version of the event. The idea is to have the new version of the producer write both the _v1 and the _v2 versions of the event when it writes. This is also known as Double Publish.

As an example, when releasing version 2 in this scenario, Version 2 will write both version 1 and version 2 of the deactivated event to the stream. When the other node running version 1 reads from the stream, it will ignore the version 2 instance of the event, which it does not understand, and will process version 1 of the event, which it does.

A rule must be followed in order to make this work. At any given point in time, you must only handle the version of the event you understand and ignore all others.

Over time, the old version of the event will be deprecated. This gives subscribers time to catch up.

This methodology is not unusual in a distributed system, where works reasonably well. It does not, however, work that well in an Event Sourced system. In an Event Sourced system, it will work fine on the stable situation, but will fail when you need to replay a projection.

When replaying a projection that currently supports version 3 of the event, what do we do when we receive a version 2 of the event that was written before the version 3 existed? A matching version 3 of the event will not be received if we ignore the version 2. To complicate matters further, there are some places where a version 2 will be written with a matching version 3 event and others where it will not be. This is due to the running of version 2 and version 3 side by side. These same issues exist when trying to hydrate state off of the event stream.

There are some ways to work around these issues. For instance, when receiving a version 2, you could look ahead to see if the next event is a version 3. If so, ignore it or upconvert it to a version 3. These mechanisms, however, quickly become quite complex.

Another option here is to make absolutely sure that everything is idempotent (including your domain objects, which can be a bit weird) and write the multiple versions with the same ID. This is feasible, but can be very costly. If you insist on going down this road, the easiest solution is to make sure the producer puts out all the versions with the same ID. Reading forward, look to the next event and to see if it has the same id, and processing if not. This is the least hacky way of handling it.

Although this mechanism works well under some circumstances for event-based systems, it is not recommended for Event Sourced systems due to the issues it introduces when replaying a projection.

But…

You should generally avoid versioning your system via types in this way. Using types to specify schema leads to the inevitable conclusion that all consumers must be updated to understand the schema before a producer is. While this may seem reasonable when you have three consumers, this completely falls apart when you have 300.

Accepting the constraint that our serializer has absolutely no versioning support seems an odd place to start when almost all serializers have some level of versioning support (json, xml, protobufs, etc.). Accordingly, let’s try removing this constraint.

Up next

Weak Schema