Leanpub: Publish Early, Publish Often

Weak Schema

The serialization formats discussed up until now operate based on what is known as strong schema. In the type example for the .NET and binary serializer, the schema is stored in the type. This is also known as out-of-band schema, meaning that the schema is not included with the message but is held outside it.

The problem with strong schema, especially when using out-of-band schema such as types, is that without the schema you will not be able to deserialize a given message. This leads to the previously described problem of needing to update a consumer before updating a producer, which is unacceptable in many situations, as the consumer will not be able to deserialize the message otherwise.

Mapping

Most systems today do not use this method of serialization for exactly these reasons. Instead, they will use something like json or xml, combined with what is known as weak-schema or hybrid-schema, to serialize their messages. While this entails more rules that must be followed, it also offers more flexibility, providing the rules are followed.

1 {
2 	foo : 'hello',
3 	bar : 3.9
4 }

When handed this, without being told anything more than it is json, it can be parsed and seen as containing a key, foo, with a string value of hello and a key, bar, with a numeric value of “3.9”. We can use this property to our advantage in terms of dealing with the versioning of messages.

What if instead of deserializing to a type it was instead mapped to? The rules for mapping are simple. When mapping, you look at the json and at the instance.

Exists on json and instance -> value from json
Exists on json but not on instance -> NOP
Exists on instance but not in json -> default value

So, given json

1 {
2 	number : 3.9,
3 	str : 'hello',
4 	other : 15
5 }

when mapped to a type

1 class Foo {
2 	public decimal Number;
3 	public string Str;
4 }

it would produce an output of Foo { Number=3.9, str=”hello”}. If, however, it were mapped to a type such as

1 class Foo {
2 	public decimal Foo;
3 	public string Str;
4 }

it would produce an output of Foo {Foo=0.0, Str=”hello”}.

Such a mapping system is no more than an hour of work, even with a nice fluent interface for handling defaults. However, this concept is especially powerful when dealing with versioning of messages.

In the previous chapter, adding or modifying something on an event required the addition of a new version of the event via adding a new type.

 1     public class InventoryItemDeactivated_v1 : Event {
 2         public readonly Guid Id;
 3 
 4         public InventoryItemDeactivated_v1(Guid id)
 5         {
 6             Id = id;
 7         }
 8     }
 9 
10     public class InventoryItemDeactivated_v2 : Event {
11         public readonly Guid ItemId;
12         public readonly string Reason;
13 
14         public InventoryItemDeactivated_v2(Guid id, string reason)
15         {
16             ItemId = id;
17             Reason = reason;
18         }
19     }

When using mapping, there is no longer an addition of a new version of the event. Instead, you just edit the event already in place.

1     public class InventoryItemDeactivated : Event {
2         public readonly Guid Id;
3         public readonly sting Reason;
4 
5         public InventoryItemDeactivated(Guid id, string reason)
6         {
7             Id = id;
8         }
9     }

The mapping handles the rest. If you have an InventoryItemDeactivated in the first version and you map it to one expecting the second version, it will still work, but Reason will be set to a default value.

If a producer were to begin putting out the second version, a consumer that understood the first version would continue working, seeing as it could map it, and Reason would just be ignored. There are, however, two factors that must be remembered here.

The first is that you are no longer allowed to rename something. If the change were from Id to ItemId, it would not work, as the first version would no longer receive Id. You can get around this by supporting both Id and ItemId, but this can quickly become annoying, especially with an Event Sourced system, where you cannot just deprecate it but must carry it forward into the future.

The second is that there will often be programmatic checks to ensure what you expect to be in the message after the mapping is in fact present. While it may be expected that Id is still there, this may not be a given. As opposed to specifying each of these checks, it is common to support in the mapper.

1 Map.To<InventoryItemDeactivated>()
2 	.WhenAbsent("Reason", "no reason at all")
3 	.WhenAbsentError("Id")

While these types of validations can be defined in code, it might also be worth looking at defining them either through json schema or xsd.

Another option is to use hybrid-schema, where some things are required and some things are not, the latter being treated as being mapped. Protobufs is an example of a format that supports this. The general rule of thumb when working with hybrid-schema is to make the things without which the event would make absolutely no sense a requirement while leaving everything else as optional. In the example of InventoryItemDeactivated, it would make no sense if the Id of the item were not included (It was deactivated, but I won’t tell you which one!). The Reason, however, would remain optional. If no Id was present, a deserialization error would occur, while Reason being absent would just result in a null as the value (and it being set as not present).

Wrapper

Less common than mapping to a type, another option with events is to write/generate a wrapper over the serialized form. While this tends to be more work, and often requires some additional rules to be followed to deal with schema evolution, it can also have some significant advantages.

For example, for the InventoryItemDeactivated event, instead of mapping the data to a new object, a wrapper could be created to wrap the underlying json.

 1 public class InventoryItemDeactivatedWrapper {
 2     public Guid Id { 
 3     	get {return Guid.Parse(_json["id"]);}
 4     	set {_json["id"] = value.ToString();}
 5     }
 6 
 7     public string Reason { 
 8     	get {return _json["reason"];}
 9     	set {_json["reason"] = value;}
10     }   
11 
12 	public InventoryItemDeactivatedWrapper(string json) {
13 	    _json = JObject.Parse(json);
14     }
15 }

The class is then not directly mapping to and from the json; it is instead wrapping it and providing typed access to it. Certain serializers, flatbuffers, for instance, will even generate the wrappers from schema definitions to avoid the overhead of manually creating this type of code.

One of the main advantages of writing a wrapper as opposed to a mapping is that, if you are enriching something, you can move it forward without understanding everything in the message. This “ility” is normally more important in a document-style messaging, but it can also be useful in some circumstances with events (for an example, see internal vs external models). Consider the InventoryItemDeactivated event, but where another consumer had a different version that included a userId.

1 {
2 	id : "24570fb7-a26c-42e8-a9cd-f6bf8060790e",
3 	reason : "none at all",
4 	userId : 42
5 
6 }

If this json were mapped to a Deactivated event containing Id, Reason, and AuthCode and then serialized again, the userId would be lost in the mapping process. By using a wrapper, the userId can be preserved even if it is not understood.

Although the pseudo-code above is not memory-friendly (creating, as it does, an entire tree of JObjects representing the json), other serializers can generate wrappers that operate in place over a byte [] or other such structure without requiring any object creation. Some can even parse on demand as opposed to deserializing the entire buffer. This can become very important in some high-performance systems, as they can operate with zero allocations.

Overall Considerations

For most systems, a simple human-readable format such as json is fine for handling messages. Most production systems use mapping with either XML or json. Mapping with weak-schema will also remove many of the versioning problems associated with type-based strong schema discussed in the previous chapter.

There are scenarios where, either for performance reasons or in order to enable data to “pass through” without being understood, a wrapper may be a better solution than a mapping, but it will require more code. This methodology is, however, more common with messages, where an enrichment process is occurring, such as in a document-based system.

Overall though, there are very few reasons why it may be considered for the type-based mechanism. Once brought out of your immediate system and put into a serialized form, the type system does very little. The upcasting system is often difficult to maintain amongst consumers; even with a centralized schema repository, it requires another service. In most scenarios where messages are being passed to/from disparate services, late-binding tends to be a better option.

Up next

Negotiation