Why can’t I update an event?
It continues to amaze me that, with every architectural or data style, there always comes a central question that defines it. When looking at document databases, the question that defines them is, “How do I write these two documents in a transaction?”
This question is actually quite reasonable from the perspective of someone whose career has been spent working with SQL databases, but, at the same time, it is completely unreasonable from the perspective of document databases. If you are trying to update multiple documents in a transaction, it most likely means that your model is wrong. Once you have a sharded system, attempting to update two documents in a transaction has many trade-offs in terms of transaction coordination. It affects everything.
Similarly, there is also such a question for those coming into Event Sourcing and dealing with an Event Store: “Why can’t I update an event?”
In dealing with support for EventStore (note in this text I will use Event Store as a general term and EventStore to discuss GetEventStore), we get this question often. The notion of not editing data is confusing for many people. Such systems are not, generally speaking, harder or easier to work with than other systems, but they are different.
Yet, however many times this question comes up, it is in fact the wrong one. While at face value it may seem obvious that you would want to be able to edit an existing event in your system, in reality, there are many circumstances where you should avoid it at all costs. “I put some bad data in this event, so I’m going to edit it quickly” may seem like a simple way to solve the issue. In many systems people have dealt with previously, an admin may simply log into Toad and issue an update statement against a table. In certain systems, however, this is a no-no.
Immutability
Much is gained in terms of simplicity as well as scalability in log-based systems because the data is immutable. Many developers find the same thing when moving to functional programming; it takes a while to learn how to work in an immutable way because it is different. However, many problems go away with immutability.
It is an old joke that there are only two hard problems left in Computer Science
- Naming Things
- Cache Invalidation
- Off-by-one errors
How many times have you had bad data in a cache? Ever asked a user to delete their local cache? This issue can be a difficult one, often with complex hacky cache invalidation schemes put into place to try to mitigate the risk of cache invalidation issues.
Cache invalidation as an issue goes away when data is immutable. It is, by definition, infinitely cacheable. This allows systems that focus on immutable data to scale in many situations much more simply than other systems. It can allow for the use of off-the-shelf commoditized tools such as reverse proxies and cdns for scaleability, as well as geographic distribution.
And if you allow a single update… Well, your data is now definitely maybe immutable. Also known as mutable. A single update results in all data needing to be treated as mutable as there’s no knowing if or what may have been updated.
How do caches get invalidated on an update? In fact, all consumers, including, say, an in-memory domain object or a projection off to a SQL database, face this same issue.
Consumers
If you edit an event, how will the consumers of that event be notified that the event has changed? As a concrete example, what if you had a projection that was interested in this event? It had received it previously, and had updated a column in a SQL database because of the event. If and when you changed the event, how would it be notified of the change that had occurred?
Other examples can be seen in other types of consumers. It is quite common to have a consumer/consumers listening to a stream that then takes action based upon the events. For instance, a consumer might decide to email a customer a welcome message based on a UserCreated event. Such an email would read something like:
Dear Greg,
Thank you for registering an account with our site. We are grateful for your business. If you need to manage your account, please visit http://oursite.com/accounts/gregyoung/manage
Thanks,
Your beloved business
If we were to go back and edit the username, should an additional email be sent? Is the URI in the original email still valid? How will we manage the process of generating a follow-up email to the user to notify them of the edited information?
Beyond all of this, what happens if you now replay a projection? It will come up with a different answer than the one it had previously, what everything that made a decision based on the original data, and while this can be tracked, it is not a trivial problem. This leads us to the next major issue.
Audit
Your audit log is now suspect. You have no idea at any given point whether your projections to read models actually match up to what your events are.
A large number of Event Sourced systems are Event Sourced specifically because they need an audit trail. If you can edit your audit trail, is it actually an audit trail? Event Sourced systems address this need.
Over the years, I have dealt with many companies that, for legal reasons, had this requirement. Most, however, did not actually meet the legal requirements of being able to submit their audit log as evidence in a court case. The bar for being able to submit your audit log to a court of law is that you must be able to rebuild your current state from your audit log.
Consider for a moment that you disagree with your bank on your balance and they put forward a ledger that does not match the balance they say you should have nor the one you say you have. Would a juror accept this log as having anything to do with fact?
The moment you allow a single edit of an event, maintaining a proper audit log becomes impossible. Immutability is immutable. The moment you allow a single edit, everything becomes suspect.
Worms
The gold standard for maintaining an audit log is to run on disks that do not allow editing, commonly known as WORM (Write Once Read Many) drives. When running on such a device, it is not possible to edit/rewrite a piece of information. This is by design.
The use of WORM drives is not only focused on the auditability of a system but also on its security. They are particularly useful in dealing with an attack vector known as a “super-user attack”, which is where we assume a rogue developer or system administrator is attacking our system and they have root access to the system.
Years ago, I worked at a company called Autotote (currently called Scientific Games) where such an attack happened. At the time, the company handled about 70% of the world’s paramutual betting market and made slot machines. This was obviously a system that would be of interest to an attacker, as it controlled all the money involved in wagers.
In the next cubicle over from me sat a gentleman named Chris Harn (Chris, if you are reading this, please drop me an email). Chris decided that the system was ripe for hacking. We ran a pool known as a pick-6, which is a specialization of a pick-N. The concept is that you pick the winner of N races, with a pick-6 requiring the winner to pick the winner of 6 consecutive races.
It is a very difficult bet to win and often punters would bet (and lose) hundreds of tickets on it. Given the volume of wagers on this type of pool, we would leave the wager at the remote track without shipping all of the details to the host track. At the end of the fourth race in the pick-6 example, the remote system would scan the tickets and then ship over the ones that could still possibly win. By the scan at the end of the fourth race, over 95% of the tickets had been removed.
Chris, however, had a different idea. What he would do is place a bet 1/2/3/4/ALL/ALL. He would then jump on the developer maintenance line, watch the first four races, and edit the bet on disk with a hex editor to be 3/7/9/2/ALL/ALL, or whatever happened to come in. The scan would occur at the end of the fourth race and would pick up his bet as a “possible winner” (while it is actually an assured winner). The system would then ship his bet over to the host system.
Chris got caught. I have always joked that, when dealing with super-user attacks, the culprit gets caught not because they are stupid but because they are unlucky. Chris was very unlucky.
This all happened during the 2002 Breeder’s Cup. For those unfamiliar with American horse race gambling, the Breeder’s Cup is the second largest racing day in America, and this was the largest scandal in American betting in the last century. They put through a bet 1/2/3/4/ALL/ALL (or similar likely with the favourites in each leg), then edited it as discussed. Smooth enough, but there was a problem. The Classic, a 43-1 horse, won the race, which made their tickets the only winning tickets, valued at a $3,100,000 level. Normally, there would be 20 or 30 winners at $100,000 each; instead, there was one winner at $3,100,000. At that level, you can be sure the people running the pool will look into the winning ticket!
Needless to say, Chris was not in the office on Monday morning, though we did meet lots of new friends from the FBI. Meanwhile, there is a Criminal Masterminds episode documenting the whole scandal. You can also read more about it on Wikipedia.
This entire situation could have been avoided by using WORM drives. The ability to rewrite an event often opens up massive holes in security.
Crime
Chris certainly placed himself on the wrong side of the law by editing an event. There are many other cases where editing an event will likely land you in prison.
In police information systems, financial/accounting systems, and even such simple things as border control, the editing of an event is not only considered morally wrong, it is likely a federal crime that comes with a prison sentence greater than five years for anyone found guilty of it. Whether we talk about fraud or embezzlement or discuss the possibility of moving a rogue agent into a state, there are a vast number of existing systems where the concept of editing an event is an impossibility.
Given all the reasons why we may not want to be able to edit an event, the question before us is: how do we handle an Event Sourced system, given such constraints? How can we run on a WORM drive while keeping a proper audit log and retain the ability to handle changes over time? How can we avoid “editing an event”?