Nov 20, 2019

Summary: Replicated Data Consistency Explained Through Baseball

This post served as a part of my database design course (CMPE 226) project. It’s my understanding of the paper Replicated Data Consistency Explained Through Baseball.

As the distributed system architecture has been widely used in cloud computing during recent years, it brings different needs for ensuring data consistency among the nodes in different conditions.

Ideally, the distributed databases are better-off to be possessing the same content at the same time, as if they were a single node. However, not all systems need such the highest consistency level which could sacrifice the performance and/or availability. This is where the numerous choices of consistency models come into play.

The paper introduced six levels of consistency models ranging from Eventual Consistency to Strong Consistency. These six levels of consistency guarantees have different trade-offs in terms of consistency, performance, and availability. Besides, the abstraction of the consistency model from its implementation helps prevent the discussion from going too deep into unnecessary details.

The Strong Consistency, as its name suggests, is the strongest level in terms of consistency. It ensures all the write operations are visible to every single read operation afterward. If a write operation was performed, the read operation after that was guaranteed to be carried out after all the writes were completed. Due to the fact that no read operation could be performed until the change was made, the performance and availability are the worst among all the consistency guarantees.

If the constraint on consistency is not needed, Eventual Consistency could be used to achieve better performance and availability for reading and writing operations. It has the weakest consistency but aced in performance and availability. It does not guarantee the data object to be consistent, but it will be consistent sooner or later.

Additional operations could be added to enhance the constrains on consistency. The following four guarantees are accomplished by using different methods to ensure part of the consistency while not sacrificing performance and availability too much.

Bounded Staleness ensures the read to get a strongly consistent result from a certain period of time ago. Consistent Prefix guarantees the reads will always get the result from previous writes in the right order. It may not be the latest, but it must be a chronologically possible value. Another strategy, Monotonic Reads, has similar ideas, except that it doesn’t guarantee the consecutive readings to be always possible in the past. However, it does ensure that the newly read results would always be newer than those of older reads. Read My Writes is the strategy to go if only one client is doing all the writings so that the writer always gets the newest results, but no guarantees for other readers.

In order to demonstrate how each guarantee fits different requirements, a few roles in a baseball game were examed to see how their needs require their respective guarantees.

The official scorekeeper is the single entity that is responsible to provide the most updated score, so the newest previous score is required. This narrows the options down to Strong Consistency and Read My Writes. Considering the scorekeeper does not need to visit multiple replications and no other it writes as well, the Read My Writes guarantee is the most appropriate strategy to take.

The umpire determines the final result of the game, as well as when to end the game. The scores of the two teams must be most updated at the umpire’s read. Besides, the umpire never writes. Thus, Strong Consistency is required for the umpire to do the job.

The radio reporter periodically reports the score via radio. It’s reading should be chronologically correct and should not contradict the previous ones. This requirement makes Consistent Prefix along with Monotonic Reads required and nothing else.

The sportswriter doesn’t care about the result until the game is over. As long as the data is up-to-date at the time when the writer starts writing about that game. Bounded Staleness is perfect for this purpose.

The statistician records the score for the team throughout the seasons. It reads its own written stat before writing the new value. As discussed in scorekeeper, Read My Writes is good for that. Besides, it also needs to read today’s score as a part of writing a new value, which requires Strong Consistency or Bounded Staleness depending on the time frame.

There are also stat watchers who only chat with friends about the stats and it does not need to always get the latest results. Eventual Consistency should be enough for them.

After analyzing this baseball game application of the guarantee strategies, it turns out that, even within such a simple application, all six guarantees are useful, and different people could have their own needs. Since the discussion does not touch the implementation layer, it should be straightforward to use modulized implementation for different strategies. Thus, if any client wants to switch to another guarantee strategy, it could be accomplished without any pain.