March 19, 2024

Overview of Conflict-free Replicated Data Type (CRDT) in HarperDB 4.3

Welcome to Community Posts
Click below to read the full article.
Arrow
Summary of What to Expect
Table of Contents

Introduction to CRDT in 4.3

HarperDB 4.3 includes new support for conflict-free replicated data types (CRDT) and operations. At a high-level CRDTs provide a definition for updating data in a way that can be concurrently and independently executed across different nodes in a distributed environment, without involving any locking, and then these data updates can be replicated and merged on each node in a deterministic manner that results in a consistent resolution across a cluster.

CRDTs can take many forms, and a broad spectrum of data types and operations can be merged according to the principles of CRDT. Many of these data types and operations are planned for future release, but for now, HarperDB 4.3 includes basic CRDT capabilities. Specifically, it includes support for merging independent property updates and property value incrementation (or decrementing).

NEW | Property Update Merging

Merging separate property updates allows a single property in a record to be updated and merged with other record updates that have affected different properties. This type of fine-grained update is now the default behavior for application code that updates properties. For example, if we had defined a post handler that could update various properties:

export class Product extends tables.Product {
  post(data) {
    if (data.action === 'update-name')
      this.name = data.value;
    else if (data.action === 'update-inventoryCount')
      this.inventoryCount = data.value;
  }

In this example, each action can specify a different individual property that is updated (this is automatically saved, and committed to the database when the method finishes). Now, if there is an update to a name issued on node A, and an update to the inventoryCount that is issued on node B, both of these updates can be merged together consistently across all nodes as part of the replication process. Note that if the same property is updated in different nodes, then existing rules of last-writer-wins will apply to determine the final resulting value (and this will also be consistent across the cluster).

This can also be used directly from our REST interface. Whenever we use a PATCH method to update a record, the updates are recorded and replicated as individual property updates. On the other hand, you can choose to do full record updates with the PUT method, which follows the standard rules of the last-writer-wins for the entire record (no properties will be merged with a PUT update).

NEW | CRDT Distributed Incrementation

This example also leads to the next capability: incrementation. If we are dealing with inventory counts, we probably want to use our new incrementation/decrementation capabilities. If the inventory is increased by 5 on node A, and decreased by 1 on node B, we do not simply want to use property updates. That is, if node A starts with inventoryCount of 5 and increases it to 10 and saves the property value of 10, and node B starts with inventoryCount of 5 and decreases it to 4, we do not want the last property value of 10 or 4, both of those are an incorrect summation of the total changes. Instead, we want to use the new addTo method to explicitly increase and decrease the inventoryCount. We will update our method to:

export class Product extends tables.Product {
  post(data) {
    if (data.action === 'update-name')
      this.name = data.value;
    else if (data.action === 'update-inventoryCount')
      this.addTo('inventoryCount', data.value);
  }

Now, we are explicitly indicating that we are increasing/decreasing the value of inventoryCount instead of just replacing it. And now, if we start with an inventoryCount of 5 and we issue an update to increase the count by 5 on node A and decrease the count by 1 on node B, the resulting inventoryCount after replication and merging will be 9, just as it should be.

Enhanced Capabilities for Distributed Applications

This incrementation capability opens up powerful possibilities for tracking quickly changing counts across a cluster. We also intend to use this capability to drive rate-limiting functionality, as it is the foundation of accurate and distributed count tracking.

HarperDB’s CRDTs is exciting new functionality that pushes the limits of what is possible with a distributed application database platform.