Application Note: ConfD Active-Active Fully Synchronous HA Clusters

Coming from a real-time operating systems background and during the past four years at Tail-f, I have had the opportunity to learn how network-wide transactions are done through ConfD. I was still blissfully unaware of the complexity of distributed databases and the CAP theorem while researching and writing the “ConfD Active-Active Fully Synchronous HA Clusters” application note. I am relieved to find that at least one assumption delivered as expected, the ease in how ConfD’s built-in transaction engine can be used to solve the complex consistency issues.

The CAP theorem, formulated 15 or so years ago by Eric Brewer, with the mathematical proof published in 2002, states that our ConfD database, when shared in a network, can have at most two of three desirable properties:

  • Consistency (C) equivalent to having a single up-to-date copy of the data (not to be confused with Consistency in ACID);
  • High availability (A) of that data (for updates); and
  • Tolerance to network partitions (P).

You might call ConfD’s transactional configuration datastore “traditional” – focusing on C like when swiping your credit card at an online terminal that checks your balance before approving the transaction or if the balance cannot be retrieved sacrificing A. The “new” part here is using transactions with NETCONF network-wide transactions. ConfD’s operational datastore has more focus on A to provide telemetry that can benefit from a “relaxed” C, allowing an “eventually consistent” strategy when setup in a distributed manner with A being the focus.

So, for most if not all use-cases that need an active-active ConfD setup in a programmable network, likely using NETCONF, we need the P, and since most configuration data in a network needs C, we therefore have to sacrifice some A. Even in NFV applications and cloud-native microservices use-cases.

Some of the devices or virtual instances that are affected by a network-wide transaction may perhaps have a tiny piece of configuration when participating in setting up a service that an operator is providing to their customers. Since the service needs to be successfully setup, all participants in the service setup will likely need to be able to promise C when setting the configuration.

Thanks to the CAP theorem, I at least didn’t have to feel too bad when sacrificing some A when locking the ConfD active-active cluster’s databases while synchronizing a transaction through a cluster wide transaction. Instead, focusing on making the A sacrifice as small as possible using only the existing ConfD APIs while keeping the synchronizing application simple was inspiring and a fun challenge when developing a tiny example.

I look forward to more examples on how to minimize the A sacrifice and exploring for example mixing “eventual C” and “strict C” depending for different parts of the configuration and operational data depending on the requirements for lists, presence container, leafs, etc.

But first, let’s have a Happy Holidays shall we? Here’s to all the CAP theorem contributors.



Leave a reply