Cassandra and Consistency - A Simplified Explaination

Cassandra is architected for read-write (RW) anywhere, enabling any client to connect to any node in any datacenter (DC). The concept of a primary or single master node does not exist in Cassandra. This allows for drastically increased RW times as master node is not taking all the traffic. The flip side is that there is not a single source of truth (SSOT) as you would see in a RDBMS such as MySQL where you assume the master or primary is always correct. Cassandra has abilities to manage this though.

REPLICATION FACTOR (RF): This determines how many copies of data exists.
CONSISTENCY LEVEL (CL): This determines how many nodes must acknowledge a read or write before an acknowledgement of a write commit or the data is returned. Examples would be ALL, ONE, QUORUM, etc. Effectively a level of assurance that a write occurred or your read is against the freshest data.

Let’s look at two nodes for example.

WRITES: If the data is simply being split between the two nodes evenly and goes down, you have effectively lost half your data. To overcome this for writes, you would need to set the CL to ALL (or TWO) to ensure data was written to both nodes, or set it to ONE with an RF of TWO to ensure data was copied to the other node. At first this seems like plausible strategy, but there are a couple issues. If the CL is set to the total number of nodes or ALL and one goes down, you do not meet the CL to do an operation and it will fail, which leaves us with only the option of doing a CL of ONE with an RL of 2, which works for writes.

READS: The problem with the CL=1 and RL=2 strategy comes from the reads. With a CL of 1, when an object is updated it will replicate, but a read on that data may occur before the replication happens. If the update occurs on NODE A and the client checks NODE B, how does it know the data is fresh or if it is waiting on replication? The only way is to compare the timestamp to another node to ensure consistency. Once again the issue then becomes is you have a CL of ALL or TWO, a single node outage prohibits this analysis, and disables reads.

So effectively with just two nodes, if a single node goes down due to issue or maintenance, you either sacrifice redundancy or consistency.

QUORUM: A majority of nodes.

CL does not just support ALL or a hard number set, it also supports a QUORUM. This can be a local or all-encompassing majority when you look at regionally separated, fully redundant DCs for example. With a three node cluster, the majority, or quorum, would be two. This then allows for assured redundancy and consistency, even in the event of a single node outage.


As to why three, because it is the minimum number to ensure redundancy, consistency, and the capability to do a quorum. With Cassandra the more nodes the better however. If you have enough resources to stand up 5 Cassandra nodes in a DC, there is no downside that I know of. 

Comments

Post a Comment

Popular posts from this blog

Firewall, IDS, IDP, WAF, API Gateway: Choose Your Shield

API Security - The Next Generation with Elastic Beam

REST API Best Practices: HTTP Status Codes and You, Part 2(xx) - Status 200; Success!