Illustration Image

Cassandra.Link

The best knowledge base on Apache Cassandra®

Helping platform leaders, architects, engineers, and operators build scalable real time data platforms.

10/10/2018

Reading time:2 min

About Data Partitioning in Cassandra — Apache Cassandra 1.1.x documentation

by John Doe

A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a row key. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.Data Distribution in the RingIn Cassandra, the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the data. Before a node can join the ring, it must be assigned a token. The token value determines the node's position in the ring and its range of data. Column family data is partitioned across the nodes based on the row key. To determine the node where the first replica of a row will live, the ring is walked clockwise until it locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). With the nodes sorted in token order, the last node is considered the predecessor of the first node; hence the ring representation.For example, consider a simple four node cluster where all of the row keys managed by the cluster were numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. The first node, the one with token 0, is responsible for the wrapping range (75-0). The node with the lowest token also accepts row keys less than the lowest token and more than the highest token.Understanding the Partitioner TypesWhen you deploy a Cassandra cluster, you must assign a partitioner and assign each node an initial_token value so each node is responsible for roughly an equal amount of data (load balancing). DataStax strongly recommends using the RandomPartitioner (default) for all cluster deployments.To calculate the tokens for nodes in a single data center cluster, you divide the range by the total number of nodes in the cluster. In multiple data center deployments, you calculate the tokens such that each data center is individually load balanced. See Generating Tokens for the different approaches to generating tokens for nodes in single and multiple data center clusters.Unlike almost every other configuration choice in Cassandra, the partitioner may not be changed without reloading all of your data. Therefore, it is important to choose and configure the correct partitioner before initializing your cluster. You set the partitioner in the cassandra.yaml file.Cassandra offers the following partitioners:RandomPartitionerByteOrderedPartitioner

Illustration Image

A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a row key. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.

Data Distribution in the Ring

In Cassandra, the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the data. Before a node can join the ring, it must be assigned a token. The token value determines the node's position in the ring and its range of data. Column family data is partitioned across the nodes based on the row key. To determine the node where the first replica of a row will live, the ring is walked clockwise until it locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between itself (inclusive) and its predecessor (exclusive). With the nodes sorted in token order, the last node is considered the predecessor of the first node; hence the ring representation.

For example, consider a simple four node cluster where all of the row keys managed by the cluster were numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. The first node, the one with token 0, is responsible for the wrapping range (75-0). The node with the lowest token also accepts row keys less than the lowest token and more than the highest token.

../../_images/ring_partitions.png

Understanding the Partitioner Types

When you deploy a Cassandra cluster, you must assign a partitioner and assign each node an initial_token value so each node is responsible for roughly an equal amount of data (load balancing). DataStax strongly recommends using the RandomPartitioner (default) for all cluster deployments.

To calculate the tokens for nodes in a single data center cluster, you divide the range by the total number of nodes in the cluster. In multiple data center deployments, you calculate the tokens such that each data center is individually load balanced. See Generating Tokens for the different approaches to generating tokens for nodes in single and multiple data center clusters.

Unlike almost every other configuration choice in Cassandra, the partitioner may not be changed without reloading all of your data. Therefore, it is important to choose and configure the correct partitioner before initializing your cluster. You set the partitioner in the cassandra.yaml file.

Cassandra offers the following partitioners:

Related Articles

cluster
troubleshooting
datastax

GitHub - arodrime/Montecristo: Datastax Cluster Health Check Tooling

arodrime

4/3/2024

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt! 
We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Join Our Newsletter!

Sign up below to receive email updates and see what's going on with our company

Explore Related Topics

AllKafkaSparkScyllaSStableKubernetesApiGithubGraphQl

Explore Further

cassandra