10/5/2018

Reading time:5 min

Optimizing Cassandra Performance: Sometimes Two Writes Are Better Than One | SignalFx

by John Doe

SignalFx is a modern monitoring service that ingests, stores and performs real-time streaming analytics on high-volume, high-resolution metric data from companies all over the world.Providing real-time streaming analytics means that we ingest tens of billions of points of time series data per day, and we give our customers the capability to send data at one second resolution. All of this data ends up in Cassandra, which we use as the backend of our time series database (or TSDB).We chose Cassandra for scalability and read and write performance at extremely high load. For operational efficiency, we’ve gone through multiple stages of performance optimization. And we came to the counterintuitive conclusion: sometimes two writes perform better than one.Measuring Overall PerformanceWe built a test environment with a load simulator to measure the difference in Cassandra performance as we moved through each optimization stage. For a constant simulated load measured in data points per second, we monitored and measured:Write volume per secondWrite latency in millisecondsHost CPU load utilizationHost disk writes in bytes per secondWe compared Cassandra performance across these metrics as we transitioned from one stage to the next and show these comparisons in a handful of key charts.Stage 1: Vertical WritesOur Cassandra schema is what you would expect. Each data point consists of a key name (or a time series ID key), a timestamp and a value for that timestamp. Each time series has its own row in the table, and we create tables representing distinct time ranges.TSDB SchemaCREATE TABLE table_0 ( timeseries text time timestamp, value blob, PRIMARY KEY (timeseries, time)) WITH COMPACT STORAGE;As each datapoint comes in, we write it to the appropriate row and column for that time series.It turns out that writing each data point individually is very expensive. In other words, touching every row, every second is a very expensive load pattern. So we decided to buffer data in memory and write multiple points for each time series in a single batch statement.Stage 2: Buffered WritesIn this version of the ingest system, we will write new data into a memory-tier. A migrator process will periodically read data from the memory-tier, write it to Cassandra, and, once it’s safely in Cassandra, remove it from the memory-tier. In other words, the migrator picks up a time range of data and moves it as a batch into Cassandra.The TSDB is now effectively two-tiered. The memory-tier is essentially a higher performance backend for the most recent data. It knows whether data for a specific time range belongs in the memory-tier or on Cassandra, and therefore routes reads and writes appropriately.There are two independent operations here:New points are being ingested on the right side (same as the non-buffered Cassandra case)Batches of points are being written to Cassandra on the left sideThe buffered writes performance shows improvements in efficiency for Cassandra compared to vertical writes. While the write pattern is choppier, Cassandra is doing many fewer writes as the previous stage. These writes are larger, but the host CPU utilization decreases significantly.So buffering was an improvement to performance, although there was more that we can do. In this stage, writing data point-by-point means that a column is created for each data point and this has implications for storage overhead in Cassandra.Stage 3: Packed WritesThe next optimization stage is to have the migrator pack the contents of each batch of points into a single block which it writes to Cassandra. This reduces the number of columns and write operations in each row, which has a larger benefit for storage than CPU.This packed write operation is essentially the same as the previous, buffered case. However, we are writing fewer, bigger objects to Cassandra and the write rate has dropped tremendously. Latency also improves with fewer writes while data writes per disk also drops as blocks are more compact.Stage 4: Persistent LogsWhile the above changes represent significant performance improvements, they introduce a big problem: if a memory-tier server crashes, we lose all of the buffered data. Obviously that is not acceptable. We’ll solve that by writing data to Cassandra as we receive it. Why is that a good idea now when it performed so poorly before?The answer is that we use a more favorable write pattern for Cassandra. The schema has row for each timestamp. Data points arriving at the same time for different time series typically have similar timestamps. Therefore, we can write these data points across a small number of rows in Cassandra instead of a row per time series as we did originally.TSDB Schema Log SchemaCREATE TABLE table_0 ( timeseries text time timestamp, value blob, PRIMARY KEY (timeseries, time)) WITH COMPACT STORAGE; CREATE TABLE table_0 ( stamp text, sequence bigint, value blob, PRIMARY KEY (stamp, sequence)) WITH COMPACT STORAGE;We write data points for different time series in the order they arrive so that we can get them onto persistent storage as quickly as possible. Because this order of arrival is effectively non-deterministic, there’s no efficient way to retrieve a datapoint for a specific time series.This is not a problem as we only read this data when we need to construct the memory-tier of the ingest server; we do random reads from the memory-tier.The data is migrated from the memory-tier just as before. Once it’s been migrated, we can also remove the log data from Cassandra simply by truncating the table in which we store it.With this process, there are clearly more write operations and an increase in disk I/O. However there are no adverse effects on write latency or on CPU load and, of course, our data is protected from a crash.Ongoing OptimizationsWe’ve learned a lot about how to incrementally improve Cassandra performance based on these optimization stages. Our analysis shows that CPU load utilization is much more dependent on the rate of writes than on the volume of data being written. For our very write-heavy workload, we saw a very large efficiency improvement by doing fewer, larger writes. This lets us get much better utilization from our Cassandra cluster.

Read this article if you want to know more about Optimizing Cassandra Performance: Sometimes Two Writes Are Better Than One | SignalFx

SignalFx is a modern monitoring service that ingests, stores and performs real-time streaming analytics on high-volume, high-resolution metric data from companies all over the world.

Providing real-time streaming analytics means that we ingest tens of billions of points of time series data per day, and we give our customers the capability to send data at one second resolution. All of this data ends up in Cassandra, which we use as the backend of our time series database (or TSDB).

We chose Cassandra for scalability and read and write performance at extremely high load. For operational efficiency, we’ve gone through multiple stages of performance optimization. And we came to the counterintuitive conclusion: sometimes two writes perform better than one.

Measuring Overall Performance

We built a test environment with a load simulator to measure the difference in Cassandra performance as we moved through each optimization stage. For a constant simulated load measured in data points per second, we monitored and measured:

Write volume per second
Write latency in milliseconds
Host CPU load utilization
Host disk writes in bytes per second

We compared Cassandra performance across these metrics as we transitioned from one stage to the next and show these comparisons in a handful of key charts.

Stage 1: Vertical Writes

Our Cassandra schema is what you would expect. Each data point consists of a key name (or a time series ID key), a timestamp and a value for that timestamp. Each time series has its own row in the table, and we create tables representing distinct time ranges.

TSDB Schema

CREATE TABLE table_0 (
  timeseries text
  time timestamp,
  value blob,
  PRIMARY KEY (timeseries, time)
) WITH COMPACT STORAGE;

As each datapoint comes in, we write it to the appropriate row and column for that time series.

Cassandra 1

It turns out that writing each data point individually is very expensive. In other words, touching every row, every second is a very expensive load pattern. So we decided to buffer data in memory and write multiple points for each time series in a single batch statement.

Stage 2: Buffered Writes

In this version of the ingest system, we will write new data into a memory-tier. A migrator process will periodically read data from the memory-tier, write it to Cassandra, and, once it’s safely in Cassandra, remove it from the memory-tier. In other words, the migrator picks up a time range of data and moves it as a batch into Cassandra.

The TSDB is now effectively two-tiered. The memory-tier is essentially a higher performance backend for the most recent data. It knows whether data for a specific time range belongs in the memory-tier or on Cassandra, and therefore routes reads and writes appropriately.

Cassandra 2

There are two independent operations here:

New points are being ingested on the right side (same as the non-buffered Cassandra case)
Batches of points are being written to Cassandra on the left side

The buffered writes performance shows improvements in efficiency for Cassandra compared to vertical writes. While the write pattern is choppier, Cassandra is doing many fewer writes as the previous stage. These writes are larger, but the host CPU utilization decreases significantly.

Cassandra stage 1 to 2

So buffering was an improvement to performance, although there was more that we can do. In this stage, writing data point-by-point means that a column is created for each data point and this has implications for storage overhead in Cassandra.

Stage 3: Packed Writes

The next optimization stage is to have the migrator pack the contents of each batch of points into a single block which it writes to Cassandra. This reduces the number of columns and write operations in each row, which has a larger benefit for storage than CPU.

Cassandra 3

This packed write operation is essentially the same as the previous, buffered case. However, we are writing fewer, bigger objects to Cassandra and the write rate has dropped tremendously. Latency also improves with fewer writes while data writes per disk also drops as blocks are more compact.

Stage 4: Persistent Logs

While the above changes represent significant performance improvements, they introduce a big problem: if a memory-tier server crashes, we lose all of the buffered data. Obviously that is not acceptable. We’ll solve that by writing data to Cassandra as we receive it. Why is that a good idea now when it performed so poorly before?

The answer is that we use a more favorable write pattern for Cassandra. The schema has row for each timestamp. Data points arriving at the same time for different time series typically have similar timestamps. Therefore, we can write these data points across a small number of rows in Cassandra instead of a row per time series as we did originally.

TSDB Schema

Log Schema

CREATE TABLE table_0 (
  timeseries text
  time timestamp,
  value blob,
  PRIMARY KEY (timeseries, time)
) WITH COMPACT STORAGE;

CREATE TABLE table_0 (
  stamp text,
  sequence bigint,
  value blob,
  PRIMARY KEY (stamp, sequence)
) WITH COMPACT STORAGE;

We write data points for different time series in the order they arrive so that we can get them onto persistent storage as quickly as possible. Because this order of arrival is effectively non-deterministic, there’s no efficient way to retrieve a datapoint for a specific time series.

This is not a problem as we only read this data when we need to construct the memory-tier of the ingest server; we do random reads from the memory-tier.

Cassandra 4

The data is migrated from the memory-tier just as before. Once it’s been migrated, we can also remove the log data from Cassandra simply by truncating the table in which we store it.

Cassandra 3 to 4

With this process, there are clearly more write operations and an increase in disk I/O. However there are no adverse effects on write latency or on CPU load and, of course, our data is protected from a crash.

Ongoing Optimizations

We’ve learned a lot about how to incrementally improve Cassandra performance based on these optimization stages. Our analysis shows that CPU load utilization is much more dependent on the rate of writes than on the volume of data being written. For our very write-heavy workload, we saw a very large efficiency improvement by doing fewer, larger writes. This lets us get much better utilization from our Cassandra cluster.

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Make your contribution and score a FREE Planet Cassandra Contributor T-Shirt!  We value our incredible Cassandra community, and we want to express our gratitude by sending an exclusive Planet Cassandra Contributor T-Shirt you can wear with pride.

Measuring Overall Performance

Stage 1: Vertical Writes

TSDB Schema

Stage 2: Buffered Writes

Stage 3: Packed Writes

Stage 4: Persistent Logs

TSDB Schema

Log Schema

Ongoing Optimizations

Checkout Planet Cassandra

Claim Your Free Planet Cassandra Contributor T-shirt!

Contact Info

Resources

Properties

Follow Us