Cassandra is awesome at time series
Cassandra’s data model works well with data in a sequence. That data can be variable in size,
and Cassandra handles large amounts of data excellently. When writing data to Cassandra,
data is sorted and written sequentially to disk. When retrieving data by row key and then by
range, you get a fast and efficient access pattern, due to minimal disk seeks. Time series data is
an excellent fit for this type of pattern. For these examples, we'll use a weather station that is
creating temperature data every minute. You will see how using the row key and sequence can
be a powerful data modeling tool.
Single device per row - Time Series Pattern 1
The simplest model for storing time series data is creating a wide row of data for each source. In
this first example, we will use the weather station ID as the row key. The timestamp of the
reading will be the column name and the temperature the column value (figure 1). Since each
column is dynamic, our row will grow as needed to accommodate the data. We will also get the
built-in sorting of Cassandra to keep everything in order.
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
Now we can insert a few data points for our weather station.
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
A simple query looking for all data on a single weather station.
SELECT event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
A range query looking for data between two dates. This is also known as a slice since it will read
a sequence of data from disk.
SELECT temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
Partitioning to limit row size - Time Series Pattern 2
In some cases, the amount of data gathered for a single device isn't practical to fit onto a single
row. Cassandra can store up to 2 billion columns per row, but if were storing data every second