Dzone Refcard 153 Apache Cassandra 2020
Dzone Refcard 153 Apache Cassandra 2020
Apache
CONTENTS
∙ Introduction
∙ Who is Using Cassandra?
∙ Data Model Overview
Cassandra
∙ Schema
∙ Table
∙ Rows
∙ Columns
∙ Partitioning
∙ Replication
∙ And More...
WRITTEN BY BRIAN O’NEIL
ARCHITECT, IRON MOUNTAIN
CASSANDRA RDBMS
1
The future of scale-out,
cloud-native data is
Apache Cassandra™
Get Started
"Astra is hands-down the best solution for Cassandra developer productivity. It eliminates all of the overhead involved in
setting up Cassandra. With Astra, developers can fully automate their CI/CD pipelines for Cassandra support. This means
they can concentrate on more important tasks."
#DataStax
REFCARD | APACHE CASSANDRA
text, and stores the bytes in column keys. The timestamp portion of the
Stores text as UTF-8 UTF-8
varchar column is used to sequence mutations. The timestamp is defined and
timeuuid Version 1 UUID only 16 bytes specified by the client. Newer versions of Cassandra drivers provide
this functionality out of the box. Client application servers should
uuid Suitable for UUID storage 16 bytes
have synchronized clocks.
frozen A frozen value serializes multiple N/A
components into a single value. Columns may optionally have a time-to-live (TTL), after which
Non-frozen types allow updates
Cassandra asynchronously deletes them. Note that TTLs are defined
to individual fields. Cassandra
treats the value of a frozen type as per cell, so each cell in the row has an independent time-to-live and
a blob. The entire value must be is handled by the Cassandra independently.
overwritten.
inet IP address string in IPv4 or IPv6 N/A HOW DATA IS STORED ON DISK
format, used by the python-cql Using the sstabledump tool, you can inspect how the data is
driver and CQL native protocols
stored on the disk. This is very important if you want to develop
list A collection of one or more N/A intuition about data modeling, reads, and writes in Cassandra.
ordered elements: [literal,
literal, literal]
Given the table defined by:
map A JSON-style array of literals: { N/A CREATE TABLE IF NOT EXISTS symbol_history (
literal : literal, literal : symbol text,
literal ... }
year int,
set A collection of one or more N/A month int,
elements: { literal, literal, day int,
literal } volume bigint,
close double,
tuple A group of 2-3 fields N/A
open double,
low double,
high double,
ROWS
idx text static,
Cassandra 3.x supports tables defined with composite primary keys. PRIMARY KEY ((symbol, year), month, day)
The first part of the primary key is a partition key. Remaining columns ) with CLUSTERING ORDER BY (month desc, day desc);
are clustering columns and define the order of the data on the disk.
For example, let’s say there is a table called users_by_location The data (when deserialized into JSON using the sstabledump tool)
with the following primary key: is stored on the disk in this form:
[
((country, town), birth_year, user_id)
{
"partition" : {
In that case, the (country, town) pair is a partition key (a "key" : [ "CORP", "2016" ],
composite one). All users with the same (country, town) values will "position" : 0
},
be stored together on a single node and replicated together based on
"rows" : [
the replication factor. The rows within the partition will be ordered by {
birth_year and then by user_id. The user_id column provides "type" : "static_block",
uniqueness for the primary key. "position" : 48,
"cells" : [
If the partition key is not separated by parentheses, then the first { "name" : "idx", "value" : "NASDAQ",
"tstamp" : 1457484225583260, "ttl" : 604800,
column in the primary key is considered a partition key. For example,
"expires_
if the primary key is defined by (country, town, birth_year,
at" : 1458089025, "expired" : false }
user_id), then country would be the partition key and town ]
},
would be a clustering column.
{
"type" : "row",
COLUMNS "position" : 48,
A column is a triplet: key, value, and timestamp. The validation "clustering" : [ "1", "5" ],
and comparator on the column family define how Cassandra sorts "deletion_info" : { "deletion_time" :
With Cassandra’s storage model, where each node owns the Hot tip: If possible, it is best to design your data model to use
preceding token space, this results in the following storage allocation Murmur3Partitioner to take advantage of the automatic load
based on the tokens: balancing and decreased administrative overhead of manually
managing token assignment.
ROW KEY MD5 HASH NODE
With OPP, range queries are simplified, and a query may not need lisa 1 2
to consult each node in the ring. This seems like an advantage, but
it comes at a price. Since the partitioner is preserving order, the NETWORKTOPOLOGYSTRATEGY
ring may become unbalanced unless the row keys are naturally The NetworkTopologyStrategy is useful when deploying to
distributed across the token space, as illustrated below: multiple datacenters. It ensures data is replicated across datacenters.
READ
LEVEL EXPECTATION
TWO Returns the most recent data from two of the closest
replicas.
With blue nodes deployed to one datacenter (DC1), green nodes
THREE Returns the most recent data from three of the
deployed to another datacenter (DC2), and a replication factor of two closest replicas.
per each datacenter, one row will be replicated twice in Data Center 1
QUORUM Returns the record after a quorum (n/2 +1) of
(R1, R2) and twice in Data Center 2 (R3, R4). replicas from all datacenters that responded.
Note: Cassandra attempts to write data simultaneously to all target LOCAL_QUORUM Returns the record after a quorum of replicas in the
nodes, then waits for confirmation from the relevant number of nodes current datacenter, as the coordinator has reported.
Avoids latency of communication among datacenters.
needed to satisfy the specified consistency level.
EACH_QUORUM Not supported for reads.
CONSISTENCY LEVELS ALL The client receives the most current data once all
One of the unique characteristics of Cassandra that sets it apart from replicas have responded.
other databases is its approach to consistency. Clients can specify
the consistency level on both read and write operations, trading off
between high availability, consistency, and performance. NETWORK TOPOLOGY
As input into the replication strategy and to efficiently route
WRITE communication, Cassandra uses a snitch to determine the
datacenter and rack of the nodes in the cluster. A snitch is a
LEVEL EXPECTATION
component that detects and informs Cassandra about the network
ANY The write was written in at least one node’s commit topology of the deployment. The snitch dictates what is used in the
log. Provides low latency and a guarantee that a
write never fails. Delivers the lowest consistency strategy options to identify replication groups when configuring
and continuous availability. replication for a keyspace.
THE RACKINFERRINGSNITCH
EC2Snitch Specify the region name in the keyspace
strategy options and dc_suffix in The RackInferringSnitch infers network topology by convention.
cassandra-rackdc.properties. From the IPv4 address (e.g., 9.100.47.75), the snitch uses the
following convention to identify the datacenter and rack:
Ec2MultiRegionSnitch Specify the region name in the keyspace
strategy options and dc_suffix in
cassandra-rackdc.properties. OCTET EXAMPLE INDICATES
3 47 Rack
SIMPLESNITCH
The SimpleSnitch provides Cassandra no information regarding 4 75 Node
Secondary indexes are built in the background automatically without INDEX PATTERNS
blocking reads or writes. To create a secondary index using CQL is There are a few design patterns to implement indexes. Each
straightforward. For example, you can define a table of data about services different query patterns. The patterns leverage the fact that
movie fans, then create a secondary index of states where they live: Cassandra columns are always stored in sorted order and all columns
for a single row reside on a single host.
CREATE TABLE fans ( watcherID uuid, favorite_actor
text, address text, zip int, state text PRIMARY KEY
(watcherID) );
INVERTED INDEXES
First, let’s consider the inverted index pattern. In an inverted index,
CREATE INDEX watcher_state ON fans (state);
columns in one row become row keys in another. Consider the
following dataset, in which users IDs are row keys:
Hot tip: Try to avoid indexes whenever possible. It is (almost) always
a better idea to denormalize data and create a separate table that PARTITION
ROWS/COLUMNS
satisfies a particular query than it is to create an index. KEY
it’s possible for Cassandra to compute the two relevant tokens: 15283 { user_id : BONE42 }
range and retrieving the rows/tokens in that range. quickly return all user IDs within a single zip code by returning all
rows within a single partition. Cassandra simply goes to a single host
Hot tip: Try to avoid queries with multiple partitions whenever based on partition key (zip code) and returns the contents of that
possible. The data should be partitioned based on the access patterns, single partition.
so it is a good idea to group the data in a single partition (or several)
if such queries exist. If you have too many range queries that cannot
TIME SERIES DATA
be satisfied by looking into several partitions, you may want to rethink
When working with time series data, consider partitioning data by
whether Cassandra is the best solution for your use case.
time unit (hourly, daily, weekly, etc.), depending on the rate of events.
That way, all the events in a single period (e.g., one hour) are grouped
RANGE QUERIES WITH RANDOM PARTITIONING
together and can be fetched and/or filtered based on the clustering
The RandomPartitioner provides no guarantees of any kind
columns. TimeWindowCompactionStrategy is specifically designed to
between keys and tokens. In fact, ideally row keys are distributed
work with time series data and is recommended in this scenario.
around the token ring evenly. Thus, the corresponding tokens for
a start key and end key are not useful when trying to retrieve the The TimeWindowCompactionStrategy compacts the all the SSTables
relevant rows from tokens in the ring with the RandomPartitioner. in a single partition per time unit. This allows for extremely fast reads
Consequently, Cassandra must consult all nodes to retrieve the of the data in a single time unit because it guarantees that only one
result. Fortunately, there are well-known design patterns to SSTable will be read.
accommodate range queries. These are described next.
DENORMALIZATION The repair command replicates any updates missed due to downtime
Finally, it is worth noting that each of the indexing strategies as or loss of connectivity. This command ensures consistency across
presented would require two steps to service a query if the request the cluster and obviates the tombstones. You will want to do this
requires the actual column data (e.g., user name). The first step periodically on each node in the cluster (within the window before
would retrieve the keys out of the index. The second step would fetch tombstone purge). The repair process is greatly simplified by using a
each relevant column by row key. We can skip the second step if we tool called Cassandra Reaper (originally developed and open sourced
denormalize the data. by Spotify but taken over and improved by The Last Pickle).
cp $MX4J_HOME/lib/mx4j-tools.jar $CASSANDRA_HOME/lib
HINTED HANDOFF
Similar to read repair, hinted handoff is a background process that The following are key attributes to track per column family:
ensures data integrity and eventual consistency. If a replica is down
in the cluster, the remaining nodes will collect and temporarily ATTRIBUTE PROVIDES
store the data that was intended to be stored on the downed node. Read Count Frequency of reads against the column family.
If the downed node comes back online soon enough (configured by
Read Latency Latency of reads against the column family.
max_hint_window_in_ms option in cassandra.yml), other nodes
will "hand off" the data to it. This way, Cassandra smooths out short Write Count Frequency of writes against the column family.
network or other outages out of the box. Write Latency Latency of writes against the column family.
prevent improper data propagation in the cluster, you will want to $CASSANDRA_HOME/bin/nodetool snapshot
ensure that you have consistency before they get purged.
This will create a snapshot directory in each keyspace data directory.
To ensure consistency, run:
Restoring the snapshot is then a matter of shutting down the node,
>$CASSANDRA_HOME/bin/nodetool repair deleting the commitlogs and the data files in the keyspace, and
copying the snapshot files back into the keyspace directory.
CLIENT DESCRIPTION CQL Cassandra provides an SQL-like query language called the
Cassandra Query Language (CQL). The CQL shell allows you
Pycassa Pycassa is the most well-known Python library for to interact with Cassandra as if it were a SQL database. Start
Cassandra: github.com/pycassa/pycassa the shell with:
$CASSANDRA_HOME/bin/
REST
Datastax has a reference card for CQL available here:
CLIENT DESCRIPTION https://www.datastax.com/sites/default/files/content/
technical-guide/2019-09/cqltop10.final_.pdf
Virgil Virgil is a Java-based REST client for Cassandra:
github.com/hmsonline/virgil
COMMAND LINE INTERFACE (CLI)
RUBY Cassandra also provides a Command Line Interface (CLI), through
which you can perform all schema-related changes. It also allows
CLIENT DESCRIPTION
you to manipulate data. DataStax provides reference cards for both
Ruby Gem Ruby has support for Cassandra via a gem:
CQL and nodetool, available here: https://docs.datastax.com/en/
rubygems.org/gems/cassandra
dse/6.7/cql/cql/cqlQuickReference.html.
Milan Milosevic is Lead Data Engineer in SmartCat, where he leads the team of data engineers and data scientists
implementing end-to-end machine learning and data-intensive solutions. He is also responsible for designing, implementing,
and automating highly available, scalable, distributed cloud architectures (AWS, Apache Cassandra, Ansible, CloudFormation,
Terraform, MongoDB). His primary focus is on Apache Cassandra and monitoring, performance tuning, and automation around it.