Cassandra_Complete_Notes
Cassandra_Complete_Notes
1 What is Cassandra?
Apache Cassandra is a high-performance, distributed NoSQL database system designed to han-
dle large volumes of structured, semi-structured, and unstructured data across many servers
with no single point of failure.
2 Features of Cassandra
• Decentralized: No master-slave; all nodes are equal. Any node can service any request.
• Scalable: Add more nodes easily without downtime. Linear horizontal scalability.
• High Availability: Data replication ensures constant availability. No single point of
failure.
• Fault Tolerance: Handles node failures gracefully. Commit log ensures durability of
writes.
• Tunable Consistency: Configure consistency level per request. Balance between avail-
ability and consistency.
• High Performance: Optimized for fast write operations. Read performance improves
with replication.
• Flexible Schema (Schema-less): Dynamic columns and tables. Easy to evolve data
structure.
• Support for CQL (Cassandra Query Language): SQL-like language. Simplifies
querying and schema design.
1
• Efficient Storage: Uses SSTables and compaction strategies. Handles write-heavy work-
loads efficiently.
• Support for Time-Series Data: Natural fit for IoT, sensor, event data. Fast insert and
retrieve of timestamped values.
3 Cassandra Architecture
Cassandra uses a peer-to-peer distributed architecture. All nodes are equal, and data is parti-
tioned across them using consistent hashing.
2
• Cassandra Query Language (CQL): SQL-like syntax to interact with Cassandra. Sup-
ports DDL and DML operations. Easy to use and familiar for developers.
Here, user_id is the partition key, determining which node stores the row. Each row can
have additional columns added dynamically (e.g., phone_number).
3
5.5 How It Works
• Partitioning: The partition key (e.g., user_id) is hashed to determine the node(s)
where the data resides.
• Clustering: Clustering columns (if any) sort data within a partition.
• Dynamic Columns: A row for user_id=123 might have columns username, email,
while another for user_id=456 might include phone.
• Timestamps: Each columns timestamp ensures the latest write wins in case of conflicts.
4
8 Real-world Example Use Cases
• Messaging Applications: WhatsApp, Messenger handle millions of messages per sec-
ond.
• IoT & Sensor Data: Collect time-series data from millions of devices.
• E-commerce: Real-time product catalog updates. Customer behavior tracking.
9 Advantages of Cassandra
• No single point of failure.
• Handles massive amounts of writes.
• High availability and fault tolerance.
• Linear scalability.
• Flexible data model.
• SQL-like query language (CQL).
10 Disadvantages of Cassandra
• Not suitable for complex joins and aggregations.
• Eventual consistency may not fit all use cases.
• Higher operational complexity.
• Schema changes need caution.
• Learning curve for data modeling.