Nosql Databases: P.Krishna Reddy Iiit Hyderabad
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
P.Krishna Reddy
IIIT Hyderabad
Introduction
• The term “database” had become synonymous with
SQL.
value
❑ execute(key, operation, parameters) -- Invoke an
2 Name Value
name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false
Name Value
3
name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
NoSQL Data Storage: Classification
◼ Uninterpreted key/value or ‘the big hash table’.
❑ Amazon S3 (Dynamo)
◼ Flexible schema
❑ BigTable, Cassandra, HBase (ordered keys, semi-
structured data),
❑ Sherpa/PNuts (unordered keys, JSON)
❑ MongoDB (based on JSON)
❑ CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
◼ Three properties of a system
❑ Consistency (all copies have same value)
❑ Availability (system can run even if parts have failed)
❑ Via replication
❑ Partitions (network can break into two or more parts, each
with active systems that can’t talk to other parts)
◼ Brewer’s CAP “Theorem”: You can have at most two
of these three properties for any system
◼ Very large systems will partition at some point
❑ ➔Choose one of consistency or availablity
❑ Traditional database choose consistency
❑ Most Web applications choose availability
◼ Except for specific parts such as order processing
About Eventual Consistency
• To address this, most NoSQL solutions choose to relax the notion of
complete consistency to something called “eventual consistency ”
• This allows each system to make updates to data and learn of other
updates made by other systems within a short period of time, without
being totally consistent at all times
• As changes are made, tools such as vector clocks are used to provide
enough information to reason about the ordering of those changes based
on an understanding of the causality of the updates.
network disruption
Eventual Consistency
◼ When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
◼ For a given accepted update and a given node, eventually either
the update reaches the node or the node is removed from
service
◼ Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
❑ Soft state: copies of a data item may be inconsistent
❑ Eventually Consistent – copies becomes consistent at some later
time if there are no more updates to that data item
Common Advantages of NoSQL Systems
◼ Cheap, easy to implement (open source)
◼ Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
❑ When data is written, the latest version is on at least
one node and then replicated to other nodes
❑ No single point of failure
◼ Easy to distribute
◼ Don't require a schema
What does NoSQL Not Provide?
◼ Joins
◼ Group by
❑ But PNUTS provides interesting materialized
view approach to joins/aggregation.
◼ ACID transactions
◼ SQL
◼ Integration with applications that are based on
SQL
Should I be using NoSQL Databases?
◼ NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
❑ Log Analysis