0% found this document useful (0 votes)

184 views

Nosql Databases: P.Krishna Reddy Iiit Hyderabad

NoSQL databases provide an alternative to traditional SQL databases that can scale to meet the unprecedented demands of large internet companies. Key-value stores and document databases relax the ACID properties of SQL to achieve higher scalability and availability. This allows data to be partitioned across many servers and replicated for fault tolerance without complex transactions. NoSQL databases have gained popularity due to their ability to handle massive datasets and workloads.

Uploaded by

Sarvesh Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

184 views

Nosql Databases: P.Krishna Reddy Iiit Hyderabad

Uploaded by

Sarvesh Mehta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

NoSQL Databases

P.Krishna Reddy
IIIT Hyderabad
Introduction
• The term “database” had become synonymous with
SQL.

• Recently there has been a shift in the database

landscape When considering options for data storage,
there is a new game in town: NoSQL databases

• In this talk, I will cover

– NoSQL
– Application of NoSQL with DBMS
How much data?
• Google processes 20 PB a day (2008)
• Facebook has 2.5 PB of user data + 15
TB/day (4/2009)
• eBay has 6.5 PB of user data + 50
TB/day (5/2009)
Evolution
• NoSQL databases first started out as in-house solutions to real problems in
companies such as
– Amazon Dynamo, Google BigTable, LinkedIn Voldemort, Twitter FlockDB , Facebook
Cassandra , Yahoo! PNUTS , and others.
• These companies didn’t start off by rejecting SQL and relational
technologies; they tried them and found that they didn’t meet their
requirements.
• In particular, these companies faced three primary issues:
– unprecedented transaction volumes,
– expectations of low-latency access to massive datasets, and
– nearly perfect service availability while operating in an unreliable
environment
• Initially, companies tried the traditional approach: they added more
hardware or upgraded to faster
• When that didn’t work, they tried to scale existing relational
solutions by
– simplifying their database schema, de-normalizing the schema,
– relaxing durability and referential integrity,
– introducing various query caching layers,
– separating read-only from write-dedicated replicas, and, finally,
– data partitioning in an attempt to address these new requirements.
• Although each of these techniques extended the
functionality of existing relational technologies, none
fundamentally addressed the core limitations, and they all
introduced additional overhead and technical tradeoffs In
other words, these were good band-aids
but not cures.
• Existing SQL systems
– Single machine will serve the requests.
– Replication was not required.
– Master-slave approach was OK.

• It was the common architecture and design

assumptions underlying most relational
databases that failed to address the scalability,
latency, and availability requirements of many of
the largest sites during the massive growth of the
Internet.
• A growing number of companies were still hitting the scalability and performance
wall even when using the best practices and the most advanced technologies of
the time
• Database architects had sacrificed many of the most central aspects of
a relational database, such as joins and fully consistent data, while introducing
many complex and fragile pieces into the operations puzzle.
• Schema devolved from many interrelated fully expressed tables to something
much more like a simple key/value look-up
• Deployments of expensive servers were not able to keep up with demand
• At this point these companies had taken relational databases so far outside their
intended use cases that it was no wonder that they were unable to meet
performance requirements
• It quickly became clear to them that they could do much better by building
something in-house that was tailored to their particular workloads
• These in-house custom solutions are the inspiration behind the many NoSQL
products we now see on the market
NO SQL foundations
• Companies needed a solution that would scale, be resilient,
and be operationally efficient
• They had been able to scale the Web (HTTP) and dynamic
content generation and business logic layers (Application
Servers), but the database continued to be the system’s
bottleneck.
• Engineers wanted databases to scale like Web servers—simply
add more commodity systems and expect things to speed up
at a nearly linear rate—but to do that they would have to
make a number of tradeoffs
• Luckily, due to the large number of compromises made when
attempting to scale their existing relational databases, these
tradeoffs were not so foreign or distasteful as they might have
been
Why Cloud Data Stores

◼ Explosion of social media sites (Facebook,

Twitter) with large data needs
◼ Explosion of storage needs in large web sites
such as Google, Yahoo
❑ Much of the data is not files
◼ Rise of cloud-based solutions such as Amazon S3
(simple storage solution)
◼ Shift to dynamically-typed data with frequent
schema changes
Parallel Databases and Data Stores
◼ Web-based applications have huge demands on data storage
volume and transaction rate
◼ Scalability of application servers is easy, but what about the
database?
◼ Approach 1: memcache or other caching mechanisms to
reduce database access
❑ Limited in scalability
◼ Approach 2: Use existing parallel databases
◼ Expensive, and most parallel databases were designed for decision
support not OLTP
◼ Approach 3: Build parallel stores with databases underneath
Scaling RDBMS - Partitioning
◼ “Sharding”
❑ Divide data amongst many cheap databases
(MySQL/PostgreSQL)
❑ Manage parallel access in the application

❑ Scales well for both reads and writes

❑ Not transparent, application needs to be partition-aware

Parallel Key-Value Data Stores
◼ Distributed key-value data storage systems allow key-
value pairs to be stored (and retrieved on key) in a
massively parallel system
❑ E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..
◼ Partitioning, high availability etc completely transparent
to application
◼ Sharding systems and key-value stores don’t support
many relational features
❑ No join operations (except within partition)
❑ No referential integrity constraints across partitions
❑ etc.
What is NoSQL?
◼ Stands for No-SQL or Not Only SQL??
◼ Class of non-relational data storage systems
❑ E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
◼ Usually do not require a fixed table schema
nor do they use the concept of joins
◼ Distributed data storage systems
◼ All NoSQL offerings relax one or more of the
ACID properties (will talk about the CAP
theorem)
ACID Properties
To preserve integrity of data, the database system must ensure:

• Atomicity. Either all operations of the transaction are properly

reflected in the database or none are.
• Consistency. Execution of a transaction in isolation preserves
the consistency of the database.
• Isolation. Although multiple transactions may execute
concurrently, each transaction must be unaware of other
concurrently executing transactions. Intermediate transaction
results must be hidden from other concurrently executed
transactions.
– That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started execution
after Ti finished.
• Durability. After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures.
Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Consistency requirement – the sum of A and B is
unchanged by the execution of the transaction.
• Atomicity requirement — if the transaction fails after
step 3 and before step 6, the system should ensure that
its updates are not reflected in the database, else an
inconsistency will result.
Example of Fund Transfer (Cont.)
• Durability requirement — once the user has been
notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to
the database by the transaction must persist despite
failures.
• Isolation requirement — if between steps 3 and 6,
another transaction is allowed to access the partially
updated database, it will see an inconsistent
database
(the sum A + B will be less than it should be).
Can be ensured trivially by running transactions
serially, that is one after the other. However,
executing multiple transactions concurrently has
significant benefits, as we will see.
Relaxing ACID
• Anyone familiar with databases will know the acronym ACID, which
outlines the fundamental elements of transactions: atomicity, consistency,
isolation, and durability Together, these qualities define the basics of any
transaction
• As NoSQL solutions developed it became clear that in order to deliver
scalability it might be necessary to relax or redefine some of these
qualities, in particular consistency and durability
• Complete consistency in a distributed environment requires a great deal of
communication involving locks, which force systems to wait on each other
before proceeding to mutate shared data
• Even in cases where multiple systems are generally not operating on the
same piece of data, there is a great deal of overhead that prevents
systems from scaling
Data Representation
• There are three main data representation
camps within NoSQL: document, key/value,
and graph
– Key/value
– JavaScript Object Notation (JSON)
• https://en.wikipedia.org/wiki/JSON
– Binary Script Object Notation(BSON)
– Big Table
Typical NoSQL API
◼ Basic API access:
❑ get(key) -- Extract the value given a key

❑ put(key, value) -- Create or update the value

given its key

❑ delete(key) -- Remove the key and its associated

value
❑ execute(key, operation, parameters) -- Invoke an

operation to the value (given its key) which is a

special data structure (e.g. List, Set, Map .... etc).
Flexible Data Model
ColumnFamily: Rockets
Key Value
Name Value
1
name Rocket-Powered Roller Skates
toon Ready, Set, Zoom
inventoryQty 5
brakes false

2 Name Value
name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false
Name Value
3
name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
NoSQL Data Storage: Classification
◼ Uninterpreted key/value or ‘the big hash table’.
❑ Amazon S3 (Dynamo)
◼ Flexible schema
❑ BigTable, Cassandra, HBase (ordered keys, semi-
structured data),
❑ Sherpa/PNuts (unordered keys, JSON)
❑ MongoDB (based on JSON)
❑ CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
◼ Three properties of a system
❑ Consistency (all copies have same value)
❑ Availability (system can run even if parts have failed)
❑ Via replication
❑ Partitions (network can break into two or more parts, each
with active systems that can’t talk to other parts)
◼ Brewer’s CAP “Theorem”: You can have at most two
of these three properties for any system
◼ Very large systems will partition at some point
❑ ➔Choose one of consistency or availablity
❑ Traditional database choose consistency
❑ Most Web applications choose availability
◼ Except for specific parts such as order processing
About Eventual Consistency
• To address this, most NoSQL solutions choose to relax the notion of
complete consistency to something called “eventual consistency ”

• This allows each system to make updates to data and learn of other
updates made by other systems within a short period of time, without
being totally consistent at all times

• As changes are made, tools such as vector clocks are used to provide
enough information to reason about the ordering of those changes based
on an understanding of the causality of the updates.

• For the majority of systems, knowing that the latest consistent

information will eventually arrive at all nodes is likely to be enough to
satisfy design requirements
Availability

◼ Traditionally, thought of as the server/process

available five 9’s (99.999 %).
◼ However, for large node system, at almost any
point in time there’s a good chance that a node
is either down or there is a network disruption
among the nodes.
❑ Want a system that is resilient in the face of

network disruption
Eventual Consistency
◼ When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
◼ For a given accepted update and a given node, eventually either
the update reaches the node or the node is removed from
service
◼ Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
❑ Soft state: copies of a data item may be inconsistent
❑ Eventually Consistent – copies becomes consistent at some later
time if there are no more updates to that data item
Common Advantages of NoSQL Systems
◼ Cheap, easy to implement (open source)
◼ Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
❑ When data is written, the latest version is on at least
one node and then replicated to other nodes
❑ No single point of failure
◼ Easy to distribute
◼ Don't require a schema
What does NoSQL Not Provide?

◼ Joins
◼ Group by
❑ But PNUTS provides interesting materialized
view approach to joins/aggregation.
◼ ACID transactions
◼ SQL
◼ Integration with applications that are based on
SQL
Should I be using NoSQL Databases?
◼ NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
❑ Log Analysis

❑ Social Networking Feeds

◼ Most of us work on organizational databases, which

are not that large and have low update/query rates
❑ regular relational databases are the correct solution

for such applications

Further Reading
◼ And lots of material on the Web
❑ E.g. nice presentation on NoSQL by Perry Hoekstra
◼ Some material in this talk is from above presentation

❑ Use a search engine to find information on data storage

systems such as
◼ BigTable (Google), Dynamo (Amazon), Cassandra
(Facebook/Apache), Pnuts/Sherpa (Yahoo), CouchDB,
MongoDB, …
◼ Several of above are open source

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
A4 Solution
No ratings yet
A4 Solution
3 pages
Emerson TerminalManager Administrator Guide (Version 5.1.4)
No ratings yet
Emerson TerminalManager Administrator Guide (Version 5.1.4)
396 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
ABDMS-UNIT 2 AND UNIT 5 NOTES
No ratings yet
ABDMS-UNIT 2 AND UNIT 5 NOTES
10 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
16 pages
No SQL
No ratings yet
No SQL
109 pages
NoSQL
No ratings yet
NoSQL
18 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Unit 1
No ratings yet
Unit 1
23 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Intro to NoSQL DBs
No ratings yet
Intro to NoSQL DBs
44 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
UNIT 3 -BDA
No ratings yet
UNIT 3 -BDA
36 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NOSQL Database
No ratings yet
NOSQL Database
10 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Nosql Technologies: Performance Characteristics and Monitoring
No ratings yet
Nosql Technologies: Performance Characteristics and Monitoring
18 pages
MODULE 1 -ppt -7B
No ratings yet
MODULE 1 -ppt -7B
70 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Big Data
No ratings yet
Big Data
53 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
Nosql
No ratings yet
Nosql
20 pages
Unit 1 Notes in NoSQL
No ratings yet
Unit 1 Notes in NoSQL
20 pages
M3, C5 NoSQLandBigdata
No ratings yet
M3, C5 NoSQLandBigdata
20 pages
Unit 5
No ratings yet
Unit 5
137 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
BigData_NoSQL
No ratings yet
BigData_NoSQL
30 pages
A Developer Guide To Jakarta EE NoSQL Development With Mongodb and Morphia
No ratings yet
A Developer Guide To Jakarta EE NoSQL Development With Mongodb and Morphia
26 pages
Big Data Notes
No ratings yet
Big Data Notes
70 pages
BigData_Unit2_V2
No ratings yet
BigData_Unit2_V2
70 pages
CT113H Lecture 1_ Introduction to NoSQL
No ratings yet
CT113H Lecture 1_ Introduction to NoSQL
51 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
NOSQL Data Management
No ratings yet
NOSQL Data Management
21 pages
No SQL
No ratings yet
No SQL
19 pages
NoSQL Group1
No ratings yet
NoSQL Group1
15 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
9 pages
Chapter_4 - NoSQL_1676181987
No ratings yet
Chapter_4 - NoSQL_1676181987
85 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Module-2
No ratings yet
Module-2
100 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Fig.a Fig.b: Tanh (+ (Tanh + Tanh / (1 + Tanh - Tanh)
No ratings yet
Fig.a Fig.b: Tanh (+ (Tanh + Tanh / (1 + Tanh - Tanh)
2 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Echo - e '#Include /N Int Main (Void) (Printf ("Hello World!//n") Return 0 ) ' Hello.c Riscv64-Unknown-Elf-Gcc - o Hello Hello.c Spike PK Hello
No ratings yet
Echo - e '#Include /N Int Main (Void) (Printf ("Hello World!//n") Return 0 ) ' Hello.c Riscv64-Unknown-Elf-Gcc - o Hello Hello.c Spike PK Hello
2 pages
Shri Gujrati Samaj A.M.N.E.M. School: Subject Physics
No ratings yet
Shri Gujrati Samaj A.M.N.E.M. School: Subject Physics
3 pages
C Pro Assignment 5
No ratings yet
C Pro Assignment 5
6 pages
Explore by Interests: Career & Money
No ratings yet
Explore by Interests: Career & Money
1 page
Microsoft DP 600 Certsdeals Actual Questions by Davis 22 07 2024 8qa
No ratings yet
Microsoft DP 600 Certsdeals Actual Questions by Davis 22 07 2024 8qa
10 pages
報告範本 DVWA網站安全
No ratings yet
報告範本 DVWA網站安全
83 pages
Zeb ConfigManual
No ratings yet
Zeb ConfigManual
5 pages
Nas2002 Advanced-Cyber-Security LTP 1.0 6 Nas2002
No ratings yet
Nas2002 Advanced-Cyber-Security LTP 1.0 6 Nas2002
3 pages
AP04 AA5 EV05 Elaboracion de Manual de Usuario Ingles
No ratings yet
AP04 AA5 EV05 Elaboracion de Manual de Usuario Ingles
6 pages
Combined Pentest Sample Report Hackerinthouse
No ratings yet
Combined Pentest Sample Report Hackerinthouse
22 pages
Session Hijacking
No ratings yet
Session Hijacking
7 pages
Readme P006
No ratings yet
Readme P006
7 pages
JavaScript Event Handlers
No ratings yet
JavaScript Event Handlers
13 pages
Mit401 Unit 08-Slm
No ratings yet
Mit401 Unit 08-Slm
13 pages
Lect 4. Frequent Pattern Mining
100% (1)
Lect 4. Frequent Pattern Mining
60 pages
Internship Report at Limat Soft Solutions
No ratings yet
Internship Report at Limat Soft Solutions
27 pages
ABAP XML - Mapping Simplified
No ratings yet
ABAP XML - Mapping Simplified
7 pages
Research Paper
No ratings yet
Research Paper
2 pages
Csew Ii
No ratings yet
Csew Ii
2 pages
192 168 0 0
No ratings yet
192 168 0 0
9 pages
Webmethods 97
No ratings yet
Webmethods 97
76 pages
Introduction To Idocs: Vimal Balaji Idocs Training
No ratings yet
Introduction To Idocs: Vimal Balaji Idocs Training
95 pages
5.1 Data and Databases
No ratings yet
5.1 Data and Databases
14 pages
Wtad Syllabus
No ratings yet
Wtad Syllabus
2 pages
Admin Guide
No ratings yet
Admin Guide
43 pages
Sap Bo
No ratings yet
Sap Bo
4 pages
MCS 051
No ratings yet
MCS 051
23 pages
Difference Between Degree and Cardinality
75% (4)
Difference Between Degree and Cardinality
3 pages
Major Project Report For B.Tech. BE Engineering Computer Science
No ratings yet
Major Project Report For B.Tech. BE Engineering Computer Science
7 pages
Asigra - Rman Interface - Description
No ratings yet
Asigra - Rman Interface - Description
28 pages
Create and Enforce Sell-Side Contracts With Oracle Sales Contracts
No ratings yet
Create and Enforce Sell-Side Contracts With Oracle Sales Contracts
19 pages
Abraham Samuel
No ratings yet
Abraham Samuel
3 pages
Product Manager Resume Example
No ratings yet
Product Manager Resume Example
1 page

Uploaded by

Uploaded by

NoSQL Databases

• Recently there has been a shift in the database

• In this talk, I will cover

• It was the common architecture and design

◼ Explosion of social media sites (Facebook,

❑ Scales well for both reads and writes

❑ Not transparent, application needs to be partition-aware

• Atomicity. Either all operations of the transaction are properly

❑ put(key, value) -- Create or update the value

given its key

operation to the value (given its key) which is a

• For the majority of systems, knowing that the latest consistent

◼ Traditionally, thought of as the server/process

❑ Social Networking Feeds

◼ Most of us work on organizational databases, which

for such applications

❑ Use a search engine to find information on data storage

You might also like