100% found this document useful (1 vote)
143 views

Azure Cosmos DB

This document provides an overview of Cosmos DB, Microsoft's globally distributed multi-model database service. It discusses some of the key benefits of Cosmos DB such as being fully managed, globally distributed, scalable with unlimited throughput, and supporting multiple data models and APIs. It also covers some example use cases for Cosmos DB and how to optimize performance through techniques like choosing an appropriate partition key and enabling automatic indexing.

Uploaded by

rajasekhardulam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
143 views

Azure Cosmos DB

This document provides an overview of Cosmos DB, Microsoft's globally distributed multi-model database service. It discusses some of the key benefits of Cosmos DB such as being fully managed, globally distributed, scalable with unlimited throughput, and supporting multiple data models and APIs. It also covers some example use cases for Cosmos DB and how to optimize performance through techniques like choosing an appropriate partition key and enabling automatic indexing.

Uploaded by

rajasekhardulam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

COSMOS

DB

Eshant Garg
Azure Data Engineer, Architect, Advisor
[email protected]
Cosmos DB
Azure NoSQL Database
Eshant Garg
Data Engineer, Architect, Advisor
Why Cosmos DB?
What traditional databases were lacking?
Challenges with globally distributed Databases

• Long time
• Lot of effort
• Need own infrastructure
• Teams
• Data centers etc.
How Cosmos DB evolved?
Why Cosmos DB?

FULLY MANAGED CONSISTENCY CHOICES


• Database as a service (DaaS) • Azure Cosmos DB's support for consistency levels
• Serverless architecture like strong, eventual, consistent prefix, session,
• No operational overhead
and bounded-staleness.
• No schema or Index management

GLOBALLY DISTRIBUTED SCALABLE


• Turnkey global distribution • Unlimited scale for both storage and throughput.

MULTIMODEL & MULTI-LANGUAGE HIGHLY AVAILABLE, RELIABLE & SECURE


• Supports Jason documents, table graph and columnar data models • Always on
• Java, .NET, Python, Node.js, JavaScript, etc. • 99.999% SLA
• < 10ms latency
Use case - IOT
Use case – Retail and Marketing
Use case – Gaming
Use case – Web and mobile
SQL API vs MongoDB API
Cluster Types
SQL(CORE) API MongoDB API

JSON Documents BSON Documents

Microsoft original Document DB platform Implement Wire protocol


Supports server side programming model
Fully compatible with Mongo DB application code
You can use SQL like language to query
JSON documents. Migrate existing Cosmos DB without much
change of logic

Use SQL(CORE) API for new development


Cosmos DB Table API

• Key-Value store
• Premium offering for Azure Table Storage
• Existing Table Storage customers will migrate to
Cosmos DB Table API
• Row value can be simple like number or string
• Row cannot store object
Cosmos DB Cassandra API

• Wide column No SQL Database


• Name and format of column can vary from row to row.
• Simple migrate your Cassandra application to Cosmos
Cassandra API and change connection string.
• Interact
• Cassandra based tools
• Data Explorer
• Programmatically, using SDK (CassandraCSharpdriver)
Cosmos DB Gremlin API
• Graph Data Model
• Real world data connected with each other
• Graph database can persist relationships in the storage layer
Graph Model
Cosmos DB Gremlin API
• Graph Data Model
• Real world data connected with each other
• Graph database can persist relationships in the storage layer
• Use cases
• Social networks
• Recommendation engines
• Geospatial
• Internet of things
• Migrate existing apps to Cosmos DB Gremlin API
• Graph traverse a language
Analyze the decision criteria
Azure Table storage vs Cosmos DB Table API

Internal Sources
Azure Table Storage CosmosSources
External DB Table API

➢ Geo replication is restricted ➢ Geo replication across your choice of any number
• Only 1 additional pair region of regions
➢ Support for primary key lookups only ➢ Secondary index support for lookups across
➢ Price optimized for cold storage multiple dimensions
➢ Lower performance ➢ Better performance
• Throughput is capped ➢ Unlimited and predictable throughput
• Latency is higher ➢ latency is lower
➢ No consistency options ➢ 5 consistency options
Database Containers and Items

Azure Cosmos entity SQL API Cassandra API MongoDB API Gremlin API Table API
Azure Cosmos database Database Keyspace Database Database NA
Azure Cosmos container Container Table Collection Graph Table

Azure Cosmos item Document Row Document Node or edge Item


Measuring Performance
Introducing Request Units
Introducing Request Units
Reserving requests units

• Provision Request units per second (RU/s)


• How many request units (not requests) per second are
available to your application
• Exceeding reserved throughput limits
• Requests are “throttled” (HTTP 429)
Horizontally Scalable

Unlimited Storage Unlimited Throughput


Partitioning
Partitioning

• Partitioning: the items in a container are


divided into distinct subsets called logical
partitions.
• Partition key is the value by which Azure
organizes your data into logical divisions.
• Logical partitions are formed based on the
value of a partition key that is associated with
each item in a container.
• Physical partitions: Internally, one or more
logical partitions are mapped to a single
physical partition.
Partitioning
Dedicated vs Shared throughput
Dedicated vs Shared throughput

Database

Container Container Container


1 2 3
Dedicated vs Shared throughput

• You can set throughput at:


• Database level – Shared throughput
• Container level – Dedicated throughput
• It is recommend to set throughput at
container level.
• Rate-Limited
• Choose at the time of creation
Avoiding hot partition
Avoiding Hot Partitions

Logical Logical Logical Logical


Partition 1 Partition 2 Partition 3 Partition 4

(2500 RUs) (2500 RUs) (2500 RUs) (2500 RUs)

Container (10,000 RUs)


Avoid Hot partitions on storage

Container
Avoiding Hot Partitions at store

Container
Avoid Hot partitions on throughput

Container

• Partition key Bad choice: Current time


• Partition key Good choices: User ID, Product ID
Single partition Query
John Brian Tom Mark

Container

SELECT * FROM c WHERE c.username = ‘Brian’


Cross partition Queries (fan out queries)
Amit Brian Tom Mark

Container

SELECT * FROM c WHERE c.favoritecolor= ‘Blue’


Composite Key
Amit Brian Tom Mark

Container

Composite Key: CustomerName-mmddyyyy


Choosing a Partition key
• Evenly distribute storage
• Make sure you pick your partition key that doesn't result in hot spots within your
applications
• Have a high cardinality
• Don’t be afraid of choosing a partition key that has a large number of values
• Example User Id & Product Id
• Evenly distribute requests.
• RUs evenly distribute across all partitions.
• Review where clause of top queries
• Consider document and partition limit while designing partition key.
• Max document size – 2 MB
• Max logical partition size – 20 GB
Choosing a Partition key

Question: Your organization is planning to use Azure Cosmos DB to store vehicle telemetry data
generated from millions of vehicles every second. Which of the following options for your Partition
Key will optimize storage distribution?

Answer choices:
1. Vehicle model
2. Vehicle Identification Number (VIN) which looks like WDDEJ9EB6DA032037
Choosing a Partition key

Question: Your organization is planning to use Azure Cosmos DB to store vehicle telemetry data
generated from millions of vehicles every second. Which of the following options for your Partition
Key will optimize storage distribution?

Answer choices:
1. Vehicle model
Most auto manufactures only have a couple dozen models. This option is potentially the least granular, will create a
fixed number of logical partitions, and may not distribute data evenly across all physical partitions.

2. Vehicle Identification Number (VIN) which looks like WDDEJ9EB6DA032037


Auto manufacturers have transactions occurring throughout the year. This option will create a more balanced
distribution of storage across partition key values.
Automatic Indexing

• Index all data without requiring Index management

• Every property of every record automatically index

• Index update synchronously as you create, update

or delete items

• Not specific for SQL, but available for all APIs


Automatic Indexing

SELECT location
FROM location IN company.locations
WHERE location.country = 'France'
Time to Live (TTL)

• You can set the expiry time for Cosmos DB data


items
• Time to live value is configured in seconds.
• The system will automatically delete the expired
items based on the TTL value
• Consume only leftover Request units
Time-to-Live • Data deletion delay if not enough Request units
• Though the data deletion is delayed, data is
not returned by any queries (by any API) after
the TTL has expired.
Global Distribution benefits

Performance
• Ensures high availability within a region
• Across regions, brings data closer to the consumer.

Business continuity

• In the event of major failure or natural disaster


Data consistency

Replication

5 seconds

West US East US

At 10.00 AM At 10:00:02 AM
Update CreditScore = 750 Read CreditScore
CAP Theorem

There is a risk of some


Client may read
data becomes unavailable
inconsistent data

Data may not be


immediately available
Five consistency Levels
Bounded Consistent
Strong Staleness Session Prefix Eventual
Strong Consistency Weaker Consistency
Higher latency Lower latency
Lower availability Higher availability

Strong: No dirty reads, high latency, cost highest, closest to RDBMS

Bounded staleness: Dirty reads possible, bounded by time and updates

Session: No dirty reads for writers (within same session), dirty read possible for other users

Consistency prefix: Dirty reads possible but sequence maintain, reads never see out-of-order writes

Eventual: No guaranteed order, but eventually everything gets in order


Setting the consistency level

Override at request level


Set default for entire
account
Any request can weaken
the default consistency
Can be changed any time
level

Bounded Consistent
Strong Staleness Session Prefix Eventual
Strong Consistency Weaker Consistency
Higher latency Lower latency
Lower availability Higher availability
Azure CLI
Azure CLI – Create Database example
Azure CLI – Example question

You might also like