Azure Cosmos DB
Azure Cosmos DB
DB
Eshant Garg
Azure Data Engineer, Architect, Advisor
[email protected]
Cosmos DB
Azure NoSQL Database
Eshant Garg
Data Engineer, Architect, Advisor
Why Cosmos DB?
What traditional databases were lacking?
Challenges with globally distributed Databases
• Long time
• Lot of effort
• Need own infrastructure
• Teams
• Data centers etc.
How Cosmos DB evolved?
Why Cosmos DB?
• Key-Value store
• Premium offering for Azure Table Storage
• Existing Table Storage customers will migrate to
Cosmos DB Table API
• Row value can be simple like number or string
• Row cannot store object
Cosmos DB Cassandra API
Internal Sources
Azure Table Storage CosmosSources
External DB Table API
➢ Geo replication is restricted ➢ Geo replication across your choice of any number
• Only 1 additional pair region of regions
➢ Support for primary key lookups only ➢ Secondary index support for lookups across
➢ Price optimized for cold storage multiple dimensions
➢ Lower performance ➢ Better performance
• Throughput is capped ➢ Unlimited and predictable throughput
• Latency is higher ➢ latency is lower
➢ No consistency options ➢ 5 consistency options
Database Containers and Items
Azure Cosmos entity SQL API Cassandra API MongoDB API Gremlin API Table API
Azure Cosmos database Database Keyspace Database Database NA
Azure Cosmos container Container Table Collection Graph Table
Database
Container
Avoiding Hot Partitions at store
Container
Avoid Hot partitions on throughput
Container
Container
Container
Container
Question: Your organization is planning to use Azure Cosmos DB to store vehicle telemetry data
generated from millions of vehicles every second. Which of the following options for your Partition
Key will optimize storage distribution?
Answer choices:
1. Vehicle model
2. Vehicle Identification Number (VIN) which looks like WDDEJ9EB6DA032037
Choosing a Partition key
Question: Your organization is planning to use Azure Cosmos DB to store vehicle telemetry data
generated from millions of vehicles every second. Which of the following options for your Partition
Key will optimize storage distribution?
Answer choices:
1. Vehicle model
Most auto manufactures only have a couple dozen models. This option is potentially the least granular, will create a
fixed number of logical partitions, and may not distribute data evenly across all physical partitions.
or delete items
SELECT location
FROM location IN company.locations
WHERE location.country = 'France'
Time to Live (TTL)
Performance
• Ensures high availability within a region
• Across regions, brings data closer to the consumer.
Business continuity
Replication
5 seconds
West US East US
At 10.00 AM At 10:00:02 AM
Update CreditScore = 750 Read CreditScore
CAP Theorem
Session: No dirty reads for writers (within same session), dirty read possible for other users
Consistency prefix: Dirty reads possible but sequence maintain, reads never see out-of-order writes
Bounded Consistent
Strong Staleness Session Prefix Eventual
Strong Consistency Weaker Consistency
Higher latency Lower latency
Lower availability Higher availability
Azure CLI
Azure CLI – Create Database example
Azure CLI – Example question