Introduction
Azure Cosmos DB is a globally distributed, multi-model database service designed to provide high availability, low latency, and scalability. It supports multiple data models, including document, key-value, graph, and column-family, and offers APIs for SQL, MongoDB, Cassandra, Gremlin, and Table.
Key Concepts
- Global Distribution
- Multi-region Replication: Cosmos DB allows you to replicate your data across multiple Azure regions to ensure high availability and low latency.
- Automatic Failover: In case of a regional outage, Cosmos DB can automatically failover to another region.
- Multi-Model Support
- Document Model: Supports JSON documents.
- Key-Value Model: Simple key-value pairs.
- Graph Model: Supports graph databases using Gremlin.
- Column-Family Model: Supports wide-column stores like Cassandra.
- APIs
- SQL API: For querying JSON documents using SQL-like syntax.
- MongoDB API: For applications using MongoDB.
- Cassandra API: For applications using Cassandra.
- Gremlin API: For graph-based applications.
- Table API: For key-value storage.
- Consistency Levels
- Strong: Guarantees linearizability.
- Bounded Staleness: Guarantees consistency within a specified lag.
- Session: Guarantees consistency within a session.
- Consistent Prefix: Guarantees that reads never see out-of-order writes.
- Eventual: Guarantees eventual consistency.
- Performance and Scalability
- Request Units (RUs): A currency for throughput in Cosmos DB. You can provision RUs per second for your database or container.
- Partitioning: Data is automatically partitioned to ensure scalability.
Practical Example
Setting Up Azure Cosmos DB
-
Create a Cosmos DB Account
- Go to the Azure Portal.
- Click on "Create a resource" and select "Azure Cosmos DB".
- Choose the API you want to use (e.g., SQL API).
- Fill in the required details and click "Review + create".
-
Create a Database and Container
- Navigate to your Cosmos DB account.
- Click on "Data Explorer".
- Click on "New Database" and provide a name.
- Click on "New Container" within the database and provide a name and partition key.
Example Code: Inserting and Querying Data
Inserting Data
from azure.cosmos import exceptions, CosmosClient, PartitionKey # Initialize the Cosmos client endpoint = "YOUR_COSMOS_DB_ENDPOINT" key = 'YOUR_COSMOS_DB_KEY' client = CosmosClient(endpoint, key) # Create a database database_name = 'AzureSampleFamilyDatabase' database = client.create_database_if_not_exists(id=database_name) # Create a container container_name = 'FamilyContainer' container = database.create_container_if_not_exists( id=container_name, partition_key=PartitionKey(path="/lastName"), offer_throughput=400 ) # Create an item family_item = { 'id': 'AndersenFamily', 'lastName': 'Andersen', 'parents': [ {'firstName': 'Thomas'}, {'firstName': 'Mary Kay'} ], 'children': [ {'firstName': 'Henriette Thaulow', 'gender': 'female', 'grade': 5, 'pets': [{'givenName': 'Fluffy'}]} ], 'address': {'state': 'WA', 'county': 'King', 'city': 'Seattle'}, 'registered': True } # Insert the item into the container container.create_item(body=family_item)
Querying Data
# Query the items in the container query = "SELECT * FROM c WHERE c.lastName='Andersen'" items = list(container.query_items( query=query, enable_cross_partition_query=True )) for item in items: print(json.dumps(item, indent=True))
Exercises
Exercise 1: Create and Query a Cosmos DB Container
- Create a Cosmos DB Account: Follow the steps to create a Cosmos DB account using the SQL API.
- Create a Database and Container: Create a database named
SchoolDatabase
and a container namedStudentContainer
with/studentId
as the partition key. - Insert Data: Insert a JSON document representing a student with fields like
studentId
,firstName
,lastName
,grade
, andsubjects
. - Query Data: Write a query to retrieve all students in a specific grade.
Solution
# Insert a student item student_item = { 'id': 'Student1', 'studentId': 'S12345', 'firstName': 'John', 'lastName': 'Doe', 'grade': 10, 'subjects': ['Math', 'Science', 'History'] } # Insert the item into the container container.create_item(body=student_item) # Query the students in grade 10 query = "SELECT * FROM c WHERE c.grade=10" students = list(container.query_items( query=query, enable_cross_partition_query=True )) for student in students: print(json.dumps(student, indent=True))
Common Mistakes and Tips
- Partition Key Selection: Choose a partition key that ensures even distribution of data to avoid hot partitions.
- Throughput Management: Monitor and adjust the provisioned RUs to balance cost and performance.
- Consistency Level: Select the appropriate consistency level based on your application's requirements for latency and consistency.
Conclusion
In this section, you learned about Azure Cosmos DB, its key features, and how to set up and interact with a Cosmos DB account using Python. You also practiced creating and querying a Cosmos DB container. In the next module, we will explore other Azure database services, such as Azure Database for MySQL and PostgreSQL.