The Aggregation Framework in MongoDB is a powerful tool for performing data processing and transformation operations on collections. It allows you to perform complex queries and transformations on your data, similar to SQL's GROUP BY and JOIN operations. The framework uses a pipeline approach, where documents pass through a series of stages that can filter, sort, group, reshape, and modify the data.
Key Concepts
- Pipeline: A sequence of stages through which documents pass. Each stage performs an operation on the documents and passes the results to the next stage.
- Stages: Individual operations that can be performed on the documents. Common stages include
$match
,$group
,$sort
,$project
, and$lookup
. - Expressions: Used within stages to compute values, perform calculations, and transform data.
Common Aggregation Stages
$match
Filters documents to pass only those that match the specified condition(s).
$group
Groups documents by a specified identifier expression and applies the accumulator expressions to each group.
$sort
Sorts the documents based on the specified field(s).
$project
Reshapes each document in the stream, such as by adding new fields or removing existing ones.
$lookup
Performs a left outer join to another collection in the same database to filter in documents from the "joined" collection for processing.
Practical Example
Let's consider a collection sales
with the following documents:
[ { "_id": 1, "item": "apple", "quantity": 5, "price": 10 }, { "_id": 2, "item": "banana", "quantity": 10, "price": 5 }, { "_id": 3, "item": "apple", "quantity": 15, "price": 10 }, { "_id": 4, "item": "banana", "quantity": 20, "price": 5 } ]
Aggregation Pipeline Example
We want to calculate the total sales for each item.
db.sales.aggregate([ { $group: { _id: "$item", totalSales: { $sum: { $multiply: ["$quantity", "$price"] } } } }, { $sort: { totalSales: -1 } } ])
Explanation:
- $group: Groups the documents by the
item
field and calculates the total sales for each item. - $sort: Sorts the results by
totalSales
in descending order.
Result
Exercises
Exercise 1: Calculate Average Quantity
Calculate the average quantity sold for each item.
Solution:
Exercise 2: Filter and Group
Filter the sales to include only items with a price greater than 5 and then group by item to calculate the total quantity sold.
Solution:
db.sales.aggregate([ { $match: { price: { $gt: 5 } } }, { $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } } ])
Common Mistakes and Tips
- Incorrect Stage Order: Ensure the stages are in the correct order. For example,
$match
should come before$group
if you want to filter documents before grouping. - Field References: Use the correct field references within expressions. For example, use
"$fieldName"
to reference a field. - Pipeline Performance: Be mindful of the performance impact of each stage. Use indexes where appropriate to optimize
$match
and$sort
stages.
Conclusion
The Aggregation Framework in MongoDB is a versatile and powerful tool for data processing and transformation. By understanding and utilizing various stages and expressions, you can perform complex queries and gain valuable insights from your data. Practice with different stages and combinations to become proficient in using the Aggregation Framework.