The Aggregation Framework in MongoDB is a powerful tool for performing data processing and transformation operations on collections. It allows you to perform complex queries and transformations on your data, similar to SQL's GROUP BY and JOIN operations. The framework uses a pipeline approach, where documents pass through a series of stages that can filter, sort, group, reshape, and modify the data.

Key Concepts

  1. Pipeline: A sequence of stages through which documents pass. Each stage performs an operation on the documents and passes the results to the next stage.
  2. Stages: Individual operations that can be performed on the documents. Common stages include $match, $group, $sort, $project, and $lookup.
  3. Expressions: Used within stages to compute values, perform calculations, and transform data.

Common Aggregation Stages

$match

Filters documents to pass only those that match the specified condition(s).

{
  $match: { status: "A" }
}

$group

Groups documents by a specified identifier expression and applies the accumulator expressions to each group.

{
  $group: {
    _id: "$status",
    total: { $sum: "$amount" }
  }
}

$sort

Sorts the documents based on the specified field(s).

{
  $sort: { amount: -1 }
}

$project

Reshapes each document in the stream, such as by adding new fields or removing existing ones.

{
  $project: {
    item: 1,
    total: { $multiply: ["$price", "$quantity"] }
  }
}

$lookup

Performs a left outer join to another collection in the same database to filter in documents from the "joined" collection for processing.

{
  $lookup: {
    from: "orders",
    localField: "item",
    foreignField: "item",
    as: "orderDetails"
  }
}

Practical Example

Let's consider a collection sales with the following documents:

[
  { "_id": 1, "item": "apple", "quantity": 5, "price": 10 },
  { "_id": 2, "item": "banana", "quantity": 10, "price": 5 },
  { "_id": 3, "item": "apple", "quantity": 15, "price": 10 },
  { "_id": 4, "item": "banana", "quantity": 20, "price": 5 }
]

Aggregation Pipeline Example

We want to calculate the total sales for each item.

db.sales.aggregate([
  {
    $group: {
      _id: "$item",
      totalSales: { $sum: { $multiply: ["$quantity", "$price"] } }
    }
  },
  {
    $sort: { totalSales: -1 }
  }
])

Explanation:

  1. $group: Groups the documents by the item field and calculates the total sales for each item.
  2. $sort: Sorts the results by totalSales in descending order.

Result

[
  { "_id": "banana", "totalSales": 150 },
  { "_id": "apple", "totalSales": 200 }
]

Exercises

Exercise 1: Calculate Average Quantity

Calculate the average quantity sold for each item.

Solution:

db.sales.aggregate([
  {
    $group: {
      _id: "$item",
      avgQuantity: { $avg: "$quantity" }
    }
  }
])

Exercise 2: Filter and Group

Filter the sales to include only items with a price greater than 5 and then group by item to calculate the total quantity sold.

Solution:

db.sales.aggregate([
  {
    $match: { price: { $gt: 5 } }
  },
  {
    $group: {
      _id: "$item",
      totalQuantity: { $sum: "$quantity" }
    }
  }
])

Common Mistakes and Tips

  • Incorrect Stage Order: Ensure the stages are in the correct order. For example, $match should come before $group if you want to filter documents before grouping.
  • Field References: Use the correct field references within expressions. For example, use "$fieldName" to reference a field.
  • Pipeline Performance: Be mindful of the performance impact of each stage. Use indexes where appropriate to optimize $match and $sort stages.

Conclusion

The Aggregation Framework in MongoDB is a versatile and powerful tool for data processing and transformation. By understanding and utilizing various stages and expressions, you can perform complex queries and gain valuable insights from your data. Practice with different stages and combinations to become proficient in using the Aggregation Framework.

© Copyright 2024. All rights reserved