In this section, we will explore how to aggregate data using SQL in BigQuery. Aggregation is a crucial concept in data analysis, allowing you to summarize and derive insights from large datasets. We will cover the following topics:

  1. Introduction to Aggregation Functions
  2. Using GROUP BY Clause
  3. Common Aggregation Functions
  4. Practical Examples
  5. Exercises

  1. Introduction to Aggregation Functions

Aggregation functions perform a calculation on a set of values and return a single value. These functions are often used with the GROUP BY clause to group rows that have the same values in specified columns into summary rows.

  1. Using GROUP BY Clause

The GROUP BY clause groups rows that have the same values into summary rows, like "find the total number of customers in each country." The GROUP BY clause is often used with aggregation functions like COUNT(), SUM(), AVG(), MAX(), and MIN().

Syntax:

SELECT column1, 
       AGGREGATE_FUNCTION(column2)
FROM table_name
GROUP BY column1;

  1. Common Aggregation Functions

Here are some common aggregation functions you will use in BigQuery:

  • COUNT(): Returns the number of rows.
  • SUM(): Returns the sum of a numeric column.
  • AVG(): Returns the average value of a numeric column.
  • MAX(): Returns the maximum value of a column.
  • MIN(): Returns the minimum value of a column.

  1. Practical Examples

Example 1: Counting Rows

Count the number of orders for each customer.

SELECT customer_id, 
       COUNT(order_id) AS total_orders
FROM orders
GROUP BY customer_id;

Explanation: This query groups the rows by customer_id and counts the number of order_id for each customer.

Example 2: Summing Values

Calculate the total sales for each product.

SELECT product_id, 
       SUM(sales_amount) AS total_sales
FROM sales
GROUP BY product_id;

Explanation: This query groups the rows by product_id and sums the sales_amount for each product.

Example 3: Averaging Values

Find the average salary for each department.

SELECT department, 
       AVG(salary) AS average_salary
FROM employees
GROUP BY department;

Explanation: This query groups the rows by department and calculates the average salary for each department.

Example 4: Maximum and Minimum Values

Get the highest and lowest scores for each student.

SELECT student_id, 
       MAX(score) AS highest_score, 
       MIN(score) AS lowest_score
FROM scores
GROUP BY student_id;

Explanation: This query groups the rows by student_id and finds the maximum and minimum score for each student.

  1. Exercises

Exercise 1: Total Sales by Region

Write a query to find the total sales amount for each region.

-- Your query here

Exercise 2: Average Order Value

Write a query to calculate the average order value for each customer.

-- Your query here

Exercise 3: Maximum and Minimum Order Amount

Write a query to find the maximum and minimum order amount for each product.

-- Your query here

Solutions

Solution 1: Total Sales by Region

SELECT region, 
       SUM(sales_amount) AS total_sales
FROM sales
GROUP BY region;

Solution 2: Average Order Value

SELECT customer_id, 
       AVG(order_amount) AS average_order_value
FROM orders
GROUP BY customer_id;

Solution 3: Maximum and Minimum Order Amount

SELECT product_id, 
       MAX(order_amount) AS max_order_amount, 
       MIN(order_amount) AS min_order_amount
FROM orders
GROUP BY product_id;

Conclusion

In this section, we covered the basics of aggregating data in BigQuery using SQL. We learned about common aggregation functions and how to use the GROUP BY clause to group data. We also provided practical examples and exercises to reinforce the concepts. In the next module, we will delve into more advanced SQL topics, such as joins and unions.

© Copyright 2024. All rights reserved