Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike aggregate functions, which return a single value for a group of rows, window functions return a value for each row in the group.
Key Concepts
-
Window Function Syntax:
<window_function>(<expression>) OVER ( [PARTITION BY <expression>] [ORDER BY <expression> [ASC|DESC]] [ROWS|RANGE BETWEEN <frame_start> AND <frame_end>] )
-
Common Window Functions:
ROW_NUMBER()
: Assigns a unique number to each row within the partition.RANK()
: Assigns a rank to each row within the partition, with gaps for ties.DENSE_RANK()
: Similar toRANK()
, but without gaps.SUM()
,AVG()
,MIN()
,MAX()
: Aggregate functions used as window functions.
-
Partitioning:
- Divides the result set into partitions to which the window function is applied.
- Syntax:
PARTITION BY <expression>
-
Ordering:
- Specifies the order of rows within each partition.
- Syntax:
ORDER BY <expression> [ASC|DESC]
-
Framing:
- Defines a subset of rows within the partition for the window function to operate on.
- Syntax:
ROWS|RANGE BETWEEN <frame_start> AND <frame_end>
Practical Examples
Example 1: Using ROW_NUMBER()
SELECT employee_id, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;
Explanation:
- This query assigns a unique row number to each employee within their department, ordered by salary in descending order.
Example 2: Using RANK()
SELECT employee_id, department, salary, RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank FROM employees;
Explanation:
- This query ranks employees within their department based on their salary. Employees with the same salary receive the same rank, and the next rank is incremented by the number of tied rows.
Example 3: Using SUM()
with Framing
SELECT employee_id, department, salary, SUM(salary) OVER (PARTITION BY department ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary FROM employees;
Explanation:
- This query calculates the cumulative salary for each employee within their department, ordered by salary. The frame includes all rows from the start of the partition to the current row.
Practical Exercises
Exercise 1: Calculate Running Total
Task: Calculate the running total of sales for each salesperson.
Table: sales
| salesperson_id | sale_date | amount |
|----------------|-----------|--------|
| 1 | 2023-01-01| 100 |
| 1 | 2023-01-02| 200 |
| 2 | 2023-01-01| 150 |
| 2 | 2023-01-03| 300 |
Solution:
SELECT salesperson_id, sale_date, amount, SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total FROM sales;
Exercise 2: Rank Products by Sales
Task: Rank products by their total sales amount within each category.
Table: product_sales
| product_id | category | total_sales |
|------------|----------|-------------|
| 1 | A | 500 |
| 2 | A | 300 |
| 3 | B | 700 |
| 4 | B | 400 |
Solution:
SELECT product_id, category, total_sales, RANK() OVER (PARTITION BY category ORDER BY total_sales DESC) AS sales_rank FROM product_sales;
Common Mistakes and Tips
-
Mistake: Forgetting to use
PARTITION BY
when needed.- Tip: Always consider if your calculation needs to be reset for each group of rows.
-
Mistake: Incorrectly specifying the frame for window functions.
- Tip: Understand the default frame (
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) and adjust it as needed.
- Tip: Understand the default frame (
-
Mistake: Using window functions without
ORDER BY
when order matters.- Tip: Always specify
ORDER BY
if the calculation depends on the order of rows.
- Tip: Always specify
Conclusion
Window functions are a versatile tool in SQL that allow for complex calculations across rows related to the current row. By mastering window functions, you can perform advanced data analysis and gain deeper insights from your data. In the next module, we will explore advanced joins, which will further enhance your ability to manipulate and analyze data in BigQuery.
BigQuery Course
Module 1: Introduction to BigQuery
- What is BigQuery?
- Setting Up Your BigQuery Environment
- Understanding BigQuery Architecture
- BigQuery Console Overview
Module 2: Basic SQL in BigQuery
Module 3: Intermediate SQL in BigQuery
Module 4: Advanced SQL in BigQuery
Module 5: BigQuery Data Management
- Loading Data into BigQuery
- Exporting Data from BigQuery
- Data Transformation and Cleaning
- Managing Datasets and Tables
Module 6: BigQuery Performance Optimization
- Query Optimization Techniques
- Understanding Query Execution Plans
- Using Materialized Views
- Optimizing Storage
Module 7: BigQuery Security and Compliance
- Access Control and Permissions
- Data Encryption
- Auditing and Monitoring
- Compliance and Best Practices
Module 8: BigQuery Integration and Automation
- Integrating with Google Cloud Services
- Using BigQuery with Dataflow
- Automating Workflows with Cloud Functions
- Scheduling Queries with Cloud Scheduler
Module 9: BigQuery Machine Learning (BQML)
- Introduction to BigQuery ML
- Creating and Training Models
- Evaluating and Predicting with Models
- Advanced BQML Features