Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike aggregate functions, which return a single value for a group of rows, window functions return a value for each row in the group.

Key Concepts

  1. Window Function Syntax:

    <window_function>(<expression>) OVER (
        [PARTITION BY <expression>]
        [ORDER BY <expression> [ASC|DESC]]
        [ROWS|RANGE BETWEEN <frame_start> AND <frame_end>]
    )
    
  2. Common Window Functions:

    • ROW_NUMBER(): Assigns a unique number to each row within the partition.
    • RANK(): Assigns a rank to each row within the partition, with gaps for ties.
    • DENSE_RANK(): Similar to RANK(), but without gaps.
    • SUM(), AVG(), MIN(), MAX(): Aggregate functions used as window functions.
  3. Partitioning:

    • Divides the result set into partitions to which the window function is applied.
    • Syntax: PARTITION BY <expression>
  4. Ordering:

    • Specifies the order of rows within each partition.
    • Syntax: ORDER BY <expression> [ASC|DESC]
  5. Framing:

    • Defines a subset of rows within the partition for the window function to operate on.
    • Syntax: ROWS|RANGE BETWEEN <frame_start> AND <frame_end>

Practical Examples

Example 1: Using ROW_NUMBER()

SELECT
    employee_id,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM
    employees;

Explanation:

  • This query assigns a unique row number to each employee within their department, ordered by salary in descending order.

Example 2: Using RANK()

SELECT
    employee_id,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM
    employees;

Explanation:

  • This query ranks employees within their department based on their salary. Employees with the same salary receive the same rank, and the next rank is incremented by the number of tied rows.

Example 3: Using SUM() with Framing

SELECT
    employee_id,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM
    employees;

Explanation:

  • This query calculates the cumulative salary for each employee within their department, ordered by salary. The frame includes all rows from the start of the partition to the current row.

Practical Exercises

Exercise 1: Calculate Running Total

Task: Calculate the running total of sales for each salesperson.

Table: sales | salesperson_id | sale_date | amount | |----------------|-----------|--------| | 1 | 2023-01-01| 100 | | 1 | 2023-01-02| 200 | | 2 | 2023-01-01| 150 | | 2 | 2023-01-03| 300 |

Solution:

SELECT
    salesperson_id,
    sale_date,
    amount,
    SUM(amount) OVER (PARTITION BY salesperson_id ORDER BY sale_date) AS running_total
FROM
    sales;

Exercise 2: Rank Products by Sales

Task: Rank products by their total sales amount within each category.

Table: product_sales | product_id | category | total_sales | |------------|----------|-------------| | 1 | A | 500 | | 2 | A | 300 | | 3 | B | 700 | | 4 | B | 400 |

Solution:

SELECT
    product_id,
    category,
    total_sales,
    RANK() OVER (PARTITION BY category ORDER BY total_sales DESC) AS sales_rank
FROM
    product_sales;

Common Mistakes and Tips

  • Mistake: Forgetting to use PARTITION BY when needed.

    • Tip: Always consider if your calculation needs to be reset for each group of rows.
  • Mistake: Incorrectly specifying the frame for window functions.

    • Tip: Understand the default frame (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) and adjust it as needed.
  • Mistake: Using window functions without ORDER BY when order matters.

    • Tip: Always specify ORDER BY if the calculation depends on the order of rows.

Conclusion

Window functions are a versatile tool in SQL that allow for complex calculations across rows related to the current row. By mastering window functions, you can perform advanced data analysis and gain deeper insights from your data. In the next module, we will explore advanced joins, which will further enhance your ability to manipulate and analyze data in BigQuery.

© Copyright 2024. All rights reserved