Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike aggregate functions, which return a single value for a group of rows, window functions return a value for each row in the result set.

Key Concepts

  1. Window Function Syntax:

    function_name (expression) OVER (
        [PARTITION BY partition_expression]
        [ORDER BY sort_expression]
        [frame_clause]
    )
    
  2. Components:

    • function_name: The name of the window function (e.g., ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE).
    • expression: The column or expression to be used in the function.
    • OVER: Specifies the window over which the function operates.
    • PARTITION BY: Divides the result set into partitions to which the window function is applied.
    • ORDER BY: Defines the order of rows within each partition.
    • frame_clause: Defines the subset of rows within the partition to be considered by the window function.

Common Window Functions

  1. ROW_NUMBER()

Assigns a unique sequential integer to rows within a partition of a result set.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM 
    employees;

  1. RANK()

Assigns a rank to each row within a partition of a result set, with gaps in the ranking for ties.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM 
    employees;

  1. DENSE_RANK()

Similar to RANK(), but without gaps in the ranking for ties.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM 
    employees;

  1. NTILE()

Distributes the rows in an ordered partition into a specified number of approximately equal groups.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile
FROM 
    employees;

  1. LAG() and LEAD()

Accesses data from a previous or subsequent row in the same result set without the use of a self-join.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY salary) AS prev_salary,
    LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY salary) AS next_salary
FROM 
    employees;

  1. FIRST_VALUE() and LAST_VALUE()

Returns the first or last value in an ordered partition of a result set.

SELECT 
    employee_id, 
    department_id, 
    salary, 
    FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary) AS min_salary,
    LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary) AS max_salary
FROM 
    employees;

Practical Example

Consider a table sales with the following structure:

sale_id employee_id sale_amount sale_date
1 101 500 2023-01-01
2 102 700 2023-01-02
3 101 300 2023-01-03
4 103 400 2023-01-04
5 102 600 2023-01-05

Example Query

SELECT 
    employee_id, 
    sale_amount, 
    sale_date, 
    SUM(sale_amount) OVER (PARTITION BY employee_id ORDER BY sale_date) AS running_total
FROM 
    sales;

Result

employee_id sale_amount sale_date running_total
101 500 2023-01-01 500
101 300 2023-01-03 800
102 700 2023-01-02 700
102 600 2023-01-05 1300
103 400 2023-01-04 400

Exercises

Exercise 1: Ranking Sales

Write a query to rank employees based on their total sales amount.

SELECT 
    employee_id, 
    SUM(sale_amount) AS total_sales, 
    RANK() OVER (ORDER BY SUM(sale_amount) DESC) AS sales_rank
FROM 
    sales
GROUP BY 
    employee_id;

Exercise 2: Calculating Moving Average

Write a query to calculate the moving average of sales for each employee over the last 3 sales.

SELECT 
    employee_id, 
    sale_amount, 
    sale_date, 
    AVG(sale_amount) OVER (PARTITION BY employee_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM 
    sales;

Common Mistakes and Tips

  • Misunderstanding PARTITION BY: Ensure you understand how PARTITION BY divides the result set. It’s similar to GROUP BY but used within the window function context.
  • Incorrect Frame Specification: Be careful with the frame clause. Incorrectly specifying the frame can lead to unexpected results.
  • Performance Considerations: Window functions can be resource-intensive. Optimize your queries and use indexes where appropriate.

Conclusion

Window functions are a versatile and powerful tool in SQL, enabling complex calculations and data analysis within partitions of your data. By mastering window functions, you can perform advanced data manipulations and gain deeper insights from your datasets. In the next module, we will explore real-world use cases and best practices for applying SQL in various scenarios.

SQL Course

Module 1: Introduction to SQL

Module 2: Basic SQL Queries

Module 3: Working with Multiple Tables

Module 4: Advanced Data Filtering

Module 5: Data Manipulation

Module 6: Advanced SQL Functions

Module 7: Subqueries and Nested Queries

Module 8: Indexes and Performance Optimization

Module 9: Transactions and Concurrency

Module 10: Advanced Topics

Module 11: SQL in Practice

Module 12: Final Project

© Copyright 2024. All rights reserved