Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows that are somehow related to the current row. Unlike aggregate functions, which return a single value for a group of rows, window functions return a value for each row in the result set.
Key Concepts
-
Window Function Syntax:
function_name (expression) OVER ( [PARTITION BY partition_expression] [ORDER BY sort_expression] [frame_clause] ) -
Components:
- function_name: The name of the window function (e.g.,
ROW_NUMBER,RANK,DENSE_RANK,NTILE,LAG,LEAD,FIRST_VALUE,LAST_VALUE). - expression: The column or expression to be used in the function.
- OVER: Specifies the window over which the function operates.
- PARTITION BY: Divides the result set into partitions to which the window function is applied.
- ORDER BY: Defines the order of rows within each partition.
- frame_clause: Defines the subset of rows within the partition to be considered by the window function.
- function_name: The name of the window function (e.g.,
Common Window Functions
- ROW_NUMBER()
Assigns a unique sequential integer to rows within a partition of a result set.
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM
employees;
- RANK()
Assigns a rank to each row within a partition of a result set, with gaps in the ranking for ties.
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM
employees;
- DENSE_RANK()
Similar to RANK(), but without gaps in the ranking for ties.
SELECT
employee_id,
department_id,
salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM
employees;
- NTILE()
Distributes the rows in an ordered partition into a specified number of approximately equal groups.
SELECT
employee_id,
department_id,
salary,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile
FROM
employees;
- LAG() and LEAD()
Accesses data from a previous or subsequent row in the same result set without the use of a self-join.
SELECT
employee_id,
department_id,
salary,
LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY salary) AS prev_salary,
LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY salary) AS next_salary
FROM
employees;
- FIRST_VALUE() and LAST_VALUE()
Returns the first or last value in an ordered partition of a result set.
SELECT
employee_id,
department_id,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary) AS min_salary,
LAST_VALUE(salary) OVER (PARTITION BY department_id ORDER BY salary) AS max_salary
FROM
employees;Practical Example
Consider a table sales with the following structure:
| sale_id | employee_id | sale_amount | sale_date |
|---|---|---|---|
| 1 | 101 | 500 | 2023-01-01 |
| 2 | 102 | 700 | 2023-01-02 |
| 3 | 101 | 300 | 2023-01-03 |
| 4 | 103 | 400 | 2023-01-04 |
| 5 | 102 | 600 | 2023-01-05 |
Example Query
SELECT
employee_id,
sale_amount,
sale_date,
SUM(sale_amount) OVER (PARTITION BY employee_id ORDER BY sale_date) AS running_total
FROM
sales;Result
| employee_id | sale_amount | sale_date | running_total |
|---|---|---|---|
| 101 | 500 | 2023-01-01 | 500 |
| 101 | 300 | 2023-01-03 | 800 |
| 102 | 700 | 2023-01-02 | 700 |
| 102 | 600 | 2023-01-05 | 1300 |
| 103 | 400 | 2023-01-04 | 400 |
Exercises
Exercise 1: Ranking Sales
Write a query to rank employees based on their total sales amount.
SELECT
employee_id,
SUM(sale_amount) AS total_sales,
RANK() OVER (ORDER BY SUM(sale_amount) DESC) AS sales_rank
FROM
sales
GROUP BY
employee_id;Exercise 2: Calculating Moving Average
Write a query to calculate the moving average of sales for each employee over the last 3 sales.
SELECT
employee_id,
sale_amount,
sale_date,
AVG(sale_amount) OVER (PARTITION BY employee_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM
sales;Common Mistakes and Tips
- Misunderstanding PARTITION BY: Ensure you understand how
PARTITION BYdivides the result set. It’s similar toGROUP BYbut used within the window function context. - Incorrect Frame Specification: Be careful with the frame clause. Incorrectly specifying the frame can lead to unexpected results.
- Performance Considerations: Window functions can be resource-intensive. Optimize your queries and use indexes where appropriate.
Conclusion
Window functions are a versatile and powerful tool in SQL, enabling complex calculations and data analysis within partitions of your data. By mastering window functions, you can perform advanced data manipulations and gain deeper insights from your datasets. In the next module, we will explore real-world use cases and best practices for applying SQL in various scenarios.
SQL Course
Module 1: Introduction to SQL
Module 2: Basic SQL Queries
Module 3: Working with Multiple Tables
Module 4: Advanced Data Filtering
- Using LIKE for Pattern Matching
- IN and BETWEEN Operators
- NULL Values and IS NULL
- Aggregating Data with GROUP BY
- HAVING Clause
Module 5: Data Manipulation
Module 6: Advanced SQL Functions
Module 7: Subqueries and Nested Queries
- Introduction to Subqueries
- Correlated Subqueries
- EXISTS and NOT EXISTS
- Using Subqueries in SELECT, FROM, and WHERE Clauses
Module 8: Indexes and Performance Optimization
- Understanding Indexes
- Creating and Managing Indexes
- Query Optimization Techniques
- Analyzing Query Performance
