Subqueries, also known as inner queries or nested queries, are queries embedded within another SQL query. They are powerful tools that allow you to perform more complex data retrieval operations by using the result of one query as an input for another. In this section, we will explore the concept of subqueries, their types, and how to use them effectively in BigQuery.
What is a Subquery?
A subquery is a query nested inside another query. The outer query is called the main query, and the inner query is called the subquery. Subqueries can be used in various parts of a SQL statement, including the SELECT
, FROM
, WHERE
, and HAVING
clauses.
Types of Subqueries
- Single-Row Subqueries: Return a single row and single column.
- Multiple-Row Subqueries: Return multiple rows and a single column.
- Multiple-Column Subqueries: Return multiple rows and multiple columns.
- Correlated Subqueries: Refer to columns in the outer query and are evaluated once for each row processed by the outer query.
Basic Subquery Syntax
Here is the basic syntax for a subquery:
In this example, the subquery (SELECT column4 FROM table2 WHERE condition)
is executed first, and its result is used by the outer query.
Practical Examples
Example 1: Single-Row Subquery
Find the name of the employee with the highest salary.
Explanation:
- The subquery
SELECT MAX(salary) FROM employees
returns the highest salary. - The outer query selects the name of the employee with that salary.
Example 2: Multiple-Row Subquery
Find the names of employees who work in the same department as 'John Doe'.
SELECT name FROM employees WHERE department_id = (SELECT department_id FROM employees WHERE name = 'John Doe');
Explanation:
- The subquery
SELECT department_id FROM employees WHERE name = 'John Doe'
returns the department ID of 'John Doe'. - The outer query selects the names of all employees in that department.
Example 3: Correlated Subquery
Find employees who earn more than the average salary in their department.
SELECT name, salary FROM employees e1 WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);
Explanation:
- The subquery
SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id
calculates the average salary for each department. - The outer query selects employees whose salary is greater than the average salary of their department.
Exercises
Exercise 1: Single-Row Subquery
Find the name of the department with the highest budget.
Exercise 2: Multiple-Row Subquery
List the names of employees who have the same job title as 'Jane Smith'.
SELECT name FROM employees WHERE job_title = (SELECT job_title FROM employees WHERE name = 'Jane Smith');
Exercise 3: Correlated Subquery
Find the names of employees who have a higher salary than the average salary of their department.
SELECT name FROM employees e1 WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.department_id = e2.department_id);
Common Mistakes and Tips
-
Mistake: Using a subquery that returns multiple rows where a single row is expected.
- Solution: Ensure the subquery returns a single value when used with operators like
=
or<
.
- Solution: Ensure the subquery returns a single value when used with operators like
-
Mistake: Forgetting to correlate the subquery with the outer query in correlated subqueries.
- Solution: Always reference the outer query's columns in the subquery where necessary.
-
Tip: Use subqueries to break down complex queries into simpler, more manageable parts.
Conclusion
Subqueries are a powerful feature in SQL that allow you to perform complex queries by nesting one query inside another. By understanding and using subqueries effectively, you can enhance your data retrieval capabilities in BigQuery. Practice the provided exercises to reinforce your understanding and prepare for more advanced SQL topics.
BigQuery Course
Module 1: Introduction to BigQuery
- What is BigQuery?
- Setting Up Your BigQuery Environment
- Understanding BigQuery Architecture
- BigQuery Console Overview
Module 2: Basic SQL in BigQuery
Module 3: Intermediate SQL in BigQuery
Module 4: Advanced SQL in BigQuery
Module 5: BigQuery Data Management
- Loading Data into BigQuery
- Exporting Data from BigQuery
- Data Transformation and Cleaning
- Managing Datasets and Tables
Module 6: BigQuery Performance Optimization
- Query Optimization Techniques
- Understanding Query Execution Plans
- Using Materialized Views
- Optimizing Storage
Module 7: BigQuery Security and Compliance
- Access Control and Permissions
- Data Encryption
- Auditing and Monitoring
- Compliance and Best Practices
Module 8: BigQuery Integration and Automation
- Integrating with Google Cloud Services
- Using BigQuery with Dataflow
- Automating Workflows with Cloud Functions
- Scheduling Queries with Cloud Scheduler
Module 9: BigQuery Machine Learning (BQML)
- Introduction to BigQuery ML
- Creating and Training Models
- Evaluating and Predicting with Models
- Advanced BQML Features