Data cleaning is a crucial step in the data preparation process. It involves identifying and correcting (or removing) errors and inconsistencies in data to improve its quality. In Tableau, data cleaning can be performed using various built-in tools and functionalities. This section will cover the following key techniques:
- Handling Null Values
- Removing Duplicates
- Correcting Data Types
- Splitting and Merging Fields
- Filtering Out Unwanted Data
- Using Data Interpreter
- Handling Null Values
Null values can represent missing or undefined data. Tableau provides several ways to handle null values:
Replacing Null Values
You can replace null values with a specific value using the ZN
function or the IFNULL
function.
Filtering Out Null Values
You can filter out rows with null values directly in Tableau.
- Drag the field with null values to the Filters shelf.
- In the filter dialog, uncheck the "Null" option.
- Removing Duplicates
Duplicate records can skew your analysis. To remove duplicates:
- Use Tableau Prep to clean your data before importing it into Tableau Desktop.
- In Tableau Prep, use the "Remove Duplicates" option to eliminate duplicate rows based on selected fields.
- Correcting Data Types
Incorrect data types can lead to errors in calculations and visualizations. To correct data types:
- Right-click on the field in the Data pane.
- Select "Change Data Type" and choose the appropriate type (e.g., String, Number, Date).
- Splitting and Merging Fields
Splitting Fields
You can split a field into multiple fields based on a delimiter.
- Right-click on the field you want to split.
- Select "Transform" > "Split" or "Custom Split".
- Choose the delimiter and the number of splits.
Merging Fields
You can merge multiple fields into a single field using calculated fields.
- Filtering Out Unwanted Data
Filtering out irrelevant data can help focus your analysis.
- Drag the field you want to filter to the Filters shelf.
- In the filter dialog, select the values you want to include or exclude.
- Using Data Interpreter
Tableau's Data Interpreter can help clean and structure your data automatically.
- When you connect to an Excel file, Tableau will prompt you to use the Data Interpreter.
- Click "Use Data Interpreter" to let Tableau clean the data for you.
Practical Exercise
Exercise: Clean a Sample Dataset
- Download the Sample Dataset: Sample Data
- Connect to the Data Source: Open Tableau and connect to the downloaded CSV file.
- Handle Null Values: Replace null values in the "Sales" field with 0.
- Remove Duplicates: Ensure there are no duplicate rows based on the "Order ID" field.
- Correct Data Types: Ensure the "Order Date" field is of type Date.
- Split Fields: Split the "Customer Name" field into "First Name" and "Last Name".
- Merge Fields: Create a new field "Full Address" by merging "Address", "City", and "State".
- Filter Data: Filter out orders with a "Quantity" of 0.
- Use Data Interpreter: If applicable, use the Data Interpreter to clean the data.
Solution
-
Connect to the Data Source:
- Open Tableau and connect to the CSV file.
-
Handle Null Values:
- Create a calculated field:
IFNULL([Sales], 0)
- Create a calculated field:
-
Remove Duplicates:
- Use Tableau Prep to remove duplicates based on "Order ID".
-
Correct Data Types:
- Right-click on "Order Date" > "Change Data Type" > "Date".
-
Split Fields:
- Right-click on "Customer Name" > "Transform" > "Split".
-
Merge Fields:
- Create a calculated field:
[Full Address] = [Address] + ", " + [City] + ", " + [State]
- Create a calculated field:
-
Filter Data:
- Drag "Quantity" to the Filters shelf and exclude 0.
-
Use Data Interpreter:
- If prompted, click "Use Data Interpreter".
Conclusion
In this section, we covered essential data cleaning techniques in Tableau, including handling null values, removing duplicates, correcting data types, splitting and merging fields, filtering out unwanted data, and using the Data Interpreter. These techniques are fundamental for ensuring the quality and accuracy of your data, which is crucial for effective data analysis and visualization. In the next module, we will delve into advanced analytics techniques to further enhance your Tableau skills.
Tableau Course
Module 1: Introduction to Tableau
- What is Tableau?
- Installing Tableau
- Tableau Interface Overview
- Connecting to Data Sources
- Basic Data Types and Structures
Module 2: Basic Visualization Techniques
- Creating Your First Visualization
- Using Marks and Cards
- Building Basic Charts
- Filtering Data
- Sorting and Grouping Data
Module 3: Intermediate Visualization Techniques
- Using Calculated Fields
- Creating Dual-Axis Charts
- Using Parameters
- Creating Maps
- Using Table Calculations
Module 4: Advanced Visualization Techniques
- Advanced Chart Types
- Using LOD Expressions
- Creating Dashboards
- Dashboard Actions
- Storytelling with Data
Module 5: Data Preparation and Transformation
Module 6: Advanced Analytics
Module 7: Performance Optimization
- Optimizing Workbook Performance
- Extracts vs Live Connections
- Reducing Load Times
- Performance Recording
- Best Practices for Performance
Module 8: Tableau Server and Online
- Introduction to Tableau Server
- Publishing Workbooks
- Managing Permissions
- Scheduling Extracts
- Collaborating with Tableau Online