Introduction

Data frames are one of the most important data structures in R. They are used to store tabular data, where each column can contain different types of data (numeric, character, factor, etc.). Data frames are similar to tables in a database or Excel spreadsheets.

Key Concepts

  1. Definition: A data frame is a list of vectors of equal length.
  2. Structure: Each column in a data frame can be of a different type.
  3. Indexing: Data frames can be indexed by row and column.
  4. Manipulation: Various functions are available to manipulate data frames.

Creating Data Frames

You can create a data frame using the data.frame() function.

# Creating a simple data frame
df <- data.frame(
  Name = c("John", "Jane", "Doe"),
  Age = c(23, 25, 28),
  Gender = c("Male", "Female", "Male")
)

# Display the data frame
print(df)

Explanation

  • Name, Age, and Gender are the column names.
  • c("John", "Jane", "Doe") creates a character vector for the Name column.
  • c(23, 25, 28) creates a numeric vector for the Age column.
  • c("Male", "Female", "Male") creates a character vector for the Gender column.

Inspecting Data Frames

You can inspect the structure and contents of a data frame using various functions.

# Display the first few rows
head(df)

# Display the structure of the data frame
str(df)

# Display the summary of the data frame
summary(df)

Explanation

  • head(df) shows the first few rows of the data frame.
  • str(df) provides the structure of the data frame, including data types and sample data.
  • summary(df) gives a summary of each column, including statistics for numeric columns and frequency counts for factors.

Indexing and Subsetting

You can access specific elements, rows, or columns of a data frame using indexing.

# Accessing a specific element
df[1, 2]  # First row, second column

# Accessing a specific row
df[1, ]  # First row

# Accessing a specific column
df[, "Name"]  # Column 'Name'

# Using the $ operator to access a column
df$Age

Explanation

  • df[1, 2] accesses the element in the first row and second column.
  • df[1, ] accesses the entire first row.
  • df[, "Name"] accesses the entire Name column.
  • df$Age is a shorthand to access the Age column.

Adding and Removing Columns

You can add or remove columns in a data frame.

# Adding a new column
df$Height <- c(170, 165, 180)

# Removing a column
df$Gender <- NULL

# Display the updated data frame
print(df)

Explanation

  • df$Height <- c(170, 165, 180) adds a new column Height to the data frame.
  • df$Gender <- NULL removes the Gender column from the data frame.

Practical Exercises

Exercise 1: Create a Data Frame

Create a data frame named students with the following columns: StudentID, Name, Grade, and Passed. Populate it with at least 3 rows of data.

# Solution
students <- data.frame(
  StudentID = c(1, 2, 3),
  Name = c("Alice", "Bob", "Charlie"),
  Grade = c(85, 92, 78),
  Passed = c(TRUE, TRUE, FALSE)
)

print(students)

Exercise 2: Inspect the Data Frame

Use the head(), str(), and summary() functions to inspect the students data frame.

# Solution
head(students)
str(students)
summary(students)

Exercise 3: Subset the Data Frame

Extract the Name and Grade columns from the students data frame.

# Solution
students_subset <- students[, c("Name", "Grade")]
print(students_subset)

Exercise 4: Add and Remove Columns

Add a new column Age to the students data frame and then remove the Passed column.

# Solution
students$Age <- c(20, 21, 19)
students$Passed <- NULL
print(students)

Common Mistakes and Tips

  • Mismatched Lengths: Ensure that all vectors used to create a data frame have the same length.
  • Column Names: Use meaningful column names to make your data frame easier to understand.
  • Indexing: Remember that R uses 1-based indexing, not 0-based.

Conclusion

In this section, you learned about data frames, one of the most versatile and commonly used data structures in R. You now know how to create, inspect, index, and manipulate data frames. These skills are fundamental for data analysis and will be used extensively in subsequent modules.

© Copyright 2024. All rights reserved