Introduction

In this case study, we will apply the concepts learned in the Data Visualization module to a real-world dataset. The goal is to create meaningful visualizations that can help us understand the data better and communicate our findings effectively. We will use both base R graphics and the ggplot2 package to create various types of plots.

Dataset

For this case study, we will use the mtcars dataset, which is a built-in dataset in R. It contains data about various car models, including their miles per gallon (mpg), number of cylinders, horsepower, and more.

Dataset Overview

# Load the dataset
data(mtcars)

# Display the first few rows of the dataset
head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Step-by-Step Visualization

  1. Scatter Plot: MPG vs Horsepower

Base R Graphics

# Scatter plot using base R
plot(mtcars$hp, mtcars$mpg,
     main = "MPG vs Horsepower",
     xlab = "Horsepower",
     ylab = "Miles Per Gallon",
     pch = 19, col = "blue")

ggplot2

# Load ggplot2 package
library(ggplot2)

# Scatter plot using ggplot2
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "blue") +
  labs(title = "MPG vs Horsepower",
       x = "Horsepower",
       y = "Miles Per Gallon")

  1. Box Plot: MPG by Number of Cylinders

Base R Graphics

# Box plot using base R
boxplot(mpg ~ cyl, data = mtcars,
        main = "MPG by Number of Cylinders",
        xlab = "Number of Cylinders",
        ylab = "Miles Per Gallon",
        col = c("red", "green", "blue"))

ggplot2

# Box plot using ggplot2
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot(aes(fill = factor(cyl))) +
  labs(title = "MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Miles Per Gallon") +
  scale_fill_manual(values = c("red", "green", "blue"))

  1. Histogram: Distribution of MPG

Base R Graphics

# Histogram using base R
hist(mtcars$mpg,
     main = "Distribution of MPG",
     xlab = "Miles Per Gallon",
     col = "purple",
     breaks = 10)

ggplot2

# Histogram using ggplot2
ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(binwidth = 2, fill = "purple", color = "black") +
  labs(title = "Distribution of MPG",
       x = "Miles Per Gallon",
       y = "Frequency")

  1. Bar Plot: Count of Cars by Gear

Base R Graphics

# Bar plot using base R
barplot(table(mtcars$gear),
        main = "Count of Cars by Gear",
        xlab = "Number of Gears",
        ylab = "Count",
        col = "orange")

ggplot2

# Bar plot using ggplot2
ggplot(mtcars, aes(x = factor(gear))) +
  geom_bar(fill = "orange") +
  labs(title = "Count of Cars by Gear",
       x = "Number of Gears",
       y = "Count")

Practical Exercises

Exercise 1: Scatter Plot with Regression Line

Create a scatter plot of wt (weight) vs mpg and add a regression line to it using ggplot2.

Solution

# Scatter plot with regression line using ggplot2
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "MPG vs Weight with Regression Line",
       x = "Weight (1000 lbs)",
       y = "Miles Per Gallon")

Exercise 2: Faceted Plot

Create a faceted plot of mpg vs hp for each number of cylinders using ggplot2.

Solution

# Faceted plot using ggplot2
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  facet_wrap(~ cyl) +
  labs(title = "MPG vs Horsepower by Number of Cylinders",
       x = "Horsepower",
       y = "Miles Per Gallon")

Conclusion

In this case study, we explored various types of visualizations using both base R graphics and the ggplot2 package. We created scatter plots, box plots, histograms, and bar plots to analyze the mtcars dataset. Additionally, we practiced creating more complex visualizations such as scatter plots with regression lines and faceted plots. These skills are essential for effectively communicating data insights and making data-driven decisions.

© Copyright 2024. All rights reserved