Descriptive analysis is a fundamental aspect of data analysis that focuses on summarizing and interpreting data to provide insights into past and current states. This type of analysis helps business analysts understand what has happened in the business and identify patterns or trends.

Key Concepts of Descriptive Analysis

  1. Data Collection: Gathering relevant data from various sources.
  2. Data Cleaning: Ensuring the data is accurate and free from errors.
  3. Data Summarization: Using statistical measures to summarize the data.
  4. Data Visualization: Representing data in graphical formats to make it easier to understand.

Steps in Descriptive Analysis

  1. Define Objectives: Clearly outline what you aim to achieve with the analysis.
  2. Collect Data: Gather data from relevant sources such as databases, surveys, or logs.
  3. Clean Data: Remove any inconsistencies, duplicates, or errors in the data.
  4. Analyze Data: Use statistical methods to summarize the data.
  5. Visualize Data: Create charts, graphs, and tables to represent the data visually.
  6. Interpret Results: Draw conclusions from the data and summarize the findings.

Common Techniques in Descriptive Analysis

Measures of Central Tendency

  • Mean: The average value of a dataset.
  • Median: The middle value when the data is ordered.
  • Mode: The most frequently occurring value in the dataset.

Measures of Dispersion

  • Range: The difference between the highest and lowest values.
  • Variance: The average of the squared differences from the mean.
  • Standard Deviation: The square root of the variance, indicating how spread out the values are.

Data Visualization Techniques

  • Bar Charts: Used to compare different categories.
  • Histograms: Show the distribution of a dataset.
  • Pie Charts: Represent the proportion of different categories.
  • Line Graphs: Display trends over time.

Practical Example

Let's consider a dataset of monthly sales figures for a retail store. We will perform a descriptive analysis to understand the sales performance.

Step-by-Step Example

  1. Define Objectives: Understand the monthly sales trends and identify peak sales periods.

  2. Collect Data: Assume we have the following sales data for 12 months:

    January: $10,000
    February: $12,000
    March: $9,000
    April: $15,000
    May: $14,000
    June: $13,000
    July: $16,000
    August: $18,000
    September: $17,000
    October: $20,000
    November: $22,000
    December: $25,000
    
  3. Clean Data: Ensure there are no errors or missing values.

  4. Analyze Data:

    • Mean Sales:

      sales = [10000, 12000, 9000, 15000, 14000, 13000, 16000, 18000, 17000, 20000, 22000, 25000]
      mean_sales = sum(sales) / len(sales)
      print(f"Mean Sales: ${mean_sales}")
      

      Output:

      Mean Sales: $16000.0
      
    • Median Sales:

      sorted_sales = sorted(sales)
      n = len(sorted_sales)
      median_sales = (sorted_sales[n//2 - 1] + sorted_sales[n//2]) / 2 if n % 2 == 0 else sorted_sales[n//2]
      print(f"Median Sales: ${median_sales}")
      

      Output:

      Median Sales: $15500.0
      
    • Mode Sales:

      from statistics import mode
      mode_sales = mode(sales)
      print(f"Mode Sales: ${mode_sales}")
      

      Output:

      Mode Sales: $10000
      
    • Standard Deviation:

      import statistics
      std_dev_sales = statistics.stdev(sales)
      print(f"Standard Deviation of Sales: ${std_dev_sales:.2f}")
      

      Output:

      Standard Deviation of Sales: $5227.82
      
  5. Visualize Data:

    • Bar Chart:
      import matplotlib.pyplot as plt
      
      months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
      plt.bar(months, sales)
      plt.xlabel('Month')
      plt.ylabel('Sales ($)')
      plt.title('Monthly Sales')
      plt.show()
      
  6. Interpret Results:

    • The mean sales figure is $16,000, indicating the average monthly sales.
    • The median sales figure is $15,500, showing the middle point of the sales data.
    • The mode sales figure is $10,000, which is the most frequently occurring sales value.
    • The standard deviation is $5,227.82, indicating the variability in monthly sales.
    • The bar chart visually represents the sales data, showing a clear upward trend with peak sales in December.

Practical Exercise

Exercise: Perform a descriptive analysis on the following dataset of weekly website visits:

Week 1: 500
Week 2: 600
Week 3: 550
Week 4: 700
Week 5: 650
Week 6: 800
Week 7: 750
Week 8: 900
Week 9: 850
Week 10: 950
Week 11: 1000
Week 12: 1100
  1. Calculate the mean, median, mode, and standard deviation of the weekly visits.
  2. Create a line graph to visualize the weekly visits.

Solution:

  1. Calculations:

    visits = [500, 600, 550, 700, 650, 800, 750, 900, 850, 950, 1000, 1100]
    
    # Mean
    mean_visits = sum(visits) / len(visits)
    
    # Median
    sorted_visits = sorted(visits)
    n = len(sorted_visits)
    median_visits = (sorted_visits[n//2 - 1] + sorted_visits[n//2]) / 2 if n % 2 == 0 else sorted_visits[n//2]
    
    # Mode
    from statistics import mode
    mode_visits = mode(visits)
    
    # Standard Deviation
    import statistics
    std_dev_visits = statistics.stdev(visits)
    
    print(f"Mean Visits: {mean_visits}")
    print(f"Median Visits: {median_visits}")
    print(f"Mode Visits: {mode_visits}")
    print(f"Standard Deviation of Visits: {std_dev_visits:.2f}")
    
  2. Visualization:

    import matplotlib.pyplot as plt
    
    weeks = [f"Week {i}" for i in range(1, 13)]
    plt.plot(weeks, visits, marker='o')
    plt.xlabel('Week')
    plt.ylabel('Visits')
    plt.title('Weekly Website Visits')
    plt.grid(True)
    plt.show()
    

Output:

Mean Visits: 775.0
Median Visits: 775.0
Mode Visits: 500
Standard Deviation of Visits: 189.74

Summary

Descriptive analysis is a crucial step in understanding and interpreting data. By summarizing data through measures of central tendency and dispersion, and visualizing it using various charts, business analysts can gain valuable insights into business performance and trends. This foundational knowledge sets the stage for more advanced analysis techniques, such as predictive and prescriptive analysis.

© Copyright 2024. All rights reserved