Non-parametric methods are statistical techniques that do not assume a specific distribution for the data. These methods are particularly useful when the data does not meet the assumptions required for parametric tests, such as normality. Non-parametric methods are often used when dealing with ordinal data or when the sample size is small.

Key Concepts

  1. Characteristics of Non-Parametric Methods

  • Distribution-Free: Do not assume a specific distribution for the population.
  • Robustness: More robust to outliers and skewed data.
  • Flexibility: Can be used with ordinal data and non-linear relationships.

  1. Common Non-Parametric Tests

  • Sign Test: Used to test the median of a single sample or the difference between paired samples.
  • Wilcoxon Signed-Rank Test: Used for comparing two related samples.
  • Mann-Whitney U Test: Used for comparing two independent samples.
  • Kruskal-Wallis Test: Used for comparing more than two independent samples.
  • Spearman's Rank Correlation: Used for assessing the relationship between two variables.

Detailed Explanation and Examples

  1. Sign Test

The sign test is a simple non-parametric test used to determine if there is a significant difference between the medians of two related samples.

Example:

Suppose we have the following paired data representing the before and after weights of 10 individuals following a diet program:

Individual Before (kg) After (kg)
1 70 68
2 80 78
3 65 66
4 90 85
5 75 74
6 85 83
7 95 92
8 60 59
9 78 77
10 82 80

To perform the sign test:

  1. Calculate the differences between the before and after weights.
  2. Count the number of positive and negative differences.
  3. Use the binomial distribution to determine if the number of positive or negative differences is significantly different from what would be expected by chance.
from scipy.stats import binom_test

# Differences
differences = [2, 2, -1, 5, 1, 2, 3, 1, 1, 2]

# Count positive and negative differences
positive_diffs = sum(d > 0 for d in differences)
negative_diffs = sum(d < 0 for d in differences)

# Perform binom test
p_value = binom_test(positive_diffs, n=len(differences), p=0.5, alternative='two-sided')
print(f"P-value: {p_value}")

  1. Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank Test is used to compare two related samples to assess whether their population mean ranks differ.

Example:

Using the same data as above, we can perform the Wilcoxon Signed-Rank Test.

from scipy.stats import wilcoxon

# Before and after weights
before = [70, 80, 65, 90, 75, 85, 95, 60, 78, 82]
after = [68, 78, 66, 85, 74, 83, 92, 59, 77, 80]

# Perform Wilcoxon Signed-Rank Test
stat, p_value = wilcoxon(before, after)
print(f"Statistic: {stat}, P-value: {p_value}")

  1. Mann-Whitney U Test

The Mann-Whitney U Test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.

Example:

Suppose we have two independent groups of students' test scores:

Group A Group B
85 78
90 82
88 80
92 85
87 79
from scipy.stats import mannwhitneyu

# Test scores
group_a = [85, 90, 88, 92, 87]
group_b = [78, 82, 80, 85, 79]

# Perform Mann-Whitney U Test
stat, p_value = mannwhitneyu(group_a, group_b)
print(f"Statistic: {stat}, P-value: {p_value}")

  1. Kruskal-Wallis Test

The Kruskal-Wallis Test is used for comparing more than two independent samples.

Example:

Suppose we have three independent groups of students' test scores:

Group A Group B Group C
85 78 88
90 82 85
88 80 87
92 85 90
87 79 89
from scipy.stats import kruskal

# Test scores
group_a = [85, 90, 88, 92, 87]
group_b = [78, 82, 80, 85, 79]
group_c = [88, 85, 87, 90, 89]

# Perform Kruskal-Wallis Test
stat, p_value = kruskal(group_a, group_b, group_c)
print(f"Statistic: {stat}, P-value: {p_value}")

  1. Spearman's Rank Correlation

Spearman's Rank Correlation is used to assess the strength and direction of the association between two ranked variables.

Example:

Suppose we have the following data representing the ranks of students in two subjects:

Student Rank in Math Rank in Science
1 1 2
2 2 1
3 3 4
4 4 3
5 5 5
from scipy.stats import spearmanr

# Ranks in Math and Science
math_ranks = [1, 2, 3, 4, 5]
science_ranks = [2, 1, 4, 3, 5]

# Perform Spearman's Rank Correlation
corr, p_value = spearmanr(math_ranks, science_ranks)
print(f"Correlation: {corr}, P-value: {p_value}")

Practical Exercises

Exercise 1: Sign Test

Given the following paired data, perform a sign test to determine if there is a significant difference between the medians of the two samples.

Individual Before After
1 50 48
2 55 53
3 60 59
4 65 64
5 70 68

Solution:

from scipy.stats import binom_test

# Differences
differences = [2, 2, 1, 1, 2]

# Count positive and negative differences
positive_diffs = sum(d > 0 for d in differences)
negative_diffs = sum(d < 0 for d in differences)

# Perform binom test
p_value = binom_test(positive_diffs, n=len(differences), p=0.5, alternative='two-sided')
print(f"P-value: {p_value}")

Exercise 2: Wilcoxon Signed-Rank Test

Given the following paired data, perform a Wilcoxon Signed-Rank Test to determine if there is a significant difference between the two samples.

Individual Before After
1 50 48
2 55 53
3 60 59
4 65 64
5 70 68

Solution:

from scipy.stats import wilcoxon

# Before and after weights
before = [50, 55, 60, 65, 70]
after = [48, 53, 59, 64, 68]

# Perform Wilcoxon Signed-Rank Test
stat, p_value = wilcoxon(before, after)
print(f"Statistic: {stat}, P-value: {p_value}")

Conclusion

Non-parametric methods are essential tools in statistics, especially when dealing with data that does not meet the assumptions required for parametric tests. Understanding and applying these methods can provide robust and reliable results in various scenarios. In this section, we covered the basic concepts, common non-parametric tests, and provided practical examples and exercises to reinforce the learned concepts. In the next module, we will explore practical applications of statistics in different fields.

© Copyright 2024. All rights reserved