Non-parametric methods are statistical techniques that do not assume a specific distribution for the data. These methods are particularly useful when the data does not meet the assumptions required for parametric tests, such as normality. Non-parametric methods are often used when dealing with ordinal data or when the sample size is small.
Key Concepts
- Characteristics of Non-Parametric Methods
- Distribution-Free: Do not assume a specific distribution for the population.
- Robustness: More robust to outliers and skewed data.
- Flexibility: Can be used with ordinal data and non-linear relationships.
- Common Non-Parametric Tests
- Sign Test: Used to test the median of a single sample or the difference between paired samples.
- Wilcoxon Signed-Rank Test: Used for comparing two related samples.
- Mann-Whitney U Test: Used for comparing two independent samples.
- Kruskal-Wallis Test: Used for comparing more than two independent samples.
- Spearman's Rank Correlation: Used for assessing the relationship between two variables.
Detailed Explanation and Examples
- Sign Test
The sign test is a simple non-parametric test used to determine if there is a significant difference between the medians of two related samples.
Example:
Suppose we have the following paired data representing the before and after weights of 10 individuals following a diet program:
Individual | Before (kg) | After (kg) |
---|---|---|
1 | 70 | 68 |
2 | 80 | 78 |
3 | 65 | 66 |
4 | 90 | 85 |
5 | 75 | 74 |
6 | 85 | 83 |
7 | 95 | 92 |
8 | 60 | 59 |
9 | 78 | 77 |
10 | 82 | 80 |
To perform the sign test:
- Calculate the differences between the before and after weights.
- Count the number of positive and negative differences.
- Use the binomial distribution to determine if the number of positive or negative differences is significantly different from what would be expected by chance.
from scipy.stats import binom_test # Differences differences = [2, 2, -1, 5, 1, 2, 3, 1, 1, 2] # Count positive and negative differences positive_diffs = sum(d > 0 for d in differences) negative_diffs = sum(d < 0 for d in differences) # Perform binom test p_value = binom_test(positive_diffs, n=len(differences), p=0.5, alternative='two-sided') print(f"P-value: {p_value}")
- Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test is used to compare two related samples to assess whether their population mean ranks differ.
Example:
Using the same data as above, we can perform the Wilcoxon Signed-Rank Test.
from scipy.stats import wilcoxon # Before and after weights before = [70, 80, 65, 90, 75, 85, 95, 60, 78, 82] after = [68, 78, 66, 85, 74, 83, 92, 59, 77, 80] # Perform Wilcoxon Signed-Rank Test stat, p_value = wilcoxon(before, after) print(f"Statistic: {stat}, P-value: {p_value}")
- Mann-Whitney U Test
The Mann-Whitney U Test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.
Example:
Suppose we have two independent groups of students' test scores:
Group A | Group B |
---|---|
85 | 78 |
90 | 82 |
88 | 80 |
92 | 85 |
87 | 79 |
from scipy.stats import mannwhitneyu # Test scores group_a = [85, 90, 88, 92, 87] group_b = [78, 82, 80, 85, 79] # Perform Mann-Whitney U Test stat, p_value = mannwhitneyu(group_a, group_b) print(f"Statistic: {stat}, P-value: {p_value}")
- Kruskal-Wallis Test
The Kruskal-Wallis Test is used for comparing more than two independent samples.
Example:
Suppose we have three independent groups of students' test scores:
Group A | Group B | Group C |
---|---|---|
85 | 78 | 88 |
90 | 82 | 85 |
88 | 80 | 87 |
92 | 85 | 90 |
87 | 79 | 89 |
from scipy.stats import kruskal # Test scores group_a = [85, 90, 88, 92, 87] group_b = [78, 82, 80, 85, 79] group_c = [88, 85, 87, 90, 89] # Perform Kruskal-Wallis Test stat, p_value = kruskal(group_a, group_b, group_c) print(f"Statistic: {stat}, P-value: {p_value}")
- Spearman's Rank Correlation
Spearman's Rank Correlation is used to assess the strength and direction of the association between two ranked variables.
Example:
Suppose we have the following data representing the ranks of students in two subjects:
Student | Rank in Math | Rank in Science |
---|---|---|
1 | 1 | 2 |
2 | 2 | 1 |
3 | 3 | 4 |
4 | 4 | 3 |
5 | 5 | 5 |
from scipy.stats import spearmanr # Ranks in Math and Science math_ranks = [1, 2, 3, 4, 5] science_ranks = [2, 1, 4, 3, 5] # Perform Spearman's Rank Correlation corr, p_value = spearmanr(math_ranks, science_ranks) print(f"Correlation: {corr}, P-value: {p_value}")
Practical Exercises
Exercise 1: Sign Test
Given the following paired data, perform a sign test to determine if there is a significant difference between the medians of the two samples.
Individual | Before | After |
---|---|---|
1 | 50 | 48 |
2 | 55 | 53 |
3 | 60 | 59 |
4 | 65 | 64 |
5 | 70 | 68 |
Solution:
from scipy.stats import binom_test # Differences differences = [2, 2, 1, 1, 2] # Count positive and negative differences positive_diffs = sum(d > 0 for d in differences) negative_diffs = sum(d < 0 for d in differences) # Perform binom test p_value = binom_test(positive_diffs, n=len(differences), p=0.5, alternative='two-sided') print(f"P-value: {p_value}")
Exercise 2: Wilcoxon Signed-Rank Test
Given the following paired data, perform a Wilcoxon Signed-Rank Test to determine if there is a significant difference between the two samples.
Individual | Before | After |
---|---|---|
1 | 50 | 48 |
2 | 55 | 53 |
3 | 60 | 59 |
4 | 65 | 64 |
5 | 70 | 68 |
Solution:
from scipy.stats import wilcoxon # Before and after weights before = [50, 55, 60, 65, 70] after = [48, 53, 59, 64, 68] # Perform Wilcoxon Signed-Rank Test stat, p_value = wilcoxon(before, after) print(f"Statistic: {stat}, P-value: {p_value}")
Conclusion
Non-parametric methods are essential tools in statistics, especially when dealing with data that does not meet the assumptions required for parametric tests. Understanding and applying these methods can provide robust and reliable results in various scenarios. In this section, we covered the basic concepts, common non-parametric tests, and provided practical examples and exercises to reinforce the learned concepts. In the next module, we will explore practical applications of statistics in different fields.