Hypothesis Testing Series - An End to End Guide to Permutation Tests - Part 2
An End to End Guide to Permutation Tests | Hypothesis Testing Series #2
Permutation test is one of the most popular non-parametric hypothesis tests. In this article, we will go through the theory, python implementation & practical use cases of the permutation test. If you are new to hypothesis testing — do checkout the introductory article on this topic:
Introduction
Permutation test is a non-parametric hypothesis test. Given its a non-parametric test, we do not need to have any assumption about the underlying distribution of data. This non-reliance on underlying distribution assumption makes this test useful in the situations where normality assumptions or t-test related assumptions do not hold true.
In this test, we compare the sample to the distribution generated as a result of permutations of the sample data. Below is a quick example:
import itertools
import numpy as np
sample = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
permutations = list(itertools.permutations(sample))
# Calculate the mean for each permutation
permutation_means = [np.mean(list(p)) for p in permutations]
So we could compare the mean of initial sample and the distribution of mean after generating from permutations performed.
Problem Statement
Lets now have a closer look at the problem statement at hand. By and large we are testing if the mean of group A and group B is significantly difference.
Null Hypothesis: No difference between means of two groups
Alternative Hypothesis: Significant difference between means of two groups
Code & Explanation
import numpy as np
np.random.seed(42)
# Generate sample data
group_1 = np.random.normal(10, 2, 50) # Mean = 10, Std = 2
group_2 = np.random.normal(12, 2, 50) # Mean = 12, Std = 2
# Observed test statistic: difference of means
obs_stat = np.mean(group_2) - np.mean(group_1)
# Permutation test
combined = np.concatenate([group_1, group_2])
permuted_stats = []
for _ in range(10000):
np.random.shuffle(combined)
perm_stat = np.mean(combined[:50]) - np.mean(combined[50:])
permuted_stats.append(perm_stat)
# p-value calculation
p_value = np.mean(np.abs(permuted_stats) >= np.abs(obs_stat))
p_value
Distribution of permutation test statistic:

Explanation:
Step 1: In this example, we start with generating two groups whereconsecutively.
- lets say A = [85, 88, 90]
- B = [78, 80, 84]
Step 2: calculate the difference between means of observed samples in each groups
- Mean of Group A = 85+88+903=87.67385+88+90=87.67
- Mean of Group B = 78+80+843=80.67378+80+84=80.67
- Observed Difference=87.67−80.67=7.00
Step 3: Now combine the observed samples in each groups
- Combined=[85,88,90,78,80,84]
Step 4: run a large number of permutation in each permutation — we shuffle the data and split it in two parts. the compute difference of means for each step like below:
- Shuffle the combined dataset randomly: [78, 85, 88, 90, 84, 80]
- Split it into two groups: Group 1 = [78, 85, 88], Group 2 = [90, 84, 80]
- Mean of Group 1 = 78+85+883=83.67378+85+88=83.67
- Mean of Group 2 = 90+84+803=84.67390+84+80=84.67
- Difference in means = 83.67−84.67=−1.0083.67−84.67=−1.00
Step 5: we compute p value by dividing the proportion of the means in permutations above the observed mean.
- Calculate the absolute values of the permuted difference:
- | -0.33 | = 0.33
- | 1.00 | = 1.00
- | -1.00 | = 1.00
p-value=(Number of permutations with absolute value greater than or equal to observed test statistic)/Total number of permutations=0/3=0
What if analysis:
- Change in Population: If sample size increases — permutations distribution might change significantly affecting the P-value. For example: in case the sample size increases — p — value might become smaller which will indicate stronger evidence against the null hypothesis.
- Significance Level: Similarly, if we change the significance level to smaller or bigger we might need more or less stronger evidence to reject the null hypothesis.
Use Cases for Permutation Test & Its Advantage
- Comparing Group Means in Medical Research: Medical data often violate assumptions like normality or equal variance — permutations test might change the

2. Analyzing A/B Testing in Marketing: While performing A/B testing, more often than not — we encounter small sample sizes or non-normal distributions. In all such cases, we could leverage permutation tests.
3. Sports Performance Analysis: Since, sports data are highly variable and might contain very low number of samples — the permutation test will allow us to perform tests without any dependence of sample size or variance assumption.
If you liked the explanation , follow me for more! Feel free to leave your comments if you have any queries or suggestions.
You can also check out other articles written around data science, computing on medium. If you like my work and want to contribute to my journey, you cal always buy me a coffee :)

Reference
[1] https://en.wikipedia.org/wiki/Permutation_test
[2] https://towardsdatascience.com/how-to-use-permutation-tests-bacc79f45749
Comments
Post a Comment