Hypothesis Testing Series - An End to End Guide to Permutation Tests - Part 2

An End to End Guide to Permutation Tests | Hypothesis Testing Series #2

Permutation test is one of the most popular non-parametric hypothesis tests. In this article, we will go through the theory, python implementation & practical use cases of the permutation test. If you are new to hypothesis testing — do checkout the introductory article on this topic:

End to End Guide to Hypothesis Testing in Statistics: Concepts, Methods, and Examples

Some experiments, such as testing efficacy of a drug or if a new feature in an app had an impact on app downloads…

indiequant.medium.com

Introduction

Permutation test is a non-parametric hypothesis test. Given its a non-parametric test, we do not need to have any assumption about the underlying distribution of data. This non-reliance on underlying distribution assumption makes this test useful in the situations where normality assumptions or t-test related assumptions do not hold true.

In this test, we compare the sample to the distribution generated as a result of permutations of the sample data. Below is a quick example:

import itertools
import numpy as np

sample = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
permutations = list(itertools.permutations(sample))

# Calculate the mean for each permutation
permutation_means = [np.mean(list(p)) for p in permutations]

So we could compare the mean of initial sample and the distribution of mean after generating from permutations performed.

Problem Statement

Lets now have a closer look at the problem statement at hand. By and large we are testing if the mean of group A and group B is significantly difference.

Null Hypothesis: No difference between means of two groups

Alternative Hypothesis: Significant difference between means of two groups

The 5 Most Popular Regression Techniques Explained

Regression is a widely used method for modeling relationships among variables. Different regression techniques can suit…

indiequant.medium.com

Code & Explanation

import numpy as np
np.random.seed(42)
# Generate sample data
group_1 = np.random.normal(10, 2, 50) # Mean = 10, Std = 2
group_2 = np.random.normal(12, 2, 50) # Mean = 12, Std = 2
# Observed test statistic: difference of means
obs_stat = np.mean(group_2) - np.mean(group_1)
# Permutation test
combined = np.concatenate([group_1, group_2])
permuted_stats = []
for _ in range(10000):
    np.random.shuffle(combined)
    perm_stat = np.mean(combined[:50]) - np.mean(combined[50:])
    permuted_stats.append(perm_stat)
# p-value calculation
p_value = np.mean(np.abs(permuted_stats) >= np.abs(obs_stat))
p_value

Distribution of permutation test statistic:

Explanation:

Step 1: In this example, we start with generating two groups whereconsecutively.

lets say A = [85, 88, 90]
B = [78, 80, 84]

Step 2: calculate the difference between means of observed samples in each groups

Mean of Group A = 85+88+903=87.67385+88+90=87.67
Mean of Group B = 78+80+843=80.67378+80+84=80.67
Observed Difference=87.67−80.67=7.00

Step 3: Now combine the observed samples in each groups

Combined=[85,88,90,78,80,84]

Step 4: run a large number of permutation in each permutation — we shuffle the data and split it in two parts. the compute difference of means for each step like below:

Shuffle the combined dataset randomly: [78, 85, 88, 90, 84, 80]
Split it into two groups: Group 1 = [78, 85, 88], Group 2 = [90, 84, 80]
Mean of Group 1 = 78+85+883=83.67378+85+88=83.67
Mean of Group 2 = 90+84+803=84.67390+84+80=84.67
Difference in means = 83.67−84.67=−1.0083.67−84.67=−1.00

Step 5: we compute p value by dividing the proportion of the means in permutations above the observed mean.

Calculate the absolute values of the permuted difference:
| -0.33 | = 0.33
| 1.00 | = 1.00
| -1.00 | = 1.00

p-value=(Number of permutations with absolute value greater than or equal to observed test statistic)/Total number of permutations=0/3=0

What if analysis:

Change in Population: If sample size increases — permutations distribution might change significantly affecting the P-value. For example: in case the sample size increases — p — value might become smaller which will indicate stronger evidence against the null hypothesis.
Significance Level: Similarly, if we change the significance level to smaller or bigger we might need more or less stronger evidence to reject the null hypothesis.

Introduction to BIRCH Clustering & Python Implementation

Introduction to Clustering & need for BIRCH

python.plainenglish.io

Use Cases for Permutation Test & Its Advantage

Comparing Group Means in Medical Research: Medical data often violate assumptions like normality or equal variance — permutations test might change the

2. Analyzing A/B Testing in Marketing: While performing A/B testing, more often than not — we encounter small sample sizes or non-normal distributions. In all such cases, we could leverage permutation tests.

3. Sports Performance Analysis: Since, sports data are highly variable and might contain very low number of samples — the permutation test will allow us to perform tests without any dependence of sample size or variance assumption.

If you liked the explanation , follow me for more! Feel free to leave your comments if you have any queries or suggestions.

You can also check out other articles written around data science, computing on medium. If you like my work and want to contribute to my journey, you cal always buy me a coffee :)

Reference

[1] https://en.wikipedia.org/wiki/Permutation_test

[2] https://towardsdatascience.com/how-to-use-permutation-tests-bacc79f45749

Search This Blog

Indie Quant

Hypothesis Testing Series - An End to End Guide to Permutation Tests - Part 2

An End to End Guide to Permutation Tests | Hypothesis Testing Series #2

End to End Guide to Hypothesis Testing in Statistics: Concepts, Methods, and Examples

Some experiments, such as testing efficacy of a drug or if a new feature in an app had an impact on app downloads…

Introduction

Problem Statement

The 5 Most Popular Regression Techniques Explained

Regression is a widely used method for modeling relationships among variables. Different regression techniques can suit…

Code & Explanation

What if analysis:

Introduction to BIRCH Clustering & Python Implementation

Introduction to Clustering & need for BIRCH

Use Cases for Permutation Test & Its Advantage

Reference

Comments

Post a Comment

Popular Posts

Missing Character Prediction in Words with BiLSTM and Attention

Handling Overfitting in Machine Learning

The 5 Most Popular Regression Techniques

Hypothesis Testing Series - An End to End Guide to Bayesian Hypothesis Tests - Part 3

Text Classification Using Recurrent Neural Networks

How I Created Animated Choropleth Map and Running Bar Plot using Python

The Power of Vectorization in Python Data Operations

Deep Convolutional Generative Adversarial Networks

Google’s Willow: So What’s the Deal with This Quantum Computer, Anyway?