Three Sigma Rule To Remove Outliers

Ulf Hamster 1 min.
python outlier three sigma rule

Load Packages

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Generate Data with Outliers

feature = np.random.normal(123, 45, 10000)
feature = np.append(feature, [1e9])
plt.hist(feature, 30)
plt.show()

png

Three Sigma Rule

Remove outlier if $x_i$ is outside $n\sigma_X$ range.

def remove_threesigma_outliers(x, n=3.0):
    m = np.mean(x)
    s = np.std(x)
    return [a for a in x if(m - n*s < a < m + n*s)]
# remove
filtered = remove_threesigma_outliers(feature)
plt.hist(filtered, 30)
plt.show()

png