Parametric assumptions
Parametric tests have the same assumptions, or conditions, that need
to be met in order for the analysis to be considered reliable.
Parametric test assumptions
- Independence
- Population distributions are normal
- Samples have equal variances
It is best to check the assumptions in the order above since some equal variance tests are sensitive to the distribution being normal.
Independence
This assumption is checked during the setup of the study. It means that
each observation is independent of another; if there are 2 or more groups
being compared, then it refers to that fact that groups are mutually
exclusive, i.e. each individual belongs to only 1 group; and that the
data is not repeated over time.
If data is repeated measures, longitudinal/panel or time series, there
are appropriate methods to account for the repeated data points from
individuals. This is typically handled in the math of the statistical
test that is designed to analyze this type of data.
Normality
The normality assumption is applied differently depending on the statistical method being used. For example, it applies to the shape of the sampling distribution for the dependent variable (outcome variable) if it's a univariate test, to the difference scores if it's mean comparison (ex. independent sample t-test or paired t-test), or to the residuals if it's a regression framework. This assumption can be checked with a formal test or graphically. There are a few formal tests that can be used to check this assumption. Before that gets touched, the Central Limit Theorem needs to be discussed. Ott and Longnecker (2010, p.188) provide this overview about the Central Limit Theorem:
"Central Limit Theorem for $\bar{y}$
Let $\bar{y}$ denote the sample mean computed from a random sample of n measurements from a population having mean $\mu$, and finite standard deviation $\sigma$. Let $\mu_\bar{y}$ and $\sigma_\bar{y}$ denote the mean and standard deviation of the sampling distribution or $\bar{y}$, respectively. Based on repeated samples of size n from the population, we can conclude the following:
- $\mu_\bar{y} = \mu$
- $\sigma_\bar{y} = \sigma$
- When n is large, the sampling distribution of $\bar{y}$ will be approximately normal (with the approximation becoming more precise as n increases)
- When the population distribution is normal, the sampling distribution of $\bar{y}$ is exactly normal for any sample size n
No specific shape was required for these measurements for the Central Limit Theorem to be validated."
Great, what does it mean though? It means that if the sample size is large enough then normality may not be a concern even if the test for normality indicates that normality is not present. Simulation studies have been conducted and the general rule of thumb is n ≥ 30. However, if it's clear that there is heavy skewness or kurtosis, one might consider transforming the data and/or using a non-parametric statistical test. Now let's test the normality assumption.
Normality tests
There are many out there to choose from.
These include Shapiro-Wilk test, Kolmogorov-Smirnov (K-S) test, Lilliefors
corrected K-S test, Anderson-Darling test,
D'Agostino skewness test, D'Agostino-Pearson omnibus test, and the
Jarque-Bera test. This section will not cover all of these, but will
demonstrate common normality tests. Additionally, normality tests
should be accompanied by a visual test as well; when the sample size
increases all statistical tests increase in power to detect smaller differences.
Yap and Sim (2011)
is a nice peer-reviewed article (and it's free) for review about various
normality tests.
Shapiro-Wilk (SW) test
The null hypothesis is that the data is normal. The Shapiro-Wilk test is an omnibus test (D'Agostino, 1971). It evaluates normality by comparing the data's distribution (values ordered) to the hypothesized normal distribution (Shapiro & Wilk, 1965). This can be completed in Python by using the shapiro() method from Scipy.stats.
# Coming from independent t-test framework import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") sampling_difference = df['bp_after'][df['sex'] == 'Male'].values - \ df['bp_after'][df['sex'] == 'Female'].values stats.shapiro(sampling_difference)
# From regression or ANOVA framework import pandas as pd import scipy.stats as stats import statsmodels.formula.api as smf df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") model = smf.ols("bp_after ~ C(sex) + C(agegrp)", data= df).fit() stats.shapiro(model.resid)
Unfortunately it's not labelled, but the output is (W test statistic, p-value).
D'Agostino-Pearson K2 Test
The null hypothesis is that the data is normal. The D'Agostino-Pearson test is an omnibus test and evaluates normality by assessing the skewness and kurtosis (D'Agostino, 1971).
# Coming from independent t-test framework import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") sampling_difference = df['bp_after'][df['sex'] == 'Male'].values - \ df['bp_after'][df['sex'] == 'Female'].values stats.normaltest(sampling_difference)
# From regression or ANOVA framework import pandas as pd import scipy.stats as stats import statsmodels.formula.api as smf df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") model = smf.ols("bp_after ~ C(sex) + C(agegrp)", data= df).fit() stats.normaltest(model.resid)
Kolmogorov-Smirnov (KS) test
The null hypothesis is that the data is normal (matches compared distribution). The Kolmogorov-Smirnov
test is a distance test (D'Agostino, 1971). It evaluates normality by comparing the data's empirical distribution function to
the expected cumulative distribution function of the comparison distribution (Öztuna D., Elhan A., & Tüccar, 2006).
If testing the difference between two groups, then this can be completed
using the ks_2samp()
method from Scipy.stats.
# Coming from independent t-test framework import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") stats.ks_2samp(df['bp_after'][df['sex'] == 'Male'], df['bp_after'][df['sex'] == 'Female'])
If coming from a regression or ANOVA approach then the 1 sample version of the test can be ran on the residuals using the kstest() method from Scipy.stats.
# From regression or ANOVA framework import pandas as pd import scipy.stats as stats import statsmodels.formula.api as smf df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") model = smf.ols("bp_after ~ C(sex) + C(agegrp)", data= df).fit() stats.kstest(model.resid, 'norm')
Anderson-Darling
The null hypothesis is that the data is normal (matches compared distribution).
The Anderson-Darling test assess normality by comparing the data's empirical
distribution function to a specified distribution's cumulative distribution
function (Stephen, 1974). This can be completed using the
anderson()
method from Scipy.stats.
A note about this method, it returns 3 parts: the test statistic, an
array of significance levels, and an array of corresponding critical values.
The test statistic needs to be compared to the critical value for the
significance level of interest- this is most commonly set to 0.05.
# Coming from mean comparison framework (independent sample t-test) import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") sampling_difference = df['bp_after'][df['sex'] == 'Male'].values - \ df['bp_after'][df['sex'] == 'Female'].values result = stats.anderson(sampling_difference) print(f'Test statistic: {result.statistic: .4f}') for i in range(len(result.critical_values)): sig, crit = result.significance_level[i], result.critical_values[i] if result.statistic < result.critical_values[i]: print(f"At {sig}% significance,{result.statistic: .4f} <{result.critical_values[i]: .4f} data looks normal (fail to reject H0)") else: print(f"At {sig}% significance,{result.statistic: .4f} >{result.critical_values[i]: .4f} data does not look normal (reject H0)")
# From regression or ANOVA framework import pandas as pd import scipy.stats as stats import statsmodels.formula.api as smf df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") model = smf.ols("bp_after ~ C(sex) + C(agegrp)", data= df).fit() result = stats.anderson(model.resid) print(f'Test statistic: {result.statistic: .4f}') for i in range(len(result.critical_values)): sig, crit = result.significance_level[i], result.critical_values[i] if result.statistic < result.critical_values[i]: print(f"At {sig}% significance,{result.statistic: .4f} <{result.critical_values[i]: .4f} data looks normal (fail to reject H0)") else: print(f"At {sig}% significance,{result.statistic: .4f} >{result.critical_values[i]: .4f} data does not look normal (reject H0)")
Graphical Methods
There are a few different plots that can be used to check the assumption of normality. There are probability plot, the p-p plot, q-q plot, and histogram.
Probability plot
The probability plot plots the ordered data against the theoretical distribution.
The better the ordered data fits the theoretical distribution, the less deviation
there will be from the fit line that is in the middle of the graph.
The theoretical distribution can be any of them, but there is a special case
if the theoretical distribution in question is the Gaussian (normal) distribution.
It's special in the sense that it's been studied so much that their is a
correlation coefficient that can be used to test normality along side
the graph. Filliben (1975) conducted this initial test and provided
critical values based on the sample size and $\alpha$ level. The table
can be found online within the NIST/SEMATECH e-handbook of statistical methods.
This can be conducted using probplot() from Scipy.stats.
# Coming from mean comparison framework (independent sample t-test) import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") sampling_difference = df['bp_after'][df['sex'] == 'Male'] - \ df['bp_after'][df['sex'] == 'Female'] fig = plt.figure(figsize= (10, 10)) ax = fig.add_subplot(111) normality_plot, stat = stats.probplot(sampling_difference, plot= plt, rvalue= True) ax.set_title("Probability plot of sampling difference \n with R value") ax.set plt.show()
# From regression or ANOVA framework import pandas as pd import scipy.stats as stats import statsmodels.formula.api as smf df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") model = smf.ols("bp_after ~ C(sex) + C(agegrp)", data= df).fit() fig = plt.figure(figsize= (10, 10)) ax = fig.add_subplot(111) normality_plot, stat = stats.probplot(model.resid, plot= plt, rvalue= True) ax.set_title("Probability plot of regression residuals \n with R value") ax.set plt.show()
The next step would be to reference the table and compare the calculated r-value to the r-value in the table. If the calculated r-value is less than the table r-value you reject the null hypothesis of normality.
What if Normality is Violated?
If the assumption of normality is violated fear not! There are a few routes to consider, these are in no particular order:
- Rely on the Central Limit Theorem if the sample size is large enough ($\ge 30$)
- Use a non-parametric statistical test
- Transform the data
Homogeneity of Variances
As with testing the assumption of normality, there are a few statistical tests
available to test the assumption of equal variances. Some common methods are
the Barlett test and Levene's test for equality of variances.
Choosing the correct test is also dependent on
the assumption of normality. For example, Barlett's test has been found to
be sensitive to departures from normality whereas Levene's test is less
sensitive to this (Conover, 1981).
It doesn't matter if one is coming from a 2 sample approach or a regression/ANOVA
approach, the test steps are the same. One needs to test for equality of
variance for each categorical variable on the outcome.
Equality of Variance Tests
Barlett's test
Barlett's test is used to test if the groups, which can be referred to
as k, have equal variances. Barlett's test can
test for equality between 2 or more groups.
This can be done with the bartlet
method from scipy.stats.
import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") stats.bartlett(df['bp_after'][df['sex'] == 'Male'], df['bp_after'][df['sex'] == 'Female'])
Levene's test
Levene's test tests if the different goups have equal variances (Levene, 1960).
Levene's test is less sensitive than Barlett's test to departures from normality
and power (Conover, 1981). Levene's test of homogeneity of variances can
test for equality between 2 or more groups. The original suggestion was
to use the mean, Brown and Forsythe (1974) provided an expansion and
testing can also be calculated using the median or the trimmed mean which
have been found to be robust under nonnormality.
This can be done with the levene
method from scipy.stats.
import pandas as pd import scipy.stats as stats df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv") # Using the mean stats.levene(df['bp_after'][df['sex'] == 'Male'], df['bp_after'][df['sex'] == 'Female'])
# Using the median stats.levene(df['bp_after'][df['sex'] == 'Male'], df['bp_after'][df['sex'] == 'Female'], center= "median")
# Using the trimmed mean (Default is to cut 10% total - 5% from each tail) stats.levene(df['bp_after'][df['sex'] == 'Male'], df['bp_after'][df['sex'] == 'Female'], center= "trimmed")
What if homogeneity of variances is violated?
If there is not equal variances between groups there are a few routes to consider, these are in no particular order:
- Use a non-parametric statistical test
- Transform the data
Data Transformation
If the assumption(s) of normality or homogeneity of variances are not met then a route to consider is transforming the data. If this is the route that is decided on to try and correct not meeting one or both of the assumptions don't forget to back-transform the data for reporting and interpreting. Some data transformations effect normality and variances differently so different transformations should be tested. There are of course, common transformations that are used.
Transformation | Effect | Back Transformation |
---|---|---|
Log (numpy.log()) | Taking the logarithm ($\log(x)$) of a value effects larger values more than smaller values. This is useful if the distribution has a positive skew because it brings larger values closer to the center. However, one cannot take the log of a negative number or of 0. If the data has negative numbers or 0s then a constant needs to be added to all values - don't forget to subtract the constant when back-transforming. | numpy.exp() |
Square-root (numpy.sqrt()) | Taking the square-root ($\sqrt{x}$) of a value effects larger values more than smaller numbers. This is useful if the distribution has a positive skew because it brings larger values closer to the center. However, one cannot take the square-root of negative numbers. If the data has negative numbers then a constant needs to be added to all values - don't forget to subtract the constant when back-transforming. | numpy.square() |
Reciprocal (numpy.reciprocal()) | Taking the reciprocal ($\frac{1}{x}$) of a value effects larger values more than smaller values and flips the scale in the process by making larger values closer to 0, ex. 1/1= 1 and 1/10= 0.1. This transformation is useful if the distribution has a positive skew. | There is not a built-in method to get the inverse of the reciprocal
to convert the value back to the original scale - a lambda is a
good way to do this if back-transforming a series or array.
variable = [x**-1 for x in variable_reciprocal_transformed]If back-transforming a single value use x**-1 |
References
- Brown, M. B., and Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367
- Conover, W. J., Johnson, M. E., and Johnson, M. M. (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23(4), 351-361.
- D'Agostino, R. (1971). An omnibus test of normality for moderate and large size samples. Biometrika, 58(2), 341
- Filliben, J. J. (1975). The probability plot correlation coefficient test for normality. Technometrics, 17(1), 111-117
- Levene, H. (1960). Contributions to probability and statistics: essays in honor of harold hotelling. I. Olkin (Ed.). Stanford, CA: Standford University Press.
- Rosner, B. (2015). Fundamentals of Biostatistics (8th). Boston, MA: Cengage Learning.
- Ott, R. L., and Longnecker, M. (2010). An introduction to statistical methods and data analysis. Belmon, CA: Brooks/Cole.
- Öztuna D., Elhan A., and Tüccar (2006). Investigation of four different normality tests in terms of type 1 error rate and power under different distributions. Turkish Journal of Medical Sciences, 36(3), 171–6.
- Shapiro, and Wilk (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591-611
- Stephens, M. A. (1974). EDF Statistics for goodness of fit and some comparisons. Journal of the American Statistical Association, 69(347), 730-737
- Yap. B. W., and Sim (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81(12), 2141-2155 doi:10.1080/00949655.2010.520163