Mixed Effect Regression
If you are looking for how to run code jump to the
next section or if you would like some
theory/refresher then start with this section.
What is mixed effects regression? Mixed effects regression is an
extension of the general linear model (GLM) that takes into account
the hierarchical structure of the data. Mixed effect models are also
known as multilevel models, hierarchical models, mixed models
(or specifically linear mixed models (LMM)) and are appropriate
for many types of data such as clustered data, repeated-measures data,
longitudinal data, as well as combinations of those three. Since mixed
effects regression is an extension of the general linear model, let's recall
that the general linear regression equation looks like:
$$
y = \underbrace{X\beta}_\textrm{Fixed effects} + \underbrace{\epsilon}_\textrm{error term}
$$
Where,
- X is a N x p design matrix that the observed values for each individual (N) for each independent variable in the model (p)
- $\beta$ is a p x 1 column vector which contains the regression coefficients for each independent variable in the model (p)
- $\epsilon$ is a N x 1 column vector which contains the errors (residuals) of the model
The mixed effects model is an extension and models the random effects of a clustering variable. Mixed models can model variation around the intercept (random intercept model), around the slope (random slope model), and around the slope (random intercept and slope model). The linear mixed model is: $$ y = \underbrace{X\beta}_\textrm{Fixed effects} + \underbrace{Zu}_\textrm{Random effects} + \underbrace{\epsilon}_\textrm{error term} $$ Where,
- Z is a N x q design matrix that contains the observed values for each individual (N) for each covariate (q) of the random effects
- u is a q x 1 vector that contains the random effects of the q covariates in matrix Z
- $\gamma$ represents the fixed effects plus random effects
- the subscripts indicate a value for ith observation of the jth grouping level of the random effect
Before unpacking the different types of mixed effect models, understanding some terminology will be beneficial.
Terminology and Concepts
Hierarchical Structure
When talking about the structure of the data in mixed effects models, the hierarchical organization of it's components are often called "levels" or "clusters". With each higher level being another grouping/clustering variable, for example:
- Level 1: is the lowest level of the hierarchy which is the unit of analysis, i.e. person, company, medications, etc. In the mixed effects model equation above, this is the "i" in the equation.
- Level 2: is the next lowest level of the data hierarchy where all units of anlaysis from level 1 are clustered into groups. In the mixed effects model equation above, this is the "j" in the equation.
- Level 3: is the next lowest level of the data hierarchy where the clusters of level 2 are clustered more. This high of a level is not modeled in the LMM equation above.
- Level N: is the next lowest level of the data hierarchy where the clusters of level N - 1 are clustered more. This high of a level is not modeled in the mixed effects model equation above.
Level 2 and higher are the random effects that are being modeled.
To help solidify this concept, imagine a study on student outcome measured by a standardized test. No curriculum nor guidlines were provided to the school districts on how to prepare for this test. The data for this study is a random sample collected from different schools which are in different school districts within the state of Michigan; each row in the data set represents a single student. In this example, level 1 would be the student, level 2 would be the school, and level 3 would be the school district. This would be a good model because it will model the variation that is present within each school and within each district. Modeling this variation will allow one to explore the school as it's own population and the district as it's own population.
Let's visualize the hierarchical structure. To keep the visualization simple, only one district will be displayed.
Fixed and Random Factors
West, Welch, and Gatecki (2015, p.9) provide a good definition of fixed-effects and random-effects "Fixed-effect parameters describe the relationships of the covariates to the dependent variable for an entire population, random effects are specific to clusters of subjects within a population." When West, Welch, and Gatecki (2015) talk about "relationships of the covariates to the dependent variable" the covariates are the independent variables in the model.
- Fixed factors are the independent variables that are of interest to the study, e.g. treatment category, sex or gender, categorical variable, etc.
- Random factors are the classification variables that the unit of anlaysis is grouped under, i.e. these are typically the variables that define level 2, level 3, or level n. These factors can also be thought of as a population where the unit measures were randomly sampled from, i.e. (using the school example from above) the school (level 2) had a random sample of students (level 1) scores selected to be used in the study.
Gelman & Hill (2007) make the case that the classification of a variable as a fixed effect or random effect will vary depending on the objective of the study and the analysis. How a variable is modeled changes what is being measured and what the model is assuming about the variable itself. Using the school study example, if one were to model the categorical school variable as a fixed effect that model would assume the group means (one from each school) are independent from each other; if the categorical school variable is modeled as a random effect (random intercept only) the model would assume that the schools measured are a sample of a larger population of schools.
Difference Between Mixed Effect Model Types
Earlier it was mentioned how mixed-effect models could be a random intercept model, random slope model, or a random intercept and random slope model. With the differences being the random intercept model allows different intercepts based on the clustering variable (graph A of the figure below) while the random slope model allows different slopes based on a variable, and while the random intercepts and random slopes model (graph B of the figure below) allows different intercepts based on a clustering variable and slopes based on a variable. These differences between the random intercept model and the random intercept and random slope model are visualized in the figure below; the figure is a slightly modified version from Harrison et. al (2018). For a visualization of all three models, check out this dynamic graphic.
Fixed and Random Factor Interpretations
The interpretation of the fixed factors are different from the interpretation of the random factors. Fixed effects are interpreted as one typiclly would and carry the assumption that the means are independent and they share the residual variance; while the random effects, the clustering classification variable (level 2, 3, n), assumes the variable means are a sample of a larger population that has it's own mean and variance (Harrison et. al, 2018). To expand this a bit more, the coefficient for a fixed effect models the unit change that occurs in the dependent variable with a 1 unit increase in the independent variable; the coefficient for a random intercept models the deviation from the intercept. The deviation can be displayed as variance or standard deviation, it all depends on the software being used. To demonstrate this, take a look at the pseduo-results below.
Parameter | Estimate | Std. Error | t | p-value | 95% Conf. Int. | |
---|---|---|---|---|---|---|
Intercept | 10.02 | 1.15 | 12.67 | 0.0004 | 10.80 | 16.28 |
Sex | ||||||
Male | -4.3 | 1.23 | -4.65 | 0.0498 | -8.80 | -1.31 |
Fixed Effect Table |
The fixed effects interpretation are the same as from a typical regression model. Using the completly fake results table, males scored 4.3 points lower on the standardized test than females.
Parameter | Estimate | Std. Error | 95% Conf. Int. | |||
---|---|---|---|---|---|---|
Intercept (Subject=School) Variance | 3.24 | 0.45 | 2.24 | 4.24 | ||
Intercept (Subject=District) Variance | 25.43 | 0.81 | 20.21 | 30.64 | ||
Random Effects Table |
The fake random effects result table would indicate that the intercept variance is estimated to be 3.24 for school which means the standard deviation is 1.8. Thus, one can say that for any given school they will have their own intercept of ± 1.8 of the constant (10.02) 68% of the time and ± 3.6 95% of the time.
Excellent, the information of the random effects tells the researcher the varability around the intercept based on the different levels of the cluster variable. When predicting results, or conducting post-hoc anlayses, the estimates will use this information by incorporating the variance from the levels specified (different intercept for every sub-level of specified random effect level) to provide better estimates for the unit of analysis, e.g. school A, B, C, ..., Z will have their own intercept used for each student that came from that school.
For a good opensource peer-reviewed article that provides an overview of mixed effect models, see Harrison et al. (2018).
Mixed-effect regression test assumptions
- Independence of errors
- Equal variance of errors
- Normality of errors
Maximum likelihood estimation (ML) and restricted maximum likelihood (REML) are commonly used to estimate the mixed-effect model in conjuction with an optimization algorithm.
Testing Hypothesis for multiple fixed-effects (could use an F-test or the omnibus Wald test)
- $H_0 : L\beta = 0$
- $H_A : L\beta \neq 0$
Testing Hypothesis for single fixed-effects
- $H_0 : \beta = 0$
- $H_A : \beta \neq 0$
Likelihood Ratio Tests for covariance parameters (REML estimation should be used)
Assuming that both the nested and reference models have the same fixed-effects
but different covariance parameters.
Compute the positive difference between the reference and nested models'
-2 REML log-likelihood then look up the p-value based on the appropriate
$\chi^2$ distribution.
- $H_0 : s^2_{\text{random effect}} = 0$
- $H_A : s^2_{\text{random effect}} > 0$
One rejects the null hypothesis if the calculated testing statistic is less than (<) the critical value at the specified significant p-value.
Mixed Effects Model Linear Regression with Python
Don't forget to check the assumptions before interpreting the results! First to load the libraries and data needed. Below, pandas, researchpy, statsmodels, scipy.stats, and the data set will be loaded.
The data used in this example is from Jose Pinheiro and Doug Bates, and is title "Rat Pup" from their book Mixed-Effects Models in S and S-Plus (2000). The objective of the study was to compare the birth weights of rat pups from different litters. In these litters, some female mothers received an experimental treatment with groups of "high dose", "low dose", and "control". The original design had 10 female rats assigned to each treatment condition but unfortunately 3 female rats died in the "high dose" treatment so the study design is unbalanced.
In this study, the litter will be the clustering variable (level-2) and the rat pup will be unit of analysis (level-1).
import pandas as pd
import researchpy as rp
import statsmodels.api as sm
import scipy.stats as stats
df = pd.read_csv("http://www-personal.umich.edu/~bwest/rat_pup.dat", sep = "\t")
df.info()
Let's look at the variables in the data set.
Let's get a better understanding of the variables.
rp.codebook(df)
Now to take a look at the weight of the rat pups based on their sex and the treatment group.
rp.summary_cont(df.groupby(["treatment", "sex"])["weight"])
N | Mean | SD | SE | 95% Conf. | Interval | ||
---|---|---|---|---|---|---|---|
treatment | sex | ||||||
Control | Female | 54 | 6.116111 | 0.685118 | 0.093233 | 5.931659 | 6.300563 |
Male | 77 | 6.471039 | 0.753788 | 0.085902 | 6.301567 | 6.640511 | |
High | Female | 32 | 5.851562 | 0.600189 | 0.106099 | 5.640280 | 6.062845 |
Male | 33 | 5.918485 | 0.690906 | 0.120271 | 5.679098 | 6.157871 | |
Low | Female | 65 | 5.837538 | 0.450496 | 0.055877 | 5.727167 | 5.947910 |
Male | 61 | 6.025082 | 0.380340 | 0.048698 | 5.928843 | 6.121321 |
It appears that the treatment has a negative effect on weight. The mean birth weight of both females and males in both the "High" and "Low" dose groups are lower compared to the control. One thing that may not be represented here is the effect of the litter (if there is one).
Let's visualize the distribution of weight by treatment group and sex. For this, the built-in Pandas boxplot method will be used. Seaborn and Matplotlib are other great graphing libraries for Python. Both Pandas and Seaborn use Matplotlib as their base package so extra features or methods from Matplotlib may be able to be passed to Pandas/Seaborn dependent on the method.
boxplot = df.boxplot(["weight"], by = ["treatment", "sex"],
figsize = (16, 9),
showmeans = True,
notch = True)
boxplot.set_xlabel("Categories")
boxplot.set_ylabel("Weight")
boxplot.figure.savefig("...\\boxplot.png")
In the event that one is not experienced with reading boxplots, a quick note. The line that goes across the box is the median value and the triangle is the mean value. The notched boxes in this case show the confidence interval of the median. Any symbol that is above/below the whisker of the box is being indicated as a potential outlier.
Looking at the boxplot, the distribution and variances of the birth weights look fairly similar between sex within each treatment group, but the variance looks different across the treatments. The boxplot also indicates that there may outliers present.
Linear Mixed-Effects Regression using StatsModels
NOTE
StatsModels formula api uses Patsy to handle passing the formulas. The pseudo code looks like the following:
smf.mixedlm("dependent_variable ~ independent_variable1 + independent_variable2 + independent_variablen",
data = df).fit()
To tell the model that a variable is categorical, it needs to be wrapped in C(independent_variable). The pseudo code with a categorical independent variable looks like:
smf.mixedlm("dependent_variable ~ independent_variable1 + C(independent_variable2)",
data = df).fit()
By default, Patsy chooses the first categorical variable as the reference category; it's possible to change the reference category if desired. In order to do this, one needs to specify the reference category while one is specifying the variable is a categorical variable. Pseduo code is as follows:
smf.mixedlm("dependent_variable ~ independent_variable1 + C(independent_variable2, Treatment(categorical_group))",
data = df).fit()
Where categorical_group is the desired reference group.
First, one needs to import the package; the official documentation for this method of the package can be found here.
import statsmodels.formula.api as smf
Random Intercept Model
Now to fit a random intercept model, recall that this type of model allows for different clusters (a group) to have different intercepts.
model = smf.mixedlm("weight ~ litsize + C(treatment) + C(sex, Treatment('Male')) + C(treatment):C(sex, Treatment('Male'))",
df,
groups= "litter").fit()
model.summary()
Model: | MixedLM | Dependent Variable: | weight |
No. Observations: | 322 | Method: | REML |
No. Groups: | 27 | Scale: | 0.1635 |
Min. group size: | 2 | Likelihood: | -200.5522 |
Max. group size: | 18 | Converged: | Yes |
Mean group size: | 11.9 |
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 8.323 | 0.273 | 30.447 | 0.000 | 7.788 | 8.859 |
C(treatment)[T.High] | -0.906 | 0.192 | -4.727 | 0.000 | -1.282 | -0.530 |
C(treatment)[T.Low] | -0.467 | 0.158 | -2.952 | 0.003 | -0.777 | -0.157 |
C(sex, Treatment('Male'))[T.Female] | -0.412 | 0.073 | -5.625 | 0.000 | -0.555 | -0.268 |
C(treatment)[T.High]:C(sex, Treatment('Male'))[T.Female] | 0.107 | 0.132 | 0.811 | 0.417 | -0.151 | 0.366 |
C(treatment)[T.Low]:C(sex, Treatment('Male'))[T.Female] | 0.084 | 0.106 | 0.794 | 0.427 | -0.123 | 0.291 |
litsize | -0.128 | 0.019 | -6.845 | 0.000 | -0.165 | -0.092 |
litter Var | 0.097 | 0.084 |
The "litter Var" is the random effects of the cluster variable. This models the variation that is present between the litters. One can convert variance to standard deviation by taking the square root, this means that on average the litter weight can vary about 0.311 lbs ($\sqrt{0.097} = 0.311$).
The intercation between treatment and sex is non-significant, therefore this will be removed from the model and re-ran.
# Random Intercept Model w/out Interaction Term
model = smf.mixedlm("weight ~ litsize + C(treatment) + C(sex, Treatment('Male'))", df, groups= "litter").fit()
model.summary()
Model: | MixedLM | Dependent Variable: | weight |
No. Observations: | 322 | Method: | REML |
No. Groups: | 27 | Scale: | 0.1628 |
Min. group size: | 2 | Likelihood: | -198.4997 |
Max. group size: | 18 | Converged: | Yes |
Mean group size: | 11.9 |
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 8.310 | 0.274 | 30.355 | 0.000 | 7.773 | 8.846 |
C(treatment)[T.High] | -0.859 | 0.182 | -4.722 | 0.000 | -1.215 | -0.502 |
C(treatment)[T.Low] | -0.429 | 0.150 | -2.849 | 0.004 | -0.723 | -0.134 |
C(sex, Treatment('Male'))[T.Female] | -0.359 | 0.048 | -7.540 | 0.000 | -0.452 | -0.266 |
litsize | -0.129 | 0.019 | -6.863 | 0.000 | -0.166 | -0.092 |
litter Var | 0.097 | 0.085 |
Another useful statistic to generate is the intraclass correlation coefficient (ICC) which measures the similarity of the responses within a random effect; in the current model with 1 cluster variable, this would be the similarity in weights within the litter. The ICC can range between 0 - 1 where 1 indicates perfect relationship within the clusters. To calculate the ICC one takes the variance of the clustering variable (0.097) and divides it by the unexplained variance of the model (0.1628). The unexplained variance of the model can be found in the top right of the upper table labeled "scale". $$\frac{0.097}{0.1628} = 0.5958$$ This indicates that there is a moderate level of correlation between the weights within a litter.
Random Slope Model
Random Slope Model: No correlation between random intercept and slopes
Using the same data, a random slope model will be fit where the random intercepts and slopes are independent. In this model, a random slope will be allowed for the sex.
# Random Slope Model: Random intercepts and slopes are independent
model2 = smf.mixedlm("weight ~ litsize + C(treatment) + C(sex)", df, groups= "litter",
vc_formula = {"sex" : "0 + C(sex)"}).fit()
model2.summary()
Model: | MixedLM | Dependent Variable: | weight |
No. Observations: | 322 | Method: | REML |
No. Groups: | 27 | Scale: | 0.1565 |
Min. group size: | 2 | Likelihood: | -205.7020 |
Max. group size: | 18 | Converged: | Yes |
Mean group size: | 11.9 |
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 7.874 | 0.229 | 34.380 | 0.000 | 7.425 | 8.323 |
C(treatment)[T.High] | -0.839 | 0.144 | -5.827 | 0.000 | -1.121 | -0.557 |
C(treatment)[T.Low] | -0.411 | 0.116 | -3.526 | 0.000 | -0.639 | -0.182 |
C(sex)[T.Male] | 0.356 | 0.102 | 3.508 | 0.000 | 0.157 | 0.555 |
litsize | -0.124 | 0.016 | -7.915 | 0.000 | -0.155 | -0.093 |
sex Var | 0.104 | 0.076 |
The estimated variance for random sex slopes is 0.104 which is a standard deviation of 0.322. This indicates that from cluster to cluster, i.e. between litters, the sex slopes fluctuate by ± 0.322 - 0.644 (1-2 standard deviations).
Random Slope Model: Correlation between random intercept and slopes
Using the same data, a random slope model will be fit where the random intercepts and slopes are correlated. In this model, a random slope will be allowed for the sex.
# Random Slope Model: Random intercepts and slopes are correlated
model3 = smf.mixedlm("weight ~ litsize + C(treatment) + C(sex)", df, groups= "litter",
re_formula = "1 + C(sex)").fit()
model3.summary()
Model: | MixedLM | Dependent Variable: | weight |
No. Observations: | 322 | Method: | REML |
No. Groups: | 27 | Scale: | 0.1601 |
Min. group size: | 2 | Likelihood: | -196.3946 |
Max. group size: | 18 | Converged: | Yes |
Mean group size: | 11.9 |
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 7.806 | 0.263 | 29.672 | 0.000 | 7.291 | 8.322 |
C(treatment)[T.High] | -0.778 | 0.172 | -4.523 | 0.000 | -1.116 | -0.441 |
C(treatment)[T.Low] | -0.387 | 0.137 | -2.821 | 0.005 | -0.656 | -0.118 |
C(sex)[T.Male] | 0.359 | 0.053 | 6.783 | 0.000 | 0.255 | 0.463 |
litsize | -0.120 | 0.018 | -6.637 | 0.000 | -0.155 | -0.084 |
litter Var | 0.063 | 0.074 | ||||
litter x C(sex)[T.Male] Cov | 0.028 | 0.044 | ||||
C(sex)[T.Male] Var | 0.013 | 0.062 |
When including random intercepts and random slopes it's useful to calculate the estimated correlation coefficient between the random intercepts and random slopes. This is calculated by taking the co-varince of the random intercept (clustering variable) and random slope (random slope variable) and dividing it by the product of the variance of the random intercept and the variance of the random slope variable. $$\frac{\text{Cov(random intercept variable, random slope variable)}}{\sqrt{\text{random intercept variance * random slope variance}}}$$ $$\frac{0.028}{\sqrt{0.063 * 0.013}} = 0.9784$$ This indicates that the litters with higher weight tend to have males that are of higher weight.
Assumption Check
Linear mixed effect models have the same assumptions as the traditional standard linear regression model. For the model diagnostics, the first model will be used which was a random intercept model with the clustering variable being litter.
Normality
One can check for normality of the residuals graphically, a formal statistical test, or a combination of the two. If the overal sample is large one should use both since as the sample size increases so does the power of the tests, meaning that the larger the sample size the smaller of a difference the formal statistical test will detect and a significant p-value will be provided.
First let's visualize the residuals with a kernal density estimate plot and Q-Q plot.
fig = plt.figure(figsize = (16, 9))
ax = sns.distplot(model.resid, hist = False, kde_kws = {"shade" : True, "lw": 1}, fit = stats.norm)
ax.set_title("KDE Plot of Model Residuals (Blue) and Normal Distribution (Black)")
ax.set_xlabel("Residuals")
## Q-Q PLot
fig = plt.figure(figsize = (16, 9))
ax = fig.add_subplot(111)
sm.qqplot(model.resid, dist = stats.norm, line = 's', ax = ax)
ax.set_title("Q-Q Plot")
There is some deviation from normality, but it doesn't look concerning. There is an outlier, which is evident in both of the plots. How to handle the outlier is up to the researcher.
Now to formally test for normality using Shapir-Wilk test of normality.
labels = ["Statistic", "p-value"]
norm_res = stats.shapiro(model.resid)
for key, val in dict(zip(labels, norm_res)).items():
print(key, val)
The test is significant which indicates that the assumption of normality for the residuals is violated. This would suggest that the model could be adjusted to meet this assumption. Common techniques include transform variables, remove outliers, use a non-parametric approach, or rely on the central limit theorem. Transformations and the backtransformations are covered on parametric assumptions page.
Homoskedasticity of Variance
Just as with checking for normality, one can check for homoskedasticity of variance visually, with a formal test, or both.
First to check for homoskedasticity visually, this can be done with an RVF Plot (residuals versus fitted values) and a boxplot.
fig = plt.figure(figsize = (16, 9))
ax = sns.scatterplot(y = model.resid, x = model.fittedvalues)
ax.set_title("RVF Plot")
ax.set_xlabel("Fitted Values")
ax.set_ylabel("Residuals")
fig = plt.figure(figsize = (16, 9))
ax = sns.boxplot(x = model.model.groups, y = model.resid)
ax.set_title("Distribution of Residuals for Weight by Litter")
ax.set_ylabel("Residuals")
ax.set_xlabel("Litter")
Now to formally test this assumption with the White’s Lagrange Multiplier Test for Heteroscedasticity
from statsmodels.stats.diagnostic import het_white
het_white_res = het_white(model.resid, model.model.exog)
labels = ["LM Statistic", "LM-Test p-value", "F-Statistic", "F-Test p-value"]
for key, val in dict(zip(labels, het_white_res)).items():
print(key, val)
Based on the visual tests, it appears that there isn't much concern about violating the assumption of homoskedasticity of the variance; however the formal testing indicates this assumption is violated.
To address this, one could assess the model for outliers, or transform variables.
References
Gelman, A., Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.Harrison, X. A., Donaldson, L., Correa-Cano, M. E., Evans, J., Fisher, D., Goodwin, E.D. C., Robinson, B. S., Hodgson, D., & Inger, R. (2018). A Brief Introduction to Mixed Effects Modelling and Multi-Model Inference in Ecology. PeerJ, 6(e4794). https://doi.org/10.7717/peerj.4794
Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2004). Applied linear statistical models (5th ed.). New York, NY: McGraw-Hill Irwin.
Nezlek, J. B. (2011). Multilevel modeling for social and personality psychology. SAGE Publications Ltd.
West, B. T., Welch, K. B., & Gatecki, A. T. (2015). Linear mixed models: A practicle guide using statistical software (2ed edition). Taylor & Francis Group.