Type 3 Sum of Squares with StatsModels
For an easy primer on the differences between the types of sum of squares,
see here.
The code that is used in the examples is for R, however the explanation
is clear.
Unlike Researchpy,
in order to get the correct Type 3 sum of square calculations, one needs
to enter the formula a bit differently. It's not anything major, but something
that has to be known otherwise the results (without this step) are incorrect.
Let's get to it.
StatsModels ANOVA with Type 3 Sum of Squares
Will use a data set from Stata called systolic that is accessible a few ways.
One way is to load it via Stata's website it's self, however since this demonstration
is for StatsModels, this demonstration will use StatsModels' method.
Now to load the required libraries and the data.
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
stata = sm.datasets.webuse('systolic')
stata.head()
|
drug |
disease |
systolic |
0 |
1 |
1 |
42 |
1 |
1 |
1 |
44 |
2 |
1 |
1 |
36 |
3 |
1 |
1 |
13 |
4 |
1 |
1 |
19 |
Now to run the ANOVA with Type 3 sum of squares using StatsModels.
model = ols('systolic ~ C(drug, Sum) + C(disease, Sum) + C(drug, Sum):C(disease, Sum)', data=stata).fit()
aov_table = sm.stats.anova_lm(model, typ=3)
aov_table
|
sum_sq |
df |
F |
PR(>F) |
Intercept |
20037.613011 |
1.0 |
181.413788 |
1.417921e-17 |
C(drug, Sum) |
2997.471860 |
3.0 |
9.046033 |
8.086388e-05 |
C(disease, Sum) |
415.873046 |
2.0 |
1.882587 |
1.637355e-01 |
C(drug, Sum):C(disease, Sum) |
707.266259 |
6.0 |
1.067225 |
3.958458e-01 |
Residual |
5080.816667 |
46.0 |
NaN |
NaN |
That's all it takes! Now the sum of squares are being calculated as they should.