Different parameterizations for one factor ANOVA

Steve Simon

2022-07-06

I’m trying to understand some of the basics of contrasts in analysis of variance. In particular, I want to figure out some alternative parameterizations of the coefficients that are estimated as part of the ANOVA model.

library(MASS)
library(matlib)
## Warning: package 'matlib' was built under R version 4.1.3
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.7     v dplyr   1.0.9
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::select() masks MASS::select()
knitr::opts_chunk$set(echo = TRUE)

First, let’s set up an artificial dataset that is nice and easy to work with.

means <- c(52, 54, 62, 72)
mean(means)
## [1] 60
y <- 
  rep(means, each=3) +
  rep(c(0, 1, -1), 4)
g <- rep(1:4, each=3)
df <- cbind(g, y)

Notice the pattern in the group means.

data.frame(g, y) %>%
  group_by(g) %>%
  summarize(mean_y=mean(y))
## # A tibble: 4 x 2
##       g mean_y
##   <int>  <dbl>
## 1     1     52
## 2     2     54
## 3     3     62
## 4     4     72

I want to use the lm function to fit the data, because it is easier to peek at the underlying coefficients.

The simplest parameterization

The simplest parameterization is to let each beta coefficient represent a different mean. We have to drop the intercept term for this model to work properly.

x1 <- as.numeric(g==1)
x2 <- as.numeric(g==2)
x3 <- as.numeric(g==3)
x4 <- as.numeric(g==4)

This is what the X matrix looks like.

A <- cbind(x1, x2, x3, x4) 
A
##       x1 x2 x3 x4
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  0  1  0  0
##  [5,]  0  1  0  0
##  [6,]  0  1  0  0
##  [7,]  0  0  1  0
##  [8,]  0  0  1  0
##  [9,]  0  0  1  0
## [10,]  0  0  0  1
## [11,]  0  0  0  1
## [12,]  0  0  0  1

This is what the regression results look like.

beta_A <- coef(lm(y~x1+x2+x3+x4-1))
beta_A
## x1 x2 x3 x4 
## 52 54 62 72

It may help to view the underlying matrix calculations. Recall that

\(\hat{\beta}=(X'X)^{-1}X'Y\)

So the matrix

\((X'X)^{-1}X'\)

will tell you how each parameter in the regression model is a linear combination of the Y’s.

inv(t(A)%*%A)%*%t(A) %>% fractions
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1/3  1/3  1/3    0    0    0    0    0    0    0     0     0  
## [2,]   0    0    0  1/3  1/3  1/3    0    0    0    0     0     0  
## [3,]   0    0    0    0    0    0  1/3  1/3  1/3    0     0     0  
## [4,]   0    0    0    0    0    0    0    0    0  1/3   1/3   1/3

So the first coefficient is an average of the first three y values, the second is an average of the next three y values, etc.

Indicator variables

A common way to parameterize the ANOVA model is to use an intercept term and indicator variables for all but one of the groups.

x0 <- rep(1, 12)
x1 <- as.numeric(g==2)
x2 <- as.numeric(g==3)
x3 <- as.numeric(g==4)

This is what the X matrix looks like.

B <- cbind(x0, x1, x2, x3) 
B
##       x0 x1 x2 x3
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  1  1  0  0
##  [5,]  1  1  0  0
##  [6,]  1  1  0  0
##  [7,]  1  0  1  0
##  [8,]  1  0  1  0
##  [9,]  1  0  1  0
## [10,]  1  0  0  1
## [11,]  1  0  0  1
## [12,]  1  0  0  1

This is what the regression results look like.

beta_B <- coef(lm(y~x0+x1+x2+x3-1))
beta_B
## x0 x1 x2 x3 
## 52  2 10 20
inv(t(B)%*%B)%*%t(B) %>% fractions
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,]  1/3  1/3  1/3    0    0    0    0    0    0    0     0     0 
## [2,] -1/3 -1/3 -1/3  1/3  1/3  1/3    0    0    0    0     0     0 
## [3,] -1/3 -1/3 -1/3    0    0    0  1/3  1/3  1/3    0     0     0 
## [4,] -1/3 -1/3 -1/3    0    0    0    0    0    0  1/3   1/3   1/3

So the first coefficient is an average of the first three y values (Y1, Y2, Y3), the second is the difference between the average of the second three y values (Y4, Y5, Y6) and the average of the first three y values (Y1, Y2, Y3), etc.

In other words,

\(\hat{\beta}_0=\bar{Y}_1\)

\(\hat{\beta}_1=\bar{Y}_2-\bar{Y}_1\)

\(\hat{\beta}_2=\bar{Y}_3-\bar{Y}_1\)

\(\hat{\beta}_3=\bar{Y}_4-\bar{Y}_1\)

This seems counter intuitive. The slope coefficient for \(\beta_2\) involves a column with zeros everywhere except for some ones for Y4, Y5, and Y6. How could that coefficient end up involving Y1, Y2, and Y3?

The point to remember is that the slope coefficient in a linear regression with multiple independent variables has a slightly more complex interpretation than the slope coefficient in a linear regression with a single independent variable. In the later case, the interpretation would be

In the former case, the interpretation would be

So the slope associated with X1 requires you to hold X2 and X3 constant. You can hold them constant at 0 or hold them constant at 1, but the former is simpler to understand. Think of it as moving from one of the first three rows of your X matrix.

B[1:3, ]
##      x0 x1 x2 x3
## [1,]  1  0  0  0
## [2,]  1  0  0  0
## [3,]  1  0  0  0

to one of the next three rows of your X matrix

B[4:6, ]
##      x0 x1 x2 x3
## [1,]  1  1  0  0
## [2,]  1  1  0  0
## [3,]  1  1  0  0

This is effectively a change from the mean of the first three observations (Y1, Y2, Y3) subtracted from the mean of the next three observations (Y4, Y5, Y6).

Deviation coding

If you tried to code the difference in means directly, you get a third result.A common way to parameterize the ANOVA model is to use an intercept term and indicator variables for all but one of the groups.

x0 <- rep(1, 12)
x1 <- as.numeric(g==2) - as.numeric(g==1)
x2 <- as.numeric(g==3) - as.numeric(g==1)
x3 <- as.numeric(g==4) - as.numeric(g==1)

This is what the X matrix looks like.

D <- cbind(x0, x1, x2, x3) 
D
##       x0 x1 x2 x3
##  [1,]  1 -1 -1 -1
##  [2,]  1 -1 -1 -1
##  [3,]  1 -1 -1 -1
##  [4,]  1  1  0  0
##  [5,]  1  1  0  0
##  [6,]  1  1  0  0
##  [7,]  1  0  1  0
##  [8,]  1  0  1  0
##  [9,]  1  0  1  0
## [10,]  1  0  0  1
## [11,]  1  0  0  1
## [12,]  1  0  0  1

This is what the regression results look like.

beta_D <- coef(lm(y~x0+x1+x2+x3-1))
beta_D
## x0 x1 x2 x3 
## 60 -6  2 12
inv(t(D)%*%D)%*%t(D) %>% fractions
##      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11] [,12]
## [1,]  1/12  1/12  1/12  1/12  1/12  1/12  1/12  1/12  1/12  1/12  1/12  1/12
## [2,] -1/12 -1/12 -1/12   1/4   1/4   1/4 -1/12 -1/12 -1/12 -1/12 -1/12 -1/12
## [3,] -1/12 -1/12 -1/12 -1/12 -1/12 -1/12   1/4   1/4   1/4 -1/12 -1/12 -1/12
## [4,] -1/12 -1/12 -1/12 -1/12 -1/12 -1/12 -1/12 -1/12 -1/12   1/4   1/4   1/4

Okay, so what is this 1/4 doing here? There are only three data values in each group. You need to rewrite

1/4 = 1/3 - 1/12

Then the -1/12 represents subtracting the overall mean of all 12 observations. So the estimates involve an average of a group minus the overall average. In other words,

\(\hat{\beta}_0=\bar{Y}_.\)

\(\hat{\beta}_1=\bar{Y}_2-\bar{Y}_.\)

\(\hat{\beta}_2=\bar{Y}_3-\bar{Y}_.\)

\(\hat{\beta}_3=\bar{Y}_4-\bar{Y}_.\)

where \(\bar{Y}_.\) is the overall mean.

Again, this seems counter-intuitive, but remember that the interpretation of the regression coefficient in a multivariate setting is the effect of a one unit change in a variable holding all the other variables constant.

Recode

Suppose you want to recombine the columns of your X matrix to transform it using some linear combinations. This is the same as post-multiplying by a matrix.

\(B=AV\)

or equivalently

\(A=BV^{-1}\)

So to convert from

A
##       x1 x2 x3 x4
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  0  1  0  0
##  [5,]  0  1  0  0
##  [6,]  0  1  0  0
##  [7,]  0  0  1  0
##  [8,]  0  0  1  0
##  [9,]  0  0  1  0
## [10,]  0  0  0  1
## [11,]  0  0  0  1
## [12,]  0  0  0  1

into

B
##       x0 x1 x2 x3
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  1  1  0  0
##  [5,]  1  1  0  0
##  [6,]  1  1  0  0
##  [7,]  1  0  1  0
##  [8,]  1  0  1  0
##  [9,]  1  0  1  0
## [10,]  1  0  0  1
## [11,]  1  0  0  1
## [12,]  1  0  0  1

you would multiply by this V

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    1    1    0    0
## [3,]    1    0    1    0
## [4,]    1    0    0    1

to get

A%*%V
##       [,1] [,2] [,3] [,4]
##  [1,]    1    0    0    0
##  [2,]    1    0    0    0
##  [3,]    1    0    0    0
##  [4,]    1    1    0    0
##  [5,]    1    1    0    0
##  [6,]    1    1    0    0
##  [7,]    1    0    1    0
##  [8,]    1    0    1    0
##  [9,]    1    0    1    0
## [10,]    1    0    0    1
## [11,]    1    0    0    1
## [12,]    1    0    0    1

Note the V is not a symmetric matrix. Symmetry is not requirement here.

You can go in the other direction using

U=inv(V)
U
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]   -1    1    0    0
## [3,]   -1    0    1    0
## [4,]   -1    0    0    1

which yields

B%*%U
##       [,1] [,2] [,3] [,4]
##  [1,]    1    0    0    0
##  [2,]    1    0    0    0
##  [3,]    1    0    0    0
##  [4,]    0    1    0    0
##  [5,]    0    1    0    0
##  [6,]    0    1    0    0
##  [7,]    0    0    1    0
##  [8,]    0    0    1    0
##  [9,]    0    0    1    0
## [10,]    0    0    0    1
## [11,]    0    0    0    1
## [12,]    0    0    0    1

Now the formula for the estimates for a given parameterization is

\(\hat{\beta}_B=(B'B)^{-1}B'Y\)

Replace B with AV to get

\(\hat{\beta}_B=(V'A'AV)^{-1}V'A'Y\)

which simplifies, after a bit of work, to

\(\hat{\beta}_B=V^{-1}(A'A)^{-1}A'Y\)

or

\(\hat{\beta}_B=V^{-1}\hat{\beta}_A\)

Let’s check this out with the coefficients we calculated earlier.

beta_A
## x1 x2 x3 x4 
## 52 54 62 72
beta_B
## x0 x1 x2 x3 
## 52  2 10 20
inv(V)%*%beta_A
##      [,1]
## [1,]   52
## [2,]    2
## [3,]   10
## [4,]   20

You can also go in the reverse direction.

V%*%beta_B
##      [,1]
## [1,]   52
## [2,]   54
## [3,]   62
## [4,]   72

Check the deviation coding

The same works with any coding. The V matrix that gets you from

A
##       x1 x2 x3 x4
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  0  1  0  0
##  [5,]  0  1  0  0
##  [6,]  0  1  0  0
##  [7,]  0  0  1  0
##  [8,]  0  0  1  0
##  [9,]  0  0  1  0
## [10,]  0  0  0  1
## [11,]  0  0  0  1
## [12,]  0  0  0  1

to

D
##       x0 x1 x2 x3
##  [1,]  1 -1 -1 -1
##  [2,]  1 -1 -1 -1
##  [3,]  1 -1 -1 -1
##  [4,]  1  1  0  0
##  [5,]  1  1  0  0
##  [6,]  1  1  0  0
##  [7,]  1  0  1  0
##  [8,]  1  0  1  0
##  [9,]  1  0  1  0
## [10,]  1  0  0  1
## [11,]  1  0  0  1
## [12,]  1  0  0  1

is

##      [,1] [,2] [,3] [,4]
## [1,]    1   -1   -1   -1
## [2,]    1    1    0    0
## [3,]    1    0    1    0
## [4,]    1    0    0    1

Let’s check this in one direction

A%*%V
##       [,1] [,2] [,3] [,4]
##  [1,]    1   -1   -1   -1
##  [2,]    1   -1   -1   -1
##  [3,]    1   -1   -1   -1
##  [4,]    1    1    0    0
##  [5,]    1    1    0    0
##  [6,]    1    1    0    0
##  [7,]    1    0    1    0
##  [8,]    1    0    1    0
##  [9,]    1    0    1    0
## [10,]    1    0    0    1
## [11,]    1    0    0    1
## [12,]    1    0    0    1

and the opposite direction

U=inv(V)
U
##       [,1]  [,2]  [,3]  [,4]
## [1,]  0.25  0.25  0.25  0.25
## [2,] -0.25  0.75 -0.25 -0.25
## [3,] -0.25 -0.25  0.75 -0.25
## [4,] -0.25 -0.25 -0.25  0.75
D%*%U
##       [,1] [,2] [,3] [,4]
##  [1,]    1    0    0    0
##  [2,]    1    0    0    0
##  [3,]    1    0    0    0
##  [4,]    0    1    0    0
##  [5,]    0    1    0    0
##  [6,]    0    1    0    0
##  [7,]    0    0    1    0
##  [8,]    0    0    1    0
##  [9,]    0    0    1    0
## [10,]    0    0    0    1
## [11,]    0    0    0    1
## [12,]    0    0    0    1

We can establish that

beta_A
## x1 x2 x3 x4 
## 52 54 62 72

can be converted to

beta_D
## x0 x1 x2 x3 
## 60 -6  2 12

using

inv(V)%*%beta_A
##      [,1]
## [1,]   60
## [2,]   -6
## [3,]    2
## [4,]   12

and the reverse direction works as well.

V%*%beta_D
##      [,1]
## [1,]   52
## [2,]   54
## [3,]   62
## [4,]   72

Make your own contrast

Suppose you wanted to develop a parameterization that is meaningful for your particular setting and none of the ones discussed here or elsewhere seem to fit the bill. With a bit of work, you can calculate your own.

Suppose for example that you wanted

\(\hat{\beta}_0=\bar{Y}_1\)

\(\hat{\beta}_1=\bar{Y}_2-\bar{Y}_1\)

\(\hat{\beta}_2=\bar{Y}_3-\bar{Y}_2\)

\(\hat{\beta}_3=\bar{Y}_4-\bar{Y}_3\)

Instead of comparing each mean to the first mean, you want to compare each mean to the previous mean.

The U matrix that converts

\(\hat{\beta}_0=\bar{Y}_1\)

\(\hat{\beta}_1=\bar{Y}_2\)

\(\hat{\beta}_2=\bar{Y}_3\)

\(\hat{\beta}_3=\bar{Y}_4\)

to the new parameterization is

##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]   -1    1    0    0
## [3,]    0   -1    1    0
## [4,]    0    0   -1    1

Now, you want to travel from the parameterization represented by

A
##       x1 x2 x3 x4
##  [1,]  1  0  0  0
##  [2,]  1  0  0  0
##  [3,]  1  0  0  0
##  [4,]  0  1  0  0
##  [5,]  0  1  0  0
##  [6,]  0  1  0  0
##  [7,]  0  0  1  0
##  [8,]  0  0  1  0
##  [9,]  0  0  1  0
## [10,]  0  0  0  1
## [11,]  0  0  0  1
## [12,]  0  0  0  1

to a new parameterization represented by E. This would need V, the inverse of U, which is

V <- inv(U)
V
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    1    1    0    0
## [3,]    1    1    1    0
## [4,]    1    1    1    1

which provides a matrix

E <- A%*%inv(U)
E
##       [,1] [,2] [,3] [,4]
##  [1,]    1    0    0    0
##  [2,]    1    0    0    0
##  [3,]    1    0    0    0
##  [4,]    1    1    0    0
##  [5,]    1    1    0    0
##  [6,]    1    1    0    0
##  [7,]    1    1    1    0
##  [8,]    1    1    1    0
##  [9,]    1    1    1    0
## [10,]    1    1    1    1
## [11,]    1    1    1    1
## [12,]    1    1    1    1

which gives the correct results.

x0 <- E[ , 1]
x1 <- E[ , 2]
x2 <- E[ , 3]
x3 <- E[ , 4]
beta_E <- coef(lm(y~x0+x1+x2+x3-1))
beta_E
## x0 x1 x2 x3 
## 52  2  8 10

Acknowledgements

In writing this webpage, I relied on a very helpful tutorial paper:

Daniel J. Schad, Shravan Vasishth, Sven Hohenstein, Reinhold Kliegl. How to capitalize on a priori contrasts in linear (mixed) models: A tutorial. Journal of Memory and Language, 2020, 110. Available in html format or pdf format.

as well as a page on the UCLA Statistics site

R Library Contrast Coding Systems for Categorical Variables. Available in html format