The keynote address at the 18th Annual Applied Statistics in Agriculture Conference, sponsored by Kansas State University was “Random Observations with Mixed Feelings”, given by Oliver Schabenberger, SAS Institute Inc. The original title was “Estimating Gene Expression Profiles Using All Available Information.” Here are my notes from that seminar.
Dr. Schabenberger started with a historical overview. Approximately 25 years ago, the the general linear mixed model was an obscure complicated mathematical construction that required specialized software that was utilized by a limited number of statisticians. The times have changed, but there is still more work that needs to be done.
He noted that the typical users have unreasonable expectations. They expect that all mixed models make sense, converge, and behave as expected (that is, behave well). “The pace of mixed model applications is outrunning knowledge acquisition.”
The goal of his talk was to take another look at the mixed model through the Mixed Model Equations, dispel some myths about mixed models, and discuss where we are going.
The mixed model equations Y=Xb+Zg+e. The original solution by Henderson was to minimize a quadratic form. The secret is to understand the lower right corner of the solution to this quadratic form.
He drew an analogy to a partitioned model with all fixed effects. There is also an analogy with ridge regression which shrinks the estimates to control for multicollinearity. The mixed model uses a similar approach, except that the G matrix is intended not to control for multicollinearity, but to penalize the fit. The random effects cannot be allowed to vary at will but they must vary according to the distribution of the random effects.
If the objective function in statistical estimation is based on a penalty function, then you can use a mixed model formulation. A good example of this is a smoothing spline.
Just because you can draw a link, however, between these approaches does not mean that you should. There are computational efficiency issues, among other things, that you need to consider.
The concept of a BLUP (Best Linear Unbiased Prediction) has an interpretation as an Empirical Bayes estimate. Dr. Shabengerger showed a correspondance between the traditional mixed model equations and the equations generated by a Bayesian interpretation of the mixed model.
He had a warning “variation does not imply variance” and cited a dangerous outlook along the lines of “One of the great advantages of mixed modeling is that you can treat effects as fixed or random depending on the kind of analysis that you are interested in.”
This is a troubling development leads to a “shoot from the hip approach” such as
- treating all nested effects are random,
- declaring effects random because you WANT to draw inferences regardless of how the effect came about.
- equating ignorance with randomness.
- equating the ability to estimate a variance component as evidence of that effect being random.
A dangerous approach is to fit a model in which all effects are random and then figuring out which ones “stick.” Dr. Schabenberger noted that he hears about this approach because the models that include everything will fail to converge.
How do you justify random effects in observational studies? Is the observational study the realization of a stochastic process? If so, it is random. The crux is: what is the effect representative of?
Myths: residuals in mixed models behave like any residuals do. Reality: they don’t, in part because there are two different types of residuals.
Myth: Leverages are bounded by 1/n and 1. Reality: leverage in mixed models is not well defined. You can look at the gradient of the fitted values with respect to y, but this is not symmetric so cannot represent a projection matrix. You can get negative leverage values. An interpretation could be made for this quality.
Myth: if you fit a model with spatial correlation, the residuals are “Purged” of spatial autocorrelation. Reality: the covariance structure of the model infects the residuals.
Myth: Least Square Means are population means that take into account all other model effects, both fixed and random. Reality: the Least Squares means do not involve any function of the random effects. In fact, the formula for the Least Square Means is identical for a model with random effects and a model without random effects. The only reason the Least Square Means differ is that the estimated coefficients change. If your random effects have non-zero means, you have a very big problem.
He mentioned the use of various information criteria. He cited a famous story about how the answer to the ultimate question of Life, Universe, and Truth is 42 (Douglas Adams). You need to look at more than just a single number. You should examine the pattern of residuals. The use of an information criteria is especially troublesome with GLMM which use linearization, and pseudo-likelihood methods. Pseudo-likelihoods are not comparable across models, even if that are not nested.
He spent a fair amount of time talking about spatial applications. Low rank spatial smoothing can often reduce a very complex problem to something that is manageable and that can produce an estimate in reasonable time.
He also talked about a joint publication with Gilliland in 2001 that looked at correlated binary variables. I don’t have the full details on this reference, but the title apparently is “Limits on Pairwise Association for Equi-Correlated Binary Variables” and it appeared in the Journal of Applied Statistical Sciences. There are serious constraints on binary variables that prevent many correlation values from occurring.
Dr. Shabenbarger ended his talk with a summary of unfinished and unresolved issues:
- the degrees of freedom issue,
- diagnostic tests and graphs,
- non-normal random effects,
- and mixture models.