There are a lot of people in the world who are a lot smarter than I am and it is always a humbling experience when I recognize how little I really know.
Frank Harrell, chair of the Department of Biostatistics at Vanderbilt University, is one of those people. He has a book,
- Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Frank E. Harrell (2001) New York, NY: Springer. [BookFinder4U link]
that should be required reading for any statistician who is planning to develop a regression model. It covers all the new things that I know I should be using, but that I have been too lazy to check out. It’s not for a beginner, but if you have some experience with regression models and want to learn how to use the best state of the art methods, there is no better place to look.
A reminder of how important all of this stuff is appears on Dr. Harrell’s website:
where he outlines his philosophy of biostatistics. It is well worth repeating here, and each bullet point needs to be expanded on
- *<U+FFFD>Biostatistics needs to be fully integrated into biomedical research; experimental design is all important*
- Don’t be afraid of using modern methods
- Avoid categorizing continuous variables and predicted values at all costs
- Don’t assume that anything operates linearly
- Account for model uncertainty and avoid it when possible by using subject matter knowledge
- Use the bootstrap routinely
- Make the sample size a random variable when possible
- Consider using Bayesian methods
- Use excellent graphics, liberally
A good elaboration of the third bullet point appears at
which outlines the issue far better than anything I have written on the topic.