Using real examples in teaching statistics (May 30, 2007)

Steve Simon


I am writing a book review and wanted to get some opinions from participants on EDSTAT-L about the tendency that this book (and many others) has for using hypothetical rather than real examples. I generally like to see real examples, but one of my favorite books

Biostatistics The Bare Essentials. Geoffrey R. Norman, PhD, David L. Streiner, PhD (1994) St. Louis, Missouri: Mosby-Year Book, Inc. has a different perspective.

Most chapters begin with an example to set the stage. Usually the examples were dreamt up in our fertile imaginations and are, we hope, entertaining. Occasionally, we reverted to real-world data, simply because sometimes the real world is at least as bizarre as anything imagination could invent. Although many reviews of statistics books praise the users of real examples and castigate others, we are unapologetic in our decision for several reasons: (1) the book is aimed at all types of health professionals, and we didn’t want to waste your time and ours explaining the intricacies of podiatry for others; (2) the real world is a messy place, and it is difficult, or well nigh impossible, to locate real examples that illustrate the pedagogic points simply; and (3) we happen to believe and can cite good psychologic evidence to back it up, that memorable (read ‘bizarre’) examples are a potent ally in learning and remembering concepts. (page viii)

Along the same lines, one of the EDSTAT-L contributors (CB) pointed out

Basically, I believe that too much emphasis is placed on using “real world data” in the early stages of teaching statistical concepts. Contrived data has the advantage of clearly demonstrating the concept being taught. For example juxtaposing two distributions with common mean and variance but different kurtosis can help to demonstrate the kurtosis concept to take a minor example. Secondly, medical and other jargon can be avoided that detracts from the issue at hand.

Let me hasten to add that I feel just as strongly that after the concept has been properly seated, it is just as important to followup with “real world” data and/or situations so the concept can be seen in proper context. In short, I believe in a combination of the hypothetical and real. My limited observations tell me that the pendulum has swung too far to the “real data” end of the continuum in many cases. We need to move to the middle IMHO.

Similar thoughts were echoed by JW.

The ‘problem’ with real examples is that they are real - if one is looking to demonstrate a specific point, the real example hides it under the cloak of all the other ‘extraneous’ items in it. I would rather have blatantly unreal examples during the development & initial learning phase, and then, say at the end of the chapter, have the real cases. Especially if they can be presented along with pointing out the additional considerations.

One of the EDSTAT_L contributors (R) mentioned the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project. This project offered “Use Real Data” as the second of six major recommendations.

It is important to use real data in teaching statistics, for reasons of authenticity, for considering issues related to how and why the data were produced or collected, and to relate the analysis to the problem context. Using real data sets of interest to students is also a good way to engage them in thinking about the data and relevant statistical concepts. There are many types of real data including archival data, classroom-generated data, and simulated data. Sometimes hypothetical data sets may be used to illustrate a particular point (e.g., the Anscombe data illustrates how four data sets can have the same correlation but strikingly different scatterplots) or to assess a specific concept. It is important to only use created or realistic data for this specific purpose and not for general data analysis and exploration. An important aspect of dealing with real data is helping students learn to formulate good questions and use data to answer them appropriately based on how the data were produced. Source: GAISE website

Along the same lines, DS argues real data, especially in a second course in statistics, and cites a Journal of Statistics Education article that he co-authored.

There now seems to be a near universal recognition of the need for real data problems in applied statistics classes. What we wish to add, though, is that real data problems are necessary but not sufficient. It is not enough to have “data examples.” Considerable care and some skill are needed to use the full data problems to communicate the entire process of data analysis and the role of statistics in scientific learning. By “case studies” we mean data sets with accompanying context that are considered thoroughly and that occupy a central position in the course structure. Our approach is to present the case studies first, as an introduction to a data structure, then use them to demonstrate the methods. For each case study we also include a summary of statistical findings to illustrate statistical communication, discuss the scope of inference as it relates to the study’s design, and talk about any additional, broader issues of data analysis that arise in the analysis and interpretation. Source: Daniel W. Schafer and Fred L. Ramsey. Teaching the Craft of Data Analysis. Journal of Statistics Education Volume 11, Number 1 (2003)

There was also commentary about social constructivism and cognitive constructivism.

You can find an earlier version of this page on my old website.