One of my favorite graphs is the box plot. It is a visualization of a five number summary. Those five numbers are minimum, 25th percentile, median (aka the 50th percentile), 75th percentile, and maximum. The box in a box plot extends from the 25th to the 75th percentile and a line inside the box represents the median.
It’s great for comparing two groups. Here’s a boxplot showing two groups. Notice that the first group tends to have larger values.
## x g
## 1 0.5373604 g1
## 2 0.6169055 g1
## 3 0.6716831 g1
## 4 0.7156946 g1
## 5 0.7538330 g1
## 6 0.7885528 g1
## 7 0.8214372 g1
## 8 0.8538300 g1
## 9 0.8873732 g1
## 10 0.9253821 g1
## 11 0.3943692 g2
## 12 0.4710208 g2
## 13 0.5267383 g2
## 14 0.5735105 g2
## 15 0.6157184 g2
## 16 0.6557301 g2
## 17 0.6952916 g2
## 18 0.7362077 g2
## 19 0.7812077 g2
## 20 0.8368375 g2
What happens if you try to compute the five number summary with a dataset that only has only two values?
## x g
## 1 0.5373604 g1
## 2 0.6169055 g1
## 3 0.6716831 g1
## 4 0.7156946 g1
## 5 0.7538330 g1
## 6 0.7885528 g1
## 7 0.8214372 g1
## 8 0.8538300 g1
## 9 0.8873732 g1
## 10 0.9253821 g1
## 11 0.5585801 g2
## 12 0.7086762 g2