I’m helping to teach a beginner’s workshop on R. Here are some resources that we will get from the web, but if you can download these files to your computer ahead of time, that would be great.
Here are the files for the latest R version (3.0.2 as of the writing of this blog entry):
Here are the files for R Studio (what is R Studio?)
Here are the files for R Commander (what is R Commander?)
I’ll be using a data set on childhood respiratory diseases (read documentation for this file)
- Text file (for all versions of R)
Here are the R commands and the associated output.
For your convenience, I am also including the R commands file below.
For this example, I am reading data from a text file
located at the OzDASL (Australian Data And Story Library)
website. It is a tab delimited file with headers in the first row.
fnam <- “http://www.statsci.org/data/general/fev.txt" crd <- read.table(file=fnam,header=TRUE)
It is always good to peek at your data, and the
head and tail functions give the first few and
the last few bits of your data.
All statistical analyses in R are functions. The
mean function gives you the average.
Pop quiz #1, what is the average height for this data?
Always be careful about missing values. Most functions
in R have options for handling missing values different
The sd, range, and quantile fuctions are self-explanatory.
The summary function gives you the mean and quantiles combined.
Use the table function to get counts for categorical data.
sd(crdAge) range(crdAge) quantile(crdAge) quantile(crdAge,probs=c(0.1,0.9)) summary(crdAge) table(crdSex)
Pop quiz #2, what is the range for height?
Pop quiz #3, how many smokers are there?
Again, you must always be careful about missing values.
You can specify different methods for handling missing
values in the table function with the useNA argument.
Pop quiz #4, look at the help file for table. What are the
other options for the useNA argument?
Since all statistical analyses in R is done through
functions, you can store the results of that function
and re-use it.
age.mn <- mean(crdAge) age.sd <- sd(crdAge) ti <- paste(“The mean age is”,age.mn,"+/-",age.sd) hist(crdAge,main=ti)
Here’s an example of a more advanced analysis.
The lm function fits a linear model, which in
this example is equivalent to simple linear
regression. Store the results of lm because you
can then use it in several different ways.
lm.mod1 <- lm(FEV~Height,data=crd)
Pop quiz #5, look at the help file for the lm
function. What is one possible option for handling
You can get a quick idea of what is in lm.mod1 by
The summary function gives more details
The anova function gives a different display than summary.
The resid function produces residuals and the predict function
produces the predicted values. Use head to avoid printing out
all 654 values.
There is quite a bit stored in lm.mod1, and you can explore it
by first finding out the names of everything within it.
This is an example of re-using information in lm.mod1 in order
to enhance the information in your plot. First you should store
the coefficients of the regression equation for later re-use.
co.mod1 <- lm.mod1coefficients b0 <- co.mod1 b1 <- co.mod1 r0 <- round(b0,1) r1 <- round(b1,2)
abline draws a reference line with a given intercept and slope.
you can also insert the equation into the title of the graph.
ti <- paste(“FEV =",r0,"+",r1,"* Height”) title(ti)
The lm function illustrates some of the object-oriented features
of the R programming language–specifically the use of classes and
methods. lm produces an object of class “lm”. In contrast, the
nlm function (non-linear model) produces an object of class “nlm”.
The predict function has a method for lm objects and a different
method for nlm objects. The object oriented nature of R takes some
getting used to, but once you get over the initial learning curve,
it actually makes your programming life a lot simpler.
end of file
Here are the two graphs produced by this output.