How I use R Markdown to write research papers

Categories: Blog post Tags: Writing research papers

R Markdown is an implementation of Markdown inside the R programming language. R Markdown also leverages features in LaTex, MathJax, and Pandoc.

If you use the RStudio integrated development environment to run R, then you already have many of the features of R built-in. There are several R packages that use or extend R Markdown:

I use R Markdown to produce early drafts of research publications. It simplifies writing in several different ways.

Figures

R can produce beautiful graphs, and with R Markdown, you can insert these graphs anywhere in your document. You can’t control page breaks easily in R Markdown, however. Here’s an example of a simple graph.

With the ability to plot polygons, line segments, and arrows, you can create informational diagrams like this.

The code to produce this graph is a bit tedious to create, but since the code that produces this graph is part of the R Markdown file, you can edit it as you are editting other parts of your paper. If you want to relabel the middle box as an “E”, you do not need to exit and run a different program, and then cut-and-paste back into your documents.

Formulas

You can get really nice looking mathematical formulas inserted into your paper using a really simple syntax. For example, the code

V = \frac{4}{3} \pi r^3

becomes

\(V = \frac{4}{3} \pi r^3\)

when you sandwich it between two dollar signs.

Tables

I used to spend a lot of time retyping numbers from what the statistical software said into a table that was organized to make it more readable. That is a tedious and error prone practice, and if your data set changes, even just a little bit, you have to re-type everything all over again. You can produce nice looking tables in R Markdown. It takes about as much work as just retyping everything, but once the table looks nice, you can use the code over again for other projects. I am still learning how to do tables well, but here is some output that I generated to show the potential value.

lm(breaks ~ wool*tension, data = warpbreaks) %>%
  tidy %>%
  rename(t.test=statistic) %>% 
  kable(
    digits=c(1, 1, 1, 2, 4),
    caption="Analysis of variance table")
Table 1: Analysis of variance table
term estimate std.error t.test p.value
(Intercept) 44.6 3.6 12.22 0.0000
woolB -16.3 5.2 -3.17 0.0027
tensionM -20.6 5.2 -3.99 0.0002
tensionH -20.0 5.2 -3.88 0.0003
woolB:tensionM 21.1 7.3 2.89 0.0057
woolB:tensionH 10.6 7.3 1.45 0.1543

I can change how the values in this table are rounded or the names in the header row with just one line of code. Other enhancements may take a bit more work.

Output formats

You can specify a variety of output formats. The ones I use most often are

The pdf option requires that you have a version of LaTeX installed on your computer. I use MikTeX, but others work just fine.

Bibliographies

If you have your references stored in BibTeX format, you can place them at the end of your document.

The bibliography always appears at the end of the document. I’m not showing a bibliography for technical reasons, but it is pretty easy.

Final comments

Most of the people that I collaborate with do not use R Markdown, so I typically will start a draft in R Markdown, but when it gets to the point where I can share it with my collaborators, I send the document in Word format and we make any further changes there. If I am the sole author, then I can keep the file in R Markdown.