Interesting links from the UseR 2022 conference

Steve Simon

2022-06-21

I am attending the UserR conference this week. It is a virtual format, which has its advantages and disadvantages. I want to track a variety of links that I found from the various speakers at this conference.

Henrik Begtsson: Futureverse: Parallelization in R

Abstract is available here.

A great short course. While there have been several approaches in R to take advantage of parallelization, these are not standardized and easily extensible. There are a suite of packages, known as the futureverse, that provide a simple cross-platform approach to speeding up your program using multiple cores on your computer or using a distributed computing system. The futureverse website provides a good overview of this work.

Paula Moraga - R for Geospatial Data Science and Public Health Surveillance

Abstract is available here.

This was a wide ranging talk covering a variety of different tools.

Her book, Geospatial Health Data, looks quite interesting.

Session 5, Big Data Management (sponsored by Oracle)

Abstracts are available here

Will Landau - Data version control for reproducible analysis pipelines

Version control is commonly used to track changes in code. You can also use version control to track the data created by this code. The problem is that the data dependencies on code are not easy to track and the data (because of its large size) often cannot reside in the same repository.

This talk covered two packages, gittargets and targets.

Ilias Moutsopoulos - bulkAnalyseR: An accessible, interactive pipeline for analysing and sharing bulk multi-modal sequencing data

I have difficulty following talks about genetics and this talk is no exception. This package appears to customize the workflow for RNASeq data and probably other genetic datasets as well.

The author used a term “bespoke” that I had not heard before. This is a synonym for custom built.

Oliver Reiter - Providing large trade datasets for research using Apache Arrow

This talk relies on several open source systems:

as well as an R package,

for checking to see when the source files were updated.

Phuong Quan - daiquiri: Data quality reporting for temporal datasets

A nice package that standardizes the quality review of data that has a time component. Of particular concern are time periods where there is no data at all or a time point at which the data “jumps” to a different state.

Session 14, Learning ggplot2

Abstracts are available here.

Jonathan Carroll - ggeasy: Easy access to ggplot2 commands

To modify elements of a plot in ggplot2 such as the axes and legend, you have to use the theme function, which is very powerful, but also tedious and difficult to remember. The ggeasy provides simple and easy to remember functions to modify these elements.

Information about ggeasy is available on

Dr. Carroll included a minor modification of a cute xkcd cartoon. Here is the original.

Nicola Rennie - Learning ggplot2 with generative art

I learned from this talk what generative art is, art derived from a computer algorithm, often with a random component.

She included links to three github repositories

along with a shiny app.

She talked about two ggplot2 geoms that are not that commonly used, but which lend themselves well to generative art:

She also combined graphs with disparate scalues using the patchwork package.

Dr. Rennie works at Jumping Rivers which is an interesting company that does training and consulting in R, Python, Shiny, and Machine Learning.

Totally unrelated to this talk, but a link I found around the same time that uses Stata to draw “spirographs” using parametric functions and polar coordinates.

James Otto - ggdensity: Improved bivariate density visualization in R

The default mode for displaying bivariate densities in ggplot uses level curves, which are difficult to interpret. This package calculates the minimal area which contains 50%, 80%, 90% and 95% of the estimated probabilities. It also offers alternative approaches for visualizing densities.

June Choe - Stepping into ggplot2 internals with ggtrace

The ggplot2 package hides most of the internal calculations from the user, which is mostly a good thing. But if you want to explore and tinker with these internal calculations, you will not find it easy. The ggtrace package allows you to interact with the internal structure and intercept and modify some of the internal arguments. It sounds dangerous and complicated, but it really is not that bad.

Cara Thompson - Level up your labels: Tips and tricks for annotating plots

Cara Thompson illustrates some simple annotation methods that can greatly improve your talk. Her slides are here.

I asked a question:

Cara Thompson, One thing I would like to do with annotations is to tilt the text to match the slope of a trend line. Do you know an easy way to do this?

She suggested the geom_textpath function, which looks like it is exactly what I want. There is a vignette and a github repository.

Yihui Xie

https://slides.yihui.org/2022-useR-blogdown.html

Julia Silge - Applied machine learning with tidymodels

Julia Silge is co-author of a book, Tidy Modeling with R, with a free version on the web.

She stressed the need for a “data budget” where you allocate data into various parts of the analysis, most notably testing and training.

She also talked about deployment of models, which uses something I have heard a little bit about, the RESTful API. Here is a technical description, but there are probably resources out there that are more accessible to a beginner like me.

Kristen Hunter - Power Under Multiplicity Project (PUMP): Estimating power, minimum detectable effect size, and sample size when adjusting for multiple outcomes

I missed this talk but it is something I am very interested in. The software is available on the author’s github site and you can find the slides for this talk in PDF format.

Achim Zeileis - distributions3: From basic probability to probabilistic regression

The author prepared some wrappers that make it more convenient to use various probability functions in R. Here are a couple of references.

Hayes A. Moller-Trane R, Jordan D. Northrop P. Lang MN. Zeileis A. et al. (2022). “distributions3: Probability Distributions as S3 Objects.” R package version 0.2.0. Available in html format.

Lang MN, Zeileis A, Stauffer R, et al. (2022). “topmodels: Infrastructure for Inference and Forecasting in Probabilistic Models.” R package version 0.2-0. Available on R-Forge.

The author used a data example from something called Tidy Tuesday, which is a repository updated weekly with interesting datasets useful to learning more about R and the tidyverse.

Pierre Masselot - The R package cirls: Constrained estimation in generalized linear models

A useful tool for putting constraints on a wide range of models. More information is at the author’s github site.

Hannah Frick - censored: A tidymodels package for survival analysis

The functions to do survival analysis in R have an unusual syntax that differs from most other statistical models. This paper shows a new package, censored, that makes accessing survival models much simpler and consistent with a general framework, tidymodels. It relies heavily on the parsnip package.

The authors deserve an award for the most interesting hex sticker. Here is the hex sticker for parsnip

Hex sticker for parsnip

and here is the hex sticker for censored.

Hex sticker for censored

This appears to be a parsnip that has been nibbled on in several places.

Mine Dogucu - Teaching accessibly and teaching accessibility

The final keynote talk. The slides are available here. The author developed a book for Bayesian methodology and used it as an example of how to address a variety of accessibility issues.

The slides used a tone each time a slide changed, with the tone increasing in pitch. This makes it easier to follow which slide the speaker is on. It uses a function, xaringanExtra::use_slide_tone().

There was a great quote on one of the slides.

“As you read the book and put Bayesian methodology into practice you will make mistakes. Many mistakes. Making and learning from mistakes is simply part of learning. We hope that you persist through the struggle of learning so that you can contribute your unique insights, perspectives, and experiences to the Bayesian community.”

She cited the Americans With Disabilities Act, of course, but also provided an international perspective with a reference to the United Nations Covenant on the Rights of Persons with Disabilities. She mentioned the Okabe-Ito palette for color-blind viewers. She also stressed the importance of using alt text for images and relied on some suggestions in an article by Amy Cesal. She mentioned the SAS site on accessibility as an excellent resource.

Other odds and ends

I don’t know which talks mentioned these, but want to list some more links here so I don’t lose them.

There are a variety of regular expression libraries out there. One of them is tre.

A simple function for coding numbers into symbols, symnum.

A special article on research into software engineering.

An example of R release notes.

The scoringRules package.

An overview of knitr engines