ROSE: A package for binary imbalanced learning

Steve Simon

2017-05-02

Figure 1. Excerpt from website

Logistic regression and other statistical methods for predicting a binary outcome run into problems when the outcome being tested is very rare, even in data sets big enough to insure that the rare outcome occurs hundreds or thousands of times. The problem is that attempts to optimize the model across all of the data will end up looking predominantly at optimizing the negative cases and could easily ignore and misclassify all or almost all of the positive cases since they constitute such a small percentage of the data. The ROSE package generates artificial balanced samples to allow for better estimation and better evaluation of the accuracy of the model.

Nicola Lunardon, Giovanna Menardi, and Nicola Torelli. ROSE: A Package for Binary Imbalanced Learning. The R Journal 6/1, 79-89, June 2014. Available in pdf format.

You can find an earlier version of this page on my blog.