My entry into the Applications of R in Business Competition

Steve Simon


This page is currently being updated from the earlier version of my website. Sorry that it is not yet fully available.

I recently heard about a contest sponsored by Revolution Analytics. Revolution Analytics is offering $20,000 in prizes to the best examples of applying R to business problems. This competition is designed to grow the collection of on-line materials describing how to use R, and to spur adoption of R and Revolution R for business applications. The contest is open to all R users worldwide. I want to submit the R code that I developed for the accrual paper published by Byron Gajewski and me in 2008.

My submission is still an early draft, but you can look at it at –>

The R code was originally published at –>

but I have tweaked it a bit here.

duration.plot <- function(n,T,P,m,tm,B=1000,Tmax=2*T,sample.paths=0) {
# n is the target sample size
# T is the target completion time
# P is the prior certainty (0 <= P <= 1)
# m is the sample observed to date
# tm is the time to date
# B is the number of simulated duration times
# Tmax is the upper limit on the time axis of the graph
# set P to zero for a non-informative prior
# set m and tm to zero if no data has been accumulated yet.
  k <- n*P+m
  V <- T*P+tm
# Draw B samples from the inverse gamma distribution
  theta <- 1/rgamma(B,shape=k,rate=V)
# Create and fill a matrix of simulated trial durations
  simulated.duration <- matrix(NA,nrow=n-m,ncol=B)
  for (i in 1:B) {
    wait <- rexp(n-m,1/theta[i])
    simulated.duration[,i] <- tm+cumsum(wait)
# Each column is a separate simulated trial of waiting
# times for the remaining n-m patients. Calculate quantiles
# across the rows to get confidence limits on the waiting
# times for patients m+1, m+2, ..., n.
  lcl <- apply(simulated.duration,1,quantile,probs=0.025)
  mid <- apply(simulated.duration,1,quantile,probs=0.500)
  ucl <- apply(simulated.duration,1,quantile,probs=0.975)
# Graph the simulated accruals. Create a layout with two graphs
# The top graph is a histogram of the total estimated accrual for
# the full trial. The bottom graph is a display of predicted
# waiting times for patients m+1, m+2, ..., n.
# No margins on bottom, top, and right of histogram.
# create histogram with 40 bars.
  duration.hist <- cut(simulated.duration[n-m,],
    axes=FALSE,xlab=" ",ylab=" ",space=0,
    col="white",names.arg=rep(" ",39))
# No margins on boot, top, and right of graph of waiting
# times for patients m+1, m+2, ..., n.
# Layout graph and axes, but no data yet.
    ylab="Number of patients",type="n")
# Draw a polygon in gray ranging from lower to upper
# quantile limits.
# Draw a white line at the median.
# Draw a black line showing observed accrual through patient m.
# Draw a samll number of simulated accrual pathways
  while (sample.paths>0) {
    sample.paths <- sample.paths-1
# Calculate and return percentiles to estimated total accrual time.
  pctiles <- matrix(NA,nrow=2,ncol=3)
  dimnames(pctiles) <- list(c("Waiting time","Trial duration")
  pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V)
  pctiles[2,1] <- lcl[n-m]
  pctiles[2,2] <- mid[n-m]
  pctiles[2,3] <- ucl[n-m]

The function calls that produced the three figures shown below are

p0 <- duration.plot(n=350,T=3,P=0.5,m= 0,tm= 0,Tmax=10)

Accrual estimates based on prior distribution only

p1 <- duration.plot(n=350,T=3,P=0.5,m=41,tm=239/365,Tmax=10)

Accrual estimate based on informative prior and actual accrual data

p2 <- duration.plot(n=350,T=3,P=0, m=41,tm=239/365,Tmax=10)

Accrual estimate based on actual data and flat prior

The due date for a submission is October 31.

You can find an earlier version of this page on my original website.