When is a control chart not a control chart?

Steve Simon


I found a pair of data sets on the web that represent counts and where one goal of the data collection is to see if any of the individual counts differ from the overall average. They look quite similar and you might be tempted to analyze both of them using a control chart. But the second example is different in subtle

The first example comes from the NIST/SEMATECH e-Handbook of Statistical Methods. This data set comes from a semiconductor manufacturing plant. The plant produces wafers

The average rate of defects is 16. Did any wafers produce defects above the average of 16? You could take this literally and point out that the 3rd

The control limits are 8 and 32. Look for a single point outside the control limits (wafer #24) or eight consecutive points on the same side of the center line (no such examples in this chart). Some people will divide the control charts into three zones and use rules like 2 out of 3 consecutive points in Zone A

The Fraser Institute produces an Ontario hospital report card on a variety of measures. One measure if the volume of procedures performed at various hospitals. There is some evidence that hospitals that perform a large volume of procedures have better outcomes (an application

The average hospital performs 52.5 procedures. Which hospitals perform more than the overall average? Your first instinct might be to use a control chart again

There’s another difference and this is much more subtle. A control chart represents a continuous monitoring of a work process. Although there are some practical limits on the number of points in a control chart

Because a control chart has no obvious upper limit on its length

With the hospital example


The ANOM model can be used for continuous outcomes

The general formula for balanced data is

where I is the number of groups and N is the total sample size. If there are n observations in each group

You can also compute the critical value using a multivariate t-distribution. There are a few complications. First the distribution you are trying to describe represents deviations from an overall mean, so there will be correlations in the data since each group contributes to the overall mean. This correlation

In R

i <- 25 co <- matrix(-1/(i-1),nrow=i,ncol=i) diag(co) <- rep(1,i) qmvt(p=0.95,tail="both.tails",corr=co,df=5000)

For count data

[Formula is misplaced. I will try to restore it soon.]

and there are further modifications if the data represents proportions or rates

Seven hospitals have volume above the overall average and twelve have volume significantly below the overall average.