Dear Professor Mean, I have developed a method to distinguish among several products that we need to buy so our company can make a good purchasing decision. I created a composite score which is a weighted average of several different indicators of quality. I want to use statistics to determine when two different products have significantly different composite scores.
It sounds like what you want is to select a product on the basis of the highest composite score, but if two composite scores are close, then you break the tie on the basis of price or convenience.
I would argue that statistics are a poor way to judge which scores are close. I’ve seen similar situations in medical research where a statistically significant change was of no practical importance. I helped with a medical study, for example, where we looked at various measures of male reproductive potential. One of the measures was semen pH. One group had a statistically significantly higher average level of semen pH, but it was 6.9 versus 7.2 or something like that. Sperm can function well in a much broader range of pH levels, so I am told. You would need to see a full unit change in pH or maybe even more before any doctor would worry.
So I would suggest that you talk to the same people who helped you develop you the weights for your composite score and ask them to tell you how much of a change in the composite score would be large enough to have a practical impact. This is an example where statistics is a poor substitute for human judgment.
You can find an earlier version of this page on my original website.