Main site: http://www.sci.csueastbay.edu/~btrumbo/JSM2006/Outliers/
Proceedings Paper (pdf)
Poster (pdf)
R code (R)
Key Words: boxplot, outlier, simulation, R/S-Plus, pedagogy,
teaching undergraduates
Abstract: Computer packages often use boxplots of data to
indicate “outliers”: data values beyond fences located a certain multiple,
often 1.5, of the interquartile range (IQR) on either side of the box bounded
by the lower and upper quartiles. Simulation, presented here, shows that this
definition of outlier yields surprisingly many false outlier indications in
normal data of small or moderate sample size, and that the proportion of such
indications is very sensitive to sample size. Simulation studies using R
investigate the behavior of such outlier indications for several sample sizes,
several multiples of IQR, and several parent populations. One behavior
studied is the proportion of simulated samples with one or more outlier
indications. Concepts and simulation programs are at a level appropriate for
use in undergraduate statistics classes.
Department of Statistics;