Abstract
One of the general goals in data analysis is the identification of unusual values. This can be done indirectly (after performing a robust analysis) or directly (via some detection procedure). This paper summarizes the backwards-stepping approach to the detection of unusual values in a data set. This approach has the advantages of simplicity of application, flexibility, and resistance to masking effects. Application to univariate, multivariate, and regression data, as well as other problems, is discussed. Simulations are used to investigate the properties of this strategy for data analysis. It is shown that identification of unusual values using appropriate detection procedures can be considerably more effective than indirect detection using a robust analysis.
I would like to thank Douglas M. Hawkins and Peter J. Rousseeuw for helpful discussion of this material.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Abbreviations
- AMS(MOS) subject classifications:
-
62F35
References
M.R. Anderberg, Clustering Analysis for Applications, Academic Press, New York (1973).
D.F. Andrews, P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers, J.W.Tukey, Robust Estimates of Location: Survey and Advances, Princeton University Press: Princeton, N.J. (1972).
D.F. Andrews, D. Pregibon, Finding the outliers that matter, J. Roy. Statist. Soc. Ser. B, 40 (1978), pp. 85–93.
F.J. Anscombe, Rejection of outliers, Technom., 2 (1960), pp. 127–147.
V. Barnett, T. Lewis, Outliers in Statistical Data, 2nd. ed., John Wiley, Sons: Chichester (1984).
R.A. Becker, J.M. Chambers, S: An Interactive Environment for Data Analysis and Graphics, Wadsworth: Belmont, CA. (1984).
D.A. Belsley, E. Kuh, R.E. Welsch, Regression Diagnostics, John Wiley and Sons: New York (1980).
R.K. Blashfield, Mixture model tests of cluster analysis: accuracy of four agglomerative hierarchical methods, Psychol. Bull., 83 (1976), pp. 377–388.
R.D. Cook (1977), Detection of influential observations in linear regression Technom. 19 , pp. 15–18.
R.D. Cook, S. Weisberg, Residuals and Influence in Regression, Chapman and Hall: New York (1982).
Crain’s New York Business, Top business lists, Vol. III, No. 52 (1987).
W.J. Dixon, ed., BMDP Statistical Software, Univ. of California Press: Berkeley (1983).
B. Efron, Bootstrap methods: another look at the jackknife, Ann. Statist. 7 (1979), pp. 1–26.
R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations, John Wiley and Sons, New York (1977).
S.B. Gray, R.F. Ling, K-clustering as a detection tool for influential subsets in regression, Technom. 26 (with discussion) (1984), pp. 305–330.
J.A. Hartigan, Consistency of single linkage for high-density clusters, J. Amer. Statist. Assoc. 76 (1981), pp. 388–394.
D.M. Hawkins, Fractiles of an extended multiple outlier test, J. Statist. Comput. Simul. 8, pp. 227–236.
D.M. Hawkins, Identification of Outliers, Chapman and Hall: London (1980).
D.C. Hoaglin, B. Iglewicz, J.W. Tukey, Performance of some resistant rules for outlier labeling, J. Amer. Statist. Assoc. 81 (1986), pp. 991–999.
P.J. HUBER, Robust regression - asymptotics, conjectures and Monte Carlo, Ann. Statist. 1 (1973), pp. 799–821.
P.J. Huber, Robust Statistics, John Wiley and Sons: New York (1981).
R.B. Jain, Percentage points of many-outlier procedures, Technom. 23 (1981), pp. 71–75.
R.B. Jain, Detecting outliers: power and some other considerations, Commun. in Statist. A10 (1981), pp. 2299–2314.
R.H. Jones, Maximum likelihood fitting of ARMA models to time series with missing observations, Technom. 22 (1980), pp. 389–395.
W.S. Krasker, R.E. Welsch, Efficient bounded influence regression estimation, J. Amer. Statist. Assoc., 77 (1982), pp. 595–604.
A. Lax, Robust estimators of scale: finite-sample performance in long-tailed symmetric distributions, J. Amer. Statist. Assoc., 80 (1985), pp. 736–741.
S. Pearson, C. Chandra Sekar, The efficiency of statistical tools and a criterion for rejection of outlying observations, Biometrika 28 (1936), pp. 308–319.
F. J. Rohlf, Generalization of the gap test for the detection of multivariate outliers, Biometrics 31 (1975), pp. 93–101.
B. Rosner, On the detection of many outliers, Technom. 17 (1975), pp. 221–227.
B. Rosner, Percentage points for a generalized ESD many-outlier procedure, Technom. 25 (1983), pp. 165–172.
P.J. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc., 79 (1984), pp. 871–880.
P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons: New York (1987).
P.J. Rousseeuw, B.C. Van Zomeren, Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc. 85 (1990) (to appear).
S.J. Schwager, B. Margolin, Detection of multivariate normal outliers, Ann. Statist., 10 (1982), pp. 943–954.
J.S. Simonoff, A comparison of robust methods and detection of outliers techniques when estimating a location parameter, Commun. Statist. - Theor. Meth., 13 (1984), pp. 813–842.
J. S. Simonoff, The calculation of outlier detection statistics, Commun. Statist.-Simul. Cornput., 13 (1984), pp. 275–285.
J.S. Simonoff, The breakdown and influence properties of outlier rejection-plus-mean procedures, Commun. Statist. - Theor. Meth., 16 (1987), pp. 1749–1760.
J.S. Simonoff, Outlier detection and robust estimation of scale, J. Statist. Comput. Simul., 27 (1987), pp. 79–92.
J.S. Simonoff, Detecting outlying cells in two-way contingency tables via backwards-stepping, Technom., 30 (1988), pp. 339–345.
S. weisberg, Applied Linear Regression, 2nd ed., John Wiley and Sons: New York (1985).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Simonoff, J.S. (1991). General Approaches to Stepwise Identification of Unusual Values in Data Analysis. In: Directions in Robust Statistics and Diagnostics. The IMA Volumes in Mathematics and its Applications, vol 34. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4444-8_13
Download citation
DOI: https://doi.org/10.1007/978-1-4612-4444-8_13
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-8772-8
Online ISBN: 978-1-4612-4444-8
eBook Packages: Springer Book Archive