Skip to main content

General Approaches to Stepwise Identification of Unusual Values in Data Analysis

  • Conference paper
Directions in Robust Statistics and Diagnostics

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 34))

Abstract

One of the general goals in data analysis is the identification of unusual values. This can be done indirectly (after performing a robust analysis) or directly (via some detection procedure). This paper summarizes the backwards-stepping approach to the detection of unusual values in a data set. This approach has the advantages of simplicity of application, flexibility, and resistance to masking effects. Application to univariate, multivariate, and regression data, as well as other problems, is discussed. Simulations are used to investigate the properties of this strategy for data analysis. It is shown that identification of unusual values using appropriate detection procedures can be considerably more effective than indirect detection using a robust analysis.

I would like to thank Douglas M. Hawkins and Peter J. Rousseeuw for helpful discussion of this material.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Abbreviations

AMS(MOS) subject classifications:

62F35

References

  1. M.R. Anderberg, Clustering Analysis for Applications, Academic Press, New York (1973).

    Google Scholar 

  2. D.F. Andrews, P.J. Bickel, F.R. Hampel, P.J. Huber, W.H. Rogers, J.W.Tukey, Robust Estimates of Location: Survey and Advances, Princeton University Press: Princeton, N.J. (1972).

    MATH  Google Scholar 

  3. D.F. Andrews, D. Pregibon, Finding the outliers that matter, J. Roy. Statist. Soc. Ser. B, 40 (1978), pp. 85–93.

    MATH  Google Scholar 

  4. F.J. Anscombe, Rejection of outliers, Technom., 2 (1960), pp. 127–147.

    Google Scholar 

  5. V. Barnett, T. Lewis, Outliers in Statistical Data, 2nd. ed., John Wiley, Sons: Chichester (1984).

    Google Scholar 

  6. R.A. Becker, J.M. Chambers, S: An Interactive Environment for Data Analysis and Graphics, Wadsworth: Belmont, CA. (1984).

    Google Scholar 

  7. D.A. Belsley, E. Kuh, R.E. Welsch, Regression Diagnostics, John Wiley and Sons: New York (1980).

    Book  MATH  Google Scholar 

  8. R.K. Blashfield, Mixture model tests of cluster analysis: accuracy of four agglomerative hierarchical methods, Psychol. Bull., 83 (1976), pp. 377–388.

    Google Scholar 

  9. R.D. Cook (1977), Detection of influential observations in linear regression Technom. 19 , pp. 15–18.

    Google Scholar 

  10. R.D. Cook, S. Weisberg, Residuals and Influence in Regression, Chapman and Hall: New York (1982).

    MATH  Google Scholar 

  11. Crain’s New York Business, Top business lists, Vol. III, No. 52 (1987).

    Google Scholar 

  12. W.J. Dixon, ed., BMDP Statistical Software, Univ. of California Press: Berkeley (1983).

    Google Scholar 

  13. B. Efron, Bootstrap methods: another look at the jackknife, Ann. Statist. 7 (1979), pp. 1–26.

    Article  MathSciNet  MATH  Google Scholar 

  14. R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations, John Wiley and Sons, New York (1977).

    MATH  Google Scholar 

  15. S.B. Gray, R.F. Ling, K-clustering as a detection tool for influential subsets in regression, Technom. 26 (with discussion) (1984), pp. 305–330.

    Google Scholar 

  16. J.A. Hartigan, Consistency of single linkage for high-density clusters, J. Amer. Statist. Assoc. 76 (1981), pp. 388–394.

    Google Scholar 

  17. D.M. Hawkins, Fractiles of an extended multiple outlier test, J. Statist. Comput. Simul. 8, pp. 227–236.

    Google Scholar 

  18. D.M. Hawkins, Identification of Outliers, Chapman and Hall: London (1980).

    MATH  Google Scholar 

  19. D.C. Hoaglin, B. Iglewicz, J.W. Tukey, Performance of some resistant rules for outlier labeling, J. Amer. Statist. Assoc. 81 (1986), pp. 991–999.

    Article  MathSciNet  MATH  Google Scholar 

  20. P.J. HUBER, Robust regression - asymptotics, conjectures and Monte Carlo, Ann. Statist. 1 (1973), pp. 799–821.

    Google Scholar 

  21. P.J. Huber, Robust Statistics, John Wiley and Sons: New York (1981).

    Book  MATH  Google Scholar 

  22. R.B. Jain, Percentage points of many-outlier procedures, Technom. 23 (1981), pp. 71–75.

    Google Scholar 

  23. R.B. Jain, Detecting outliers: power and some other considerations, Commun. in Statist. A10 (1981), pp. 2299–2314.

    Google Scholar 

  24. R.H. Jones, Maximum likelihood fitting of ARMA models to time series with missing observations, Technom. 22 (1980), pp. 389–395.

    Google Scholar 

  25. W.S. Krasker, R.E. Welsch, Efficient bounded influence regression estimation, J. Amer. Statist. Assoc., 77 (1982), pp. 595–604.

    Article  MathSciNet  MATH  Google Scholar 

  26. A. Lax, Robust estimators of scale: finite-sample performance in long-tailed symmetric distributions, J. Amer. Statist. Assoc., 80 (1985), pp. 736–741.

    Google Scholar 

  27. S. Pearson, C. Chandra Sekar, The efficiency of statistical tools and a criterion for rejection of outlying observations, Biometrika 28 (1936), pp. 308–319.

    Google Scholar 

  28. F. J. Rohlf, Generalization of the gap test for the detection of multivariate outliers, Biometrics 31 (1975), pp. 93–101.

    Google Scholar 

  29. B. Rosner, On the detection of many outliers, Technom. 17 (1975), pp. 221–227.

    Article  MathSciNet  MATH  Google Scholar 

  30. B. Rosner, Percentage points for a generalized ESD many-outlier procedure, Technom. 25 (1983), pp. 165–172.

    Article  MATH  Google Scholar 

  31. P.J. Rousseeuw, Least median of squares regression, J. Amer. Statist. Assoc., 79 (1984), pp. 871–880.

    Article  MathSciNet  MATH  Google Scholar 

  32. P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons: New York (1987).

    Book  MATH  Google Scholar 

  33. P.J. Rousseeuw, B.C. Van Zomeren, Unmasking multivariate outliers and leverage points, J. Amer. Statist. Assoc. 85 (1990) (to appear).

    Google Scholar 

  34. S.J. Schwager, B. Margolin, Detection of multivariate normal outliers, Ann. Statist., 10 (1982), pp. 943–954.

    Article  MathSciNet  MATH  Google Scholar 

  35. J.S. Simonoff, A comparison of robust methods and detection of outliers techniques when estimating a location parameter, Commun. Statist. - Theor. Meth., 13 (1984), pp. 813–842.

    Google Scholar 

  36. J. S. Simonoff, The calculation of outlier detection statistics, Commun. Statist.-Simul. Cornput., 13 (1984), pp. 275–285.

    Article  Google Scholar 

  37. J.S. Simonoff, The breakdown and influence properties of outlier rejection-plus-mean procedures, Commun. Statist. - Theor. Meth., 16 (1987), pp. 1749–1760.

    Google Scholar 

  38. J.S. Simonoff, Outlier detection and robust estimation of scale, J. Statist. Comput. Simul., 27 (1987), pp. 79–92.

    Article  MATH  Google Scholar 

  39. J.S. Simonoff, Detecting outlying cells in two-way contingency tables via backwards-stepping, Technom., 30 (1988), pp. 339–345.

    Article  Google Scholar 

  40. S. weisberg, Applied Linear Regression, 2nd ed., John Wiley and Sons: New York (1985).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer-Verlag New York, Inc.

About this paper

Cite this paper

Simonoff, J.S. (1991). General Approaches to Stepwise Identification of Unusual Values in Data Analysis. In: Directions in Robust Statistics and Diagnostics. The IMA Volumes in Mathematics and its Applications, vol 34. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4444-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-4444-8_13

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-8772-8

  • Online ISBN: 978-1-4612-4444-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics