Skip to main content
Log in

A procedure for outlier identification in data sets from continuous distributions

  • Published:
Test Aims and scope Submit manuscript

Abstract

We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörgő (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arnold, B. C. andBalakrishnan, N. (1989).Relations, Bounds and Approximations for Order Statistics, vol. 53 ofLecture Notes in Statistics. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Balakrishnan, N. andCutler, C. D. (1996). Maximum likelihood estimation of the Laplace parameters based on Type-II censored samples. In H. N. Nagaraja, P. K. Sen, and D. F. Morrison, eds.,Statistical Theory and Applications: Papers in Honor of Herbert A. David, pp. 145–151. Springer-Verlag, New York.

    Google Scholar 

  • Barnett, V. andLewis, T. (1993).Outliers in Statistical Data, John Wiley & Sons, New York, 3rd ed.

    Google Scholar 

  • Billor, N., Hadi, A. S., andVelleman, P. F. (2000). BACON: Blocked adaptive computationally efficient outlier nominators.Computational Statistics and Data Analysis, 34:279–298.

    Article  MATH  Google Scholar 

  • Csörgő, S. (1990). A probabilistic approach to domains of partial attraction.Advances in Applied Mathematics, 11:282–327.

    Article  MathSciNet  Google Scholar 

  • Csörgő, S. andDodunekova, R. (1991). Limit theorems for the Petersburg game. In M. G. Hahn, D. M. Mason, and D. C. Wiener, eds.,Sums, Trimmed Sums and Extremes, pp. 285–315. Birkhäuser, Boston.

    Google Scholar 

  • David, H. A. (1981).Order Statistics. John Wiley & Sons, New York, 2nd ed.

    MATH  Google Scholar 

  • Davies, L. andGather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association, 88:782–792.

    Article  MATH  MathSciNet  Google Scholar 

  • Fang, K. T., Kotz, S., andNg, K. W. (1990).Elliptically Symmetric Multivariate and Related Distributions, vol. 36 ofMonographs on Statistics and Applied Probability. Chapman and Hall, London.

    Google Scholar 

  • Gnanadesikan, R. andKettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data.Biometrics, 28:81–124.

    Article  Google Scholar 

  • Huber, P. (1981).Robust Statistics. John Wiley & Sons, New York.

    Book  MATH  Google Scholar 

  • Ihaka, R. andGentleman, R. (1996). R: A language for data analysis and graphics.Journal of Computational and Graphical Statistics, 5(3):299–314.

    Article  Google Scholar 

  • Martin-Löf, A. (1985). A limit theorem which clarifies the ‘Petersburg paradox’.Journal of Applied Probability, 22:634–643.

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw, P. J. andVan Driessen, K. A. (1999). Fast algorithm for the minimum covariance determinant estimator.Technometrics, 41:212–223.

    Article  Google Scholar 

  • Rousseeuw, P. J. andVan Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85:633–639.

    Article  Google Scholar 

  • Shafer, G. (1988). The St. Petersburg paradox. In S. Kotz, N. L. Johnson, and C. B. Read, eds.,Encyclopedia of Statistical Sciences, vol. 8, pp. 865–870. Wiley, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Balakrishnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balakrishnan, N., Quiroz, A.J. A procedure for outlier identification in data sets from continuous distributions. Test 13, 247–262 (2004). https://doi.org/10.1007/BF02603008

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02603008

Key Words

AMS subject classification

Navigation