Test

, Volume 13, Issue 1, pp 247–262 | Cite as

A procedure for outlier identification in data sets from continuous distributions

Article
  • 48 Downloads

Abstract

We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörgő (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.

Key Words

Outlier identification St. Petersburg paradox continuous distributions 

AMS subject classification

62H99 62G35 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arnold, B. C. andBalakrishnan, N. (1989).Relations, Bounds and Approximations for Order Statistics, vol. 53 ofLecture Notes in Statistics. Springer-Verlag, New York.MATHGoogle Scholar
  2. Balakrishnan, N. andCutler, C. D. (1996). Maximum likelihood estimation of the Laplace parameters based on Type-II censored samples. In H. N. Nagaraja, P. K. Sen, and D. F. Morrison, eds.,Statistical Theory and Applications: Papers in Honor of Herbert A. David, pp. 145–151. Springer-Verlag, New York.Google Scholar
  3. Barnett, V. andLewis, T. (1993).Outliers in Statistical Data, John Wiley & Sons, New York, 3rd ed.Google Scholar
  4. Billor, N., Hadi, A. S., andVelleman, P. F. (2000). BACON: Blocked adaptive computationally efficient outlier nominators.Computational Statistics and Data Analysis, 34:279–298.MATHCrossRefGoogle Scholar
  5. Csörgő, S. (1990). A probabilistic approach to domains of partial attraction.Advances in Applied Mathematics, 11:282–327.CrossRefMathSciNetGoogle Scholar
  6. Csörgő, S. andDodunekova, R. (1991). Limit theorems for the Petersburg game. In M. G. Hahn, D. M. Mason, and D. C. Wiener, eds.,Sums, Trimmed Sums and Extremes, pp. 285–315. Birkhäuser, Boston.Google Scholar
  7. David, H. A. (1981).Order Statistics. John Wiley & Sons, New York, 2nd ed.MATHGoogle Scholar
  8. Davies, L. andGather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association, 88:782–792.MATHCrossRefMathSciNetGoogle Scholar
  9. Fang, K. T., Kotz, S., andNg, K. W. (1990).Elliptically Symmetric Multivariate and Related Distributions, vol. 36 ofMonographs on Statistics and Applied Probability. Chapman and Hall, London.Google Scholar
  10. Gnanadesikan, R. andKettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data.Biometrics, 28:81–124.CrossRefGoogle Scholar
  11. Huber, P. (1981).Robust Statistics. John Wiley & Sons, New York.MATHCrossRefGoogle Scholar
  12. Ihaka, R. andGentleman, R. (1996). R: A language for data analysis and graphics.Journal of Computational and Graphical Statistics, 5(3):299–314.CrossRefGoogle Scholar
  13. Martin-Löf, A. (1985). A limit theorem which clarifies the ‘Petersburg paradox’.Journal of Applied Probability, 22:634–643.MATHCrossRefMathSciNetGoogle Scholar
  14. Rousseeuw, P. J. andVan Driessen, K. A. (1999). Fast algorithm for the minimum covariance determinant estimator.Technometrics, 41:212–223.CrossRefGoogle Scholar
  15. Rousseeuw, P. J. andVan Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85:633–639.CrossRefGoogle Scholar
  16. Shafer, G. (1988). The St. Petersburg paradox. In S. Kotz, N. L. Johnson, and C. B. Read, eds.,Encyclopedia of Statistical Sciences, vol. 8, pp. 865–870. Wiley, New York.Google Scholar

Copyright information

© Sociedad Española de Estadística e Investigación Operativa 2004

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsMcMaster UniversityHamiltonCanada
  2. 2.Departmento de Cómputo Científico y EstadísticaUniversidad Simón BolívarSpain

Personalised recommendations