A procedure for outlier identification in data sets from continuous distributions

Balakrishnan, N.; Quiroz, A. J.

doi:10.1007/BF02603008

A procedure for outlier identification in data sets from continuous distributions

Published: June 2004

Volume 13, pages 247–262, (2004)
Cite this article

Test Aims and scope Submit manuscript

N. Balakrishnan¹ &
A. J. Quiroz²

83 Accesses
1 Citation
Explore all metrics

Abstract

We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörgő (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Arnold, B. C. andBalakrishnan, N. (1989).Relations, Bounds and Approximations for Order Statistics, vol. 53 ofLecture Notes in Statistics. Springer-Verlag, New York.
MATH Google Scholar
Balakrishnan, N. andCutler, C. D. (1996). Maximum likelihood estimation of the Laplace parameters based on Type-II censored samples. In H. N. Nagaraja, P. K. Sen, and D. F. Morrison, eds.,Statistical Theory and Applications: Papers in Honor of Herbert A. David, pp. 145–151. Springer-Verlag, New York.
Google Scholar
Barnett, V. andLewis, T. (1993).Outliers in Statistical Data, John Wiley & Sons, New York, 3rd ed.
Google Scholar
Billor, N., Hadi, A. S., andVelleman, P. F. (2000). BACON: Blocked adaptive computationally efficient outlier nominators.Computational Statistics and Data Analysis, 34:279–298.
Article MATH Google Scholar
Csörgő, S. (1990). A probabilistic approach to domains of partial attraction.Advances in Applied Mathematics, 11:282–327.
Article MathSciNet Google Scholar
Csörgő, S. andDodunekova, R. (1991). Limit theorems for the Petersburg game. In M. G. Hahn, D. M. Mason, and D. C. Wiener, eds.,Sums, Trimmed Sums and Extremes, pp. 285–315. Birkhäuser, Boston.
Google Scholar
David, H. A. (1981).Order Statistics. John Wiley & Sons, New York, 2nd ed.
MATH Google Scholar
Davies, L. andGather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association, 88:782–792.
Article MATH MathSciNet Google Scholar
Fang, K. T., Kotz, S., andNg, K. W. (1990).Elliptically Symmetric Multivariate and Related Distributions, vol. 36 ofMonographs on Statistics and Applied Probability. Chapman and Hall, London.
Google Scholar
Gnanadesikan, R. andKettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data.Biometrics, 28:81–124.
Article Google Scholar
Huber, P. (1981).Robust Statistics. John Wiley & Sons, New York.
Book MATH Google Scholar
Ihaka, R. andGentleman, R. (1996). R: A language for data analysis and graphics.Journal of Computational and Graphical Statistics, 5(3):299–314.
Article Google Scholar
Martin-Löf, A. (1985). A limit theorem which clarifies the ‘Petersburg paradox’.Journal of Applied Probability, 22:634–643.
Article MATH MathSciNet Google Scholar
Rousseeuw, P. J. andVan Driessen, K. A. (1999). Fast algorithm for the minimum covariance determinant estimator.Technometrics, 41:212–223.
Article Google Scholar
Rousseeuw, P. J. andVan Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85:633–639.
Article Google Scholar
Shafer, G. (1988). The St. Petersburg paradox. In S. Kotz, N. L. Johnson, and C. B. Read, eds.,Encyclopedia of Statistical Sciences, vol. 8, pp. 865–870. Wiley, New York.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, McMaster University, Hamilton, Canada
N. Balakrishnan
Departmento de Cómputo Científico y Estadística, Universidad Simón Bolívar, Spain
A. J. Quiroz

Authors

N. Balakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
A. J. Quiroz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Balakrishnan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balakrishnan, N., Quiroz, A.J. A procedure for outlier identification in data sets from continuous distributions. Test 13, 247–262 (2004). https://doi.org/10.1007/BF02603008

Download citation

Received: 15 August 2002
Accepted: 15 April 2003
Issue Date: June 2004
DOI: https://doi.org/10.1007/BF02603008

Key Words

AMS subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A procedure for outlier identification in data sets from continuous distributions

Abstract

Access this article

Similar content being viewed by others

Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter

Empirical likelihood for outlier detection in regression models

A new multiple outliers identification method in linear regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

AMS subject classification

Navigation

A procedure for outlier identification in data sets from continuous distributions

Abstract

Access this article

Similar content being viewed by others

Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter

Empirical likelihood for outlier detection in regression models

A new multiple outliers identification method in linear regression

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

AMS subject classification

Search

Navigation