Abstract
We propose a procedure, based on sums of reciprocals ofp-values, for the identification of outliers in univariate or multivariate data sets coming from continuous distributions. Using results of Csörgő (1990), we find the limiting distribution of the relevant statistic for completely specified models. By simulations, we obtain approximate quantiles for the asymptotic distribution, (which does not depend on the specific model or the dimension where the data live) and for the finite sample distribution in different dimensions of our statistic when parameters are estimated, for the multivariate Gaussian model and a multivariate double exponential model with independent coordinates. Monte Carlo evaluation shows that the procedure proposed is effective in the identification of outliers, and that it is sensitive to sample size, a feature seldom found in outlier identification methods.
Similar content being viewed by others
References
Arnold, B. C. andBalakrishnan, N. (1989).Relations, Bounds and Approximations for Order Statistics, vol. 53 ofLecture Notes in Statistics. Springer-Verlag, New York.
Balakrishnan, N. andCutler, C. D. (1996). Maximum likelihood estimation of the Laplace parameters based on Type-II censored samples. In H. N. Nagaraja, P. K. Sen, and D. F. Morrison, eds.,Statistical Theory and Applications: Papers in Honor of Herbert A. David, pp. 145–151. Springer-Verlag, New York.
Barnett, V. andLewis, T. (1993).Outliers in Statistical Data, John Wiley & Sons, New York, 3rd ed.
Billor, N., Hadi, A. S., andVelleman, P. F. (2000). BACON: Blocked adaptive computationally efficient outlier nominators.Computational Statistics and Data Analysis, 34:279–298.
Csörgő, S. (1990). A probabilistic approach to domains of partial attraction.Advances in Applied Mathematics, 11:282–327.
Csörgő, S. andDodunekova, R. (1991). Limit theorems for the Petersburg game. In M. G. Hahn, D. M. Mason, and D. C. Wiener, eds.,Sums, Trimmed Sums and Extremes, pp. 285–315. Birkhäuser, Boston.
David, H. A. (1981).Order Statistics. John Wiley & Sons, New York, 2nd ed.
Davies, L. andGather, U. (1993). The identification of multiple outliers.Journal of the American Statistical Association, 88:782–792.
Fang, K. T., Kotz, S., andNg, K. W. (1990).Elliptically Symmetric Multivariate and Related Distributions, vol. 36 ofMonographs on Statistics and Applied Probability. Chapman and Hall, London.
Gnanadesikan, R. andKettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data.Biometrics, 28:81–124.
Huber, P. (1981).Robust Statistics. John Wiley & Sons, New York.
Ihaka, R. andGentleman, R. (1996). R: A language for data analysis and graphics.Journal of Computational and Graphical Statistics, 5(3):299–314.
Martin-Löf, A. (1985). A limit theorem which clarifies the ‘Petersburg paradox’.Journal of Applied Probability, 22:634–643.
Rousseeuw, P. J. andVan Driessen, K. A. (1999). Fast algorithm for the minimum covariance determinant estimator.Technometrics, 41:212–223.
Rousseeuw, P. J. andVan Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85:633–639.
Shafer, G. (1988). The St. Petersburg paradox. In S. Kotz, N. L. Johnson, and C. B. Read, eds.,Encyclopedia of Statistical Sciences, vol. 8, pp. 865–870. Wiley, New York.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Balakrishnan, N., Quiroz, A.J. A procedure for outlier identification in data sets from continuous distributions. Test 13, 247–262 (2004). https://doi.org/10.1007/BF02603008
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02603008