Abstract
A large number of statistics are used in the literature to detect outliers and influential observations in the linear regression model. In this paper comparison studies have been made for determining a statistic which performs better than the other. This includes: (i) a detailed simulation study, and (ii) analyses of several data sets studied by different authors. Different choices of the design matrix of regression model are considered. Design A studies the performance of the various statistics for detecting the scale shift type outliers, and designs B and C provide information on the performance of the statistics for identifying the influential observations. We have used cutoff points using the exact distributions and Bonferroni's inequality for each statistic. The results show that the studentized residual which is used for detection of mean shift outliers is appropriate for detection of scale shift outliers also, and the Welsch's statistic and the Cook's distance are appropriate for detection of influential observations.
Similar content being viewed by others
References
Aitchinson, T. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge University Press.
Aitchinson, T. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge University Press.
Atkinson, A. C. (1981). Two graphical displays for outlying and influential observations in regression. Biometrika, 68: 13–20.
Atkinson, A. C. (1985). Plots, Transformation, and Regression. University Press, Oxford.
Balasooriya, U. and Tse, Y. K. (1986). Outlier detection in linear models: A comparative study in simple linear regression. Communications in Statistics: Theory and Methods, 15(12): 3589–3597.
Balasooriya, U., Tse, Y. K. and Liew, Y. S. (1987). An empirical comparison of some statistics for identifying outliers and influential observations in linear regression models. Journal of Applied Statistics, 14: 177–184.
Beckman, R. J. and Cook, R. D. (1983). Outlier …s. Technometrics, 25: 119–149.
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics: Identifying influential data and sources of collinearity. John Wiley, New York.
Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. 2nd edn. John Wiley, New York.
Chatterjee, S. and Hadi, A. S. (1986). Influential observations, high leverage points, and outlier in linear regression. Statistical Science, 1: 379–416.
Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19: 15–18.
Cook, R. D. and Weisberg, S. (1980). Characterization of an empirical influence function for detecting influential cases in regression. Technometrics, 22: 495–508.
Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression. Chapman and Hall, London.
Cook, R. D., Holschuh, N. and Weisberg, S. (1982). A note on an alternative outlier model. Journal of Royal Statist. Soc. B, 44: 370–376.
Ellenberg, J. H. (1973). The joint distribution of the standardized least squares residuals from a general linear regression. Journal of the Amer. Statist. Assoc., 68: 941–943.
Gibbons, D. G. (1981). A simulation study of some ridge estimators. Journal of the Amer. Statist. Assoc., 76: 131–139.
Hoaglin, D. C. and Kempthorne, P. J. (1986). Comment on Chatterjee and Hadi's paper. Statistical Science, 1: 408–412.
Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32: 117–122.
Hossain, A. (1989). Detection of outliers and influential observations in regression models. Unpublished Dissertation, Old Dominion University.
Hossain, A. and Naik, D. N. (1989). Detection of influential observations in multivariate regression. Journal of Applied Statistics, 16: 25–37.
Mickey, M. R., Dunn, O. J. and Clark, V. (1967). Note on the use of stepwise regression in detecting outliers. Computers and Biomedical Research, 1: 105–111.
Moore, J. (1975). Total biochemical oxygen demand of dairy manures. Ph.D thesis, University of Minnesota.
Naik, D. N. (1989). Detection of outliers in the multivariate linear regression model. Communications in Statistics: Theory and Methods, 16(6): 2225–2232.
Srikantan, K. S. (1961). Testing for the single outlier in a regression model. Sankhya, Series A, 23: 251–260.
Weisberg, S. (1980). Applied linear regression. John Wiley, New York.
Welsch, R. E., Kuh, E. (2977). Linear regression diagnostics. Massachusetts Institute of Technology. Technical report 923–77.
Welsch, R. E. (1982). Influence function and regression diagnostics. In Modern data analysis. R. L. Launer and A. F. Siegel, Eds. Academic, New York.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hossain, A., Naik, D.N. A comparative study on detection of influential observations in linear regression. Statistical Papers 32, 55–69 (1991). https://doi.org/10.1007/BF02925479
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02925479