Skip to main content
Log in

Noncentralities Induced in Regression Diagnostics

  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

Anomalies persist in the use of deletion diagnostics in regression. Tests for outliers under subset deletions utilize the R-Fisher FI statistics, each having a noncentral F-distribution with noncentrality parameter λ as a function of shifts only at deleted rows in the index set I. Numerous studies examine empirical outcomes of these diagnostics in random experiments. In contrast, studies here are probabilistic, examining distributions behind those empirical outcomes and tracking the effects of shifts at nondeleted rows. By allowing shifts at nondeleted rows in a set J, in addition to traditional shifts at deleted rows in I, FI is shown to have a doubly noncentral F-distribution. By removing the unnecessary restriction that shifts occur only at deleted rows, these findings support constructs akin to power curves in tracking probabilities of masking or swamping as shifts evolve. In addition, “regression effects” among outliers may have unforeseen consequences. A dichotomy of shifts is discovered as projections into the “regressor” and “error” spaces of a model. Hidden shifts at nondeleted rows can obfuscate not only meanings ascribed to traditional outlier diagnostics, but also to subset influence diagnostics corresponding one-to-one with FI. In short, despite wide usage abetted by software support, deletion diagnostics in current vogue no longer can be recommended to achieve objectives traditionally cited. Case studies illustrate the debilitating effects of these anomalies in practice, together with conclusions misleading to prospective users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrews, D. F., and T. Pregibon. 1978. Finding outliers that matter. J. R. Stat. Soc. B, 40, 85–93.

    MATH  Google Scholar 

  • Atkinson, A. C. 1985. Plots, transformations, and regression. Oxford, U.K.: Oxford University Press.

    MATH  Google Scholar 

  • Barnett, V., and T. Lewis. 1984. Outliers in statistical data, 2nd ed. New York, NY: Wiley.

    MATH  Google Scholar 

  • Beckman, R. J., and H. J. Trussell. 1974. The distribution of an arbitrary Studentized residual and the effects of updating in multiple regression. J. Am. Stat. Assoc. 69, 199–201.

    Article  MathSciNet  Google Scholar 

  • Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.

    Book  Google Scholar 

  • Box, G. E. P., and K. B. Wilson. 1951. On the experimental attainment of optimum conditions. J. R. Stat. Soc. B, 13, 1–45.

    MathSciNet  MATH  Google Scholar 

  • Bulgren, W. 1971. On representations of the doubly non-central F distribution. J. Am. Stat. Assoc., 66, 184–186.

    MATH  Google Scholar 

  • Chatterjee, S., and A. S. Hadi. 1986. Influential observations, high leverage points, and outliers in linear regression. Stat. Sci., 1, 379–393.

    Article  MathSciNet  Google Scholar 

  • Chatterjee, S., and A. S. Hadi. 1988. Sensitivity analysis in linear regression. New York, NY: Wiley.

    Book  Google Scholar 

  • Cook, R. D. 1977. Detection of influential observations in linear regression. Technometrics, 19, 15–18.

    MathSciNet  MATH  Google Scholar 

  • Cook, R. D. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 393–397.

    Article  Google Scholar 

  • Cook, R. D., and S. Weisberg. 1982. Residuals and influence in regression. London, UK: Chapman and Hall.

    MATH  Google Scholar 

  • Draper, N. R., J. A. John. 1981. Influential observations and outliers in regression. Technometrics, 23, 21–26.

    Article  MathSciNet  Google Scholar 

  • Ennis, D., and N. Johnson. 1993. Noncentral and central chi-square, F and beta distribution functions as special cases of the distribution function of an indefinite quadratic form. Commun. Stat. Theory Methods, 22, 897–905.

    Article  MathSciNet  Google Scholar 

  • Fox, J. 1991. Regression diagnostics. Newbury Park, CA: Sage.

    Book  Google Scholar 

  • Gentleman, J. F., and W. B. Wilk. 1975. Detecting outliers. II. Supplementing the direct analysis of residuals. Biometrics, 31, 387–410.

    Article  Google Scholar 

  • Ghosh, S. 1978. On robustness of designs against incomplete data. Sankhyā Ser. B, 40, 204–208.

    MathSciNet  MATH  Google Scholar 

  • Das Gupta, S., and M. D. Perlman. 1974. Power of the noncentral F-test: Effect of additional variates on Hotelling’s T2-test. J. Am. Stat. Assoc., 69, 174–180.

    MATH  Google Scholar 

  • Hoaglin, D. C., and P. J. Kempthorne. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 408–412.

    Article  Google Scholar 

  • Imhof, J. 1961. Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419–426.

    Article  MathSciNet  Google Scholar 

  • Jensen, D. R. 2000. The use of Studentized diagnostics in regression. Metrika, 52, 213–223.

    Article  MathSciNet  Google Scholar 

  • Jensen, D. R. 2001. Properties of selected subset diagnostics in regression. Stat. Prob. Lett., 51, 377–388.

    Article  MathSciNet  Google Scholar 

  • Jensen, D. R., and D. E. Ramirez. 1996. Computing the CDF of Cook’s DI statistic. In Proceedings of the 12th Symposium in Computational Statistics ed. A. Prat, and E. Ripoll, 65–66. Barcelona, Spain: Institut d’Estadistica de Catalunya.

    Google Scholar 

  • Johnson, N. L., and S. Kotz. 1970. Distributions in statistics: Continuous univariate distributions—2. Boston, MA: Houghton Mifflin.

    MATH  Google Scholar 

  • LaMotte, L. R. 1999. Collapsibility hypotheses and diagnostic bounds in regression analysis. Metrika, 50, 109–119.

    Article  MathSciNet  Google Scholar 

  • Mahalanobis, P. C. 1936. On the generalized distance in statistics. Proc. Nat. Inst. Sci. India, 12, 49–55.

    MATH  Google Scholar 

  • Myers, R. H. 1990. Classical and modern regression with applications, 2nd ed. Boston, MA: PWS-KENT.

    Google Scholar 

  • Rousseeuw, P. J., and A.M. Leroy. 1987. Robust regression and outlier detection. New York, NY: Wiley.

    Book  Google Scholar 

  • Snedecor, G. W., and W. G. Cochran. 1968. Statistical methods, 6th ed. Ames, IA: Iowa State University Press.

    MATH  Google Scholar 

  • Welsch, R. E. 1982. Influence functions and regression diagnostics. In Modern data analysis, ed. R. L. Launer and A. F. Siegel, 149–169. New York, NY: Academic Press.

    Chapter  Google Scholar 

  • Welsch, R. E., and E. Kuh. 1977. Linear regression diagnostics. Technical Report 923–77, Cambridge, MA: Sloan School of Management, Massachusetts Institute of Technology.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. R. Jensen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jensen, D.R., Ramirez, D.E. Noncentralities Induced in Regression Diagnostics. J Stat Theory Pract 8, 141–165 (2014). https://doi.org/10.1080/15598608.2014.847758

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1080/15598608.2014.847758

AMS Subject Classification

Keywords

Navigation