Noncentralities Induced in Regression Diagnostics

Jensen, D. R.; Ramirez, D. E.

doi:10.1080/15598608.2014.847758

Noncentralities Induced in Regression Diagnostics

Published: 01 June 2014

Volume 8, pages 141–165, (2014)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

D. R. Jensen¹ &
D. E. Ramirez²

5 Accesses
2 Citations
Explore all metrics

Abstract

Anomalies persist in the use of deletion diagnostics in regression. Tests for outliers under subset deletions utilize the R-Fisher F_I statistics, each having a noncentral F-distribution with noncentrality parameter λ as a function of shifts only at deleted rows in the index set I. Numerous studies examine empirical outcomes of these diagnostics in random experiments. In contrast, studies here are probabilistic, examining distributions behind those empirical outcomes and tracking the effects of shifts at nondeleted rows. By allowing shifts at nondeleted rows in a set J, in addition to traditional shifts at deleted rows in I, F_I is shown to have a doubly noncentral F-distribution. By removing the unnecessary restriction that shifts occur only at deleted rows, these findings support constructs akin to power curves in tracking probabilities of masking or swamping as shifts evolve. In addition, “regression effects” among outliers may have unforeseen consequences. A dichotomy of shifts is discovered as projections into the “regressor” and “error” spaces of a model. Hidden shifts at nondeleted rows can obfuscate not only meanings ascribed to traditional outlier diagnostics, but also to subset influence diagnostics corresponding one-to-one with F_I. In short, despite wide usage abetted by software support, deletion diagnostics in current vogue no longer can be recommended to achieve objectives traditionally cited. Case studies illustrate the debilitating effects of these anomalies in practice, together with conclusions misleading to prospective users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seeking outlying subsets under star-contoured errors

Article 01 December 2018

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Article Open access 03 March 2023

Simple powerful robust tests based on sign depth

Article Open access 30 July 2022

References

Andrews, D. F., and T. Pregibon. 1978. Finding outliers that matter. J. R. Stat. Soc. B, 40, 85–93.
MATH Google Scholar
Atkinson, A. C. 1985. Plots, transformations, and regression. Oxford, U.K.: Oxford University Press.
MATH Google Scholar
Barnett, V., and T. Lewis. 1984. Outliers in statistical data, 2nd ed. New York, NY: Wiley.
MATH Google Scholar
Beckman, R. J., and H. J. Trussell. 1974. The distribution of an arbitrary Studentized residual and the effects of updating in multiple regression. J. Am. Stat. Assoc. 69, 199–201.
Article MathSciNet Google Scholar
Belsley, D. A., E. Kuh, and R. E. Welsch. 1980. Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.
Book Google Scholar
Box, G. E. P., and K. B. Wilson. 1951. On the experimental attainment of optimum conditions. J. R. Stat. Soc. B, 13, 1–45.
MathSciNet MATH Google Scholar
Bulgren, W. 1971. On representations of the doubly non-central F distribution. J. Am. Stat. Assoc., 66, 184–186.
MATH Google Scholar
Chatterjee, S., and A. S. Hadi. 1986. Influential observations, high leverage points, and outliers in linear regression. Stat. Sci., 1, 379–393.
Article MathSciNet Google Scholar
Chatterjee, S., and A. S. Hadi. 1988. Sensitivity analysis in linear regression. New York, NY: Wiley.
Book Google Scholar
Cook, R. D. 1977. Detection of influential observations in linear regression. Technometrics, 19, 15–18.
MathSciNet MATH Google Scholar
Cook, R. D. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 393–397.
Article Google Scholar
Cook, R. D., and S. Weisberg. 1982. Residuals and influence in regression. London, UK: Chapman and Hall.
MATH Google Scholar
Draper, N. R., J. A. John. 1981. Influential observations and outliers in regression. Technometrics, 23, 21–26.
Article MathSciNet Google Scholar
Ennis, D., and N. Johnson. 1993. Noncentral and central chi-square, F and beta distribution functions as special cases of the distribution function of an indefinite quadratic form. Commun. Stat. Theory Methods, 22, 897–905.
Article MathSciNet Google Scholar
Fox, J. 1991. Regression diagnostics. Newbury Park, CA: Sage.
Book Google Scholar
Gentleman, J. F., and W. B. Wilk. 1975. Detecting outliers. II. Supplementing the direct analysis of residuals. Biometrics, 31, 387–410.
Article Google Scholar
Ghosh, S. 1978. On robustness of designs against incomplete data. Sankhyā Ser. B, 40, 204–208.
MathSciNet MATH Google Scholar
Das Gupta, S., and M. D. Perlman. 1974. Power of the noncentral F-test: Effect of additional variates on Hotelling’s T²-test. J. Am. Stat. Assoc., 69, 174–180.
MATH Google Scholar
Hoaglin, D. C., and P. J. Kempthorne. 1986. [Influential observations, high leverage points, and outliers in linear regression]: Comment. Stat. Sci., 1, 408–412.
Article Google Scholar
Imhof, J. 1961. Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419–426.
Article MathSciNet Google Scholar
Jensen, D. R. 2000. The use of Studentized diagnostics in regression. Metrika, 52, 213–223.
Article MathSciNet Google Scholar
Jensen, D. R. 2001. Properties of selected subset diagnostics in regression. Stat. Prob. Lett., 51, 377–388.
Article MathSciNet Google Scholar
Jensen, D. R., and D. E. Ramirez. 1996. Computing the CDF of Cook’s D_I statistic. In Proceedings of the 12th Symposium in Computational Statistics ed. A. Prat, and E. Ripoll, 65–66. Barcelona, Spain: Institut d’Estadistica de Catalunya.
Google Scholar
Johnson, N. L., and S. Kotz. 1970. Distributions in statistics: Continuous univariate distributions—2. Boston, MA: Houghton Mifflin.
MATH Google Scholar
LaMotte, L. R. 1999. Collapsibility hypotheses and diagnostic bounds in regression analysis. Metrika, 50, 109–119.
Article MathSciNet Google Scholar
Mahalanobis, P. C. 1936. On the generalized distance in statistics. Proc. Nat. Inst. Sci. India, 12, 49–55.
MATH Google Scholar
Myers, R. H. 1990. Classical and modern regression with applications, 2nd ed. Boston, MA: PWS-KENT.
Google Scholar
Rousseeuw, P. J., and A.M. Leroy. 1987. Robust regression and outlier detection. New York, NY: Wiley.
Book Google Scholar
Snedecor, G. W., and W. G. Cochran. 1968. Statistical methods, 6th ed. Ames, IA: Iowa State University Press.
MATH Google Scholar
Welsch, R. E. 1982. Influence functions and regression diagnostics. In Modern data analysis, ed. R. L. Launer and A. F. Siegel, 149–169. New York, NY: Academic Press.
Chapter Google Scholar
Welsch, R. E., and E. Kuh. 1977. Linear regression diagnostics. Technical Report 923–77, Cambridge, MA: Sloan School of Management, Massachusetts Institute of Technology.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Virginia Tech, 406-A Hutcheson Hall, Blacksburg, Virginia, 24061-0439, USA
D. R. Jensen
Department of Mathematics, University of Virginia, Charlottesville, Virginia, USA
D. E. Ramirez

Authors

D. R. Jensen
View author publications
You can also search for this author in PubMed Google Scholar
D. E. Ramirez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. R. Jensen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jensen, D.R., Ramirez, D.E. Noncentralities Induced in Regression Diagnostics. J Stat Theory Pract 8, 141–165 (2014). https://doi.org/10.1080/15598608.2014.847758

Download citation

Received: 10 May 2012
Accepted: 05 April 2013
Published: 01 June 2014
Issue Date: June 2014
DOI: https://doi.org/10.1080/15598608.2014.847758

AMS Subject Classification

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noncentralities Induced in Regression Diagnostics

Abstract

Access this article

Similar content being viewed by others

Seeking outlying subsets under star-contoured errors

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Simple powerful robust tests based on sign depth

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS Subject Classification

Keywords

Navigation

Noncentralities Induced in Regression Diagnostics

Abstract

Access this article

Similar content being viewed by others

Seeking outlying subsets under star-contoured errors

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Simple powerful robust tests based on sign depth

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS Subject Classification

Keywords

Search

Navigation