The Guinea Pig of Multiple Regression

  • Yadolah Dodge
Part of the Lecture Notes in Statistics book series (LNS, volume 109)


The role played by a 21×4 data set in the development (progress and regress) of multiple regression is discussed. While this ‘famous’ data set has been (and is being) used as a guinea pig1 for almost every method of estimation introduced in the regression market, it appears that no one has questioned the origin and correctness in the last 30 years. An attempt is made to clarify some points in this regard. In particular, it is argued here that this data set is in fact a subset of a longer data set of which the rest is missing.

Key words and phrases

Data estimation meta-analysis multiple regression outliers robustness 

AMS 1991 subject classifications

Primary 62J05 secondary 62F35 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Aitkin, M. and Wilson, G.T. (1980): Mixture models, outliers, and the EM algorithm. Technometrics 22 325–331.CrossRefMATHGoogle Scholar
  2. [2]
    Andrews, D.F. (1974): A robust method for multiple linear regression. Technometrics 16 523–531.MathSciNetCrossRefMATHGoogle Scholar
  3. [3]
    Andrews, D.F. and Pregibon, D. (1978): Finding the outliers that matter. J. Roy. Statist. Soc. B 40 85–93.MATHGoogle Scholar
  4. [4]
    Antille, G. and El May, H. (1992): The use of slices in the LMS and the method of density slices: Foundation and comparison. In Computational Statistics (Y. Dodge and J. Whittaker, eds.), 441–445. Physica-Verlag.Google Scholar
  5. [5]
    Antodi, J. and Jureckova, J. (1985): Trimmed least squares estimator resistant to leverage points. CSQ 4 329–339.Google Scholar
  6. [6]
    Atkinson, A.C. (1980): Examples showing the use of two graphical displays for the detection of influential and outlying observations in regression. In Compstat 1980 (M.M. Barritt and D. Wishart, eds.), 276–282. Physica-Verlag, Vienna.Google Scholar
  7. [7]
    Atkinson, A.C. (1981): Two graphical displays for outlying and influential observations in regression. Biometrika 68 13–20.MathSciNetCrossRefMATHGoogle Scholar
  8. [8]
    Atkinson, A.C. (1982a): Regression diagnostics, transformations and constructed variables. J. Roy. Statist. Soc. B 44 1–36.MATHGoogle Scholar
  9. [9]
    Atkinson, A.C. (1982b): Robust and diagnostic regression analyses. Commun. in Statist., Theory and Methods 11 2559–2571.CrossRefGoogle Scholar
  10. [10]
    Atkinson, A.C. (1985): Plots, Transformations, and Regression, 129–136, 261–262 and 266–268. Clarendon Press, Oxford.MATHGoogle Scholar
  11. [11]
    Atkinson, A.C. (1986): Diagnostic tests for transformations. Technometrics 28 29–37.MathSciNetCrossRefMATHGoogle Scholar
  12. [12]
    Atkinson, A.C. (1986): Masking unmasked. Biometrika 73 533–541.MathSciNetCrossRefMATHGoogle Scholar
  13. [13]
    Atkinson, A.C. and Weisberg, S. (1991): Simulated annealing for the detection of multiple outliers using least squares and least median of squares fitting. In Directions in Robust Statistics and Diagnostics — Part I. (W. Stahel and S. Weisberg eds.), 7–20. Springer-Verlag, New York.Google Scholar
  14. [14]
    Bassett, J.R. and Koenker, R. (1982): An empirical quantile function for linear models. J. Amer. Statist. Assoc. 77 407–415.MathSciNetCrossRefMATHGoogle Scholar
  15. [15]
    Birkes, D. and Dodge, Y. (1993): Alternative Methods of Regression, 177–179. John Wiley, New York.CrossRefMATHGoogle Scholar
  16. [16]
    Brownlee, K.A. (1960): Statistical Theory and Methodology in Science and Engineering, 491–500. John Wiley, New York.MATHGoogle Scholar
  17. [17]
    Carmody, T.J. (1988): Diagnostics for multivariate smoothing splines. J. Statist. Plann. Inference 19 171–186.MathSciNetCrossRefMATHGoogle Scholar
  18. [18]
    Carroll, R.J. and Ruppert, D. (1985): Transformations in regression: A robust analysis. Technometrics 27 1–12.MathSciNetCrossRefMATHGoogle Scholar
  19. [19]
    Chaloner, K. and Brant, R. (1988): A bayesian approach to outlier detection and residual analysis. Biometrika 75 651–659.MathSciNetCrossRefMATHGoogle Scholar
  20. [20]
    Chalton, D.O. and Troskie, C.G. (1992): Q plots, a graphical aid for regression analysis. Commun. in Statist., Theory and Methods 21 625–636.MathSciNetCrossRefMATHGoogle Scholar
  21. [21]
    Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A. (1983): Graphical Methods for Data Analysis, 188, 309 and 376. Wadsworth & Brooks/Cole Publishing Company, California.MATHGoogle Scholar
  22. [22]
    Chambers, R.L. and Heathcote, C.R. (1981): On the estimation of slope and the identification of outliers in linear regression. Biometrika 68 21–33.MathSciNetCrossRefMATHGoogle Scholar
  23. [23]
    Chatterjee, S. and Hadi, A.S. (1988): Impact of simultaneous omission of a variable and an observation on a linear regression equation. Comput. Statist. & Data Analysis 6 129–144.MathSciNetCrossRefMATHGoogle Scholar
  24. [24]
    Chatterjee, S. and Hadi, A.S. (1988): Sensitivity Analysis in Linear Regression, 226–233. John Wiley, New York.CrossRefMATHGoogle Scholar
  25. [25]
    Cheng, K.S. and Hettmansperger, T.P. (1983): Weighted least squares rank estimates. Commun. in Statist., Theory and Methods 12 1069–1086.MathSciNetCrossRefMATHGoogle Scholar
  26. [26]
    Clark, D.I. and Osborne, M.R. (1986): Finite algorithms for Huber’s Mestimator. SIAM, J. Sci. Statist. Comput. 7 72–85.MathSciNetCrossRefMATHGoogle Scholar
  27. [27]
    Cléroux, R., Helbling, J.-M. and Ranger, N. (1986): Some methods of detecting multivariate outliers. CSQ 3 177–195.MATHGoogle Scholar
  28. [28]
    Cléroux, R., Helbling, J.-M. and Ranger, N. (1989): Influential subsets diagnostics based on multiple correlation. CSQ 2 99–117.Google Scholar
  29. [29]
    Cook, R.D. (1979): Influential observations in linear regression. J. Amer. Statist. Assoc. 74 169–174.MathSciNetCrossRefMATHGoogle Scholar
  30. [30]
    Crisp, A. and Burridge, J. (1993): A note on the uniqueness of Mestimators in robust regression. Canadian J. Statist. 21 205–208.MathSciNetCrossRefMATHGoogle Scholar
  31. [31]
    Daniel, C. and Wood, F.S. (1980): Fitting Equations to Data, (60–82 and 138–139). John Wiley, New York, 2nd edition.MATHGoogle Scholar
  32. [32]
    Dempster, A.P. and Gasko-Green, M. (1981): New tools for residuals analysis. Ann. Statist. 9 945–959.MathSciNetCrossRefMATHGoogle Scholar
  33. [33]
    Denby, L. and Mallows, C.L. (1977): Two diagnostic displays for robust regression analysis. Technometrics 19 1–13.CrossRefMATHGoogle Scholar
  34. [34]
    Dodge, Y. (1984): Robust estimation of regression coefficients by minimizing a convex combination of least squares and least absolute deviations. CSQ 1 139–153.MATHGoogle Scholar
  35. [35]
    Dodge, Y., Antoch, J. and Jureckova, J. (1991): Computational Aspects of adaptative combination of least squares and least absolute deviations estimators. Comput. Statist. & Data Analysis 12 87–99.MathSciNetCrossRefMATHGoogle Scholar
  36. [36]
    Dollinger, M.B. and Staudte, R.G. (1990): The construction of equileverage designs for multiple linear regression. Austral. J. Statist. 32 99–118.MathSciNetCrossRefMATHGoogle Scholar
  37. [37]
    Draper, N.R. and Smith, H. (1966): Applied Regression Analysis 204–215 and 337. John Wiley, New York.Google Scholar
  38. [38]
    Eubank, R.L. (1988): Spline Smoothing and Nonparametric Regression, 338–347. Marcel Dekker, New York.MATHGoogle Scholar
  39. [39]
    Flack, V.F. and Flores, R.A. (1989): Using simulated envelopes in the evaluation of normal probability plots of regression analysis. Technometrics 31 219–225.CrossRefGoogle Scholar
  40. [40]
    Gray, J.B. (1989): The internal norm approach to influence diagnostics. Commun. in Statist., Theory and Methods 18 943–958.CrossRefMATHGoogle Scholar
  41. [41]
    Gray, J.B. (1993): Approximating the internal norm influence measure in linear regression. Commun. in Statist., Simul. and Comput. 22 117–135.CrossRefMATHGoogle Scholar
  42. [42]
    Gray, J.B. and Ling, R.F. (1984): K-clustering as a detection tool for influential subsets in regression. Technometrics 26 305–318.MathSciNetCrossRefGoogle Scholar
  43. [43]
    Hadi, A.S. (1992): Identifying multiple outliers in multivariate data. J. Roy. Statist. Soc. B, 54 761–771.MathSciNetGoogle Scholar
  44. [44]
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986): Robust Statistics — The Approach Based on Inßuence Functions, 330–332. John Wiley, New York.Google Scholar
  45. [45]
    Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994): A Handbook of Small Data Sets, 156–157. Chapmann & Hall, London.Google Scholar
  46. [46]
    Hawkins, D.M. (1980): Identification of Outliers, 97–99. Chapman & Hall, New York.MATHGoogle Scholar
  47. [47]
    Hawkins, D.M. (1993): The feasible set algorithm for least median of squares regression. Comput. Statist. & Data Analysis 16 81–101.MathSciNetCrossRefMATHGoogle Scholar
  48. [48]
    Hawkins, D.M. (1994): The feasible solution algorithm for least trimmed squares regression. Comput. Statist. & Data Analysis 17 185–196.MathSciNetCrossRefMATHGoogle Scholar
  49. [49]
    Hawkins, D.M. (1994): The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data. Comput. Statist. & Data Analysis 17 197–210.CrossRefMATHGoogle Scholar
  50. [50]
    Hawkins, D.M., Bradu, D. and Kass, V. (1984): Location of several outliers in multiple-regression data using elemental sets. Technometrics 26 197–208.MathSciNetCrossRefGoogle Scholar
  51. [51]
    Hettmansperger, T.P. (1984): Statistical Inference Based on Ranks, 266–270. John Wiley, New York.MATHGoogle Scholar
  52. [52]
    Hettmansperger, T.P. and McKean, J.W. (1977): A robust alternative based on ranks to least squares in analyzing linear models. Technometrics 19 275–284.MathSciNetCrossRefMATHGoogle Scholar
  53. [53]
    Hogg, R.V. (1979a): An introduction to robust estimation. In Robustness in Statistics. (R.L. Launer and G.N. Wilkinson eds.), 1–17. Academic Press, New York.Google Scholar
  54. [54]
    Hogg, R.V. (1979b): Statistical robustness: one view of its use in applications today. The American Statistician 33 108–115.MathSciNetCrossRefMATHGoogle Scholar
  55. [55]
    Iman, R.C. and Conover, W.J. (1979): The use of the rank transform in regression. Technometrics 21 499–509.CrossRefGoogle Scholar
  56. [56]
    Lange, K.L., Little, R.J.A. and Taylor, J.M.G. (1989): Robust statistical modeling using the T distribution. J. Amer. Statist. Assoc. 84 881–896.MathSciNetCrossRefGoogle Scholar
  57. [57]
    Lenth, R.V. (1978): A Computational Procedute for Robust Multiple Regression. Technical Report 53, Univ. Iowa.Google Scholar
  58. [58]
    Li, G. (1985): Robust regression. In Exploring Data Tables, Trends and Shapes. (D. Hoaglin, F. Mosteller, and J. Tukey eds.), 281–343. John Wiley, New York.Google Scholar
  59. [59]
    Lye, J.N. (1991): Implications of uncorrelatedness and independence in linear regression analysis. Commun. in Statist., Theory and Methods 20 2407–2430.MathSciNetCrossRefGoogle Scholar
  60. [60]
    Marasinghe, M.G. (1985): A multistage procedure for detecting several outliers in linear regression. Technometrics 27 395–399.CrossRefGoogle Scholar
  61. [61]
    Marazzi, A. (1994): Developpement dun logiciel de modélisation robuste. In XXVIe Journées de Statistique.Google Scholar
  62. [62]
    Moberg, T.F., Ramberg, J.S. and Randies, R.H. (1980): An adaptative multiple regression procedure based on M-estimators. Technometrics 22 213–224.CrossRefMATHGoogle Scholar
  63. [63]
    Narula, S.N. and Wellington, J.F. (1985): Interior analysis for the minimum sum of absolute errors regression. Technometrics 27 181–188.CrossRefMATHGoogle Scholar
  64. [64]
    Narula, S.C. and Wellington, J.F. (1989): On the robustness of the simple linear minimum sum of absolute errors regression. In Robust Regression — Analysis and Applications. (K.D. Lawrence, and J.L. Arthur eds.), 129–141. Marcel Dekker, New York.Google Scholar
  65. [65]
    Neykov, N.M. and Neytchev, P.N. (1991): Least median of squares, least trimmed squares and S estimation by means of BMDP3R and BMDPAR. CSQ 4 281–293.Google Scholar
  66. [66]
    Osborne, M.R. (1985): Finite Algorithms in Optimization and Data Analysis, 267–270. John Wiley, New York.MATHGoogle Scholar
  67. [67]
    Osborne, M.R. (1992): An effective method for computing regression quantiles. IMA, J. of Numerical Analysis 12 151–166.MathSciNetCrossRefMATHGoogle Scholar
  68. [68]
    Osborne, M.R. and Watson, G.A. (1985): An analysis of the total approximation problem in separate norms, and an algorithm for the total L1 problem. SIAM, J. Sci. Statist. Comput. 6 410–424.MathSciNetCrossRefMATHGoogle Scholar
  69. [69]
    Parker, I. (1988): Transformations and influential observations in minimum sum of absolute errors regression. Technometrics 30 215–220.CrossRefGoogle Scholar
  70. [70]
    Paul, S.R. (1983): Sequential detection of unusual points in regression. The Statistician 32 417–424.CrossRefGoogle Scholar
  71. [71]
    Paul, S.R. and Fung, K.Y. (1991): A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression. Technometrics 33 339–348.CrossRefMATHGoogle Scholar
  72. [72]
    Portnoy, S. (1987): Using regression fractiles to identify outliers. In Statistical Data Analysis Based on the L1-Norm and Related Methods. (Y. Dodge ed.), 345–356. Elsevier Science Publishers B.V., North Holland.Google Scholar
  73. [73]
    Rey, W.J.J. (1977): M-Estimators in Robust Regression, a Case Study. M.B.L.E. Research Laboratory, Brussels.Google Scholar
  74. [74]
    Rey, W.J.J. (1978): Robust Statistical Methods. Lecture Notes in Mathematics #690, 68–72. Springer-Verlag, Berlin.MATHGoogle Scholar
  75. [75]
    Rey, W.J.J. (1983): Introduction to Robust and Quasi-Robust Statistical Methods, 152–162. Springer-Verlag, Berlin.MATHGoogle Scholar
  76. [76]
    Rivest, L.-P. (1988): A new scale step for hubers M-estimators in multiple regression. SIAM, J. Sci. Statist. Comput. 9 164–169.MathSciNetCrossRefMATHGoogle Scholar
  77. [77]
    Rivest, L.-P. (1989): De lunicité des estimateurs robustes en régression lorsque le paramétre déchelle et le paramétre de la régression sont estimés simultanément. Canadian J. Statist. 17 141–153.MathSciNetCrossRefMATHGoogle Scholar
  78. [78]
    Rousseeuw, P.J. and Leroy A.M. (1987): Robust Regression & Outlier Detection, 76–78. John Wiley, New York.CrossRefMATHGoogle Scholar
  79. [79]
    Rousseeuw, P.J. and Van Zomeren, B.C. (1990): Unmasking multivariate outliers and leverage points. J. Amer. Statist. Assoc. 85 633–651.CrossRefGoogle Scholar
  80. [80]
    Ruppert, D. and Carroll, R.J. (1980): Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75 828–838.MathSciNetCrossRefMATHGoogle Scholar
  81. [81]
    Salahuddin and Hawkes, A.G. (1991): Cross-validation in stepwise regression. Commun. in Statist., Theory and Methods 20 1163–1182.MathSciNetCrossRefGoogle Scholar
  82. [82]
    S-Plus for Windows (1993): Reference Manual, Vol.1, 16. Statistical Sciences, Inc., Seattle, Washington.Google Scholar
  83. [83]
    Staudte, R.G. and Sheather, S.J. (1990): Robust Estimation and Testing, 215–218 and 222–223. John Wiley, New York.CrossRefMATHGoogle Scholar
  84. [84]
    Stirling, W.D. (1984): Iteratively reweighted least squares for models with a linear part. Applied Statistics 33 7–17.MathSciNetCrossRefMATHGoogle Scholar
  85. [85]
    Tichavsky, P. (1991): Algorithms for and geometrical characterization of solutions in the LMS and the LTS linear regression. CSQ 2 139–151.MathSciNetGoogle Scholar
  86. [86]
    Tiku, M.L., Tan, W.Y. and Balakrishnan, N. (1986): Robust Inference, 177–178. Marcel Dekker, New York.MATHGoogle Scholar
  87. [87]
    Tukey, P.A. (1983): Graphical methods. In Statistical Data Analysis. Proceedings of Symposia in Applied Mathematics 28 8–48. American Mathematical Society.Google Scholar
  88. [88]
    Welsh, A.H. (1987): The trimmed mean in the linear model. Ann. Statist. 15 20–36.MathSciNetCrossRefMATHGoogle Scholar
  89. [89]
    Wetherill, G.B. (1986): Regression Analysis with Applications, 149–153. Chapman & Hall, London.MATHGoogle Scholar
  90. [90]
    Wiens, D.P. (1992): A note of the computation of robust, bounded influence estimates and test statistics in regression. Comput. Statist. & Data Analysis 13 211–220.CrossRefMATHGoogle Scholar
  91. [91]
    Wolke, R. and Schwetlick, H. (1988): Iteratively reweighted least squares: algorithms, convergence analysis and numerical comparisons. SIAM, J. Sci. Statist. Comput. 9 907–921.MathSciNetCrossRefMATHGoogle Scholar

Additional References

  1. [92]
    Barnard, G.A. (1982): Contribution to discussion of the paper Regression Diagnostics, Transformations and Constructed Variables, by A.C. Atkinson. J. Roy. Statist. Soc. B 44 1–36.Google Scholar
  2. [93]
    Brownlee, K.A. (1946): Industrial Experimentation. HMSO, London.Google Scholar
  3. [94]
    Brownlee, K.A. (1948): Industrial Experimentation. HMSO, London, 3rd edition.Google Scholar
  4. [95]
    Brownlee, K.A. (1949): Industrial Experimentation. HMSO, London, 4th edition.Google Scholar
  5. [96]
    Brownlee, K.A. (1965): Statistical Theory and Methodology in Science and Engineering. John Wiley, New York, 2nd edition.MATHGoogle Scholar
  6. [97]
    Fisher, R.A. (1936): The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7 179–184.CrossRefGoogle Scholar
  7. [98]
    Hocking, R.R. (1976): The Analysis and Selection of Variables in Linear Regression. Biometrics 32 1–49.MathSciNetCrossRefMATHGoogle Scholar
  8. [99]
    Huber (1964): Robust Estimation of Location Parameter. Ann. Math. Statist. 35 73–101.MathSciNetCrossRefMATHGoogle Scholar
  9. [100]
    Larntz, K., (1989): Contribution to the book reviews. J. Amer. Statist. Assoc. 84 624.CrossRefGoogle Scholar
  10. [101]
    Mallows, C.L. (1979): Robust Methods — Some Examples of their Use. The American Statistician 33 179–184.CrossRefGoogle Scholar
  11. [102]
    Preece, D.A. (1986): Illustrative Examples: Illustratives of what? The Statistician 35 33–44.CrossRefGoogle Scholar
  12. [103]
    Stigler, S.M. (1977): Do Robust Estimators Work with REAL DATA?. Ann. Statist. 5 1055–1098.MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Yadolah Dodge
    • 1
  1. 1.University of NeuchâtelSwitzerland

Personalised recommendations