The Guinea Pig of Multiple Regression

Part of the Lecture Notes in Statistics book series (LNS, volume 109)


The role played by a 21×4 data set in the development (progress and regress) of multiple regression is discussed. While this ‘famous’ data set has been (and is being) used as a guinea pig1 for almost every method of estimation introduced in the regression market, it appears that no one has questioned the origin and correctness in the last 30 years. An attempt is made to clarify some points in this regard. In particular, it is argued here that this data set is in fact a subset of a longer data set of which the rest is missing.

Key words and phrases

Data estimation meta-analysis multiple regression outliers robustness 

AMS 1991 subject classifications

Primary 62J05 secondary 62F35 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Aitkin, M. and Wilson, G.T. (1980): Mixture models, outliers, and the EM algorithm. Technometrics 22 325–331.CrossRefzbMATHGoogle Scholar
  2. [2]
    Andrews, D.F. (1974): A robust method for multiple linear regression. Technometrics 16 523–531.MathSciNetCrossRefzbMATHGoogle Scholar
  3. [3]
    Andrews, D.F. and Pregibon, D. (1978): Finding the outliers that matter. J. Roy. Statist. Soc. B 40 85–93.zbMATHGoogle Scholar
  4. [4]
    Antille, G. and El May, H. (1992): The use of slices in the LMS and the method of density slices: Foundation and comparison. In Computational Statistics (Y. Dodge and J. Whittaker, eds.), 441–445. Physica-Verlag.Google Scholar
  5. [5]
    Antodi, J. and Jureckova, J. (1985): Trimmed least squares estimator resistant to leverage points. CSQ 4 329–339.Google Scholar
  6. [6]
    Atkinson, A.C. (1980): Examples showing the use of two graphical displays for the detection of influential and outlying observations in regression. In Compstat 1980 (M.M. Barritt and D. Wishart, eds.), 276–282. Physica-Verlag, Vienna.Google Scholar
  7. [7]
    Atkinson, A.C. (1981): Two graphical displays for outlying and influential observations in regression. Biometrika 68 13–20.MathSciNetCrossRefzbMATHGoogle Scholar
  8. [8]
    Atkinson, A.C. (1982a): Regression diagnostics, transformations and constructed variables. J. Roy. Statist. Soc. B 44 1–36.zbMATHGoogle Scholar
  9. [9]
    Atkinson, A.C. (1982b): Robust and diagnostic regression analyses. Commun. in Statist., Theory and Methods 11 2559–2571.CrossRefGoogle Scholar
  10. [10]
    Atkinson, A.C. (1985): Plots, Transformations, and Regression, 129–136, 261–262 and 266–268. Clarendon Press, Oxford.zbMATHGoogle Scholar
  11. [11]
    Atkinson, A.C. (1986): Diagnostic tests for transformations. Technometrics 28 29–37.MathSciNetCrossRefzbMATHGoogle Scholar
  12. [12]
    Atkinson, A.C. (1986): Masking unmasked. Biometrika 73 533–541.MathSciNetCrossRefzbMATHGoogle Scholar
  13. [13]
    Atkinson, A.C. and Weisberg, S. (1991): Simulated annealing for the detection of multiple outliers using least squares and least median of squares fitting. In Directions in Robust Statistics and Diagnostics — Part I. (W. Stahel and S. Weisberg eds.), 7–20. Springer-Verlag, New York.Google Scholar
  14. [14]
    Bassett, J.R. and Koenker, R. (1982): An empirical quantile function for linear models. J. Amer. Statist. Assoc. 77 407–415.MathSciNetCrossRefzbMATHGoogle Scholar
  15. [15]
    Birkes, D. and Dodge, Y. (1993): Alternative Methods of Regression, 177–179. John Wiley, New York.CrossRefzbMATHGoogle Scholar
  16. [16]
    Brownlee, K.A. (1960): Statistical Theory and Methodology in Science and Engineering, 491–500. John Wiley, New York.zbMATHGoogle Scholar
  17. [17]
    Carmody, T.J. (1988): Diagnostics for multivariate smoothing splines. J. Statist. Plann. Inference 19 171–186.MathSciNetCrossRefzbMATHGoogle Scholar
  18. [18]
    Carroll, R.J. and Ruppert, D. (1985): Transformations in regression: A robust analysis. Technometrics 27 1–12.MathSciNetCrossRefzbMATHGoogle Scholar
  19. [19]
    Chaloner, K. and Brant, R. (1988): A bayesian approach to outlier detection and residual analysis. Biometrika 75 651–659.MathSciNetCrossRefzbMATHGoogle Scholar
  20. [20]
    Chalton, D.O. and Troskie, C.G. (1992): Q plots, a graphical aid for regression analysis. Commun. in Statist., Theory and Methods 21 625–636.MathSciNetCrossRefzbMATHGoogle Scholar
  21. [21]
    Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, P.A. (1983): Graphical Methods for Data Analysis, 188, 309 and 376. Wadsworth & Brooks/Cole Publishing Company, California.zbMATHGoogle Scholar
  22. [22]
    Chambers, R.L. and Heathcote, C.R. (1981): On the estimation of slope and the identification of outliers in linear regression. Biometrika 68 21–33.MathSciNetCrossRefzbMATHGoogle Scholar
  23. [23]
    Chatterjee, S. and Hadi, A.S. (1988): Impact of simultaneous omission of a variable and an observation on a linear regression equation. Comput. Statist. & Data Analysis 6 129–144.MathSciNetCrossRefzbMATHGoogle Scholar
  24. [24]
    Chatterjee, S. and Hadi, A.S. (1988): Sensitivity Analysis in Linear Regression, 226–233. John Wiley, New York.CrossRefzbMATHGoogle Scholar
  25. [25]
    Cheng, K.S. and Hettmansperger, T.P. (1983): Weighted least squares rank estimates. Commun. in Statist., Theory and Methods 12 1069–1086.MathSciNetCrossRefzbMATHGoogle Scholar
  26. [26]
    Clark, D.I. and Osborne, M.R. (1986): Finite algorithms for Huber’s Mestimator. SIAM, J. Sci. Statist. Comput. 7 72–85.MathSciNetCrossRefzbMATHGoogle Scholar
  27. [27]
    Cléroux, R., Helbling, J.-M. and Ranger, N. (1986): Some methods of detecting multivariate outliers. CSQ 3 177–195.zbMATHGoogle Scholar
  28. [28]
    Cléroux, R., Helbling, J.-M. and Ranger, N. (1989): Influential subsets diagnostics based on multiple correlation. CSQ 2 99–117.Google Scholar
  29. [29]
    Cook, R.D. (1979): Influential observations in linear regression. J. Amer. Statist. Assoc. 74 169–174.MathSciNetCrossRefzbMATHGoogle Scholar
  30. [30]
    Crisp, A. and Burridge, J. (1993): A note on the uniqueness of Mestimators in robust regression. Canadian J. Statist. 21 205–208.MathSciNetCrossRefzbMATHGoogle Scholar
  31. [31]
    Daniel, C. and Wood, F.S. (1980): Fitting Equations to Data, (60–82 and 138–139). John Wiley, New York, 2nd edition.zbMATHGoogle Scholar
  32. [32]
    Dempster, A.P. and Gasko-Green, M. (1981): New tools for residuals analysis. Ann. Statist. 9 945–959.MathSciNetCrossRefzbMATHGoogle Scholar
  33. [33]
    Denby, L. and Mallows, C.L. (1977): Two diagnostic displays for robust regression analysis. Technometrics 19 1–13.CrossRefzbMATHGoogle Scholar
  34. [34]
    Dodge, Y. (1984): Robust estimation of regression coefficients by minimizing a convex combination of least squares and least absolute deviations. CSQ 1 139–153.zbMATHGoogle Scholar
  35. [35]
    Dodge, Y., Antoch, J. and Jureckova, J. (1991): Computational Aspects of adaptative combination of least squares and least absolute deviations estimators. Comput. Statist. & Data Analysis 12 87–99.MathSciNetCrossRefzbMATHGoogle Scholar
  36. [36]
    Dollinger, M.B. and Staudte, R.G. (1990): The construction of equileverage designs for multiple linear regression. Austral. J. Statist. 32 99–118.MathSciNetCrossRefzbMATHGoogle Scholar
  37. [37]
    Draper, N.R. and Smith, H. (1966): Applied Regression Analysis 204–215 and 337. John Wiley, New York.Google Scholar
  38. [38]
    Eubank, R.L. (1988): Spline Smoothing and Nonparametric Regression, 338–347. Marcel Dekker, New York.zbMATHGoogle Scholar
  39. [39]
    Flack, V.F. and Flores, R.A. (1989): Using simulated envelopes in the evaluation of normal probability plots of regression analysis. Technometrics 31 219–225.CrossRefGoogle Scholar
  40. [40]
    Gray, J.B. (1989): The internal norm approach to influence diagnostics. Commun. in Statist., Theory and Methods 18 943–958.CrossRefzbMATHGoogle Scholar
  41. [41]
    Gray, J.B. (1993): Approximating the internal norm influence measure in linear regression. Commun. in Statist., Simul. and Comput. 22 117–135.CrossRefzbMATHGoogle Scholar
  42. [42]
    Gray, J.B. and Ling, R.F. (1984): K-clustering as a detection tool for influential subsets in regression. Technometrics 26 305–318.MathSciNetCrossRefGoogle Scholar
  43. [43]
    Hadi, A.S. (1992): Identifying multiple outliers in multivariate data. J. Roy. Statist. Soc. B, 54 761–771.MathSciNetGoogle Scholar
  44. [44]
    Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986): Robust Statistics — The Approach Based on Inßuence Functions, 330–332. John Wiley, New York.Google Scholar
  45. [45]
    Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994): A Handbook of Small Data Sets, 156–157. Chapmann & Hall, London.Google Scholar
  46. [46]
    Hawkins, D.M. (1980): Identification of Outliers, 97–99. Chapman & Hall, New York.zbMATHGoogle Scholar
  47. [47]
    Hawkins, D.M. (1993): The feasible set algorithm for least median of squares regression. Comput. Statist. & Data Analysis 16 81–101.MathSciNetCrossRefzbMATHGoogle Scholar
  48. [48]
    Hawkins, D.M. (1994): The feasible solution algorithm for least trimmed squares regression. Comput. Statist. & Data Analysis 17 185–196.MathSciNetCrossRefzbMATHGoogle Scholar
  49. [49]
    Hawkins, D.M. (1994): The feasible solution algorithm for the minimum covariance determinant estimator in multivariate data. Comput. Statist. & Data Analysis 17 197–210.CrossRefzbMATHGoogle Scholar
  50. [50]
    Hawkins, D.M., Bradu, D. and Kass, V. (1984): Location of several outliers in multiple-regression data using elemental sets. Technometrics 26 197–208.MathSciNetCrossRefGoogle Scholar
  51. [51]
    Hettmansperger, T.P. (1984): Statistical Inference Based on Ranks, 266–270. John Wiley, New York.zbMATHGoogle Scholar
  52. [52]
    Hettmansperger, T.P. and McKean, J.W. (1977): A robust alternative based on ranks to least squares in analyzing linear models. Technometrics 19 275–284.MathSciNetCrossRefzbMATHGoogle Scholar
  53. [53]
    Hogg, R.V. (1979a): An introduction to robust estimation. In Robustness in Statistics. (R.L. Launer and G.N. Wilkinson eds.), 1–17. Academic Press, New York.Google Scholar
  54. [54]
    Hogg, R.V. (1979b): Statistical robustness: one view of its use in applications today. The American Statistician 33 108–115.MathSciNetCrossRefzbMATHGoogle Scholar
  55. [55]
    Iman, R.C. and Conover, W.J. (1979): The use of the rank transform in regression. Technometrics 21 499–509.CrossRefGoogle Scholar
  56. [56]
    Lange, K.L., Little, R.J.A. and Taylor, J.M.G. (1989): Robust statistical modeling using the T distribution. J. Amer. Statist. Assoc. 84 881–896.MathSciNetCrossRefGoogle Scholar
  57. [57]
    Lenth, R.V. (1978): A Computational Procedute for Robust Multiple Regression. Technical Report 53, Univ. Iowa.Google Scholar
  58. [58]
    Li, G. (1985): Robust regression. In Exploring Data Tables, Trends and Shapes. (D. Hoaglin, F. Mosteller, and J. Tukey eds.), 281–343. John Wiley, New York.Google Scholar
  59. [59]
    Lye, J.N. (1991): Implications of uncorrelatedness and independence in linear regression analysis. Commun. in Statist., Theory and Methods 20 2407–2430.MathSciNetCrossRefGoogle Scholar
  60. [60]
    Marasinghe, M.G. (1985): A multistage procedure for detecting several outliers in linear regression. Technometrics 27 395–399.CrossRefGoogle Scholar
  61. [61]
    Marazzi, A. (1994): Developpement dun logiciel de modélisation robuste. In XXVIe Journées de Statistique.Google Scholar
  62. [62]
    Moberg, T.F., Ramberg, J.S. and Randies, R.H. (1980): An adaptative multiple regression procedure based on M-estimators. Technometrics 22 213–224.CrossRefzbMATHGoogle Scholar
  63. [63]
    Narula, S.N. and Wellington, J.F. (1985): Interior analysis for the minimum sum of absolute errors regression. Technometrics 27 181–188.CrossRefzbMATHGoogle Scholar
  64. [64]
    Narula, S.C. and Wellington, J.F. (1989): On the robustness of the simple linear minimum sum of absolute errors regression. In Robust Regression — Analysis and Applications. (K.D. Lawrence, and J.L. Arthur eds.), 129–141. Marcel Dekker, New York.Google Scholar
  65. [65]
    Neykov, N.M. and Neytchev, P.N. (1991): Least median of squares, least trimmed squares and S estimation by means of BMDP3R and BMDPAR. CSQ 4 281–293.Google Scholar
  66. [66]
    Osborne, M.R. (1985): Finite Algorithms in Optimization and Data Analysis, 267–270. John Wiley, New York.zbMATHGoogle Scholar
  67. [67]
    Osborne, M.R. (1992): An effective method for computing regression quantiles. IMA, J. of Numerical Analysis 12 151–166.MathSciNetCrossRefzbMATHGoogle Scholar
  68. [68]
    Osborne, M.R. and Watson, G.A. (1985): An analysis of the total approximation problem in separate norms, and an algorithm for the total L1 problem. SIAM, J. Sci. Statist. Comput. 6 410–424.MathSciNetCrossRefzbMATHGoogle Scholar
  69. [69]
    Parker, I. (1988): Transformations and influential observations in minimum sum of absolute errors regression. Technometrics 30 215–220.CrossRefGoogle Scholar
  70. [70]
    Paul, S.R. (1983): Sequential detection of unusual points in regression. The Statistician 32 417–424.CrossRefGoogle Scholar
  71. [71]
    Paul, S.R. and Fung, K.Y. (1991): A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression. Technometrics 33 339–348.CrossRefzbMATHGoogle Scholar
  72. [72]
    Portnoy, S. (1987): Using regression fractiles to identify outliers. In Statistical Data Analysis Based on the L1-Norm and Related Methods. (Y. Dodge ed.), 345–356. Elsevier Science Publishers B.V., North Holland.Google Scholar
  73. [73]
    Rey, W.J.J. (1977): M-Estimators in Robust Regression, a Case Study. M.B.L.E. Research Laboratory, Brussels.Google Scholar
  74. [74]
    Rey, W.J.J. (1978): Robust Statistical Methods. Lecture Notes in Mathematics #690, 68–72. Springer-Verlag, Berlin.zbMATHGoogle Scholar
  75. [75]
    Rey, W.J.J. (1983): Introduction to Robust and Quasi-Robust Statistical Methods, 152–162. Springer-Verlag, Berlin.zbMATHGoogle Scholar
  76. [76]
    Rivest, L.-P. (1988): A new scale step for hubers M-estimators in multiple regression. SIAM, J. Sci. Statist. Comput. 9 164–169.MathSciNetCrossRefzbMATHGoogle Scholar
  77. [77]
    Rivest, L.-P. (1989): De lunicité des estimateurs robustes en régression lorsque le paramétre déchelle et le paramétre de la régression sont estimés simultanément. Canadian J. Statist. 17 141–153.MathSciNetCrossRefzbMATHGoogle Scholar
  78. [78]
    Rousseeuw, P.J. and Leroy A.M. (1987): Robust Regression & Outlier Detection, 76–78. John Wiley, New York.CrossRefzbMATHGoogle Scholar
  79. [79]
    Rousseeuw, P.J. and Van Zomeren, B.C. (1990): Unmasking multivariate outliers and leverage points. J. Amer. Statist. Assoc. 85 633–651.CrossRefGoogle Scholar
  80. [80]
    Ruppert, D. and Carroll, R.J. (1980): Trimmed least squares estimation in the linear model. J. Amer. Statist. Assoc. 75 828–838.MathSciNetCrossRefzbMATHGoogle Scholar
  81. [81]
    Salahuddin and Hawkes, A.G. (1991): Cross-validation in stepwise regression. Commun. in Statist., Theory and Methods 20 1163–1182.MathSciNetCrossRefGoogle Scholar
  82. [82]
    S-Plus for Windows (1993): Reference Manual, Vol.1, 16. Statistical Sciences, Inc., Seattle, Washington.Google Scholar
  83. [83]
    Staudte, R.G. and Sheather, S.J. (1990): Robust Estimation and Testing, 215–218 and 222–223. John Wiley, New York.CrossRefzbMATHGoogle Scholar
  84. [84]
    Stirling, W.D. (1984): Iteratively reweighted least squares for models with a linear part. Applied Statistics 33 7–17.MathSciNetCrossRefzbMATHGoogle Scholar
  85. [85]
    Tichavsky, P. (1991): Algorithms for and geometrical characterization of solutions in the LMS and the LTS linear regression. CSQ 2 139–151.MathSciNetGoogle Scholar
  86. [86]
    Tiku, M.L., Tan, W.Y. and Balakrishnan, N. (1986): Robust Inference, 177–178. Marcel Dekker, New York.zbMATHGoogle Scholar
  87. [87]
    Tukey, P.A. (1983): Graphical methods. In Statistical Data Analysis. Proceedings of Symposia in Applied Mathematics 28 8–48. American Mathematical Society.Google Scholar
  88. [88]
    Welsh, A.H. (1987): The trimmed mean in the linear model. Ann. Statist. 15 20–36.MathSciNetCrossRefzbMATHGoogle Scholar
  89. [89]
    Wetherill, G.B. (1986): Regression Analysis with Applications, 149–153. Chapman & Hall, London.zbMATHGoogle Scholar
  90. [90]
    Wiens, D.P. (1992): A note of the computation of robust, bounded influence estimates and test statistics in regression. Comput. Statist. & Data Analysis 13 211–220.CrossRefzbMATHGoogle Scholar
  91. [91]
    Wolke, R. and Schwetlick, H. (1988): Iteratively reweighted least squares: algorithms, convergence analysis and numerical comparisons. SIAM, J. Sci. Statist. Comput. 9 907–921.MathSciNetCrossRefzbMATHGoogle Scholar

Additional References

  1. [92]
    Barnard, G.A. (1982): Contribution to discussion of the paper Regression Diagnostics, Transformations and Constructed Variables, by A.C. Atkinson. J. Roy. Statist. Soc. B 44 1–36.Google Scholar
  2. [93]
    Brownlee, K.A. (1946): Industrial Experimentation. HMSO, London.Google Scholar
  3. [94]
    Brownlee, K.A. (1948): Industrial Experimentation. HMSO, London, 3rd edition.Google Scholar
  4. [95]
    Brownlee, K.A. (1949): Industrial Experimentation. HMSO, London, 4th edition.Google Scholar
  5. [96]
    Brownlee, K.A. (1965): Statistical Theory and Methodology in Science and Engineering. John Wiley, New York, 2nd edition.zbMATHGoogle Scholar
  6. [97]
    Fisher, R.A. (1936): The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7 179–184.CrossRefGoogle Scholar
  7. [98]
    Hocking, R.R. (1976): The Analysis and Selection of Variables in Linear Regression. Biometrics 32 1–49.MathSciNetCrossRefzbMATHGoogle Scholar
  8. [99]
    Huber (1964): Robust Estimation of Location Parameter. Ann. Math. Statist. 35 73–101.MathSciNetCrossRefzbMATHGoogle Scholar
  9. [100]
    Larntz, K., (1989): Contribution to the book reviews. J. Amer. Statist. Assoc. 84 624.CrossRefGoogle Scholar
  10. [101]
    Mallows, C.L. (1979): Robust Methods — Some Examples of their Use. The American Statistician 33 179–184.CrossRefGoogle Scholar
  11. [102]
    Preece, D.A. (1986): Illustrative Examples: Illustratives of what? The Statistician 35 33–44.CrossRefGoogle Scholar
  12. [103]
    Stigler, S.M. (1977): Do Robust Estimators Work with REAL DATA?. Ann. Statist. 5 1055–1098.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  1. 1.University of NeuchâtelSwitzerland

Personalised recommendations