Skip to main content
Log in

Distortion in statistical inference: the distinction between data contamination and model deviation

  • Original Article
  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The present work develops a basic classification scheme for distortion in the framework of classical statistical inference. In particular, it emphasizes the still outstanding and consequent distinction between data contamination and model deviation. It is explored when different implications on the performance of statistical inference procedures under the two types of distortion are possible and how these can be detected. A critical review of some important approaches in the robustness and diagnostics literature finally indicates which of them is aimed at data contamination and which at model deviation (independently from what has been claimed originally). The paper raises awareness of the above problem through a constructive discussion – it is not meant to introduce new methodology

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barnett V. (1982). Comparative statistical inference, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Barnett V., Lewis T. (1995). Outliers in statistical data, 3rd edn. Wiley, New York

    Google Scholar 

  • Billor N., Loynes RM. (1993). Local influence: a new approach. Commun Stat Theory Methods 22(6):1595–1611

    Article  MATH  MathSciNet  Google Scholar 

  • Cabrera J., Maguluri G., Singh K. (1997). Indices of empirical robustness. Stat Probab Lett 33:49–62

    Article  MATH  MathSciNet  Google Scholar 

  • Cook RD. (1986). Assessment of local influence. J R Stat Soc Series B 48(2):133–169

    MATH  Google Scholar 

  • Cook RD., Hawkins DM., Weisberg S. (1992). Comparison of model misspecification diagnostics using residuals from least mean of squares and least median of squares fits. J Am Stat Assoc 87(418):419–424

    Article  MathSciNet  Google Scholar 

  • Cook RD., Weisberg S. (1982). Residuals and influence in regression. Chapman & Hall, London

    MATH  Google Scholar 

  • Dawid AP. (1983). Inference, statistical: I. In: Kotz S., Johnson NL (eds). Encyclopedia of statistical sciences, vol 4. Wiley, New York, pp 89–105

    Google Scholar 

  • Diggle PJ., Liang KY., Zeger SL. (1995). Analysis of longitudinal data. No. 13 in Oxford Statistical Science Series. Oxford University Press, Oxford, reprint with corrections

    Google Scholar 

  • Donoho DL., Liu RC. (1988). The “automatic” robustness of minimum distance functionals. Ann Stat 16(2):552–586

    Article  MATH  MathSciNet  Google Scholar 

  • Grunert da Fonseca V., (1999). Data contamination versus model deviation. PhD-thesis, Department of Probability and Statistics University of Sheffield, United Kingdom

    Google Scholar 

  • Hampel FR., Ronchetti EM., Rousseeuw PJ., Stahel WA. (1986). Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics. Wiley, New York

  • He X., Simpson DG. (1993). Lower bounds for contamination bias: globally minimax versus locally linear estimation. Ann Stat 21(1):314–337

    Article  MATH  MathSciNet  Google Scholar 

  • Hettmansperger TP., Sheather SJ. (1992). Resistant and robust procedures. In: Hoaglin DC., Moore DS (eds). Perspectives on contemporary statistics. Mathematical association of America, no. 21 in MAA notes, pp 145–170

  • Hoaglin DC., Mosteller F., Tukey JW (eds). (2000). Understanding robust and exploratory data analysis. Wiley Classics Library, Wiley New York

    MATH  Google Scholar 

  • Huber PJ. (1981). Robust statistics. Wiley series in probability and mathematical statistics. Wiley, New York

  • Lawrance AJ. (1991). Local and deletion influence. In: Stahel W., Weisberg S (eds). Directions in robust statistics and diagnostics, part I. The IMA volumes in mathematics and its applications, vol 33. Springer, Berlin Heidelberg New York, pp. 141–157

    Google Scholar 

  • McKean JW., Sheather SJ., Hettmansperger TP. (1993). The use and interpretation of residuals based on robust estimation. J Am Stat Asso 88(424):1254–1263

    Article  MATH  MathSciNet  Google Scholar 

  • Millar PW. (1981). Robust estimation via minimum distance methods. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 55:73–89

    Article  MATH  MathSciNet  Google Scholar 

  • Morgenthaler S. (1991). Configural polysampling. In: Stahel W., Weisberg S (eds). Directions in robust statistics and diagnostics part II The IMA volumes in mathematics and its applications vol 34. Springer, Berlin Heidelberg New York, pp. 49–63

    Google Scholar 

  • Morgenthaler S., Tukey JW (eds). (1991). Configural polysampling. A route to practical robustness. Wiley series in probability and mathematical statistics, Wiley, New York

  • Mosteller F., Tukey JW. (1977). Data analysis and regression. A second course in statistics. Addison-Wesley Publishing Company, Reading

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Grunert da. Fonseca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fonseca, V.G.d., Fieller, N.R.J. Distortion in statistical inference: the distinction between data contamination and model deviation. Metrika 63, 169–190 (2006). https://doi.org/10.1007/s00184-005-0010-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-005-0010-2

Keywords

Navigation