Abstract
The present work develops a basic classification scheme for distortion in the framework of classical statistical inference. In particular, it emphasizes the still outstanding and consequent distinction between data contamination and model deviation. It is explored when different implications on the performance of statistical inference procedures under the two types of distortion are possible and how these can be detected. A critical review of some important approaches in the robustness and diagnostics literature finally indicates which of them is aimed at data contamination and which at model deviation (independently from what has been claimed originally). The paper raises awareness of the above problem through a constructive discussion – it is not meant to introduce new methodology
Similar content being viewed by others
References
Barnett V. (1982). Comparative statistical inference, 2nd edn. Wiley, New York
Barnett V., Lewis T. (1995). Outliers in statistical data, 3rd edn. Wiley, New York
Billor N., Loynes RM. (1993). Local influence: a new approach. Commun Stat Theory Methods 22(6):1595–1611
Cabrera J., Maguluri G., Singh K. (1997). Indices of empirical robustness. Stat Probab Lett 33:49–62
Cook RD. (1986). Assessment of local influence. J R Stat Soc Series B 48(2):133–169
Cook RD., Hawkins DM., Weisberg S. (1992). Comparison of model misspecification diagnostics using residuals from least mean of squares and least median of squares fits. J Am Stat Assoc 87(418):419–424
Cook RD., Weisberg S. (1982). Residuals and influence in regression. Chapman & Hall, London
Dawid AP. (1983). Inference, statistical: I. In: Kotz S., Johnson NL (eds). Encyclopedia of statistical sciences, vol 4. Wiley, New York, pp 89–105
Diggle PJ., Liang KY., Zeger SL. (1995). Analysis of longitudinal data. No. 13 in Oxford Statistical Science Series. Oxford University Press, Oxford, reprint with corrections
Donoho DL., Liu RC. (1988). The “automatic” robustness of minimum distance functionals. Ann Stat 16(2):552–586
Grunert da Fonseca V., (1999). Data contamination versus model deviation. PhD-thesis, Department of Probability and Statistics University of Sheffield, United Kingdom
Hampel FR., Ronchetti EM., Rousseeuw PJ., Stahel WA. (1986). Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics. Wiley, New York
He X., Simpson DG. (1993). Lower bounds for contamination bias: globally minimax versus locally linear estimation. Ann Stat 21(1):314–337
Hettmansperger TP., Sheather SJ. (1992). Resistant and robust procedures. In: Hoaglin DC., Moore DS (eds). Perspectives on contemporary statistics. Mathematical association of America, no. 21 in MAA notes, pp 145–170
Hoaglin DC., Mosteller F., Tukey JW (eds). (2000). Understanding robust and exploratory data analysis. Wiley Classics Library, Wiley New York
Huber PJ. (1981). Robust statistics. Wiley series in probability and mathematical statistics. Wiley, New York
Lawrance AJ. (1991). Local and deletion influence. In: Stahel W., Weisberg S (eds). Directions in robust statistics and diagnostics, part I. The IMA volumes in mathematics and its applications, vol 33. Springer, Berlin Heidelberg New York, pp. 141–157
McKean JW., Sheather SJ., Hettmansperger TP. (1993). The use and interpretation of residuals based on robust estimation. J Am Stat Asso 88(424):1254–1263
Millar PW. (1981). Robust estimation via minimum distance methods. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 55:73–89
Morgenthaler S. (1991). Configural polysampling. In: Stahel W., Weisberg S (eds). Directions in robust statistics and diagnostics part II The IMA volumes in mathematics and its applications vol 34. Springer, Berlin Heidelberg New York, pp. 49–63
Morgenthaler S., Tukey JW (eds). (1991). Configural polysampling. A route to practical robustness. Wiley series in probability and mathematical statistics, Wiley, New York
Mosteller F., Tukey JW. (1977). Data analysis and regression. A second course in statistics. Addison-Wesley Publishing Company, Reading
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fonseca, V.G.d., Fieller, N.R.J. Distortion in statistical inference: the distinction between data contamination and model deviation. Metrika 63, 169–190 (2006). https://doi.org/10.1007/s00184-005-0010-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-005-0010-2