Statistics and Computing

, Volume 24, Issue 3, pp 481–491 | Cite as

Outlier detection in contingency tables based on minimal patterns

Article

Abstract

A new technique for the detection of outliers in contingency tables is introduced, where outliers are unusual cell counts with respect to classical loglinear Poisson models. Subsets of cell counts called minimal patterns are defined, corresponding to non-singular design matrices and leading to potentially uncontaminated maximum-likelihood estimates of the model parameters and thereby the expected cell counts. A criterion to easily produce minimal patterns in the two-way case under independence is derived, based on the analysis of the positions of the chosen cells. A simulation study and a couple of real-data examples are presented to illustrate the performance of the newly developed outlier identification algorithm, and to compare it with other existing methods.

Keywords

Contingency tables Robustness Loglinear models Outliers Minimal patterns 

Notes

Acknowledgements

FR is partially supported by the Italian Ministry for University and Research, programme PRIN2009, grant number 2009H8WPX5.

References

  1. Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002) CrossRefMATHGoogle Scholar
  2. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, New York (1994) MATHGoogle Scholar
  3. Davies, L., Gather, U.: The identification of multiple outliers. J. Am. Stat. Assoc. 88, 782–792 (1993) CrossRefMATHMathSciNetGoogle Scholar
  4. Fuchs, C., Kenett, R.: A test for detecting outlying cells in the multinomial distribution and two-way contingency tables. J. Am. Stat. Assoc. 75, 395–398 (1980) CrossRefMATHMathSciNetGoogle Scholar
  5. Gupta, A.K., Nguyen, T., Pardo, L.: Residual analysis and outliers in loglinear models based on ϕ-divergence statistics. J. Stat. Plan. Inference 137(4), 1407–1423 (2007) CrossRefMATHMathSciNetGoogle Scholar
  6. Glass, D.V., Berent, J.: Social Mobility in Britain. International Library of Sociology and Social Reconstruction. Routledge & Kegan Paul, London (1954) Google Scholar
  7. Goodman, L.A.: A simple simultaneous test procedure for quasi-independence in contingency tables. J. R. Stat. Soc., Ser. C 20(2), 165–177 (1971) Google Scholar
  8. Hubert, M.: The breakdown value of the L 1 estimator in contingency tables. Stat. Probab. Lett. 33, 419–425 (1997) CrossRefMATHMathSciNetGoogle Scholar
  9. Kuhnt, S.: Ausreißeridentifikation im Loglinearen Poissonmodell für Kontingenztafeln unter Einbeziehung robuster Schätzer. Ph.D. thesis, Universität Dortmund, Dortmund (2000) Google Scholar
  10. Kuhnt, S.: Outlier identification procedures for contingency tables using maximum likelihood and L 1 estimates. Scand. J. Stat. 31, 431–442 (2004) CrossRefMathSciNetGoogle Scholar
  11. Kuhnt, S.: Breakdown concepts for contingency tables. Metrika 71, 281–294 (2010) CrossRefMATHMathSciNetGoogle Scholar
  12. Mosteller, F., Parunak, A.: Identifying extreme cells in a sizable contingency table: probabilistic and exploratory approaches. In: Hoaglin, D.C., Mosteller, F., Tukey, J.W. (eds.) Exploring Data Tables, Trends, and Shapes, pp. 189–224. Wiley, New York (2006) Google Scholar
  13. McKinley, J.: Social networks, lay consultation and help-seeking behavior. Soc. Forces 51, 275–291 (1973) CrossRefGoogle Scholar
  14. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012) Google Scholar
  15. Rapallo, F.: Algebraic Markov bases and MCMC for two-way contingency tables. Scand. J. Stat. 30(2), 385–397 (2003) CrossRefMATHMathSciNetGoogle Scholar
  16. Rapallo, F.: Outliers and patterns of outliers in contingency tables with algebraic statistics. Scand. J. Stat. 39(4), 784–797 (2012) CrossRefMATHMathSciNetGoogle Scholar
  17. Riani, M., Atkinson, A.C.: Robust diagnostic data analysis: transformations in regression. Technometrics 42(4), 384–394 (2000) CrossRefMATHMathSciNetGoogle Scholar
  18. Simonoff, J.S.: Detecting outlying cells in two-way contingency tables via backwards stepping. Technometrics 30(3), 339–345 (1988) CrossRefGoogle Scholar
  19. Shane, K.V., Simonoff, J.S.: A robust approach to categorical data analysis. J. Comput. Graph. Stat. 10(1), 135–157 (2001) CrossRefMathSciNetGoogle Scholar
  20. Terbeck, W., Davies, L.: Interactions and outliers in the two-way analysis of variance. Ann. Stat. 26, 1279–1305 (1998) CrossRefMATHMathSciNetGoogle Scholar
  21. Upton, G.J.: Contingency table analysis: log-linear models. Qual. Quant. 14(1), 155–180 (1980) CrossRefMathSciNetGoogle Scholar
  22. Upton, G.J., Guillen, M.: Perfect cells, direct models and contingency table outliers. Commun. Stat., Theory Methods 24(7), 1843–1862 (1995) CrossRefMATHMathSciNetGoogle Scholar
  23. von Eye, A.: Configural Frequency Analysis: Methods, Models, and Applications. Lawrence Erlbaum Associates, Mahwah (2002) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Faculty of StatisticsTU Dortmund UniversityDortmundGermany
  2. 2.Department DISITUniversità del Piemonte OrientaleAlessandriaItaly

Personalised recommendations