Advertisement

Exploring incomplete data using visualization techniques

  • Matthias Templ
  • Andreas Alfons
  • Peter Filzmoser
Regular Article

Abstract

Visualization of incomplete data allows to simultaneously explore the data and the structure of missing values. This is helpful for learning about the distribution of the incomplete information in the data, and to identify possible structures of the missing values and their relation to the available information. The main goal of this contribution is to stress the importance of exploring missing values using visualization methods and to present a collection of such visualization techniques for incomplete data, all of which are implemented in the \({{\sf R}}\) package VIM. Providing such functionality for this widely used statistical environment, visualization of missing values, imputation and data analysis can all be done from within \({{\sf R}}\) without the need of additional software.

Keywords

Visualization Missing values Exploring incomplete data \({{\sf R}}\) software 

Mathematics Subject Classification (2010)

62-XX Statistics 00A66 Mathematics and visual arts, visualization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acuna E, Members of the CASTLE Group at UPR-Mayaguez (2009) dprep: data preprocessing and visualization functions for classification. http://math.uprm.edu/~edgar/dprep.html. \({{\sf R}}\) package version 2.1
  2. Aitchison J (1986) The statistical analysis of compositional data. Wiley, HobokenzbMATHCrossRefGoogle Scholar
  3. Allison T, Cichetti D (1976) Sleep in mammals: ecological and constitutional correlates. Science 194(4266): 732–734CrossRefGoogle Scholar
  4. Box G, Cox D (1964) An analysis of transformations. J R Stat Soc B 26: 211–252MathSciNetzbMATHGoogle Scholar
  5. Cook D, Swayne D (2007) Interactive and dynamic graphics for data analysis: with R and GGobi. Springer, New York. ISBN:978-0-387-71761-6Google Scholar
  6. Dempster A, Laird N, Rubin D (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussions). J R Stat Soc B 39(1): 1–38MathSciNetzbMATHGoogle Scholar
  7. Eaton C, Plaisant C, Drizd T (2005) Visualizing missing data: graph interpretation user study. In: Costabile M, Paternò F (eds) Human-computer interaction—INTERACT 2005. Lecture notes in computer sciences. Springer, Heidelberg, pp 861–872. ISBN:978-3-540-28943-2Google Scholar
  8. Gustavsson N, Lampio E, Tarvainen T (1997) Visualization of geochemical data on maps at the Geological Survey of Finland. J Geochem Explor 59(3): 197–200CrossRefGoogle Scholar
  9. Harrower M, Brewer C (2003) ColorBrewer.org: an online tool for selecting colour schemes for maps. Cartogr J 40(1): 27–37CrossRefGoogle Scholar
  10. Hartigan J, Kleiner B (1981) Mosaics for contingency tables. In: Eddy W (ed) Computer science and statistics: proceedings of the 13th symposium on the interface. Springer, New York, pp 268–273Google Scholar
  11. Hartigan J, Kleiner B (1984) A mosaic of television ratings. Am Stat 38(1): 32–35CrossRefGoogle Scholar
  12. Hofmann H (2003) Constructing and reading mosaicplots. Comput Stat Data Anal 43(4): 565–580zbMATHCrossRefGoogle Scholar
  13. Hofmann H, Theus M (2005) Interactive graphics for visualizing conditional distributions. Unpublished manuscriptGoogle Scholar
  14. Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54(12): 3095–3107MathSciNetCrossRefGoogle Scholar
  15. Josse J, Pagès J, Husson F (2011) Multiple imputation in principal component analysis. Adv Data Anal Classif 5(3): 231–246MathSciNetCrossRefGoogle Scholar
  16. Little R, Rubin D (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken. ISBN:0-471-18386-5Google Scholar
  17. Meyer D, Zeileis A, Hornik K (2006) The \({{\tt strucplot}}\) framework: visualizing multi-way contingency tables with vcd. J Stat Softw 17(3):1–48. http://www.jstatsoft.org/v17/i03
  18. Meyer D, Zeileis A, Hornik K, Friendly M (2011) vcd: visualizing categorical data. http://CRAN.R-project.org/package=vcd. \({{\sf R}}\) package version 1.2-11
  19. Osborne J (1999) Notes on the use of data transformations. Pract Assess Res Eval 8(6):212–223. http://pareonline.net/getvn.asp?v=8&n=6
  20. Perrotta D, Riani M, Torti F (2009) New robust dynamic plots for regression mixture detection. Adv Data Anal Classif 3: 263–279MathSciNetCrossRefGoogle Scholar
  21. Raghunathan T, Lepkowski J, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27(1): 85–95Google Scholar
  22. \({{\sf R}}\) Development Core Team (2011) \({{\sf R}}\) : a language and environment for statistical computing. \({{\sf R}}\) Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN:3-900051-07-0
  23. Reimann C, Filzmoser P, Garrett R, Dutter R (2008) Statistical data analysis explained: applied environmental statistics with R. Wiley, HobokenCrossRefGoogle Scholar
  24. Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41: 212–223CrossRefGoogle Scholar
  25. Rubin D (1976) Inference and missing data. Biometrika 63(3): 581–592MathSciNetzbMATHCrossRefGoogle Scholar
  26. Rubin D (2004) Multiple imputation for nonresponse in surveys. Wiley Classics Library edn. Wiley, Hoboken. ISBN:0-471-65574-0Google Scholar
  27. Schafer J (1997) Analysis of incomplete multivariate data. Chapman & Hall, London. ISBN:0-412-04061-1Google Scholar
  28. Statistics Austria (2006) Einkommen, Armut und Lebensbedingungen 2004, Ergebnisse aus EU-SILC 2004. In German. ISBN:3-902479-59-0Google Scholar
  29. Statistics Austria (2007) EU-SILC 2004. Erläuterungen: Mikrodaten-Subsample für externe Nutzer. In GermanGoogle Scholar
  30. Swayne D, Lang D, Buja A, Cook D (2003) GGobi: evolving from XGobi into an extensible framework for interactive data visualization. Comput Stat Data Anal 43(4): 423–444MathSciNetzbMATHCrossRefGoogle Scholar
  31. Templ M, Alfons A, Kowarik A (2011a) VIM: visualization and imputation of missing values. http://CRAN.R-project.org/package=VIM. \({{\sf R}}\) package version 2.0.4
  32. Templ M, Kowarik A, Filzmoser P (2011b) Iterative stepwise regression imputation using standard and robust methods. Comput Stat Data Anal 55(10): 2793–2806MathSciNetCrossRefGoogle Scholar
  33. Theus M (2002) Interactive data visualization using mondrian. J Stat Softw 7(11):1–9. http://www.jstatsoft.org/v07/i11 Google Scholar
  34. Theus M, Hofmann H, Siegl B, Unwin A (1997) MANET—extensions to interactive statistical graphics for missing values. In: New techniques and technologies for statistics II. IOS Press, pp 247–259. ISBN:90,5119,326,9Google Scholar
  35. Todorov V, Templ M, Filzmoser P (2011) Detection of multivariate outliers in business survey data with incomplete information. Adv Data Anal Classif 5(1): 37–56MathSciNetCrossRefGoogle Scholar
  36. Unwin A (1994) Computational statistics. Physica-Verlag, Heidelberg, pp 315–326Google Scholar
  37. Unwin A, Wills G, Haslett J (1990) REGARD—graphical analysis of regional data. In: Proceedings of the section on statistical graphics. American Statistical Association, pp 36–41Google Scholar
  38. Unwin A, Hawkins G, Hofmann H, Siegl B (1996) Interactive graphics for data sets with missing values: MANET. J Comput Graph Stat 5(2): 113–122CrossRefGoogle Scholar
  39. Urbanek S (2011) Acinonyx: iPlots Extreme. http://www.RForge.net/Acinonyx/. \({{\sf R}}\) package version 3.0-0
  40. Branden K, Verboven S (2009) Robust data imputation. Comput Biol Chem 9(1): 7–13CrossRefGoogle Scholar
  41. Wegman E (1990) Hyperdimensional data analysis using parallel coordinates. J Am Stat Assoc 85(411): 664–675CrossRefGoogle Scholar
  42. Young F (1996) ViSta: the Visual Statistics System. UNC L.L. Thurstone Psychometric Laboratory Research Memorandum 94-1(c)Google Scholar
  43. Young F, Valero-Mora P, Friendly M (2006) Visual statistics. seeing data with dynamic interactive graphics. Wiley, Hoboken. ISBN:978-0-471-68160-1Google Scholar
  44. Zeileis A, Hornik K, Murrell P (2009) Escaping RGBland: selecting colors for statistical graphics. Comput Stat Data Anal 53(9): 1259–1270MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Matthias Templ
    • 1
    • 2
  • Andreas Alfons
    • 1
    • 3
  • Peter Filzmoser
    • 1
  1. 1.Department of Statistics and Probability TheoryVienna University of TechnologyViennaAustria
  2. 2.Methods UnitStatistics AustriaViennaAustria
  3. 3.Faculty of Business and EconomicsORSTAT Research Center, K.U. LeuvenLeuvenBelgium

Personalised recommendations