Computational Statistics

, Volume 34, Issue 4, pp 1537–1563 | Cite as

Clicks and cliques: exploring the soul of the community

  • Natalia da SilvaEmail author
  • Ignacio Alvarez-Castro
Original paper


In this paper we analyze 26 communities across the United States with the objective to understand what attaches people to their community and how this attachment differs among communities. How different are attached people from unattached? What attaches people to their community? How different are the communities? What are key drivers behind emotional attachment? To address these questions, graphical, supervised and unsupervised learning tools were used and information from the Census Bureau and the Knight Foundation were combined. Using the same pre-processed variables as Knight (Soul of the community, Technical report, 2010) most likely will drive the results towards the same conclusions than the Knight foundation, so this paper does not use those variables.


Community attachment Exploratory data analysis Random forests Statistical visualization Mosaic plot Product plot 



Professor Di Cook helped with the analysis and reviewing the paper. Xiaoyue Cheng’s mergeGui was used to create more amenable initial data. Israel Almodovar, Vianey Leos and Ricardo Martinez helped us with proofreading our paper for final corrections.


  1. Atkinson AB, Bourguignon F (2000) Handbook of income distribution, vol 1. Elsevier, AmsterdamGoogle Scholar
  2. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  3. Dahl DB (2016) xtable: export tables to LaTeX or HTML. R package version 1.8-2Google Scholar
  4. de Vries A, Ripley BD (2016) ggdendro: create dendrograms and tree diagrams using ‘ggplot2’. R package version 0.1-20Google Scholar
  5. Fox J, Weisberg S (2011) An R companion to applied regression, 2nd edn. Sage, Thousand OaksGoogle Scholar
  6. Friendly M (1994) Mosaic displays for multi-way contingency tables. J Am Stat Assoc 89(425):190–200CrossRefGoogle Scholar
  7. Handcock MS (2016) Relative distribution methods. Los Angeles, CA. Version 1.6-6. Project home page at
  8. Hartigan JA, Kleiner B (1981) Mosaics for contingency tables. In: Computer science and statistics: proceedings of the 13th symposium on the interface. Springer, pp 268–273Google Scholar
  9. Hofmann H, Wickham H, Cook D (2019) The 2013 data expo of the American Statistical Association. Comput Stat 25:551–554Google Scholar
  10. Hummel J (1996) Linked bar charts: analysing categorical data graphically. Comput Stat 11(1):23–33MathSciNetzbMATHGoogle Scholar
  11. Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, Berlin, HeidelbergCrossRefGoogle Scholar
  12. Izenman A (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning, 1st edn. Springer. ISBN:0387781889Google Scholar
  13. Knight F (2010) Soul of the community. Technical report.
  14. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22Google Scholar
  15. Lumley T (2018) Survey: analysis of complex survey samples. R package version 3.34Google Scholar
  16. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297Google Scholar
  17. R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  18. Rushton M (2008) A note on the use and misuse of the racial diversity index. Policy Stud J 36(3):445–459CrossRefGoogle Scholar
  19. Schloerke B, Crowley J, Cook D, Briatte F, Marbach M, Thoen E, Elberg A, Larmarange J (2018) GGally: extension to ’ggplot2’. R package version 1.4.0Google Scholar
  20. U.S. Census Bureau (2011) 2011 American community survey 5-year estimates. Tables B01001, B01003, B02001, B05005, B08202, B19001, B19301, B23006, B25026.
  21. Wickham H (2017) tidyverse: easily install and load the ‘Tidyverse’. R package version 1.2.1Google Scholar
  22. Wickham H, Hofmann H (2011) Product plots. IEEE Trans Vis Comput Graph 17(12):2223–2230CrossRefGoogle Scholar
  23. Wickham H, Hofmann H (2016) productplots: product plots for R. R package version 0.1.1Google Scholar
  24. Xie Y (2018) knitr: a general-purpose package for dynamic report generation in R. R package version 1.20Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of StatisticsIowa State UniversityAmesUSA

Personalised recommendations