Soul of the community: an attempt to assess attachment to a community

  • Anna QuachEmail author
  • Jürgen Symanzik
  • Nicole Forsgren
Original paper


In this article, we work with data from the Soul of the Community survey project that was conducted by the Knight Foundation from 2008 to 2010. Overall, 26 communities across the United States with a total of more than 47,800 participants took part in this study. Each year, around 200 different questions were posed to each participant. One key variable is attachment to one’s community. In our article, we provide an assessment via various machine learning algorithms which factors may have an effect on attachment.


Random forests Archetypes Knight foundation Community attachment 



We would like to thank Dr. Adele Cutler for her input on the methodology of this manuscript and for providing access to her archetype software. In addition, we would like to thank the reviewers for their helpful comments and suggestions. This article was submitted prior to Jürgen Symanzik becoming Editor-in-Chief of Computational Statistics, and was handled by Yuichi Mori, the previous Editor-in-Chief.


  1. Becker RA, Wilks AR, Brownrigg R, Minka TP (2013) Maps: draw geographical maps. R package version 2.3-2. Accessed 12 Dec 2018
  2. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual workshop on computational learning theory (COLT’92). ACM Press, Pittsburgh, PA, USA, pp 144–152Google Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  4. Breiman L, Cutler A (2014) Random forests. Accessed 21 May 2014
  5. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyzbMATHGoogle Scholar
  6. Cook D (2014) ASA 2009 data expo. Comput Stat 29(1–2):117–119CrossRefGoogle Scholar
  7. Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4):338–347MathSciNetCrossRefzbMATHGoogle Scholar
  8. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 85:14,863–14,868CrossRefGoogle Scholar
  9. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181MathSciNetzbMATHGoogle Scholar
  10. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(7):179–188CrossRefGoogle Scholar
  11. Hofmann H (2013) Soul of the community. Accessed 12 Nov 2013
  12. Hofmann H, Wickham H, Cook D (2019) The 2013 data expo of the American Statistical Association. Computational Statistics XX(YY): This issueGoogle Scholar
  13. Inselberg A (2009) Parallel coordinates: visual multidimensional geometry and its applications. Springer, New YorkCrossRefzbMATHGoogle Scholar
  14. James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning. Springer, New YorkCrossRefzbMATHGoogle Scholar
  15. Kahle D, Wickham H (2013) ggmap: a package for spatial visualization with Google Maps and OpenStreetMap. R package version 2.3. Accessed 12 Dec 2018
  16. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab–an S4 package for kernel methods in R. J Stat Softw 11(9):1–20CrossRefGoogle Scholar
  17. Knight Foundation (2013) Soul of the community. Accessed 12 Nov 2013
  18. Knight Foundation (2014) Accessed 23 May 2014
  19. Knight Foundation (2015) Accessed 3 Mar 2015
  20. Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A (2012) caret: classification and regression training. R package version 5.15-023.
  21. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
  22. Murrell P (2010) The 2006 data expo of the American Statistical Association. Comput Stat 25(4):551–554MathSciNetCrossRefzbMATHGoogle Scholar
  23. Neuwirth E (2011) RColorBrewer: ColorBrewer palettes. R package version 1.0-5. Accessed 24 Mar 2015
  24. Quach A, Symanzik J, Forsgren Velasquez N (2013) Soul of the community: a first attempt to assess attachment to a community. In: 2013 JSM proceedings, American Statistical Association, Alexandria, VAGoogle Scholar
  25. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria,, ISBN 3-900051-07-0
  26. Rowley E (November, 2011) Is loving where you live the key to a successful community?
  27. Schloerke B, Crowley J, Cook D, Hofmann H, Wickham H (2012) GGally: extension to ggplot2. R package version 0.4.2. Accessed 24 Mar 2015
  28. Wegman EJ (1990) Hyperdimensional data analysis using parallel coordinates. J Am Stat Assoc 85(411):664–675CrossRefGoogle Scholar
  29. Wickham H (2009) ggplot2: elegant graphics for data analysis. Springer New York,
  30. Wickham H (2011a) ASA 2009 data expo. J Comput Graph Stat 20(2):281–283MathSciNetCrossRefGoogle Scholar
  31. Wickham H (2011b) The split-apply-combine strategy for data analysis. J Stat Softw 40(1):1–29MathSciNetCrossRefGoogle Scholar
  32. Wickham H (2012) scales: scale functions for graphics. R package version 0.2.3. Accessed 24 Mar 2015
  33. Williams C (November, 2013) Detroit Mayor Dave Bing says bankruptcy was ‘inevitable’ after city hit rock-bottom.

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUtah State UniversityLoganUSA
  2. 2.Department of Management Information SystemsUtah State UniversityLoganUSA

Personalised recommendations