Computational and Mathematical Organization Theory

, Volume 18, Issue 3, pp 300–327 | Cite as

Data-to-model: a mixed initiative approach for rapid ethnographic assessment

  • Kathleen M. Carley
  • Michael W. Bigrigg
  • Boubacar Diallo
SI: Data to Model


Rapid ethnographic assessment is used when there is a need to quickly create a socio-cultural profile of a group or region. While there are many forms such an assessment can take, we view it as providing insight into who are the key actors, what are the key issues, sentiments, resources, activities and locations, how have these changed in recent times, and what roles do the various actors play. We propose a mixed initiative rapid ethnographic approach that supports socio-cultural assessment through a network analysis lens. We refer to this as the data-to-model (D2M) process. In D2M, semi-automated computer-based text-mining and machine learning techniques are used to extract networks linking people, groups, issues, sentiments, resources, activities and locations from vast quantities of texts. Human-in-the-loop procedures are then used to tune and correct the extracted data and refine the computational extraction. Computational post-processing is then used to refine the extracted data and augment it with other information, such as the latitude and longitude of particular cities. This methodology is described and key challenges illustrated using three distinct data sets. We find that the data-to-model approach provides a reusable, scalable, rapid approach for generating a rapid ethnographic assessment in which human effort and coding errors are reduced, and the resulting coding can be replicated.


Text-mining Network-analysis Meta-network Social-networks Agent-based simulation Data analysis Newspaper data 



This work was supported in part by the Office of Naval Research—ONR-N000140811223. (SORASCS), ONR-N000140910667 (CATNET), ONR-N000140811186 (Ethnographic), and W15P7T-09-C-8324 awarded by CERDEC-C2D under the THINK ATO. Additional support was provided by the center for Computational Analysis of Social and Organizational Systems (CASOS). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, CERDEC, the Department of Defense or the U.S. government.


  1. Alexa M (1997) Computer-assisted text analysis methodology in the social sciences. ZUMA-Arbeitsbericht 97/07 Google Scholar
  2. Batagelj V, Mrvary A, Zaveršnik M (2002) Network analysis of texts. In: Erjavec T, Gros J (eds) Proceedings of the 5th international multi-conference information society—language technologies, Ljubljana, Jezikovne tehnologije/Language Technologies Google Scholar
  3. Bauersfeld K, Halgren S (1996) “You’ve got three days!” Case studies in field techniques for the time-challenged. In: Wixon D, Ramey J (eds) Field methods casebook for software design. Wiley, New York Google Scholar
  4. Beebe J (1995) Basic concepts and techniques of rapid appraisal. Human Organ 54(1):42–51 Google Scholar
  5. Bentley ME, Pelto GH, Straus WL, Schumann DA, Adegbola C, de la Pena E, Oni GA, Brown KH, Huffman SL (1988) Rapid ethnographic assessment: applications in a diarrhea management program. Soc Sci Med 27(1):107–116 CrossRefGoogle Scholar
  6. Bigrigg M (2012) Window size effect on key network entities. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-12-102 Google Scholar
  7. Blei DM, Ng AY, Jordan MI (2004) Latent dirichlet allocation. J Mach Learn Res 3:993–1022 Google Scholar
  8. Burkart M (2004) Thesaurus. In: Kuhlen R, Seeger T, Strauch D (eds) Grundlagen der Praktischen Information und Dokumentation: ein Handbuch zur Einführung in die Fachliche Informationswissenschaft und -praxis. Saur, Munich Google Scholar
  9. Carley KM (1993) Coding choices for textual analysis: a comparison of content analysis and map analysis. Sociol Method 23:75–126 CrossRefGoogle Scholar
  10. Carley KM (1997) Network text analysis: the network position of concepts. In: Roberts CW (ed) Text analysis for the social sciences. Lawrence Erlbaum, Mahwah Google Scholar
  11. Carley KM (2002) Smart agents and organizations of the future. In: Lievrouw L, Livingstone S (eds) The handbook of new media. Sage, Thousand Oaks, pp 206–220 Google Scholar
  12. Carley KM (2006) Destabilization of covert networks. Comput Math Organ Theory 12:51–66 CrossRefGoogle Scholar
  13. Carley KM, Martin MK, Hirshman B (2009) The etiology of social change. Top Cogn Sci 1(4):621–650 CrossRefGoogle Scholar
  14. Carley KM, Reminga J, Storrick J, Columbus D (2011a) ORA user’s guide 2011. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-11-107 Google Scholar
  15. Carley KM, Columbus D, Bigrigg M, Kunkel F (2011b) AutoMap user’s guide 2011. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-11-108 Google Scholar
  16. Chakrabarti S (2002) Mining the web: analysis of hypertext and semi structured data. Morgan Kaufmann, San Mateo Google Scholar
  17. Corman SR, Kuhn T, McPhee RD, Dooley KJ (2002) Studying complex discursive systems: centering resonance analysis of communication. Human Commun 28:157–206 Google Scholar
  18. Diesner J, Carley KM (2005) Revealing social structure from texts: meta-matrix text analysis as a novel method for network text analysis In: Causal mapping for information systems and technology research: approaches, advances, and illustrations. Idea Group Publishing, Harrisburg Google Scholar
  19. Diesner J, Carley KM (2008) Conditional random fields for entity extraction and ontological text coding. J Comput Math Organ Theory 13:248–262 CrossRefGoogle Scholar
  20. Ding B, Zhao B, Lin CX, Han J, Zhai C (2010) TopCells: keyword-based search of top-k aggregated documents in text cube. In: Proc of 2010 int conf on data engineering (ICDE’10) Google Scholar
  21. Garlan D, Carley KM, Schmerl B, Bigrigg M, Celiku O (2009) Using service-oriented architectures for socio-cultural analysis. In: Proceedings of the 21st international conference on software engineering and knowledge engineering (SEKE2009), Boston, USA Google Scholar
  22. Hofmann T (1999) Probabilistic latent semantic analysis. In: Proc of uncertainty in artificial intelligence Google Scholar
  23. Holsti OR (1969) Content analysis for the social sciences and humanities. Addison-Wesley, Reading Google Scholar
  24. Jurafsky D, Marton JH (2000) Speech and language processing. Prentice-Hall, Upper Saddle River Google Scholar
  25. Klein H (1997) Classification of text analysis software. In: Klar R, Opitz O (eds) Classification and knowledge organization: proceedings of the 20th annual conference of the gesellschaft für klassifikation eV University of Freiburg, Berlin. Springer, New York Google Scholar
  26. Krackhardt D, Carley KM (1998) A PCANS model of structure in organization. In: Proceedings of the 1998 international symposium on command and control research and technology evidence based research, Vienna, VA, pp 113–119 Google Scholar
  27. Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage, Thousand Oaks Google Scholar
  28. Landauer T, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284 CrossRefGoogle Scholar
  29. Lin CX, Zhao B, Mei Q, Han J (2010) A statistical model for popular event tracking in social communities. In: Proc of 2010 ACM int conf on knowledge discovery and data mining (KDD’10) Google Scholar
  30. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge CrossRefGoogle Scholar
  31. Popping R (2000) Computer-assisted text analysis. Sage, Thousand Oaks Google Scholar
  32. Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137 CrossRefGoogle Scholar
  33. Ramakrishnan C, Kochut KJ, Sheth AP (2006) A framework for schema-driven relationship discovery from unstructured text. In: Proc international semantic web conference Google Scholar
  34. Roth D, Yih W (2007) Global inference for entity and relation identification via a linear programming formulation. In: Getoor L L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge Google Scholar
  35. Wang C, Han J, Jia Y, Tang J, Zhang D, Yu Y, Guo J (2010) Mining advisor-advisee relationships from research publication networks. In: Proc 2010 ACM SIGKDD conf on knowledge discovery and data mining (KDD’10) Google Scholar
  36. Zhang D, Zhai CX, Han J, Srivastava A, Oza N (2009) Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. Stat Anal Data Min, 2:378–395 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Kathleen M. Carley
    • 1
  • Michael W. Bigrigg
    • 2
  • Boubacar Diallo
    • 2
  1. 1.Wean 5130, ISR, SCSCarnegie Mellon UniversityPittsburghUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA

Personalised recommendations