Skip to main content
Log in

Data-to-model: a mixed initiative approach for rapid ethnographic assessment

  • SI: Data to Model
  • Published:
Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

Rapid ethnographic assessment is used when there is a need to quickly create a socio-cultural profile of a group or region. While there are many forms such an assessment can take, we view it as providing insight into who are the key actors, what are the key issues, sentiments, resources, activities and locations, how have these changed in recent times, and what roles do the various actors play. We propose a mixed initiative rapid ethnographic approach that supports socio-cultural assessment through a network analysis lens. We refer to this as the data-to-model (D2M) process. In D2M, semi-automated computer-based text-mining and machine learning techniques are used to extract networks linking people, groups, issues, sentiments, resources, activities and locations from vast quantities of texts. Human-in-the-loop procedures are then used to tune and correct the extracted data and refine the computational extraction. Computational post-processing is then used to refine the extracted data and augment it with other information, such as the latitude and longitude of particular cities. This methodology is described and key challenges illustrated using three distinct data sets. We find that the data-to-model approach provides a reusable, scalable, rapid approach for generating a rapid ethnographic assessment in which human effort and coding errors are reduced, and the resulting coding can be replicated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Alexa M (1997) Computer-assisted text analysis methodology in the social sciences. ZUMA-Arbeitsbericht 97/07

  • Batagelj V, Mrvary A, Zaveršnik M (2002) Network analysis of texts. In: Erjavec T, Gros J (eds) Proceedings of the 5th international multi-conference information society—language technologies, Ljubljana, Jezikovne tehnologije/Language Technologies

    Google Scholar 

  • Bauersfeld K, Halgren S (1996) “You’ve got three days!” Case studies in field techniques for the time-challenged. In: Wixon D, Ramey J (eds) Field methods casebook for software design. Wiley, New York

    Google Scholar 

  • Beebe J (1995) Basic concepts and techniques of rapid appraisal. Human Organ 54(1):42–51

    Google Scholar 

  • Bentley ME, Pelto GH, Straus WL, Schumann DA, Adegbola C, de la Pena E, Oni GA, Brown KH, Huffman SL (1988) Rapid ethnographic assessment: applications in a diarrhea management program. Soc Sci Med 27(1):107–116

    Article  Google Scholar 

  • Bigrigg M (2012) Window size effect on key network entities. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-12-102

  • Blei DM, Ng AY, Jordan MI (2004) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  • Burkart M (2004) Thesaurus. In: Kuhlen R, Seeger T, Strauch D (eds) Grundlagen der Praktischen Information und Dokumentation: ein Handbuch zur Einführung in die Fachliche Informationswissenschaft und -praxis. Saur, Munich

    Google Scholar 

  • Carley KM (1993) Coding choices for textual analysis: a comparison of content analysis and map analysis. Sociol Method 23:75–126

    Article  Google Scholar 

  • Carley KM (1997) Network text analysis: the network position of concepts. In: Roberts CW (ed) Text analysis for the social sciences. Lawrence Erlbaum, Mahwah

    Google Scholar 

  • Carley KM (2002) Smart agents and organizations of the future. In: Lievrouw L, Livingstone S (eds) The handbook of new media. Sage, Thousand Oaks, pp 206–220

    Google Scholar 

  • Carley KM (2006) Destabilization of covert networks. Comput Math Organ Theory 12:51–66

    Article  Google Scholar 

  • Carley KM, Martin MK, Hirshman B (2009) The etiology of social change. Top Cogn Sci 1(4):621–650

    Article  Google Scholar 

  • Carley KM, Reminga J, Storrick J, Columbus D (2011a) ORA user’s guide 2011. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-11-107

  • Carley KM, Columbus D, Bigrigg M, Kunkel F (2011b) AutoMap user’s guide 2011. Carnegie Mellon University, School of Computer Science, Institute for Software Research, Technical Report, CMU-ISR-11-108

  • Chakrabarti S (2002) Mining the web: analysis of hypertext and semi structured data. Morgan Kaufmann, San Mateo

    Google Scholar 

  • Corman SR, Kuhn T, McPhee RD, Dooley KJ (2002) Studying complex discursive systems: centering resonance analysis of communication. Human Commun 28:157–206

    Google Scholar 

  • Diesner J, Carley KM (2005) Revealing social structure from texts: meta-matrix text analysis as a novel method for network text analysis In: Causal mapping for information systems and technology research: approaches, advances, and illustrations. Idea Group Publishing, Harrisburg

    Google Scholar 

  • Diesner J, Carley KM (2008) Conditional random fields for entity extraction and ontological text coding. J Comput Math Organ Theory 13:248–262

    Article  Google Scholar 

  • Ding B, Zhao B, Lin CX, Han J, Zhai C (2010) TopCells: keyword-based search of top-k aggregated documents in text cube. In: Proc of 2010 int conf on data engineering (ICDE’10)

    Google Scholar 

  • Garlan D, Carley KM, Schmerl B, Bigrigg M, Celiku O (2009) Using service-oriented architectures for socio-cultural analysis. In: Proceedings of the 21st international conference on software engineering and knowledge engineering (SEKE2009), Boston, USA

    Google Scholar 

  • Hofmann T (1999) Probabilistic latent semantic analysis. In: Proc of uncertainty in artificial intelligence

    Google Scholar 

  • Holsti OR (1969) Content analysis for the social sciences and humanities. Addison-Wesley, Reading

    Google Scholar 

  • Jurafsky D, Marton JH (2000) Speech and language processing. Prentice-Hall, Upper Saddle River

    Google Scholar 

  • Klein H (1997) Classification of text analysis software. In: Klar R, Opitz O (eds) Classification and knowledge organization: proceedings of the 20th annual conference of the gesellschaft für klassifikation eV University of Freiburg, Berlin. Springer, New York

    Google Scholar 

  • Krackhardt D, Carley KM (1998) A PCANS model of structure in organization. In: Proceedings of the 1998 international symposium on command and control research and technology evidence based research, Vienna, VA, pp 113–119

    Google Scholar 

  • Krippendorff K (2004) Content analysis: an introduction to its methodology, 2nd edn. Sage, Thousand Oaks

    Google Scholar 

  • Landauer T, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284

    Article  Google Scholar 

  • Lin CX, Zhao B, Mei Q, Han J (2010) A statistical model for popular event tracking in social communities. In: Proc of 2010 ACM int conf on knowledge discovery and data mining (KDD’10)

    Google Scholar 

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Popping R (2000) Computer-assisted text analysis. Sage, Thousand Oaks

    Google Scholar 

  • Porter MF (1980) An algorithm for suffix stripping. Program 14:130–137

    Article  Google Scholar 

  • Ramakrishnan C, Kochut KJ, Sheth AP (2006) A framework for schema-driven relationship discovery from unstructured text. In: Proc international semantic web conference

    Google Scholar 

  • Roth D, Yih W (2007) Global inference for entity and relation identification via a linear programming formulation. In: Getoor L L, Taskar B (eds) Introduction to statistical relational learning. MIT Press, Cambridge

    Google Scholar 

  • Wang C, Han J, Jia Y, Tang J, Zhang D, Yu Y, Guo J (2010) Mining advisor-advisee relationships from research publication networks. In: Proc 2010 ACM SIGKDD conf on knowledge discovery and data mining (KDD’10)

    Google Scholar 

  • Zhang D, Zhai CX, Han J, Srivastava A, Oza N (2009) Topic modeling for OLAP on multidimensional text databases: topic cube and its applications. Stat Anal Data Min, 2:378–395

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Office of Naval Research—ONR-N000140811223. (SORASCS), ONR-N000140910667 (CATNET), ONR-N000140811186 (Ethnographic), and W15P7T-09-C-8324 awarded by CERDEC-C2D under the THINK ATO. Additional support was provided by the center for Computational Analysis of Social and Organizational Systems (CASOS). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Office of Naval Research, CERDEC, the Department of Defense or the U.S. government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kathleen M. Carley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carley, K.M., Bigrigg, M.W. & Diallo, B. Data-to-model: a mixed initiative approach for rapid ethnographic assessment. Comput Math Organ Theory 18, 300–327 (2012). https://doi.org/10.1007/s10588-012-9125-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-012-9125-y

Keywords

Navigation