A Low Effort Approach to Quantitative Content Analysis

  • Maria Saburova
  • Archil Maysuradze
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 518)


We propose a workflow for an individual sociologist to be able to use quantitative content analysis in small-scale short-term research projects. The key idea of the approach is to generate a domain-oriented dictionary for researchers with limited resources. The workflow starts like a typical one and then deviates to include content analysis. First, the researcher performs deductive analysis which results in an interview guide. Second, the researcher conducts the small number of interviews to collect a domain-oriented labelled text corpus. Third, a domain-oriented dictionary is generated for the following content analysis. We propose and compare a number of methods to automatically extract a domain-oriented dictionary from a labelled corpus. Some properties of the proposed workflow are empirically studied based on a sociological research on volunteering in Russia.


Domain-oriented dictionary Quantitative content analysis Term extraction Low effort sociological workflow 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., Zhu, M.: A practical algorithm for topic modeling with provable guarantees. arXiv preprint arXiv:1212.4777 (2012)
  2. 2.
    Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond svd. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10. IEEE (2012)Google Scholar
  3. 3.
    Arsirij, E., Antoshhuk, S., Ignatenko, O., Trofimov, B.: Avtomatizacija razrabotki i obnovlenija semanticheskogo jadra sajta s dinamicheskim kontentom. Shtuchnijintelekt (2012)Google Scholar
  4. 4.
    Basili, R., Cammisa, M., Moschitti, A.: A Semantic Kernel to Classify Texts with Very Few Training Examples. Informatica (Slovenia) 30, 163–172 (2006)zbMATHGoogle Scholar
  5. 5.
    Baziz, M., Boughanem, M., Aussenac-Gilles, N.: Conceptual indexing based on document content representation. In: Crestani, F., Ruthven, I. (eds.) CoLIS 2005. LNCS, vol. 3507, pp. 171–186. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  6. 6.
    Bengston, D.N., Xu, Z.: Changing national forest values: a content analysis. Research Paper NC-323. St. Paul, MN: US Dept. of Agriculture, Forest Service, North Central Forest Experiment Station (2006)Google Scholar
  7. 7.
    Berelson, B.: Content analysis in communication research (1952)Google Scholar
  8. 8.
    von dem Berge, B., Poguntke, T., Obert, P., Tipei, D.: Measuring intra-party democracyGoogle Scholar
  9. 9.
    Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. Journal of Intelligent Information Systems 18(2–3), 127–152 (2002)CrossRefGoogle Scholar
  10. 10.
    Khalifa, O., Corne, D.W., Chantler, M., Halley, F.: Multi-objective topic modeling. In: Purshouse, R.C., Fleming, P.J., Fonseca, C.M., Greco, S., Shaw, J. (eds.) EMO 2013. LNCS, vol. 7811, pp. 51–65. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  11. 11.
    Kuznecov, A.M.: Strukturno-semanticheskie parametry v leksike: na materiale anglijskogo jazyka. Nauka (1980)Google Scholar
  12. 12.
    Kvale, S., Brinkmann, S.: Interviews: Learning the craft of qualitative research interviewing. Sage (2009)Google Scholar
  13. 13.
    Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to information retrieval, vol. 1. Cambridge university press Cambridge (2008)Google Scholar
  14. 14.
    Neuendorf, K.: Computer content analysis programs (2015). (Accessed July 13, 2015])
  15. 15.
    Newman, D., Karimi, S., Cavedon, L.: External evaluation of topic models. In: Australasian Doc. Comp. Symp., 2009. Citeseer (2009)Google Scholar
  16. 16.
    Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 100–108. Association for Computational Linguistics (2010)Google Scholar
  17. 17.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  18. 18.
    Stemler, S.: An overview of content analysis. Practical Assessment, Research & Evaluation 7(17), 137–146 (2001)Google Scholar
  19. 19.
    Voroncov, K.V., Potapenko, A.A.: Reguljarizacija verojatnostnyh tematicheskih modelej dlja povyshenija interpretiruemosti i opredelenija chisla tem. Mezhdunarodnaja konferencija po komp’juternoj lingvistike “Dialog”, pp. 676–687 (2014)Google Scholar
  20. 20.
    Vorontsov, K., Potapenko, A.: Pregularization, robustness and sparsity of probabilistic topic models. Computer Research and Modeling 4(4), 693–706 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Computational Mathematics and CyberneticsLomonosov Moscow State UniversityMoscowRussia

Personalised recommendations