Content Analysis between Quality and Quantity

Fulfilling Blended-Reading Requirements for the Social Sciences with a Scalable Text Mining Infrastructure

Abstract

Social science research using Text Mining tools requires—due to the lack of a canonical heuristics in the digital humanities—a blended reading approach. Integrating quantitative and qualitative analyses of complex textual data progressively, blended reading brings up various requirements for the implementation of Text Mining infrastructures. The article presents the Leipzig Corpus Miner (LCM), developed in the joint research project ePol—Post-Democracy and Neoliberalism and responding to social science research requirements. The functionalities offered by the LCM may serve as best practice of processing data in accordance with blended reading.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    http://www.epol-projekt.de; for the heuristic interest articulated by the Political Science branch of the project, see Lemke and Schaal [16: 3–19].

  2. 2.

    http://lisd.princeton.edu/projects/diachronic-global-corpus-digcor

  3. 3.

    http://translantis.wp.hum.uu.nl

  4. 4.

    Wiedemann et al. [27: pp. 101 ff].

  5. 5.

    See http://atlasti.com

  6. 6.

    Currently we are trying to optimize classification results, before we run final classifications for different sub collections. For now we achieve F1 = 0.613 and accuracy = 0.867 on our category of neoliberal argumentation (interrater reliability during manual annotation phase: Krippendorf’s alpha = 0.76).

References

  1. 1

    Baßler M (1995) Einleitung. In: Baßler M (ed) New Historicism. Literaturgeschichte als Poetik der Kultur. Fischer, Frankfurt a. M., pp 7–28

  2. 2

    Drucker J (2011) Humanities approaches to graphical display. Digital Humanities Quaterly 5, http://digitalhumanities.org/dhq/vol/5/1/000091/000091.html. Accessed 17 April 2014

  3. 3

    Dumm S, Lemke M (2013) Argumentmarker. Definition, Generierung und Anwendung im Rah- men eines semi-automatischen Dokument-Retrieval-Verfahrens, Hamburg/Leipzig (= ePol Discussion Paper 3). http://www.epol-projekt.de/wp-content/uploads/2014/10/Discussion-Paper-epol-3_dumm_lemke_CC.pdf. Accessed 1 Dec 2014

  4. 4

    Evangelopoulos N, Zhang X, Prybutok VR (2012) Latent semantic analysis: five methodological recommendations. Eur J Inf Syst 21:70–86

  5. 5

    Ferrucci D, Lally A (2004) UIMA. An architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 10(3–4):327–348

    Article  Google Scholar 

  6. 6

    Früh W (2009) Inhaltsanalyse. Theorie und Praxis. UVK, Konstanz

  7. 7

    Gadamer HG (1968) Klassische und philosophische Hermeneutik. In: Grondin J (ed) Gadamer-Lesebuch. Mohr Siebeck, Tübingen, pp 32–57

  8. 8

    Heyer G, Quasthoff U, Wittig T (2008) Text Mining: Wissensrohstoff Text. IT lernen. W3L GmbH, Herdecke

  9. 9

    Husserl E (1976) Die Krisis der europäischen Wissenschaften und die transzendentale Phänomenologie. Eine Einleitung in die phänomenologische Philosophie. Biemel v. W. (ed) Husserliana, vol 6. Nijhoff, The Hague

  10. 10

    Husserl E (1980) Phantasie, Bildbewusstsein, Erinnerung. Zur Phänomenologie der anschaulichen Vergegenwärtigungen. In: Marbach E (ed) Husserliana, vol. 23. Springer, The Hague

  11. 11

    Ihde D (1998) Expanding hermeneutics. Visualism in science. Northwestern University Press, Evanston

  12. 12

    Ihde D (2012) Experimental Phenomenology. Multistables. State University of New York, New York

  13. 13

    Kath R, Schaal GS, Dumm S (2015, forthcoming) New visual hermeneutics. Scharloth J, Bubenhofer N (eds) ZGL-Sonderheft Automatisierte Textanalyse. http://www.degruyter.com/view/j/zfgl

  14. 14

    Keim D, Kohlhammer J, Ellis G, Mansmann F (eds) (2010) Mastering the information age. Solving problems with visual analytics. http://www.diglib.eg.org. Accessed 14 May 2014

  15. 15

    Lemke M (2014) Frequenzanalyse und Diktionäransatz, Hamburg/Leipzig (eTMV 1/5). http://www.epol-projekt.de/wp-content/uploads/2014/10/eTMV_1.pdf

  16. 16

    Lemke M, Schaal GS (2014) Ökonomisierung und Politikfeldanalyse. Eine ideengeschichtliche und theoretische Rekonstruktion des Neoliberalismus in der Postdemokratie. Schaal GS, Lemke M, Ritzi C (eds) Die Ökonomisierung der Politik in Deutschland. Eine vergleichende Politikfeldanalyse. Springer VS, Wiesbaden, pp 3–19

  17. 17

    Lemke M, Stulpe A (2015, forthcoming) Text und soziale Wirklichkeit. Theoretische Grundlagen und empirische Anwendung von Text-Mining-Verfahren in sozialwissenschaftlicher Perspektive. Scharloth J, Bubenhofer N (eds) ZGL-Sonderheft Automatisierte Textanalyse. http://www.degruyter.com/view/j/zfgl

  18. 18

    Mayring P (2010) Qualitative Inhaltsanalyse. Grundlagen und Techniken, 11th edn. Beltz, Weinheim

  19. 19

    Montrose L (1995) Die Renaissance behaupten. Die Poetik und Politik der Kultur. In: Baßler M (ed), New Historicism. Literaturgeschichte als Poetik der Kultur. Fischer, Frankfurt a. M., pp 60–93

  20. 20

    Moretti F (2000) Conjectures on world literature. New Left Rev 1(1):54–68

    Google Scholar 

  21. 21

    Moretti F (2007) Graphs, maps, trees. Abstract models for literary history. Verso, London

  22. 22

    Niehr T (1999) Halbautomatische Erforschung des öffentlichen Sprachgebrauchs oder Vom Nutzen computerlesbarer Textkorpora. ZGL 27(2):205–214

    Google Scholar 

  23. 23

    Niekler A, Wiedemann G, Heyer G (2014) Leipzig Corpus Miner. A Text Mining Infrastructure for Qualitative Data Analysis. http://hal.archives-ouvertes.fr/hal-01005878/. Accessed 30 Sept 2014

  24. 24

    Niekler A, Wiedemann G, Dumm S, Heyer, G (2014) Creating dictionaries for argument identification by reference data, Poster presented at DHd2014, Passau, http://asv.informatik.uni-leipzig.de/publication/file/254/Poster_A0_dhd2014_final.pdf. Accessed 1 Dec 2014

  25. 25

    Stone PJ (1966) The general inquirer: A computer approach to content analysis. MIT Press, Cambridge

  26. 26

    Wiedemann G (2013) Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences. FQS, 14(2). http://www.qualitative-research.net/index.php/fqs/article/view/1949. Accessed 30 Sept 2014

  27. 27

    Wiedemann G, Lemke M, Niekler A (2013) Postdemokratie und Neoliberalismus – Zur Nutzung neoliberaler Argumentation in der Bundesrepublik Deutschland 1949–2011. Ein Werkstattbericht. ZPTh 4(1):99–115

    Google Scholar 

  28. 28

    Wiedemann G, Niekler A (2014) Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries. http://hal.archives-ouvertes.fr/hal-01005879/. Accessed 30 Sept 2014

Download references

Acknowledgements

ePol is a joint research project of the Institute for Political Science, specialization on Political Theory at Helmut-Schmidt-University Hamburg (Prof. Dr. Gary Schaal) and the Natural Language Processing Group, Department of Computer Science, University of Leipzig (Prof. Dr. Gerhard Heyer). The project is funded by the Federal ministry of education and research (BMBF; FKZ 01UG1231A and B).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Matthias Lemke.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lemke, M., Niekler, A., Schaal, G. et al. Content Analysis between Quality and Quantity. Datenbank Spektrum 15, 7–14 (2015). https://doi.org/10.1007/s13222-014-0174-x

Download citation

Keywords

  • Text Mining
  • Qualitative Analysis
  • Blended Reading
  • Mixed Methods
  • Corpus Linguistics