Text Analysis

  • Taylor Arnold
  • Lauren Tilton
Part of the Quantitative Methods in the Humanities and Social Sciences book series (QMHSS)


In this chapter, several methods for extracting meaning from a collection of parsed textual documents are presented. Examples include information retrieval, topic modeling, and stylometrics. Particular focus is placed on how to use these methods for constructing visualizations of textual corpora and a high-level categorization of some narrative trends.


Topic Model Latent Dirichlet Allocation Stop Word Code Snippet Word Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3: 993–1022, 2003.Google Scholar
  2. [2]
    Kurt Hornik and Bettina Grün. topicmodels: An r package for fitting topic models. Journal of Statistical Software, 40(13): 1–30, 2011.Google Scholar
  3. [3]
    Dan Knights, Michael C Mozer, and Nicolas Nicolov. Detecting topic drift with compound topic models. In ICWSM, 2009.Google Scholar
  4. [4]
    Maheshkumar H Kolekar, Kannappan Palaniappan, Somnath Sengupta, and Gunasekaran Seetharaman. Semantic concept mining based on hierarchical event detection for soccer video indexing. Journal of multimedia, 4(5):298–312, 2009.Google Scholar
  5. [5]
    David Mimno. mallet: A wrapper around the Java machine learning tool MALLET, 2013. URL http://CRAN.R-project.org/package=mallet. R package version 1.0.
  6. [6]
    Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2 (1–2):1–135, 2008.CrossRefGoogle Scholar
  7. [7]
    Roger D Peng and Nicolas W Hengartner. Quantitative analysis of literary styles. The American Statistician, 56(3):175–185, 2002.Google Scholar
  8. [8]
    Kevin Dela Rosa, Rushin Shah, Bo Lin, Anatole Gershman, and Robert Frederking. Topical clustering of tweets. Proceedings of the ACM SIGIR: SWSM, 2011.Google Scholar
  9. [9]
    Michael Steinbach, George Karypis, Vipin Kumar, et al. A comparison of document clustering techniques. In KDD workshop on text mining, volume 400, pages 525–526. Boston, MA, 2000.Google Scholar
  10. [10]
    Yee Whye Teh, Michael I Jordan, Matthew J Beal, and David M Blei. Hierarchical dirichlet processes. Journal of the american statistical association, 101 (476), 2006.Google Scholar
  11. [11]
    Sholom M Weiss, Nitin Indurkhya, and Tong Zhang. Fundamentals of predictive text mining. Springer Science & Business Media, 2010.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Taylor Arnold
    • 1
  • Lauren Tilton
    • 1
  1. 1.Yale UniversityNew HavenUSA

Personalised recommendations