Investigating Unstructured Texts with Latent Semantic Analysis

  • Fridolin Wild
  • Christina Stahl
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Latent semantic analysis (LSA) is an algorithm applied to approximate the meaning of texts, thereby exposing semantic structure to computation. LSA combines the classical vector-space model — well known in computational linguistics — with a singular value decomposition (SVD), a two-mode factor analysis. Thus, bag-of-words representations of texts can be mapped into a modified vector space that is assumed to reflect semantic structure. In this contribution the authors describe the lsa package for the statistical language and environment R and illustrate its proper use through examples from the areas of automated essay scoring and knowledge representation.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BAEZA-YATES, R. and RIBEIRO-NETO, B. (1999): Modern Information Retrieval. ACM Press, New York.Google Scholar
  2. BERRY, M., DUMAIS, S. and O’BRIEN, G. (1995): Using Linear Algebra for Intelligent Information Retrieval. SIAM Review, 37, 573–595.CrossRefMathSciNetMATHGoogle Scholar
  3. DEERWESTER, S., DUMAIS, S., FURNAS, G., LANDAUER, T. and HARSHMAN, R. (1990): Indexing by Latent Semantic Analysis. JASIS, 41, 391–407.CrossRefGoogle Scholar
  4. LANG, D.T. (2004): Rstem. R Package Version 0.2-0.Google Scholar
  5. STALNAKER, J.M. (1951): The Essay Type of Examination. In: E.F. Lindquist (Ed.): Educational Measurement. George Banta, Menasha, 495–530.Google Scholar
  6. WILD, F., STAHL, C., STERMSEK, G. and NEUMANN, G. (2005): Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In: M. Danson (Ed.): Proceedings of the 9th CAA. Prof. Development, Loughborough, 485–494.Google Scholar
  7. WILD, F. (2005): lsa: Latent Semantic Analysis. R Package Version 0.57.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fridolin Wild
    • 1
  • Christina Stahl
    • 1
  1. 1.Institute for Information Systems and New MediaVienna University of Economics and Business AdministrationViennaAustria

Personalised recommendations