Chapter

Advances in Data Analysis

Part of the series Studies in Classification, Data Analysis, and Knowledge Organization pp 383-390

Investigating Unstructured Texts with Latent Semantic Analysis

  • Fridolin WildAffiliated withInstitute for Information Systems and New Media, Vienna University of Economics and Business Administration
  • , Christina StahlAffiliated withInstitute for Information Systems and New Media, Vienna University of Economics and Business Administration

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Latent semantic analysis (LSA) is an algorithm applied to approximate the meaning of texts, thereby exposing semantic structure to computation. LSA combines the classical vector-space model — well known in computational linguistics — with a singular value decomposition (SVD), a two-mode factor analysis. Thus, bag-of-words representations of texts can be mapped into a modified vector space that is assumed to reflect semantic structure. In this contribution the authors describe the lsa package for the statistical language and environment R and illustrate its proper use through examples from the areas of automated essay scoring and knowledge representation.