Chapter

Advances in Information Retrieval

Volume 6611 of the series Lecture Notes in Computer Science pp 289-300

Subspace Tracking for Latent Semantic Analysis

  • Radim ŘehůřekAffiliated withCarnegie Mellon UniversityNLP lab, Masaryk University in Brno

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Modern applications of Latent Semantic Analysis (LSA) must deal with enormous (often practically infinite) data collections, calling for a single-pass matrix decomposition algorithm that operates in constant memory w.r.t. the collection size. This paper introduces a streamed distributed algorithm for incremental SVD updates. Apart from the theoretical derivation, we present experiments measuring numerical accuracy and runtime performance of the algorithm over several data collections, one of which is the whole of the English Wikipedia.