Advertisement

Identification of Critical Values in Latent Semantic Indexing

  • April Kontostathis
  • William M. Pottenger
  • Brian D. Davison
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 6)

Abstract

In this chapter we analyze the values used by Latent Semantic Indexing (LSI) for information retrieval. By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero). This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries. We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI). Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested. Our approach additionally has the computational benefit of reducing memory requirements and query response time.

Keywords

Information Retrieval Singular Value Decompo Average Precision Retrieval Performance Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Authors and Affiliations

  • April Kontostathis
    • 1
  • William M. Pottenger
    • 1
  • Brian D. Davison
    • 1
  1. 1.Ursinus College, Department of Mathematics and Computer SciencePennsylvania

Personalised recommendations