Advertisement

Spotting Topics with the Singular Value Decomposition

  • Charles Nicholas
  • Randall Dahlberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1481)

Abstract

The singular value decomposition, or SVD, has been studied in the past as a tool for detecting and understanding patterns in a collection of documents. We show how the matrices produced by the SVD calculation can be interpreted, allowing us to spot patterns of characters that indicate particular topics in a corpus. A test collection, consisting of two days of AP newswire traffic, is used as a running example.

Keywords

Singular Vector Term Vector Document Vector Test Corpus Negative Entry 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Michael Berry. Large scale singular value calculations. International Journal of Supercomputer Applications, 6:13–49, 1992.Google Scholar
  2. 2.
    Michael Berry, Susan Dumais, and Gavin O’Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37(4):573–595, December 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    M. Damashek. Gauging similarity with n-grams: Language-independent categorization of text. Science, 267:843–848, 10 February 1995.CrossRefGoogle Scholar
  4. 4.
    Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.CrossRefGoogle Scholar
  5. 5.
    Susan Dumais. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments & Computers, 23(2):229–236, 1991.Google Scholar
  6. 6.
    Donna Harman. Overview of the Fourth Text REtrieval Conference (TREC-4). National Institute of Standards and Technology, 1995.Google Scholar
  7. 7.
    Bradley Kjell and Ophir Frieder. Visualization of literary style. In IEEE International Conference on Systems, Man and Cybernetics, pages 656–661. IEEE, 18–21 October 1992.Google Scholar
  8. 8.
    Thomas Landauer and Michael Littman. Computerized cross-language document retrieval using latent semantic indexing. United States Patent 5,301,109, 5 April 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Charles Nicholas
    • 1
  • Randall Dahlberg
    • 2
  1. 1.University of Maryland Baltimore CountyBaltimore
  2. 2.U.S. Department of DefenseUSA

Personalised recommendations