Advertisement

Large-scale SVD and subspace-based methods for information retrieval

  • Hongyuan ZhaEmail author
  • Osni Marques
  • Horst D. Simon
Regular Talks
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1457)

Abstract

A theoretical foundation for latent semantic indexing (LSI) is proposed by adapting a model first used in array signal processing to the context of information retrieval using the concept of subspaces. It is shown that this subspace-based model coupled with minimal description length (MDL) principle leads to a statistical test to determine the dimensions of the latent-concept subspaces in LSI. The effect of weighting on the choice of the optimal dimensions of latent-concept subspaces is illustrated. It is also shown that the model imposes a so-called low-rank-plus-shift structure that is approximately satisfied by the cross-product of the term-document matrices. This structure can be exploited to give a more accurate updating scheme for LSI and to correct some of the misconception about the achievable retrieval accuracy in LSI updating. Variants of Lanczos algorithms are illustrated with numerical test results on Cray T3E using document collections generated from World Wide Web.

Keywords

Singular Value Decomposition Singular Vector Average Precision Krylov Subspace Minimum Description Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    M.W. Berry, S.T. Dumais and G.W. O'Brien. Using linear algebra for intelligent information retrieval. SIAM Review, 37:573–595, 1995.Google Scholar
  2. [2]
    L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. W. Demmel, I. Dhillon, J. J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK User's Guide. SIAM, Philadelphia, USA, 1997.Google Scholar
  3. [3]
    Cornell SMART System, ftp://ftp.cs.cornell.edu/pub/smart.Google Scholar
  4. [4]
    S. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas and R.A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41:391–407, 1990.Google Scholar
  5. [5]
    G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, USA, third edition, 1996.Google Scholar
  6. [6]
    R. G. Grimes, J. G. Lewis, and H. D. Simon. A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Eigenvalue Problems. SIAM J. Matrix Anal. Appl., 15:228–272, 1994.Google Scholar
  7. [7]
    D. Harman. TREC-3 conference report. NIST Special Publication 500-225, 1995.Google Scholar
  8. [8]
    G. Kowalski. Information Retrieval System: Theory and Implementation. Kluwer Academic Publishers, Boston, 1997.Google Scholar
  9. [9]
    R. Krovetz and W.B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10:115–141, 1992.Google Scholar
  10. [10]
    B. Nour-Omid, B. N. Parlett, T. Ericsson, and P. S. Jensen. How to Implement the Spectral Transformation. Mathematics of Computation, 48:663–673, 1987.Google Scholar
  11. [11]
    G.W. O'Brien. Information Management Tools for Updating an SVD-Encoded Indexing Scheme. M.S. Thesis, Department of Computer Science, Univ. of Tennessee, 1994.Google Scholar
  12. [12]
    O.A. Marques.BLZPACK: Description and User's Guide. CERFACS, TR/PA/95/30, 1995.Google Scholar
  13. [13]
    B. N. Parlett. The Symmetric Eigenvalue Problem. Prentice Hall, Englewood Cliffs, USA, 1980.Google Scholar
  14. [14]
    B. N. Parlett and D. S. Scott. The Lanczos Algorithm with Selective Orthogonalization. Mathematics of Computation, 33:217–238, 1979.Google Scholar
  15. [15]
    G. Salton. Automatic Text Processing. Addison-Wesley, New York, 1989.Google Scholar
  16. [16]
    H. D. Simon. The Lanczos Algorithm with Partial Reorthogonalization. Mathematics of Computation, 42:115–142, 1984.Google Scholar
  17. [17]
    H.D. Simon and H. Zha. Low rank matrix approximation using the Lanczos bidiagonalization process with applications. Technical Report CSE-97-008, Department of Computer Science and Engineering, The Pennsylvania State University, 1997.Google Scholar
  18. [18]
    G. Xu and T. Kailath. Fast subspace decomposotion. IEEE Transactions on Signal Processing, 42:539–551, 1994.Google Scholar
  19. [19]
    G. Xu, H. Zha, G. Golub, and T. Kailath. Fast algorithms for updating signal subspaces. IEEE Transactions on Circuits and Systems, 41:537–549, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  1. 1.Lawrence Berkeley National Laboratory/NERSCBerkeley

Personalised recommendations