Latent Semantic Indexing Via a Semi-Discrete Matrix Decomposition
With the electronic storage of documents comes the possibility of building search engines that can automatically choose documents relevant to a given set of topics. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. There are a number of information retrieval systems based on inexact matches. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term profiles are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). For equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage for the MEDLINE test set.
Key wordsInformation Retrieval Latent Semantic Indexing Singular Value Decomposition Semi-Discrete Decomposition References
Unable to display preview. Download preview PDF.
- J. P. Callan, B. Croft, and S. M. Harding, The INQUERY retrieval system, in Proceedings of the Third International Conference on Database and Expert Systems Applications, Springer-Verlag, 1992, pp. 78–83.Google Scholar
- W. B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Prentice Hall, Englewood Cliffs, New Jersey, 1992.Google Scholar
- G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins Press, 2nd ed., 1989.Google Scholar
- G. Salton and M. J. Mcgill, Introduction to Modern Information Retrieval, McGraw-Hill, 1983.Google Scholar