Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing
Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval. However, in many real-world applications dealing with very large document collections, LSI suffers from its high computational complexity, which comes from the process of Singular Value Decomposition(SVD). As a result, in practice, the folding-in method is widely used as an approximation to the LSI method. However, in practice, the folding-in method is generally implemented ”as is” and detailed analysis on its effectiveness and performance is left out. Consequentially, the performance of the folding-in method cannot be guaranteed. In this paper, we firstly illustrated the underlying principle of the folding-in method from a linear algebra point of view and analyzed some existing commonly used techniques. Based on the theoretical analysis, we proposed a novel algorithm to guide the implementation of the folding-in method. Our method was justified and evaluated by a series of experiments on various classical IR data sets. The results indicated that our method was effective and had consistent performance over different document collections.
KeywordsSingular Value Decomposition Average Precision Document Collection Vector Space Model Semantic Structure
Unable to display preview. Download preview PDF.
- 2.Dumais, S.T.: LSI meets TREC: A status report. In: The First Text REtrieval Conference (TREC1), pp. 137–152 (1992)Google Scholar
- 3.Dumais, S.T.: Latent semantic indexing (LSI) and TREC-2. In: The Second Text REtrieval Conference (TREC2), pp. 105–116 (1993)Google Scholar
- 4.Dumais, S.T.: Latent semantic indexing (LSI): TREC-3 report. In: The Third Text REtrieval Conference (TREC3), pp. 105–115 (1994)Google Scholar
- 8.Kontostathis, A., Pottenger, W.M.: A framework for understanding LSI performance. In: Proceedings of ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval (ACMSIGIRMF/IR 2003) (2003)Google Scholar
- 10.Dumais, S.: Enhancing performance in latent semantic indexing (LSI) retrieval. Technical Report TM-ARH-017527 (1990)Google Scholar
- 11.O’Brien, G.W.: Information management tools for updating an SVD-encoded indexing scheme. Master’s thesis, The University of Knoxville, Tennessee, TN (1994)Google Scholar
- 13.Chen, C.-M., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C.: Telcordia LSI engine: Implementation and scalability issues. In: RIDE 2001: Proceedings of the 11th International Workshop on research Issues in Data Engineering (2001)Google Scholar
- 14.Tang, C., Dwarkadas, S., Xu, Z.: On scaling latent semantic indexing for large peer-to-peer systems. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 112–121 (2004)Google Scholar
- 15.Bassu, D., Behrens, C.: Distributed LSI: Scalable concept-based information retrieval with high semantic resolution. In: Proceedings of the 3rd SIAM International Conference on Data Mining (Text Mining Workshop) (2003)Google Scholar