Abstract
Recently, singular value decomposition (SVD) and its variants, which are singular value rescaling (SVR), approximation dimension equalization (ADE) and iterative residual rescaling (IRR), were proposed to conduct the job of latent semantic indexing (LSI). Although they are all based on linear algebraic method for tem-document matrix computation, which is SVD, the basic motivations behind them concerning LSI are different from each other. In this paper, a series of experiments are conducted to examine their effectiveness of LSI for the practical application of text mining, including information retrieval, text categorization and similarity measure. The experimental results demonstrate that SVD and SVR have better performances than other proposed LSI methods in the above mentioned applications. Meanwhile, ADE and IRR, because of the too much difference between their approximation matrix and original term-document matrix in Frobenius norm, can not derive good performances for text mining applications using LSI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
White, C.: Consolidating, accessing and analyzing unstructured data, http://www.b-eye-network.com/view/2098
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Golub, G.H., von Loan, C.F.: Matrix Computations, 3rd edn., pp. 72–73. The John Hopkins University Press (1996)
Yan, H., Grosky, W.I., Fotouhi, F.: Augmenting the power of LSI in text retrieval: Singular value rescaling. Data & Knowledge Engineering 65(1), 108–125 (2008)
Ando, R.K.: Latent Semantic Space: Iterative Scaling Imrpoves Precision of Inter-document Similarity Measurement. In: Proceedings of SIGIR 2000, pp. 216–223 (2000)
Zha, H., Marques, O., Simon, H.D.: Large scale SVD and subspace-based methods for information retrieval. In: Ferreira, A., Rolim, J.D.P., Teng, S.-H. (eds.) IRREGULAR 1998. LNCS, vol. 1457, pp. 29–42. Springer, Heidelberg (1998)
Jiang, F., Littman, M.L.: Approximate Dimension Equalization in Vector-based Information Retrieval. In: Proceedings of the Seventh International Conference on Machine Learning (ICML 2000), pp. 423–430 (2000)
Zhang, W., Yoshida, T., Tang, X.J.: Text classification based on multi-word with support vector machine. Knowledge-based Systems 21(8), 879–886 (2008)
Zhang, W., Yoshida, T., Tang, X.J.: Using Ontology to Improve Precision of Terminology Extraction from Documents. Expert Systems with Applications (2009) (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, W., Tang, X., Yoshida, T. (2009). A Comparison of SVD, SVR, ADE and IRR for Latent Semantic Indexing. In: Shi, Y., Wang, S., Peng, Y., Li, J., Zeng, Y. (eds) Cutting-Edge Research Topics on Multiple Criteria Decision Making. MCDM 2009. Communications in Computer and Information Science, vol 35. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02298-2_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-02298-2_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02297-5
Online ISBN: 978-3-642-02298-2
eBook Packages: Computer ScienceComputer Science (R0)