Trading Spaces: On the Lore and Limitations of Latent Semantic Analysis
Two decades after its inception, Latent Semantic Analysis (LSA) has become part and parcel of every modern introduction to IR. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition.
For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l 2 space for an l 1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.
KeywordsSingular Value Decomposition Compressive Sensing Latent Dirichlet Allocation Latent Semantic Analysis Semantic Space
Unable to display preview. Download preview PDF.
- 2.Baraniuk, R.G.: Compressive Sensing. IEEE Signal Processing Magazine 24(118-120,124) (July 2007)Google Scholar
- 5.Deerwester, S.C., Dumais, S.T., Furnas, G.W., Harshman, R.A., Landauer, T.K., Lochbaum, K.E., Streeter, L.A.: U.S. Patent No. 4,839,853. U.S. Patent and Trademark Office, Washington, DC (June 1989)Google Scholar
- 9.Girolami, M., Kaban, A.: On an equivalence between pLSI and LDA. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–434 (2003)Google Scholar
- 11.Hoenkamp, E., Bruza, P., Song, D., Huang, Q.: An effective approach to verbose queries using a limited dependencies language model. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 116–127. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 13.Hoenkamp, E., Song, D.: The document as an ergodic markov chain. In: Proceedings of the 27th Conference on Research and Development in Information Retrieval, pp. 496–497 (2004)Google Scholar
- 14.Hoenkamp, E.: Why information retrieval needs cognitive science: A call to arms. In: Proceedings of the 27th Annual Conference of the Cognitive Science Society, pp. 965–970 (2005)Google Scholar
- 15.Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR Forum Special issue, pp. 50–57. ACM, New York (1999)Google Scholar
- 16.Hofmann, T., Christian, J.: U.S. Patent No. 6,687,696. U.S. Patent and Trademark Office, Washington, DC (February 1989)Google Scholar
- 17.Jaber, T., Amira, A., Milligan, P.: TDM modeling and evaluation of different domain transforms for LSI. Neurocomputing 72(10-12), 2406–2417 (2009); Lattice Computing and Natural Computing (JCIS 2007) / Neural Networks in Intelligent Systems Designn (ISDA 2007)Google Scholar
- 20.Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order: A comparison of latent semantic analysis and humans. In: Proc. of the 19th Annual Meeting of the Cognitive Science Society, pp. 412–417. Erlbaum, Mahwah (1991)Google Scholar
- 23.Park, L.A.F., Ramamohanarao, K.: Kernel latent semantic analysis using an information retrieval based kernel. In: International Conference on Information and Knowledge Management, pp. 1721–1724 (2009)Google Scholar
- 25.Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)Google Scholar