Abstract
The aim of this paper is to study possibilities of latent semantic analysis for automatic extraction of word pair collocations from domain texts. The basic idea of this work consists in a search of collocations among pairs of words with strong (stable) relations since collocations are nothing else than steady combinations of words. Results of experiments on a corpus of texts from a Russian online newspaper demonstrate that applying latent semantic analysis to collocation extraction significantly decreases information noise and strengthens the words associations. The proposed method will be used for an automatic building thesaurus for a domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing By Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Dumais, S.: Enhancing Performance in Latent Semantic Indexing, Behaviour Research Methods. Instruments, & Computers 23(2), 229–236 (1990)
Dumais, S.T.: Latent Semantic Indexing (LSI): TREC-3 Report. In: The 3rd Text Retrieval Conference, vol. 500(226), pp. 219–230. Nist Special Publication (1995)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: The 22nd Annual International SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3(5), 993–1022 (2003)
Potapenko, A., Vorontsov, K.: Robust PLSA performs better than LDA. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 784–787. Springer, Heidelberg (2013)
Church, W.K., Hanks, P.: Word association norms, mutual information and lexicography. In: The 27th Meeting of the Association of Computational Linguistics, pp. 76–83 (1989)
Church, W.K., Gale, A.W.: Concordance for parallel text. In: The 7th Annual Conference of the UW Centre for New OED and Text Research, Oxford, pp. 40–62 (1991)
Lin, D.: Extracting collocations from text corpora. In: Workshop on Computational Terminology, Montreal, Canada, pp. 57–63 (1998)
Panchenko, A., Romanov, P., Morozova, O., Naets, H., Philippovich, A., Romanov, A., Fairon, C.: Serelex: Search and Visualization of Semantically Related Words. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 837–840. Springer, Heidelberg (2013)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nugumanova, A., Bessmertny, I. (2013). Applying the Latent Semantic Analysis to the Issue of Automatic Extraction of Collocations from the Domain Texts. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and the Semantic Web. KESW 2013. Communications in Computer and Information Science, vol 394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41360-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-41360-5_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41359-9
Online ISBN: 978-3-642-41360-5
eBook Packages: Computer ScienceComputer Science (R0)