Abstract
We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-word similarity, word-order similarity, and concept similarity. We build datasets consisting of 45 Vietnamese sentence pairs and then evaluate the method on these datasets. The results show that in the best cases, concept similarity help improving the performance of our method more than 15% point. The proposed method is language-independent and quite easy to employ. Therefore, one can readily adopt our method to measure semantic similarity for sentences written in other languages.
Chapter PDF
Similar content being viewed by others
References
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI 6, 775–780 (2006)
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. In: ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2), Article 10 (2008)
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386 (2006)
Li, Y., McLean, D., Bandar, Z.A., O’shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)
Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: A syntax-based measure for short-text semantic similarity. Data & Knowledge Engineering 70(4), 390–405 (2011)
Bach, N.X., Minh, N.L., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Systems with Applications 41(6), 2832–2841 (2014)
Madnani, N., Tetreault, J., Chodorow, M.: Re-examining Machine Translation Metrics for Paraphrase Identification. In: Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), pp. 182–190 (2012)
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 24, 801–809 (2011)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476 (2009)
Qiu, L., Kan, M.Y., Chua, T.S.: Paraphrase recognition via dissimilarity significance classification. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 18–26 (2006)
Rus, V., McCarthy, P.M., Lintean, M.C., McNamara, D.S., Graesser, A.C.: Paraphrase identification with lexico-syntactic graph subsumption. In: FLAIRS 2008, pp. 201–206 (2008)
Lee, M.C.: A novel sentence similarity measure for semantic-based expert systems. Expert Systems with Applications 38(5), 6392–6399 (2011)
Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Information Sciences 180(20), 4031–4041 (2010)
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556 (2012)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37(1), 1–40 (2010)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Huynh, H.M., Nguyen, T.T., Cao, T.H.: Using coreference and surrounding contexts for entity linking. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF 2013), pp. 1–5 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 IFIP International Federation for Information Processing
About this paper
Cite this paper
Nguyen, H.T., Duong, P.H., Vo, V.T. (2014). Vietnamese Sentence Similarity Based on Concepts. In: Saeed, K., Snášel, V. (eds) Computer Information Systems and Industrial Management. CISIM 2015. Lecture Notes in Computer Science, vol 8838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45237-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-662-45237-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45236-3
Online ISBN: 978-3-662-45237-0
eBook Packages: Computer ScienceComputer Science (R0)