Vietnamese Sentence Similarity Based on Concepts

Nguyen, Hien T.; Duong, Phuc H.; Vo, Vinh T.

doi:10.1007/978-3-662-45237-0_24

Hien T. Nguyen¹⁷,
Phuc H. Duong¹⁷ &
Vinh T. Vo¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8838))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

2056 Accesses
2 Citations

Abstract

We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-word similarity, word-order similarity, and concept similarity. We build datasets consisting of 45 Vietnamese sentence pairs and then evaluate the method on these datasets. The results show that in the best cases, concept similarity help improving the performance of our method more than 15% point. The proposed method is language-independent and quite easy to employ. Therefore, one can readily adopt our method to measure semantic similarity for sentences written in other languages.

Download to read the full chapter text

Chapter PDF

Calculation of Textual Similarity Using Semantic Relatedness Functions

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Measuring Semantic Similarity of Vietnamese Sentences Based on Lexical and Distribution Similarity

Keywords

References

Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI 6, 775–780 (2006)
Google Scholar
Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. In: ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2), Article 10 (2008)
Google Scholar
Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377–386 (2006)
Google Scholar
Li, Y., McLean, D., Bandar, Z.A., O’shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)
Article Google Scholar
Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: A syntax-based measure for short-text semantic similarity. Data & Knowledge Engineering 70(4), 390–405 (2011)
Article Google Scholar
Bach, N.X., Minh, N.L., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Systems with Applications 41(6), 2832–2841 (2014)
Article Google Scholar
Madnani, N., Tetreault, J., Chodorow, M.: Re-examining Machine Translation Metrics for Paraphrase Identification. In: Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), pp. 182–190 (2012)
Google Scholar
Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS 24, 801–809 (2011)
Google Scholar
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Google Scholar
Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468–476 (2009)
Google Scholar
Qiu, L., Kan, M.Y., Chua, T.S.: Paraphrase recognition via dissimilarity significance classification. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 18–26 (2006)
Google Scholar
Rus, V., McCarthy, P.M., Lintean, M.C., McNamara, D.S., Graesser, A.C.: Paraphrase identification with lexico-syntactic graph subsumption. In: FLAIRS 2008, pp. 201–206 (2008)
Google Scholar
Lee, M.C.: A novel sentence similarity measure for semantic-based expert systems. Expert Systems with Applications 38(5), 6392–6399 (2011)
Article Google Scholar
Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Information Sciences 180(20), 4031–4041 (2010)
Article Google Scholar
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556 (2012)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Google Scholar
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37(1), 1–40 (2010)
MATH Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Article Google Scholar
Huynh, H.M., Nguyen, T.T., Cao, T.H.: Using coreference and surrounding contexts for entity linking. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF 2013), pp. 1–5 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Ton Duc Thang University, Vietnam
Hien T. Nguyen, Phuc H. Duong & Vinh T. Vo

Authors

Hien T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phuc H. Duong
View author publications
You can also search for this author in PubMed Google Scholar
Vinh T. Vo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Physics and Applied Computer Science, AGH University of Science and Technology, Mickiewicza 30, 30059, Krakow, Poland
Khalid Saeed
Faculty of Electrical Engineering and computer Science, VŠB-Technical University of Ostrava, 17. listopadu 15, 70833, Ostrava-Poruba, Czech Republic
Václav Snášel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, H.T., Duong, P.H., Vo, V.T. (2014). Vietnamese Sentence Similarity Based on Concepts. In: Saeed, K., Snášel, V. (eds) Computer Information Systems and Industrial Management. CISIM 2015. Lecture Notes in Computer Science, vol 8838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45237-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-662-45237-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45236-3
Online ISBN: 978-3-662-45237-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Vietnamese Sentence Similarity Based on Concepts

Abstract

Chapter PDF

Similar content being viewed by others

Calculation of Textual Similarity Using Semantic Relatedness Functions

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Measuring Semantic Similarity of Vietnamese Sentences Based on Lexical and Distribution Similarity

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Vietnamese Sentence Similarity Based on Concepts

Abstract

Chapter PDF

Similar content being viewed by others

Calculation of Textual Similarity Using Semantic Relatedness Functions

Sentence Similarity by Combining Explicit Semantic Analysis and Overlapping N-Grams

Measuring Semantic Similarity of Vietnamese Sentences Based on Lexical and Distribution Similarity

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation