Measuring Similarity for Short Texts on Social Media

Duong, Phuc H.; Nguyen, Hien T.; Huynh, Ngoc-Tu

doi:10.1007/978-3-319-42345-6_22

Phuc H. Duong¹⁵,
Hien T. Nguyen¹⁵ &
Ngoc-Tu Huynh¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9795))

Included in the following conference series:

International Conference on Computational Social Networks

1348 Accesses
2 Citations

Abstract

In this paper, we present a method for measuring semantic similarity between short texts by combining two different kinds of features: (1) distributed representation of word, (2) knowledge-based and corpus-based metrics. Then, we present experiments to evaluate our method on two popular datasets - Microsoft Research Paraphrase Corpus and SemEval-2015. The experimental results show that our method achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://wordnet.princeton.edu.
2.
https://en.wikipedia.org/.
3.
http://usat.ly/1pla4oI.
4.
http://nyti.ms/1QS47Ga.
5.
http://en.wiktionary.org/.
6.
This information is generated from the 03 March 2016 dump.
7.
https://en.wikipedia.org/wiki/Brown_Corpus.
8.
http://alt.qcri.org/semeval2015/task2/.

References

Duong, P., Nguyen, H., Nguyen, V.: Evaluating semantic relatedness between concepts. In: IMCOM, pp. 20:1–20:8. ACM (2016)
Google Scholar
Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: HLT-NAACL, pp. 182–190 (2012)
Google Scholar
Bach, N., Nguyen, M., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Syst. Appl. 41(6), 2832–2841 (2014)
Article Google Scholar
Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Article Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp. 775–780 (2006)
Google Scholar
Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: EMNLP, pp. 891–896 (2013)
Google Scholar
Guo, W., Diab, M.: Modeling sentences in the latent space. ACL 1, 864–872 (2012)
Google Scholar
Kozareva, Z., Montoyo, A.: Paraphrase identification on the basis of supervised machine learning techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 524–533. Springer, Heidelberg (2006)
Chapter Google Scholar
Snover, M., Madnani, N., Dorr, B., Schwartz, R.: TER-Plus: paraphrase, semantic, and alignment. Mach. Transl. 23(2–3), 117–127 (2009)
Article Google Scholar
Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 468–476 (2009)
Google Scholar
Socher, R., Huang, E., Pennington, J., Ng, A., Manning, C.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) NIPS, pp. 801–809 (2011)
Google Scholar
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Lluís, M., Callison-Burch, C., Pighin, D., Marton, Y. (eds.) EMNLP, pp. 1576–1586 (2015)
Google Scholar
Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: Carr, L., Roure, D., Iyengar, A., Dahlin, M. (eds.) WWW, pp. 377–386 (2006)
Google Scholar
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Google Scholar
Mikolov, T., Chen, K., Corrado, G.: Efficient estimation of word representations in vector. In: Proceedings of International Conference of Learning Representations (2013)
Google Scholar
Qiu, L., Cao, Y., Nie, Z., Yu, Y.: Learning word representation considering proximity and ambiguity. In: Brodley, C., Stone, P. (eds.) AAAI, pp. 1572–1578 (2014)
Google Scholar
Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow methods for named entity coreference resolution. In: Chaınes de références et résolveurs d’anaphores, Workshop TALN (2002)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)
Google Scholar
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Google Scholar
Dolan, W., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP (2005)
Google Scholar
Nguyen, H.T., Duong, P.H., Le, T.Q.: A multifaceted approach to sentence similarity. In: Huynh, V.-N., Inuiguchi, M., Demoeux, T. (eds.) IUKM 2015. LNCS, vol. 9376, pp. 303–314. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25135-6_29
Chapter Google Scholar
Sultan, M., Bethard, S., Sumner, T.: DLS@ CU: Sentence similarity from word alignment and semantic vector composition. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 148–153 (2015)
Google Scholar
Milajevs, D., Kartsaklis, D., Sadrzadeh, M., Purver, M.: Evaluating neural word representations in tensor-based. In: EMNLP, pp. 708–719 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Phuc H. Duong, Hien T. Nguyen & Ngoc-Tu Huynh

Authors

Phuc H. Duong
View author publications
You can also search for this author in PubMed Google Scholar
Hien T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngoc-Tu Huynh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hien T. Nguyen .

Editor information

Editors and Affiliations

Ton Duc Thang University , Ho Chi Minh City, Vietnam
Hien T. Nguyen
VSB-Technical University of Ostrava , Ostrava, Poland
Vaclav Snasel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duong, P.H., Nguyen, H.T., Huynh, NT. (2016). Measuring Similarity for Short Texts on Social Media. In: Nguyen, H., Snasel, V. (eds) Computational Social Networks. CSoNet 2016. Lecture Notes in Computer Science(), vol 9795. Springer, Cham. https://doi.org/10.1007/978-3-319-42345-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-42345-6_22
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42344-9
Online ISBN: 978-3-319-42345-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics