Abstract
In this paper, we present a method for measuring semantic similarity between short texts by combining two different kinds of features: (1) distributed representation of word, (2) knowledge-based and corpus-based metrics. Then, we present experiments to evaluate our method on two popular datasets - Microsoft Research Paraphrase Corpus and SemEval-2015. The experimental results show that our method achieves state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
This information is generated from the 03 March 2016 dump.
- 7.
- 8.
References
Duong, P., Nguyen, H., Nguyen, V.: Evaluating semantic relatedness between concepts. In: IMCOM, pp. 20:1–20:8. ACM (2016)
Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: HLT-NAACL, pp. 182–190 (2012)
Bach, N., Nguyen, M., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Syst. Appl. 41(6), 2832–2841 (2014)
Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp. 775–780 (2006)
Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: EMNLP, pp. 891–896 (2013)
Guo, W., Diab, M.: Modeling sentences in the latent space. ACL 1, 864–872 (2012)
Kozareva, Z., Montoyo, A.: Paraphrase identification on the basis of supervised machine learning techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 524–533. Springer, Heidelberg (2006)
Snover, M., Madnani, N., Dorr, B., Schwartz, R.: TER-Plus: paraphrase, semantic, and alignment. Mach. Transl. 23(2–3), 117–127 (2009)
Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 468–476 (2009)
Socher, R., Huang, E., Pennington, J., Ng, A., Manning, C.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) NIPS, pp. 801–809 (2011)
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: LluÃs, M., Callison-Burch, C., Pighin, D., Marton, Y. (eds.) EMNLP, pp. 1576–1586 (2015)
Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: Carr, L., Roure, D., Iyengar, A., Dahlin, M. (eds.) WWW, pp. 377–386 (2006)
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Mikolov, T., Chen, K., Corrado, G.: Efficient estimation of word representations in vector. In: Proceedings of International Conference of Learning Representations (2013)
Qiu, L., Cao, Y., Nie, Z., Yu, Y.: Learning word representation considering proximity and ambiguity. In: Brodley, C., Stone, P. (eds.) AAAI, pp. 1572–1578 (2014)
Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow methods for named entity coreference resolution. In: Chaınes de références et résolveurs d’anaphores, Workshop TALN (2002)
Mikolov, T., Sutskever, I., Chen, K., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)
Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)
Dolan, W., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP (2005)
Nguyen, H.T., Duong, P.H., Le, T.Q.: A multifaceted approach to sentence similarity. In: Huynh, V.-N., Inuiguchi, M., Demoeux, T. (eds.) IUKM 2015. LNCS, vol. 9376, pp. 303–314. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25135-6_29
Sultan, M., Bethard, S., Sumner, T.: DLS@ CU: Sentence similarity from word alignment and semantic vector composition. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 148–153 (2015)
Milajevs, D., Kartsaklis, D., Sadrzadeh, M., Purver, M.: Evaluating neural word representations in tensor-based. In: EMNLP, pp. 708–719 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Duong, P.H., Nguyen, H.T., Huynh, NT. (2016). Measuring Similarity for Short Texts on Social Media. In: Nguyen, H., Snasel, V. (eds) Computational Social Networks. CSoNet 2016. Lecture Notes in Computer Science(), vol 9795. Springer, Cham. https://doi.org/10.1007/978-3-319-42345-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-42345-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42344-9
Online ISBN: 978-3-319-42345-6
eBook Packages: Computer ScienceComputer Science (R0)