Skip to main content

Measuring Similarity for Short Texts on Social Media

  • Conference paper
  • First Online:
Computational Social Networks (CSoNet 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9795))

Included in the following conference series:

Abstract

In this paper, we present a method for measuring semantic similarity between short texts by combining two different kinds of features: (1) distributed representation of word, (2) knowledge-based and corpus-based metrics. Then, we present experiments to evaluate our method on two popular datasets - Microsoft Research Paraphrase Corpus and SemEval-2015. The experimental results show that our method achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://wordnet.princeton.edu.

  2. 2.

    https://en.wikipedia.org/.

  3. 3.

    http://usat.ly/1pla4oI.

  4. 4.

    http://nyti.ms/1QS47Ga.

  5. 5.

    http://en.wiktionary.org/.

  6. 6.

    This information is generated from the 03 March 2016 dump.

  7. 7.

    https://en.wikipedia.org/wiki/Brown_Corpus.

  8. 8.

    http://alt.qcri.org/semeval2015/task2/.

References

  1. Duong, P., Nguyen, H., Nguyen, V.: Evaluating semantic relatedness between concepts. In: IMCOM, pp. 20:1–20:8. ACM (2016)

    Google Scholar 

  2. Madnani, N., Tetreault, J., Chodorow, M.: Re-examining machine translation metrics for paraphrase identification. In: HLT-NAACL, pp. 182–190 (2012)

    Google Scholar 

  3. Bach, N., Nguyen, M., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Syst. Appl. 41(6), 2832–2841 (2014)

    Article  Google Scholar 

  4. Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  5. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, pp. 775–780 (2006)

    Google Scholar 

  6. Ji, Y., Eisenstein, J.: Discriminative improvements to distributional sentence similarity. In: EMNLP, pp. 891–896 (2013)

    Google Scholar 

  7. Guo, W., Diab, M.: Modeling sentences in the latent space. ACL 1, 864–872 (2012)

    Google Scholar 

  8. Kozareva, Z., Montoyo, A.: Paraphrase identification on the basis of supervised machine learning techniques. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 524–533. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Snover, M., Madnani, N., Dorr, B., Schwartz, R.: TER-Plus: paraphrase, semantic, and alignment. Mach. Transl. 23(2–3), 117–127 (2009)

    Article  Google Scholar 

  10. Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Su, K.-Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP, pp. 468–476 (2009)

    Google Scholar 

  11. Socher, R., Huang, E., Pennington, J., Ng, A., Manning, C.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) NIPS, pp. 801–809 (2011)

    Google Scholar 

  12. He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modeling with convolutional neural networks. In: Lluís, M., Callison-Burch, C., Pighin, D., Marton, Y. (eds.) EMNLP, pp. 1576–1586 (2015)

    Google Scholar 

  13. Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: Carr, L., Roure, D., Iyengar, A., Dahlin, M. (eds.) WWW, pp. 377–386 (2006)

    Google Scholar 

  14. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)

    Google Scholar 

  15. Mikolov, T., Chen, K., Corrado, G.: Efficient estimation of word representations in vector. In: Proceedings of International Conference of Learning Representations (2013)

    Google Scholar 

  16. Qiu, L., Cao, Y., Nie, Z., Yu, Y.: Learning word representation considering proximity and ambiguity. In: Brodley, C., Stone, P. (eds.) AAAI, pp. 1572–1578 (2014)

    Google Scholar 

  17. Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow methods for named entity coreference resolution. In: Chaınes de références et résolveurs d’anaphores, Workshop TALN (2002)

    Google Scholar 

  18. Mikolov, T., Sutskever, I., Chen, K., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  19. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138 (1994)

    Google Scholar 

  20. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)

    Google Scholar 

  21. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26 (1986)

    Google Scholar 

  22. Dolan, W., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP (2005)

    Google Scholar 

  23. Nguyen, H.T., Duong, P.H., Le, T.Q.: A multifaceted approach to sentence similarity. In: Huynh, V.-N., Inuiguchi, M., Demoeux, T. (eds.) IUKM 2015. LNCS, vol. 9376, pp. 303–314. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25135-6_29

    Chapter  Google Scholar 

  24. Sultan, M., Bethard, S., Sumner, T.: DLS@ CU: Sentence similarity from word alignment and semantic vector composition. In: Proceedings of the 9th International Workshop on Semantic Evaluation, pp. 148–153 (2015)

    Google Scholar 

  25. Milajevs, D., Kartsaklis, D., Sadrzadeh, M., Purver, M.: Evaluating neural word representations in tensor-based. In: EMNLP, pp. 708–719 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Duong, P.H., Nguyen, H.T., Huynh, NT. (2016). Measuring Similarity for Short Texts on Social Media. In: Nguyen, H., Snasel, V. (eds) Computational Social Networks. CSoNet 2016. Lecture Notes in Computer Science(), vol 9795. Springer, Cham. https://doi.org/10.1007/978-3-319-42345-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42345-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42344-9

  • Online ISBN: 978-3-319-42345-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics