Skip to main content

Evaluation of Similarity Measures in a Benchmark for Spanish Paraphrasing Detection

19th Mexican International Conference on Artificial Intelligence

  • Conference paper
  • First Online:
Advances in Computational Intelligence (MICAI 2020)

Abstract

In this paper, we present a similarity-based approach towards paraphrase detection in Spanish. We evaluate various models for semantic similarity computation using a gold-standard paraphrase corpus. It contains one original document and paraphrased documents on different levels (low and high), and reference documents on the same topic or same vocabulary. It allows to assess the similarity between a pair of texts or individual sentences. We found that some of the similarity metrics have a larger difference when comparing paraphrased sentences than others. Finally, we obtained a threshold for each of the similarity metrics with the aim of determining a classification boundary to decide if two sentences are paraphrased.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bendersky, M., Croft, W.B.: Finding text reuse on the web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 262–271 (2009)

    Google Scholar 

  2. Castro, B., Sierra, G., Torres-Moreno, J.M., Da Cunha, I.: El discurso y la semántica como recursos para la detección de similitud textual. In: Proceedings of the III RST Meeting (8th Brazilian Symposium in Information and Human Language Technology, STIL 2011). Brazilian Computer Society, Cuiabá (2011)

    Google Scholar 

  3. Clough, P., Gaizauskas, R., Piao, S.S., Wilks, Y.: METER: measuring text reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 152–159. ACL (2002)

    Google Scholar 

  4. Das, D., Smith, N.A.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP: Volume 1, pp. 468–476. ACL (2009)

    Google Scholar 

  5. Dey, K., Shrivastava, R., Kaushik, S.: A paraphrase and semantic similarity detection system for user generated short-text content on microblogs. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2880–2890 (2016)

    Google Scholar 

  6. Dolan, W., Quirk, C., Brockett, C., Dolan, B.: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING) (2004)

    Google Scholar 

  7. El Desouki, M.I., Gomaa, W.H., Abdalhakim, H.: A hybrid model for paraphrase detection combines pros of text similarity with deep learning. Int. J. Comput. Appl. 975, 8887 (2019)

    Google Scholar 

  8. Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)

    Google Scholar 

  9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality (2013)

    Google Scholar 

  10. Molina, A., Torres-Moreno, J.M., SanJuan, E., Sierra, G., Rojas-Mora, J.: Analysis and transformation of textual energy distribution. In: 2013 12th Mexican International Conference on Artificial Intelligence, pp. 203–208. IEEE (2013)

    Google Scholar 

  11. Potthast, M., Stein, B., Eiselt, A., Cedeño, A.B., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: 3rd PAN Workshop. Uncovering Plagiarism, Authorship and Social Software Misuse, p. 1 (2009)

    Google Scholar 

  12. Torres-Moreno, J.M., Sierra, G., Peinl, P.: A German corpus for similarity detection tasks. Int. J. Comput. Linguist. Appl. 5(2), 9–24 (2014)

    Google Scholar 

  13. Zhou, L., Lin, C.Y., Munteanu, D.S., Hovy, E.: ParaEval: using paraphrases to evaluate summaries automatically. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 447–454. Association for Computational Linguistics (2006)

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by PAPIIT projects IA401219, TA100520, AG400119 and CONACYT project A1-S-27780.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helena Gómez-Adorno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gómez-Adorno, H., Bel-Enguix, G., Sierra, G., Torres-Moreno, JM., Martinez, R., Serrano, P. (2020). Evaluation of Similarity Measures in a Benchmark for Spanish Paraphrasing Detection. In: Martínez-Villaseñor, L., Herrera-Alcántara, O., Ponce, H., Castro-Espinoza, F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science(), vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60887-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60886-6

  • Online ISBN: 978-3-030-60887-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics