Language Resources and Evaluation

, Volume 49, Issue 1, pp 77–105 | Cite as

Corpus annotation with paraphrase types: new annotation scheme and inter-annotator agreement measures

  • Marta Vila
  • Manuel Bertran
  • M. Antònia Martí
  • Horacio Rodríguez
Original Paper


Paraphrase corpora annotated with the types of paraphrases they contain constitute an essential resource for the understanding of the phenomenon of paraphrasing and the improvement of paraphrase-related systems in natural language processing. In this article, a new annotation scheme for paraphrase-type annotation is set out, together with newly created measures for the computation of inter-annotator agreement. Three corpora different in nature and in two languages have been annotated using this infrastructure. The annotation results and the inter-annotator agreement scores for these corpora are proof of the adequacy and robustness of our proposal.


Paraphrasing Paraphrase typology Corpus annotation Inter-annotator agreement 



We are grateful to the people that participated in the annotation of the corpora: Rita Zaragoza, Montse Nofre, Patricia Fernández, and Oriol Borrega. We would also like to thank Alberto Barrón-Cedeño for his help in shaping inter-annotator agreement measure formulae. This work is supported by the Spanish government through the projects DIANA (TIN2012-38603-C02-02) and SKATER (TIN2012-38584-C06-01) from Ministerio de Ciencia e Innovación, as well as a FPU Grant (AP2008-02185) from Ministerio de Educación, Cultura y Deporte.


  1. Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the 1st joint conference on lexical and computational semantics (*SEM 2012) (pp. 385–393). Montréal.Google Scholar
  2. Amigó, E., Giménez, J., Gonzalo, J., & Màrquez, L. (2006). MT evaluation: Human-like vs. human acceptable. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics (COLING/ACL 2006) (pp. 17–24). Sydney.Google Scholar
  3. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Boston: Addison-Wesley Longman Publishing Co.Google Scholar
  4. Barrón-Cedeño, A., Vila, M., Martí, M. A., & Rosso, P. (2013). Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Computational Linguistics, 39(4), 917–947.CrossRefGoogle Scholar
  5. Barzilay, R., & McKeown, K. (2001). Extracting paraphrases from a parallel corpus. In Proceedings of the 39th annual meeting of the association for computational linguistics (ACL 2001) (pp. 50–57). Toulouse.Google Scholar
  6. Bès, G. G., & Fuchs, C. (1988). Introduction. In Lexique et paraphrase (pp. 7–11). Presses Universitaires de Lille.Google Scholar
  7. Bhagat, R. (2009). Learning paraphrases from Text, Ph.D. thesis. University of Southern California, Los Angeles.Google Scholar
  8. Chen, D. L., & Dolan, W. B. (2011). Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL/HLT 2011) (Vol 1, pp. 190–200). Portland.Google Scholar
  9. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.CrossRefGoogle Scholar
  10. Cohn, T., Callison-Burch, C., & Lapata, M. (2008). Constructing corpora for the development and evaluation of paraphrase systems. Computational Linguistics, 34(4), 597–614.CrossRefGoogle Scholar
  11. Dale, R., & Kilgarriff, A. (2011). Helping our own: The HOO 2011 pilot shared task. In Proceedings of the 13th European workshop on natural language generation (ENLG 2011) (pp. 242–249). Nancy.Google Scholar
  12. Dale, R., & Narroway, G. (2011). The HOO pilot data set: Notes on release 2.0. Resource document. Accessed 8 February 2013
  13. Dale, R., & Narroway, G. (2012). A framework for evaluating text correction. In Proceedings of the 8th international conference on language resources and evaluation (LREC 2012) (pp. 3015–3018). Istanbul.Google Scholar
  14. Dolan, W. B., & Brockett, C. (2005). Automatically constructing a corpus of sentential paraphrases. In Proceedings of the 3rd international workshop on paraphrasing (IWP 2005) (pp. 9–16). Jeju Island.Google Scholar
  15. Dutrey, C., Bernhard, D., Bouamor, H., & Max, A. (2011). Local modifications and paraphrases in Wikipedia’s revision history. Procesamiento del Lenguaje Natural, 46, 51–58.Google Scholar
  16. España-Bonet, C., Vila, M., Rodríguez, H., & Martí, M. A. (2009). CoCo, a web interface for corpora compilation. Procesamiento del Lenguaje Natural, 43, 367–368.Google Scholar
  17. Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.Google Scholar
  18. Fuchs, C. (1988). Paraphrases prédicatives et contraintes énonciatives. In: Bès G., & Fuchs C. (Eds.), Lexique et Paraphrase, no. 6 in Lexique, Presses Universitaires de Lille, Villeneuve d’Ascq (pp. 157–171).Google Scholar
  19. Hovy, E., Lin, C. Y., Zhou, L., & Fukumoto, J. (2006). Automated summarization evaluation with basic elements. In Proceedings of the 5th international conference on language resources and evaluation (LREC 2006) (pp. 899–902). Genoa.Google Scholar
  20. Kupper, L. L., & Hafner, K. B. (1989). On assessing interrater agreement for multiple attribute responses. Biometrics, 45(3), 957–967.CrossRefGoogle Scholar
  21. Lin, C. Y., & Hovy, E. (2003). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 4th annual meeting of the north american chapter of the association for computational linguistics: Human language technologies (NAACL/HLT 2003), Edmonton (Vol. 1, pp. 71–78).Google Scholar
  22. Lin, C. Y., & Och, F. J. (2004). ORANGE: A method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva.Google Scholar
  23. Liu, C., Dahlmeier, D., & Ng, H. T. (2010) PEM: A paraphrase evaluation metric exploiting parallel texts. In Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP 2010), Cambridge (pp. 923–932).Google Scholar
  24. Madnani, N., & Dorr, B. J. (2010). Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics, 36(3), 341–387.CrossRefGoogle Scholar
  25. Max, A., & Wisniewski, G. (2010). Mining naturally-occurring corrections and paraphrases from Wikipedia’s revision history. In Proceedings of the 7th international conference on language resources and evaluation (LREC 2010), Valletta (pp. 3143–3148).Google Scholar
  26. Milićević, J. (2007). La paraphrase. Modélisation de la paraphrase langagière. Bern: Peter Lang.Google Scholar
  27. Nenkova, A., & Passonneau, R. (2004). Evaluating content selection in summarization: the pyramid method. In Proceedings of the 5th annual meeting of the North American chapter of the association for computational linguistics: human language technologies (NAACL/HLT 2004), Boston (pp 145–152).Google Scholar
  28. Potthast, M., Stein, B., Barrón-Cedeño, A., & Rosso, P. (2010). An evaluation framework for plagiarism detection. In Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Beijing (pp. 997–1005).Google Scholar
  29. Recasens, M., & Vila, M. (2010). On paraphrase and coreference. Computational Linguistics, 36(4), 639–647.CrossRefGoogle Scholar
  30. Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., & Lavelli, A. (2006). Investigating a generic paraphrase-based approach for relations extraction. In Proceedings of the 11th conference of the European chapter of the association for computational linguistics (EACL 2006), Trento (pp. 409–416).Google Scholar
  31. Vila, M., & Dras, M. (2012). Tree edit distance as a baseline approach for paraphrase representation. Procesamiento del Lenguaje Natural, 48, 89–95.Google Scholar
  32. Vila, M., Rodríguez, H., & Martí, M. A. (2013). Relational paraphrase acquisition from Wikipedia. The WRPA method and corpus: Natural language engineering. doi: 10.1017/S1351324913000235.
  33. Vila, M., Martí, M. A., & Rodríguez, H. (2014). Is this a paraphrase? What kind? Paraphrase boundaries and typology. Open Journal of Modern Linguistics, 4, 205–218.CrossRefGoogle Scholar
  34. Zaenen, A. (2006). Mark-up barking up the wrong tree. Computational Linguistics, 32(4), 577–580.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Marta Vila
    • 1
  • Manuel Bertran
    • 1
  • M. Antònia Martí
    • 1
  • Horacio Rodríguez
    • 2
  1. 1.CLiCUniversitat de BarcelonaBarcelonaSpain
  2. 2.TALPUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations