Advertisement

Harmonizing and Merging Italian Treebanks: Towards a Merged Italian Dependency Treebank and Beyond

  • Maria Simi
  • Simonetta Montemagni
  • Cristina BoscoEmail author
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 589)

Abstract

In this paper we address the challenge of combining existing CoNLL-compliant dependency-annotated corpora with the final aim of constructing a bigger treebank for the Italian language. To this end, we defined a methodology for mapping different annotation schemes, based on: (i) The analysis of similarities and differences of considered source and target dependency annotation schemes; (ii) The analysis of the performance of state of the art dependency parsers trained on the source and target treebanks; (iii) The mapping of the source annotation scheme(s) onto a set of target (possibly underspecified) data categories. This methodology was applied in two different case studies. The first one was aimed at constructing a “Merged Italian Dependency Treebank” (MIDT) starting from existing Italian dependency treebanks, namely TUT and ISST–TANL. The second case study, still ongoing, consists in the conversion of the MIDT resource into the Stanford Dependencies de facto standard with the final aim of developing an “Italian Stanford Dependency Treebank” (ISDT).

Keywords

Treebank Italian Harmonization and merging of resources 

References

  1. 1.
    Bosco, C., Lombardo, V., Lesmo, L., Vassallo, D.: Building a treebank for Italian: a data-driven annotation schema. In: Proceedings of the 2nd Language Resources and Evaluation Conference (LREC’00), pp. 99–105. ELRA, Athens, Greece (2000)Google Scholar
  2. 2.
    Montemagni, S., Barsotti, F., Battista, M., Calzolari, N., Corazzari, O., Lenci, A., Zampolli, A., Fanciulli, F., Massetani, M., Raffaelli, R., Basili, R., Pazienza, M.T., Saracino, D., Zanzotto, F., Mana, N., Pianesi, F., Delmonte, R.: Building the Italian Syntactic-Semantic Treebank. In: Abeillé, A. (ed.) Building and Using Syntactically Annotated Corpora, pp. 189–210. Kluwer, Dordrecht (2003)Google Scholar
  3. 3.
    Tonelli, S., Delmonte, R., Bristot, A.: Enriching the Venice Italian Treebank with dependency and grammatical relations. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC’08), pp. 1920–1924. ELRA, Marrakech, Morocco (2008)Google Scholar
  4. 4.
    Buch-Kromann, M., Korzen, I., Müller, H.H.: Uncovering the ‘lost’ structure of translations with parallel treebanks. Spec. Issue Cph. Stud. Lang. 38, 199–224 (2009)Google Scholar
  5. 5.
    Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the EMNLP-CoNLL, pp. 915–932 (2007)Google Scholar
  6. 6.
    Montemagni, S., Simi, M.: The Italian dependency annotated corpus developed for the CoNLL-2007 shared task. Technical report, ILC-CNR (2007)Google Scholar
  7. 7.
    Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A.: Evalita’09 parsing task: comparing dependency parsers and treebanks. In: Proceedings of Evalita’09, Reggio Emilia, Italia (2009)Google Scholar
  8. 8.
    Dell’Orletta, F., Marchi, S., Montemagni, S., Venturi, G., Agnoloni, T., Francesconi, E.: Domain adaptation for dependency parsing at Evalita 2011. In: Working Notes of Evalita’11, Roma, Italia (2012)Google Scholar
  9. 9.
    Francesconi, E., Montemagni, S., Peters, W., Wyner, A. (eds.): Proceedings of the LREC Workshop on Semantic Processing of Legal Texts (SPLeT 2012). ELRA, Istanbul, Turkey (2012)Google Scholar
  10. 10.
    Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A., Lesmo, L., Attardi, G., Simi, M., Lavelli, A., Hall, J., Nilsson, J., Nivre, J.: Comparing the influence of different treebank annotations on dependency parsing. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC’10), pp. 1794–1801. ELRA, Valletta, Malta (2010)Google Scholar
  11. 11.
    Clegg, A.B., Shepherd, A.J.: Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinform. 8(1), 17–24 (2007)CrossRefGoogle Scholar
  12. 12.
    de Marneffe, M., Manning, C.: The Stanford typed dependencies representation. In: Proceedings of the COLING Workshop on Cross-Framework and Cross-Domain Parser Evaluation, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2008)Google Scholar
  13. 13.
    Leech, G., Barnett, R., Kahrel, P.: EAGLES recommendations for the syntactic annotation of corpora. Technical report, EAG-TCWG-SASG1.8 (1996)Google Scholar
  14. 14.
    Ide, N., Romary, L.: Representing linguistic corpora and their annotations. In: Proceedings of the 5th Language Resources and Evaluation Conference (LREC’06), pp. 225–228. ELRA, Genova, Italy (2006)Google Scholar
  15. 15.
    Ide, N., Suderman, K.: GrAF: A graph-based format for linguistic annotations. In: Proceedings of the Linguistic Annotation Workshop (LAW’07), pp. 1–8. ACL, Prague, Czech Republic (2007)Google Scholar
  16. 16.
    Declerck, T.: SynAF: towards a standard for syntactic annotation. In: Proceedings of the 6th Language Resources and Evaluation Conference (LREC’08), pp. 229–232. ELRA, Marrakech, Morocco (2008)Google Scholar
  17. 17.
    Kemps-Snijders, M., Windhouwer, M., Wittenburg, P., Wright, S.: ISOcat: remodelling metadata for language resources. IJMSO 4(4), 261–276 (2009)CrossRefGoogle Scholar
  18. 18.
    Bosco, C., Montemagni, S., Simi, M.: Converting Italian treebanks: towards an Italian Stanford dependency treebank. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse at ACL, pp. 61–69. ACL (2013)Google Scholar
  19. 19.
    Hudson, R.: Word Grammar. Basil Blackwell, Oxford (1984)Google Scholar
  20. 20.
    Lenci, A., Montemagni, S., Pirrelli, V., Soria, C.: A syntactic meta-scheme for corpus annotation and parsing evaluation. In: Proceedings of the 2nd Language Resources and Evaluation Conference (LREC’00), pp. 625–632. ELRA, Athens, Greece (2000)Google Scholar
  21. 21.
    Kübler, S., McDonald, R., Nivre, J.: Dependency Parsing. Morgan and Claypool, Oxford and New York (2009)Google Scholar
  22. 22.
    Cheung, J., Penn, G.: Topological field parsing of German. In: Proceedings of the ACL-IJCNLP’09, pp. 64–72. ACL, Suntec, Singapore (2009)Google Scholar
  23. 23.
    Ide, N., Bunt, H.: Anatomy of annotation schemes: mapping to graf. In: Proceedings of the 4th Linguistic Annotation Workshop (LAW IV’10), pp. 247–255. Stroudsburg (2010)Google Scholar
  24. 24.
    Hayashi, Y., Declerck, T., Narawa, C.: LAF/GrAF-grounded representation of dependency structures. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC’10), pp. 1990–1995. ELRA, Valletta, Malta (2010)Google Scholar
  25. 25.
    de Marneffe, M., Manning, C.: Stanford typed dependencies manual. Stanford University, Technical report, CA (2008)Google Scholar
  26. 26.
    Attardi, G.: Experiments with a multilanguage non-projective dependency parser. In: Proceedings of the CoNLL-X’06, pp. 166–170. New York City, New York (2006)Google Scholar
  27. 27.
    Attardi, G., Dell’Orletta, F.: Reverse revision and linear tree combination for dependency parsing. In: Proceedings of the NAACL HLT’09, pp. 261–264. Boulder, Colorado (2009)Google Scholar
  28. 28.
    Attardi, G., Dell’Orletta, F., Simi, M., Turian, J.: Accurate dependency parsing with a stacked multilayer perceptron. In: Proceedings of Evalita’09, Reggio Emilia, Italy (2009)Google Scholar
  29. 29.
    Bosco, C., Mazzei, A.: The Evalita dependency parsing task: from 2007 to 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian, pp. 1–12. Roma, Italia (2012)Google Scholar
  30. 30.
    Bosco, C., Simi, M., Montemagni, S.: Harmonization and merging of two Italian dependency treebanks. In: Proceedings of the LREC Workshop on Language Resource Merging, pp. 23–30. ELRA, Istanbul, Turkey (2012)Google Scholar
  31. 31.
    Cer, D.M., de Marneffe, M.C., Jurafsky, D., Manning, C.D.: Parsing to Stanford dependencies: trade-offs between speed and accuracy. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC’10), pp. 1628–1632. ELRA, Valletta, Malta (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Maria Simi
    • 1
  • Simonetta Montemagni
    • 2
  • Cristina Bosco
    • 3
    Email author
  1. 1.Dipartimento di InformaticaUniversità di PisaPisaItaly
  2. 2.Istituto di Linguistica Computazionale “Antonio Zampolli” (ILC–CNR)PisaItaly
  3. 3.Dipartimento di InformaticaUniversità di TorinoTorinoItaly

Personalised recommendations