Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 589))

  • 391 Accesses

Abstract

This paper describes some results about the way syntactic representations and parsing methodologies affect the performance of systems for parsing Italian. Italian has a rich morphology, especially with respect to Verbal suffixes, that can provide a parser with useful information for making the correct choices. With respect to syntactic representation, the experiments are based on a treebank for Italian, which has been delivered both in a dependency and in a constituency formalism, and for each of them also annotated at different degrees of specificity. The two paradigms are compared, and the different degrees of specificity in marking some syntactic phenomena are pointed out. On the basis of this treebank, statistical parsers have been evaluated. The results have shown that both the representation format and the parsing approach strongly affect the performance, that in some cases are very close and in others drastically different from the ones that constitute the state of the art for English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The CODICECIVILE and COSTITA corpora include legal texts, the EUDIR declarations of the European Community from the Italian section of the JRC-Acquis Multilingual Parallel Corpus (see http://langtech.jrc.it/JRC-Acquis.html). Instead NEWS corpus includes texts from Italian newspapers, WIKIPEDIA from the Italian section of Wikipedia, and VED a miscellanea from academic, journal and novels.

  2. 2.

    The term token refers to all the objects annotated in the treebank, namely words, punctuation marks and null elements.

  3. 3.

    English translations of the Italian examples are literal and so may appear awkward in English.

  4. 4.

    According to the Word Grammar, many words qualify as Prepositions or Determiners which traditional grammar would have classified as AdVerbs or subordinating conjunctions.

  5. 5.

    For instance, in Machine Translation if the source language allows argument deletion and the target language does not, in order to make possible for the system to handle the translation, it is crucial that in the source language the dropped argument is explicitly marked. An alike situation can happen in a translation from Italian (a typical pro-drop language where the subject deletion is very common with tensed Verbs) to English (where the subject is always lexically realized in tensed clauses).

  6. 6.

    The term equi refers to the lacking Subject of the subordinate infinitive Verb, e.g. the Subject of the Verb “dormire” (sleep) in “Vuole dormire” ([He] wants [to] sleep).

  7. 7.

    The projectivity constraint is maintained for TUT also in the CoNLL format.

  8. 8.

    See http://www.cis.upenn.edu/chinese/.

  9. 9.

    See http://www.ircs.upenn.edu/arabic/.

  10. 10.

    Apart from a few cases of English morphological features which do not exist (e.g. possessive ending) or do not correspond with Italian forms (e.g. comparative Adjective and Adverb).

  11. 11.

    The inclusion of person, gender and number values in morphological tags were tested without yielding any improvement in the parser performance. The investigation of the effect of the inclusion of these features in the Italian case, or in that of other MRLs, can be of some interest for future works.

  12. 12.

    English translation: The agreement is broken for three main motivations.

  13. 13.

    Proper nouns are not marked in Italian in terms of number.

  14. 14.

    In fact, in a dependency tree the relation subject marks an edge linking the verbal head with a dependent which can be distinguished from other verbal dependents only by the type of the relation.

  15. 15.

    English translation: A right allowance is due to the owner.

  16. 16.

    E.g. the tag PUT which represents the locative complement of the Verb “put”, or the tag DTV (dative) which is annotated in indirect objects when they are realized as prepositional phrases, i.e. not affected by the dative shift.

  17. 17.

    The evaluation has been performed by using the MaltEval tools [31].

  18. 18.

    This shows however that the test set, even if it shows the same balancement of TUT, does not represent at best the treebank in terms of relations and constructions.

  19. 19.

    This is only partially explained by the sentence length, which is lower than 40 words only in the test set, and by the smaller size of the training set for the 10-fold cross validation.

  20. 20.

    The ten most frequent relations in all the 1-Comp treebank (with respect to 72,149 annotated tokens) are ARG (30.3 %), RMOD (19.2 %), OBJ (4.5 %), SUBJ (3.9 %), END (3.3 %), TOP (3.2 %), COORD2ND\(+\)BASE (3.1 %), COORD\(+\)BASE (3.1 %), SEPARATOR (2.7 %), INDCOMPL (1.9 %).

  21. 21.

    For what concerns in particular parsing of legal text, see also the Proceedings of the LREC 2012 Workshop on Semantic Processing of Legal Texts (SPLeT-2012), available at http://www.lrec-conf.org/proceedings/lrec2012/workshops/27.LREC%202012%20Workshop%20-Proceedings%20SPLeT.pdf.

  22. 22.

    The tool is freely available from http://www.cis.upenn.edu/dbikel/software.html#comparator.

References

  1. Alicante, A., Bosco, C., Corazza, A., Lavelli, A.: A treebank-based study on the influence of Italian word order on parsing performance. In: LREC, pp. 1985–1992 (2012)

    Google Scholar 

  2. Bosco, C.: A richer annotation schema for an Italian treebank. In: Proceedings of European Summer School on Logic Language and Information, Birmingham, UK (2000), http://www.di.unito.it/~bosco/publicat/esslli00.zip

  3. Bosco, C.: Grammatical relation’s system in treebank annotation. In: Proceedings of Student Research Workshop of Joint ACL/EACL Meeting, Toulose, France (2001), http://www.di.unito.it/~bosco/publicat/acl-stud-ses-01.zip

  4. Bosco, C.: A grammatical relation system for treebank annotation, Ph.D. thesis, University of Torino (2004)

    Google Scholar 

  5. Bosco, C.: Multiple-step treebank conversion: from dependency to Penn format. In: Proceedings of Linguistic Annotation Workshop at the ACL’07 (2007)

    Google Scholar 

  6. Bosco, C.: Linguistic knowledge extraction from corpus parallel annotations. In: Proceedings of XL Congresso della Società di Linguistica Italiana, Vercelli (2009), http://www.di.unito.it/~bosco/publicat/sli06.zip

  7. Bos, J., Bosco, C., Mazzei, A.: Converting a dependency treebank to a categorial grammar treebank for Italian. In: Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories, pp. 27–38. Milan (2009)

    Google Scholar 

  8. Bosco, C., Lavelli, A.: Annotation schema oriented evaluation for parsing validation. In: Proceedings of the 9th Workshop on Treebanks and Linguistic Theories (TLT-9), pp. 19–30. Tartu, Estonia (2010)

    Google Scholar 

  9. Bosco, C., Mazzei, A., Lavelli, A.: Looking back to the Evalita constituency parsing task: 2007–2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 46–57 (2012)

    Google Scholar 

  10. Bosco, C., Lombardo, V.: A relation-schema for treebank annotation. In: A. Cappelli, F.T. (ed.) Advances in Artificial Intelligence, LNCS, vol. 2829. Springer, Berlin (2003), http://www.di.unito.it/~bosco/publicat/aiia-03.zip

  11. Bosco, C., Lombardo, V.: Comparing linguistic information in treebank annotations. In: Proceedings of the 5th International Language Resources and Evaluation Conference (2006), http://www.di.unito.it/~bosco/publicat/lrec06.zip

  12. Bosco, C., Lombardo, V., Lesmo, L., Vassallo, D.: Building a treebank for Italian: a data-driven annotation schema. In: Proceedings of 2nd International Conference on Language Resources and Evaluation, Athens, Greece (2000), http://www.di.unito.it/~bosco/publicat/lrec00.zip

  13. Bosco, C., Mazzei, A., Lombardo, V.: Evalita parsing task: an analysis of the first parsing system contest for Italian. Intell. Artif. 2(IV), 30–33 (2007)

    Google Scholar 

  14. Bosco, C., Mazzei, A., Lombardo, V.: Evalita’09 parsing task: constituency parsers and the Penn format for Italian. In: Proceedings of Evalita’09 (2009)

    Google Scholar 

  15. Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A.: Evalita’09 parsing task: comparing dependency parsers and treebanks. In: Proceedings of Evalita’09, Reggio Emilia (2009)

    Google Scholar 

  16. Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A., Lesmo, L., Attardi, G., Simi, M., Lavelli, A., Hall, J., Nilsson, J., Nivre, J.: Comparing the influence of different treebank annotations on dependency parsing. In: Proceedings of Language Resources and Evaluation Conference, pp. 1794–1801. Malta (2010)

    Google Scholar 

  17. Cheung, J.C., Penn, G.: Topological field parsing of German. In: Proceedings of ACL-IJCNLP’09, pp. 64–72. Singapore (2009)

    Google Scholar 

  18. Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A statistical parser of Czech. In: Proceedings of the ACL’99 (1999)

    Google Scholar 

  19. Corazza, A., Lavelli, A., Satta, G.: An information-theoretic measure to evaluate parsing difficulty across treebanks. ACM Trans. Speech Lang. Process. 9(4), 7:1–7:31 (2013). http://doi.acm.org/10.1145/2407736.2407737

  20. Dell’Orletta, F., Marchi, S., Montemagni, S., Venturi, G.: Domain adaptation for dependency parsing at Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 58–69 (2012)

    Google Scholar 

  21. Green, S., Manning, C.D.: Better Arabic parsing: Baselines, evaluations, and analysis. In: Proceedings of COLING 2010 (2010)

    Google Scholar 

  22. Hajič, J., Böhmová, A., Hajičová, E., Vidová-Hladká, B.: The prague dependency treebank: a three-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 103–127. Kluwer, Amsterdam (2000)

    Google Scholar 

  23. Hudson, R.: Word Grammar. Basil Blackwell, Oxford (1984)

    Google Scholar 

  24. Jones, B.E.M.: Exploring the role of punctuation in parsing natural text. In: Proceedings of COLING’94, pp. 421–425. Kyoto (1994)

    Google Scholar 

  25. Kübler, S., Rehbein, I., van Genabith, J.: TePaCoC a corpus for testing parser performance on complex German grammatical constructions. In: Proceedings of TLT-7, pp. 15–28. Groningen, The Netherlands (2009)

    Google Scholar 

  26. Lavelli, A., Hall, J., Nilsson, J., Nivre, J.: MaltParser at the Evalita 2009 dependency parsing task. In: Proceedings of Evalita’09, Reggio Emilia (2009)

    Google Scholar 

  27. Lesmo, L.: Use of semantic information in a syntactic dependency parser. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 13–20 (2012)

    Google Scholar 

  28. Lesmo, L.: The rule-based parser of the NLP group of the University of Torino. Intell. Artif. 2, 46–47 (2007)

    Google Scholar 

  29. Lesmo, L.: The Turin University parser at Evalita 2009. In: Proceedings of Evalita’09, Reggio Emilia (2009)

    Google Scholar 

  30. Lesmo, L., Lombardo, V., Bosco, C.: Treebank development: the TUT approach. In: Proceedings of ICON02, Mumbai, India (2002), http://www.di.unito.it/~bosco/publicat/icon02lesmo-et-al.zip

  31. Nilsson, J., Nivre, J.: MaltEval: An evaluation and visualization tool for dependency parsing. In: Proceedings of LREC’08, pp. 161–166. Marrakech (2008)

    Google Scholar 

  32. Nivre, J., Hall, J., Nilsson, J.: MaltParser: A data-driven parser-generator for dependency parsing. In: Proceedings of LREC’06, pp. 2216–2219. Genova (2006)

    Google Scholar 

  33. Petrov, S., Klein, D.: Improved inference for unlexicalized parsing. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 404–411. Rochester, New York (April 2007). http://www.aclweb.org/anthology/N/N07/N07-1051

  34. Rimell, L., Clark, S., Steedman, M.: Unbounded dependency recovery for parser evaluation. In: Proceedings of Empirical Methods in Natural Language Processing ’09, pp. 813–821. Singapore (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anita Alicante .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Alicante, A., Bosco, C., Corazza, A., Lavelli, A. (2015). Evaluating Italian Parsing Across Syntactic Formalisms and Annotation Schemes. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14206-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14205-0

  • Online ISBN: 978-3-319-14206-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics