Evaluating Italian Parsing Across Syntactic Formalisms and Annotation Schemes

Alicante, Anita; Bosco, Cristina; Corazza, Anna; Lavelli, Alberto

doi:10.1007/978-3-319-14206-7_7

Anita Alicante⁷,
Cristina Bosco⁸,
Anna Corazza⁷ &
…
Alberto Lavelli⁹

Part of the book series: Studies in Computational Intelligence ((SCI,volume 589))

391 Accesses

Abstract

This paper describes some results about the way syntactic representations and parsing methodologies affect the performance of systems for parsing Italian. Italian has a rich morphology, especially with respect to Verbal suffixes, that can provide a parser with useful information for making the correct choices. With respect to syntactic representation, the experiments are based on a treebank for Italian, which has been delivered both in a dependency and in a constituency formalism, and for each of them also annotated at different degrees of specificity. The two paradigms are compared, and the different degrees of specificity in marking some syntactic phenomena are pointed out. On the basis of this treebank, statistical parsers have been evaluated. The results have shown that both the representation format and the parsing approach strongly affect the performance, that in some cases are very close and in others drastically different from the ones that constitute the state of the art for English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The CODICECIVILE and COSTITA corpora include legal texts, the EUDIR declarations of the European Community from the Italian section of the JRC-Acquis Multilingual Parallel Corpus (see http://langtech.jrc.it/JRC-Acquis.html). Instead NEWS corpus includes texts from Italian newspapers, WIKIPEDIA from the Italian section of Wikipedia, and VED a miscellanea from academic, journal and novels.
2.
The term token refers to all the objects annotated in the treebank, namely words, punctuation marks and null elements.
3.
English translations of the Italian examples are literal and so may appear awkward in English.
4.
According to the Word Grammar, many words qualify as Prepositions or Determiners which traditional grammar would have classified as AdVerbs or subordinating conjunctions.
5.
For instance, in Machine Translation if the source language allows argument deletion and the target language does not, in order to make possible for the system to handle the translation, it is crucial that in the source language the dropped argument is explicitly marked. An alike situation can happen in a translation from Italian (a typical pro-drop language where the subject deletion is very common with tensed Verbs) to English (where the subject is always lexically realized in tensed clauses).
6.
The term equi refers to the lacking Subject of the subordinate infinitive Verb, e.g. the Subject of the Verb “dormire” (sleep) in “Vuole dormire” ([He] wants [to] sleep).
7.
The projectivity constraint is maintained for TUT also in the CoNLL format.
8.
See http://www.cis.upenn.edu/chinese/.
9.
See http://www.ircs.upenn.edu/arabic/.
10.
Apart from a few cases of English morphological features which do not exist (e.g. possessive ending) or do not correspond with Italian forms (e.g. comparative Adjective and Adverb).
11.
The inclusion of person, gender and number values in morphological tags were tested without yielding any improvement in the parser performance. The investigation of the effect of the inclusion of these features in the Italian case, or in that of other MRLs, can be of some interest for future works.
12.
English translation: The agreement is broken for three main motivations.
13.
Proper nouns are not marked in Italian in terms of number.
14.
In fact, in a dependency tree the relation subject marks an edge linking the verbal head with a dependent which can be distinguished from other verbal dependents only by the type of the relation.
15.
English translation: A right allowance is due to the owner.
16.
E.g. the tag PUT which represents the locative complement of the Verb “put”, or the tag DTV (dative) which is annotated in indirect objects when they are realized as prepositional phrases, i.e. not affected by the dative shift.
17.
The evaluation has been performed by using the MaltEval tools [31].
18.
This shows however that the test set, even if it shows the same balancement of TUT, does not represent at best the treebank in terms of relations and constructions.
19.
This is only partially explained by the sentence length, which is lower than 40 words only in the test set, and by the smaller size of the training set for the 10-fold cross validation.
20.
The ten most frequent relations in all the 1-Comp treebank (with respect to 72,149 annotated tokens) are ARG (30.3 %), RMOD (19.2 %), OBJ (4.5 %), SUBJ (3.9 %), END (3.3 %), TOP (3.2 %), COORD2ND\(+\)BASE (3.1 %), COORD\(+\)BASE (3.1 %), SEPARATOR (2.7 %), INDCOMPL (1.9 %).
21.
For what concerns in particular parsing of legal text, see also the Proceedings of the LREC 2012 Workshop on Semantic Processing of Legal Texts (SPLeT-2012), available at http://www.lrec-conf.org/proceedings/lrec2012/workshops/27.LREC%202012%20Workshop%20-Proceedings%20SPLeT.pdf.
22.
The tool is freely available from http://www.cis.upenn.edu/dbikel/software.html#comparator.

References

Alicante, A., Bosco, C., Corazza, A., Lavelli, A.: A treebank-based study on the influence of Italian word order on parsing performance. In: LREC, pp. 1985–1992 (2012)
Google Scholar
Bosco, C.: A richer annotation schema for an Italian treebank. In: Proceedings of European Summer School on Logic Language and Information, Birmingham, UK (2000), http://www.di.unito.it/~bosco/publicat/esslli00.zip
Bosco, C.: Grammatical relation’s system in treebank annotation. In: Proceedings of Student Research Workshop of Joint ACL/EACL Meeting, Toulose, France (2001), http://www.di.unito.it/~bosco/publicat/acl-stud-ses-01.zip
Bosco, C.: A grammatical relation system for treebank annotation, Ph.D. thesis, University of Torino (2004)
Google Scholar
Bosco, C.: Multiple-step treebank conversion: from dependency to Penn format. In: Proceedings of Linguistic Annotation Workshop at the ACL’07 (2007)
Google Scholar
Bosco, C.: Linguistic knowledge extraction from corpus parallel annotations. In: Proceedings of XL Congresso della Società di Linguistica Italiana, Vercelli (2009), http://www.di.unito.it/~bosco/publicat/sli06.zip
Bos, J., Bosco, C., Mazzei, A.: Converting a dependency treebank to a categorial grammar treebank for Italian. In: Proceedings of the Eighth Workshop on Treebanks and Linguistic Theories, pp. 27–38. Milan (2009)
Google Scholar
Bosco, C., Lavelli, A.: Annotation schema oriented evaluation for parsing validation. In: Proceedings of the 9th Workshop on Treebanks and Linguistic Theories (TLT-9), pp. 19–30. Tartu, Estonia (2010)
Google Scholar
Bosco, C., Mazzei, A., Lavelli, A.: Looking back to the Evalita constituency parsing task: 2007–2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 46–57 (2012)
Google Scholar
Bosco, C., Lombardo, V.: A relation-schema for treebank annotation. In: A. Cappelli, F.T. (ed.) Advances in Artificial Intelligence, LNCS, vol. 2829. Springer, Berlin (2003), http://www.di.unito.it/~bosco/publicat/aiia-03.zip
Bosco, C., Lombardo, V.: Comparing linguistic information in treebank annotations. In: Proceedings of the 5th International Language Resources and Evaluation Conference (2006), http://www.di.unito.it/~bosco/publicat/lrec06.zip
Bosco, C., Lombardo, V., Lesmo, L., Vassallo, D.: Building a treebank for Italian: a data-driven annotation schema. In: Proceedings of 2nd International Conference on Language Resources and Evaluation, Athens, Greece (2000), http://www.di.unito.it/~bosco/publicat/lrec00.zip
Bosco, C., Mazzei, A., Lombardo, V.: Evalita parsing task: an analysis of the first parsing system contest for Italian. Intell. Artif. 2(IV), 30–33 (2007)
Google Scholar
Bosco, C., Mazzei, A., Lombardo, V.: Evalita’09 parsing task: constituency parsers and the Penn format for Italian. In: Proceedings of Evalita’09 (2009)
Google Scholar
Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A.: Evalita’09 parsing task: comparing dependency parsers and treebanks. In: Proceedings of Evalita’09, Reggio Emilia (2009)
Google Scholar
Bosco, C., Montemagni, S., Mazzei, A., Lombardo, V., Dell’Orletta, F., Lenci, A., Lesmo, L., Attardi, G., Simi, M., Lavelli, A., Hall, J., Nilsson, J., Nivre, J.: Comparing the influence of different treebank annotations on dependency parsing. In: Proceedings of Language Resources and Evaluation Conference, pp. 1794–1801. Malta (2010)
Google Scholar
Cheung, J.C., Penn, G.: Topological field parsing of German. In: Proceedings of ACL-IJCNLP’09, pp. 64–72. Singapore (2009)
Google Scholar
Collins, M., Hajic, J., Ramshaw, L., Tillmann, C.: A statistical parser of Czech. In: Proceedings of the ACL’99 (1999)
Google Scholar
Corazza, A., Lavelli, A., Satta, G.: An information-theoretic measure to evaluate parsing difficulty across treebanks. ACM Trans. Speech Lang. Process. 9(4), 7:1–7:31 (2013). http://doi.acm.org/10.1145/2407736.2407737
Dell’Orletta, F., Marchi, S., Montemagni, S., Venturi, G.: Domain adaptation for dependency parsing at Evalita 2011. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 58–69 (2012)
Google Scholar
Green, S., Manning, C.D.: Better Arabic parsing: Baselines, evaluations, and analysis. In: Proceedings of COLING 2010 (2010)
Google Scholar
Hajič, J., Böhmová, A., Hajičová, E., Vidová-Hladká, B.: The prague dependency treebank: a three-level annotation scenario. In: Abeillé, A. (ed.) Treebanks: Building and Using Parsed Corpora, pp. 103–127. Kluwer, Amsterdam (2000)
Google Scholar
Hudson, R.: Word Grammar. Basil Blackwell, Oxford (1984)
Google Scholar
Jones, B.E.M.: Exploring the role of punctuation in parsing natural text. In: Proceedings of COLING’94, pp. 421–425. Kyoto (1994)
Google Scholar
Kübler, S., Rehbein, I., van Genabith, J.: TePaCoC a corpus for testing parser performance on complex German grammatical constructions. In: Proceedings of TLT-7, pp. 15–28. Groningen, The Netherlands (2009)
Google Scholar
Lavelli, A., Hall, J., Nilsson, J., Nivre, J.: MaltParser at the Evalita 2009 dependency parsing task. In: Proceedings of Evalita’09, Reggio Emilia (2009)
Google Scholar
Lesmo, L.: Use of semantic information in a syntactic dependency parser. In: Magnini, B., Cutugno, F., Falcone, M., Pianta, E. (eds.) Evaluation of Natural Language and Speech Tools for Italian—Proceedings of EVALITA 2011, pp. 13–20 (2012)
Google Scholar
Lesmo, L.: The rule-based parser of the NLP group of the University of Torino. Intell. Artif. 2, 46–47 (2007)
Google Scholar
Lesmo, L.: The Turin University parser at Evalita 2009. In: Proceedings of Evalita’09, Reggio Emilia (2009)
Google Scholar
Lesmo, L., Lombardo, V., Bosco, C.: Treebank development: the TUT approach. In: Proceedings of ICON02, Mumbai, India (2002), http://www.di.unito.it/~bosco/publicat/icon02lesmo-et-al.zip
Nilsson, J., Nivre, J.: MaltEval: An evaluation and visualization tool for dependency parsing. In: Proceedings of LREC’08, pp. 161–166. Marrakech (2008)
Google Scholar
Nivre, J., Hall, J., Nilsson, J.: MaltParser: A data-driven parser-generator for dependency parsing. In: Proceedings of LREC’06, pp. 2216–2219. Genova (2006)
Google Scholar
Petrov, S., Klein, D.: Improved inference for unlexicalized parsing. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 404–411. Rochester, New York (April 2007). http://www.aclweb.org/anthology/N/N07/N07-1051
Rimell, L., Clark, S., Steedman, M.: Unbounded dependency recovery for parser evaluation. In: Proceedings of Empirical Methods in Natural Language Processing ’09, pp. 813–821. Singapore (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Ingegneria Elettrica e Tecnologie dell’Informazione, Università di Napoli Federico II, Naples, Italy
Anita Alicante & Anna Corazza
Dipartimento di Informatica, Università di Torino, C.so Svizzera 185, 10149, Turin, TO, Italy
Cristina Bosco
HLT Research Unit, Fondazione Bruno Kessler, Povo, TN, Italy
Alberto Lavelli

Authors

Anita Alicante
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Bosco
View author publications
You can also search for this author in PubMed Google Scholar
Anna Corazza
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Lavelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anita Alicante .

Editor information

Editors and Affiliations

Department of Computer Science, Systems and Production, University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Department of Computer Science, University of Turin, Turin, Italy
Cristina Bosco
Department of Language and Cultural Studies, Department of Computer Science, Ca’ Foscari University of Venice, Venezia, Italy
Rodolfo Delmonte
Department of Computer Science and Information Engineering, University of Trento, Trento, Italy
Alessandro Moschitti
Department of Computer Science, University of Pisa, Pisa, Italy
Maria Simi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Alicante, A., Bosco, C., Corazza, A., Lavelli, A. (2015). Evaluating Italian Parsing Across Syntactic Formalisms and Annotation Schemes. In: Basili, R., Bosco, C., Delmonte, R., Moschitti, A., Simi, M. (eds) Harmonization and Development of Resources and Tools for Italian Natural Language Processing within the PARLI Project. Studies in Computational Intelligence, vol 589. Springer, Cham. https://doi.org/10.1007/978-3-319-14206-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-14206-7_7
Published: 15 January 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14205-0
Online ISBN: 978-3-319-14206-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics