Language Resources and Evaluation

, Volume 49, Issue 1, pp 1–18 | Cite as

<tiger2/>: serialising the ISO SynAF syntactic object model

Original Paper

Abstract

This paper introduces <tiger2/>, an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types such as compounds or empty elements. We also define interfaces to other formats and standards including the Morpho-syntactic Annotation Framework MAF and the ISOCat Data Category Registry. Finally a case study of the German Treebank TueBa-D/Z is presented, showcasing the handling of constituent structures, topological fields and coreference annotation in tandem.

Keywords

syntactic annotation XML format corpus corpora Treebank Tiger XML 

References

  1. Bies, A., Ferguson, M., Katz, K., & MacIntyre, R. (1995). Bracketing guidelines for Treebank II style. Penn Treebank Project. CIS Technical Report MS-CIS-95-06.Google Scholar
  2. Bosch, S., Choi, K.-S., Villemonte De La Clergerie, E., Fang, A. C., Faass, G., Lee, K., et al. (2012). tiger2 as a standardized serialisation for ISO 24615—SynAF. In I. Hendrickx, S. Kübler, & K. Simov (Eds.), TLT1111th international workshop on Treebanks and Linguistic Theories, Nov 2012, Lisbon, Portugal. Ediçoes Colibri, pp. 37–60.Google Scholar
  3. Burnard, L., & Bauman, S. (2008). TEI P5: Guidelines for electronic text encoding and interchange. Manual. http://www.tei-c.org/Guidelines/P5/
  4. Dipper, S. (2005). XML-based stand-off representation and exploitation of multi-level linguistic annotation. In Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Germany, pp. 39–50.Google Scholar
  5. Hajič, J., Panevová, J., Hajičová, E., Panevová, J., Sgall, P., Pajas, P., et al. (2006). Prague dependency Treebank 2.0. Philadelphia: Linguistic Data Consortium.Google Scholar
  6. Ide, N., & Romary, L. (2003). Encoding syntactic annotation. In A. Abeillée (Ed.), Treebanks: Building and using parsed corpora (pp. 281–296). Dordrecht: Kluwer.CrossRefGoogle Scholar
  7. Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In Proceedings of the linguistic annotation workshop 2007, Prague, pp. 1–8.Google Scholar
  8. Ide, N., & Suderman, K. (2014). The linguistic annotation framework: A standard for annotation interchange and merging. Language Resources and Evaluation, 8(3), 395–418.CrossRefGoogle Scholar
  9. Ide, N., & Véronis, J. (1995). Encoding dictionaries. Computers and the Humanities, 29(2), 167–179.CrossRefGoogle Scholar
  10. Krause, T., Ritz, J., Zeldes, A., & Zipser, F. (2011). Topological fields, constituents and coreference: A new multi-layer architecture for TüBa-D/Z. In H. Hedeland, T. Schmidt, & K. Wörner (Eds.), Multilingual resources and multilingual applications. Proceedings of GSCL 2011 (pp. 259–262). Hamburg: Hamburger Zentrum für Sprachkorpora.Google Scholar
  11. Langendoen, D. T., & Simons, G. F. (1995). A rationale for the TEI recommendations for feature-structure markup. Computers and the Humanities, 29(3), 191–209.CrossRefGoogle Scholar
  12. Lee, K., Burnard, L., Romary, L., de la Clergerie, E., Declerck, T., Bauman, S., et al. (2004). Towards an international standard on feature structures representation. In Proceedings of LREC 2004, Lisbon, Portugal, pp. 373–376.Google Scholar
  13. Mengel, A., & Lezius, W. (2000). An XML-based encoding format for syntactically annotated corpora. In Proceedings of the second international conference on language resources and engineering (LREC 2000), Athens, pp. 121–126.Google Scholar
  14. Miller, J., & Mukerji, J. (Eds.). (2003). MDA guide version 1.0.1. Object Management Group (OMG), Needham, MA.Google Scholar
  15. Maedche, A., & Staab, S. (2000). Discovering conceptual relations from text. In Proceedings of ECAI 2000, pp. 321–325.Google Scholar
  16. Pollard, C. J., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: University of Chicago Press.Google Scholar
  17. Romary, L. (2001). An abstract model for the representation of multilingual terminological data: TMF—terminological markup framework. In Proceedings of terminology in advanced management applications (TAMA) 2001. Antwerp, Belgium.Google Scholar
  18. Romary, L. (2013a). Standardization of the formal representation of lexical information for NLP. In R. Gouws, U. Heid, W. Schweickard, & H. Wiegand (Eds.), Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent developments with special focus on computational lexicography. Mouton de Gruyter.Google Scholar
  19. Romary, L. (2013b). TEI and LMF crosswalks. In S. Gradmann & F. Sasaki (Eds.), Digital Humanities: Wissenschaft vom Verstehen. Humboldt Universität zu Berlin, Berlin.Google Scholar
  20. Romary, L., & Ide, N. (2004). International standard for a linguistic annotation framework. Natural Language Engineering, 10(3–4), 211–225.Google Scholar
  21. Romary, L., & Witt, A. (2012). Data formats for phonological corpora. In U. Gut (Ed.), Handbook of corpus phonology. Oxford: Oxford University Press.Google Scholar
  22. Steinberg, D., Budinsky, F., Paternostro, M., & Merks, E. (2009). EMF: Eclipse modeling framework 2.0. Upper Saddle River, NJ: Addison-Wesley.Google Scholar
  23. Telljohann, H., Hinrichs, E., & Kübler, S. (2004). The TüBa-D/Z Treebank—annotating German with a context-free backbone. In Proceedings of the fourth international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal, pp. 2229–2232.Google Scholar
  24. Telljohann, H., Hinrichs, E. W., Kübler, S., Zinsmeister, H., & Beck, K. (2009). Stylebook for the Tübingen Treebank of Written German (TüBa-D/Z). Tübingen: Universität Tübingen, Seminar für Sprachwissenschaft.Google Scholar
  25. Zeldes, A., Ritz, J., Lüdeling, A., & Chiarcos, C. (2009). ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009, Liverpool, July 20–23, 2009.Google Scholar
  26. Zipser, F. (2009). Entwicklung eines Konverterframeworks für linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta-)Modells. Diploma thesis, Humboldt-Universität zu Berlin, Institut für Informatik. http://hal.archives-ouvertes.fr/docs/00/60/61/02/PDF/Diplomarbeit_FZ_final.pdf
  27. Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the workshop on language resource and language technology standards, LREC 2010. Malta, pp. 7–18.Google Scholar

ISO Standards/Drafts

  1. ISO/DIS 24611 Language resource management—Morpho-syntactic annotation framework (MAF)Google Scholar
  2. ISO/DIS 24612 Language resource management—Linguistic annotation framework (LAF)Google Scholar
  3. ISO 24615 Language resource management—Syntactic annotation framework (SynAF)Google Scholar
  4. ISO 12620 Terminology and other language and content resources—Specification of data categories and management of a Data Category Registry for language resources; implemented in ISOcat.orgGoogle Scholar
  5. ISO 24610-1. Language resource management—Feature structures—Part 1: Feature structure representation.Google Scholar
  6. ISO 24613 Language resource management—Lexical markup framework (LMF).Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.InriaParisFrance
  2. 2.Institut für Deutsche Sprache und LinguistikHumboldt-Universität zu BerlinBerlinGermany
  3. 3.Department of LinguisticsGeorgetown UniversityWashingtonUSA

Personalised recommendations