A Discourse Information Radio News Database for Linguistic Analysis

Abstract

In this paper we present DIRNDL, an annotated corpus resource comprising syntactic annotations as well as information status labels and prosodic information. We introduce each annotation layer and then focus on the linking of the data in a standoff approach. The corpus is based on data from radio news broadcasts, i.e. two sets of primary data: spoken radio news files and a written text version which sometimes deviates from the actual spoken data. We utilize a generic relational database management system to bridge the gap between the deviating primary data as well as between the different properties of the annotation levels. We show how the resource can support data extraction concerning the interface between information status, syntax and prosody.

Keywords

None None Linguistic Analysis News Broadcast Pitch Accent Prosodic Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumann S, Riester A (to appear) Referential and Lexical Givenness: Semantic, Prosodic and Cognitive Aspects. In: Elordieta G, Prieto P (eds) Prosody and Meaning Interface Explorations, De Gruyter Mouton, Berlin Google Scholar
  2. Burchardt A, Erk K, Frank A, Kowalski A, Padó S (2006) SALTO: A Versatile Multi-Level Annotation Tool. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC), Genoa, Italy Google Scholar
  3. Cassidy S (2010) An RDF realisation of LAF in the DADA annotation server. In: Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5), Hong Kong Google Scholar
  4. Chiarcos C (this vol.) Interoperability of corpora and annotations. pp 161–179 Google Scholar
  5. Chiarcos C, Ritz J, Stede M (2009) By all these lovely tokens… Merging Conflicting Tokenizations. In: Proceedings of the Third Linguistic Annotation Workshop, Association for Computational Linguistics, Suntec, Singapore, pp 35–43 Google Scholar
  6. Dipper S (2005) XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, pp 39–50 Google Scholar
  7. Eckart K, Eberle K, Heid U (2010) An Infrastructure for More Reliable Corpus Analysis. In: Proceedings of the Workshop on Web Services and Processing Pipelines in HLT: Tool Evaluation, LR Production and Validation (LREC’10), Valletta, Malta, pp 8–14 Google Scholar
  8. Lezius W, Biesinger H, Gerstenberger C (2002) TIGERRegistry Manual. Tech. rep., IMS Stuttgart Google Scholar
  9. Mayer J (1995) Transcription of German Intonation. The Stuttgart System. URL http://www.ims.uni-stuttgart.de/phonetik/joerg/labman/STGTsystem.html, ms
  10. Prince EF (1981) Toward a Taxonomy of Given-New Information. In: Cole P (ed) Radical Pragmatics, Academic Press, New York, pp 233–255 Google Scholar
  11. Prince EF (1992) The ZPG Letter: Subjects, Definiteness and Information Status. In: Mann W, Thompson S (eds) Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text, Benjamins, Amsterdam, pp 295–325 Google Scholar
  12. Rapp S (1995) Automatic Phonemic Transcription and Linguistic Annotation from Known Text with Hidden Markov Models – An Aligner for German. In: Proceedings of ELSNET Goes East and IMACS Workshop “Integration of Language and Speech in Academia and Industry” (Russia) Google Scholar
  13. Riester A, Lorenz D, Seemann N (2010) A Recursive Annotation Scheme for Referential Information Status. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC), Valletta, Malta, pp 717–722 Google Scholar
  14. Rohrer C, Forst M (2006) Improving Coverage and Parsing Quality of a Large-scale LFG for German. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC), Genoa, Italy Google Scholar
  15. Taylor P, Black AW, Caley R (1998) The Architecture Of The Festival Speech Synthesis System. In: Proceedings of the Third ESCA Workshop in Speech Synthesis, pp 147–151 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Kerstin Eckart
    • 1
  • Arndt Riester
  • Katrin Schweitzer
  1. 1.Institut für Maschinelle SprachverarbeitungUniversität StuttgartStuttgartGermany

Personalised recommendations