Where Do Parsing Errors Come From

The Case of Spoken Estonian
  • Kaili Müürisep
  • Helen Nigol
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5246)

Abstract

This paper discusses some issues of developing a parser for spoken Estonian which is based on an already existing parser for written language, and employs the Constraint Grammar framework.

When we used a corpus of face-to-face everyday conversations as the training and testing material, the parser gained the recall 97.6% and the precision 91.8%. The parsing of institutional phone calls turned out to be a more complicated task, with the recall dropping by 3%. In this paper, we will focus on parsing nonfluent speech using a rule-based parser. We will give an overview of parsing errors and ways to overcome them.

Keywords

Parsing Estonian language spoken language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hennoste, T., Lindström, L., Rääbis, A., Toomet, P., Vellerind, R.: Tartu University Corpus of Spoken Estonian. In: Seilenthal, T., Nurk, A., Palo, T. (eds.) Congressus Nonus Internationalis Fenno-Ugristarum. Pars IV. Dissertationes sectionum: Linguistica I, Tartu, pp. 345–351 (2000)Google Scholar
  2. 2.
    Müürisep, K., Puolakainen, T., Muischnek, K., Koit, M., Roosmaa, T., Uibo, H.: A New Language for Constraint Grammar: Estonian. In: Proc. of Conference Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 304–310 (2003)Google Scholar
  3. 3.
    Karlsson, F., Anttila, A., Heikkilä, J., Voutilainen, A.: Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)Google Scholar
  4. 4.
    Müürisep, K., Uibo, H.: Shallow Parsing of Spoken Estonian Using Constraint Grammar. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language 32, pp. 105–118. Samfundslitteratur (2006)Google Scholar
  5. 5.
    Müürisep, K., Nigol, H.: Disfluency Detection and Parsing of Transcribed Speech of Estonian. In: Vetulani, Z. (ed.) Proc.of Human Language Technologies as a Challenge for Computer Science and Linguistics. 3rd Language & Technology Conference, Poznan, Poland, pp. 483–487. Fundacja Uniwersitetu im. A. Mickiewicza (2007)Google Scholar
  6. 6.
    Nigol, H.: Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian. In: Proc. of NODALIDA 2007, Tartu (2007)Google Scholar
  7. 7.
    Charniak, E., Johnson, M.: Edit detection and parsing for transcribed speech. In: Proc. of NAACL 2001, pp. 118–126 (2001)Google Scholar
  8. 8.
    Lease, M., Johnson, M.: Early deletion of fillers in processing conversational speech. In: Proc. HLT-NAACL 2006, companion volume: short papers, pp. 73–76 (2006)Google Scholar
  9. 9.
    Core, M.G., Schubert, L.K.: A Syntactic Framework for Speech Repairs and Other Disruptions. In: Proc. of 37th Ann. Meet. of the ACL, pp. 413–420 (1999)Google Scholar
  10. 10.
    Johannessen, J.B., Jørgensen, F.: Annotating and Parsing Spoken Language. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language 32, pp. 83–103. Samfundslitteratu (2006)Google Scholar
  11. 11.
    Heeman, P., Allen, J.: Tagging Speech Repairs. ARPA Workshop on Human Language Technolog, pp. 187–192 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Kaili Müürisep
    • 1
  • Helen Nigol
    • 2
  1. 1.Institute of Computer ScienceUniversity of TartuTartuEstonia
  2. 2.Institute of Estonian and General LinguisticsUniversity of TartuTartuEstonia

Personalised recommendations