Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection

  • Kaili Müürisep
  • Helen Nigol
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5603)

Abstract

This paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.6% and precision from 91.6% to 91.8%. Also, the paper gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.

Keywords

Parsing Estonian language spoken language 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hennoste, T., Lindström, L., Rääbis, A., Toomet, P., Vellerind, R.: Tartu University Corpus of Spoken Estonian. In: Seilenthal, T., Nurk, A., Palo, T. (eds.) Congressus Nonus Internationalis Fenno-Ugristarum. Pars IV, Dissertationes sectionum, Linguistica I, pp. 345–351 (2000)Google Scholar
  2. 2.
    Lindström, L., Pajusalu, K.: Corpus of Estonian Dialects and the Estonian Vowel System. Linguistica Uralica 4, 241–257 (2003)Google Scholar
  3. 3.
    Sampson, G.: Consistent Annotation of Speech-Repair Structures. In: Rubio, A. (ed.) Proc. of the First International Conference on Language Resources and Evaluation, vol. 2, Granada, Spain (1998)Google Scholar
  4. 4.
    Müürisep, K.: Parsing Estonian with Constraint Grammar. In: Online Proc. of NODALIDA 2001, Uppsala (2001)Google Scholar
  5. 5.
    Karlsson, F., Anttila, A., Heikkilä, J., Voutilainen, A.: Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)CrossRefGoogle Scholar
  6. 6.
    Müürisep, K.: Eesti keele arvutigrammatika: süntaks. Dissertationes Mathematicae Universitatis Tartuensis 22, Tartu (2000)Google Scholar
  7. 7.
    Müürisep, K., Uibo, H.: Shallow Parsing of Spoken Estonian Using Constraint Grammar. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language, vol. 32. Samfundslitteratur, pp. 105–118 (2006)Google Scholar
  8. 8.
    Bick, E.: Tagging Speech Data - Constraint Grammar Analysis of Spoken Portuguese. In: Proc. of the 17th Scandinavian Conference of Linguistics, Odense (1998)Google Scholar
  9. 9.
    Nigol, H.: Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian. In: Proc. of NODALIDA 2007, Tartu (2007)Google Scholar
  10. 10.
    Meteer, M., Taylor, A., MacIntyre, R., Iver, R.: Dysfluency Annotation Stylebook for the Switchboard Corpus. LDC (1995)Google Scholar
  11. 11.
    Charniak, E., Johnson, M.: Edit Detection and Parsing for Transcribed Speech. In: Proc. of NAACL 2001, pp. 118–126 (2001)Google Scholar
  12. 12.
    Lease, M., Johnson, M.: Early Deletion of Fillers in Processing Conversational Speech. In: Proc. HLT–NAACL 2006, companion volume: short papers, pp. 73–76 (2006)Google Scholar
  13. 13.
    Müürisep, K., Nigol, H.: Where do parsing errors come from. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 161–168. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Bies, A., Strassel, S., Lee, H., Maeda, K., Kulick, S., Liu, Y., Harper, M., Lease, M.: Linguistic Resources for Speech Parsing. In: Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kaili Müürisep
    • 1
  • Helen Nigol
    • 2
  1. 1.Institute of Computer ScienceUniversity of TartuTartuEstonia
  2. 2.Institute of Estonian and General LinguisticsUniversity of TartuTartuEstonia

Personalised recommendations