Skip to main content

Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection

  • Conference paper
  • 648 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5603))

Abstract

This paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.6% and precision from 91.6% to 91.8%. Also, the paper gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.

This study has been supported by the grant SF0180078s08 from Estonian Ministry of Education and Research.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hennoste, T., Lindström, L., Rääbis, A., Toomet, P., Vellerind, R.: Tartu University Corpus of Spoken Estonian. In: Seilenthal, T., Nurk, A., Palo, T. (eds.) Congressus Nonus Internationalis Fenno-Ugristarum. Pars IV, Dissertationes sectionum, Linguistica I, pp. 345–351 (2000)

    Google Scholar 

  2. Lindström, L., Pajusalu, K.: Corpus of Estonian Dialects and the Estonian Vowel System. Linguistica Uralica 4, 241–257 (2003)

    Google Scholar 

  3. Sampson, G.: Consistent Annotation of Speech-Repair Structures. In: Rubio, A. (ed.) Proc. of the First International Conference on Language Resources and Evaluation, vol. 2, Granada, Spain (1998)

    Google Scholar 

  4. Müürisep, K.: Parsing Estonian with Constraint Grammar. In: Online Proc. of NODALIDA 2001, Uppsala (2001)

    Google Scholar 

  5. Karlsson, F., Anttila, A., Heikkilä, J., Voutilainen, A.: Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)

    Book  Google Scholar 

  6. Müürisep, K.: Eesti keele arvutigrammatika: süntaks. Dissertationes Mathematicae Universitatis Tartuensis 22, Tartu (2000)

    Google Scholar 

  7. Müürisep, K., Uibo, H.: Shallow Parsing of Spoken Estonian Using Constraint Grammar. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language, vol. 32. Samfundslitteratur, pp. 105–118 (2006)

    Google Scholar 

  8. Bick, E.: Tagging Speech Data - Constraint Grammar Analysis of Spoken Portuguese. In: Proc. of the 17th Scandinavian Conference of Linguistics, Odense (1998)

    Google Scholar 

  9. Nigol, H.: Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian. In: Proc. of NODALIDA 2007, Tartu (2007)

    Google Scholar 

  10. Meteer, M., Taylor, A., MacIntyre, R., Iver, R.: Dysfluency Annotation Stylebook for the Switchboard Corpus. LDC (1995)

    Google Scholar 

  11. Charniak, E., Johnson, M.: Edit Detection and Parsing for Transcribed Speech. In: Proc. of NAACL 2001, pp. 118–126 (2001)

    Google Scholar 

  12. Lease, M., Johnson, M.: Early Deletion of Fillers in Processing Conversational Speech. In: Proc. HLT–NAACL 2006, companion volume: short papers, pp. 73–76 (2006)

    Google Scholar 

  13. Müürisep, K., Nigol, H.: Where do parsing errors come from. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 161–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Bies, A., Strassel, S., Lee, H., Maeda, K., Kulick, S., Liu, Y., Harper, M., Lease, M.: Linguistic Resources for Speech Parsing. In: Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müürisep, K., Nigol, H. (2009). Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04235-5_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04234-8

  • Online ISBN: 978-3-642-04235-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics