Abstract
This paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.6% and precision from 91.6% to 91.8%. Also, the paper gives a detailed analysis of the types of errors made by the parser while analyzing the corpus of disfluencies.
This study has been supported by the grant SF0180078s08 from Estonian Ministry of Education and Research.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hennoste, T., Lindström, L., Rääbis, A., Toomet, P., Vellerind, R.: Tartu University Corpus of Spoken Estonian. In: Seilenthal, T., Nurk, A., Palo, T. (eds.) Congressus Nonus Internationalis Fenno-Ugristarum. Pars IV, Dissertationes sectionum, Linguistica I, pp. 345–351 (2000)
Lindström, L., Pajusalu, K.: Corpus of Estonian Dialects and the Estonian Vowel System. Linguistica Uralica 4, 241–257 (2003)
Sampson, G.: Consistent Annotation of Speech-Repair Structures. In: Rubio, A. (ed.) Proc. of the First International Conference on Language Resources and Evaluation, vol. 2, Granada, Spain (1998)
Müürisep, K.: Parsing Estonian with Constraint Grammar. In: Online Proc. of NODALIDA 2001, Uppsala (2001)
Karlsson, F., Anttila, A., Heikkilä, J., Voutilainen, A.: Constraint Grammar: a Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (1995)
Müürisep, K.: Eesti keele arvutigrammatika: süntaks. Dissertationes Mathematicae Universitatis Tartuensis 22, Tartu (2000)
Müürisep, K., Uibo, H.: Shallow Parsing of Spoken Estonian Using Constraint Grammar. In: Henrichsen, P.J., Skadhauge, P.R. (eds.) Treebanking for Discourse and Speech. Proc. of NODALIDA 2005 Special Session. Copenhagen Studies in Language, vol. 32. Samfundslitteratur, pp. 105–118 (2006)
Bick, E.: Tagging Speech Data - Constraint Grammar Analysis of Spoken Portuguese. In: Proc. of the 17th Scandinavian Conference of Linguistics, Odense (1998)
Nigol, H.: Parsing Manually Detected and Normalized Disfluencies in Spoken Estonian. In: Proc. of NODALIDA 2007, Tartu (2007)
Meteer, M., Taylor, A., MacIntyre, R., Iver, R.: Dysfluency Annotation Stylebook for the Switchboard Corpus. LDC (1995)
Charniak, E., Johnson, M.: Edit Detection and Parsing for Transcribed Speech. In: Proc. of NAACL 2001, pp. 118–126 (2001)
Lease, M., Johnson, M.: Early Deletion of Fillers in Processing Conversational Speech. In: Proc. HLT–NAACL 2006, companion volume: short papers, pp. 73–76 (2006)
Müürisep, K., Nigol, H.: Where do parsing errors come from. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 161–168. Springer, Heidelberg (2008)
Bies, A., Strassel, S., Lee, H., Maeda, K., Kulick, S., Liu, Y., Harper, M., Lease, M.: Linguistic Resources for Speech Parsing. In: Fifth International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müürisep, K., Nigol, H. (2009). Shallow Parsing of Transcribed Speech of Estonian and Disfluency Detection. In: Vetulani, Z., Uszkoreit, H. (eds) Human Language Technology. Challenges of the Information Society. LTC 2007. Lecture Notes in Computer Science(), vol 5603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04235-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-04235-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04234-8
Online ISBN: 978-3-642-04235-5
eBook Packages: Computer ScienceComputer Science (R0)