Abstract
Spontaneous speech is different from any other type of speech in many ways, with speech disfluencies being the prominent feature. These phenomena both play an important role in communication, and also cause problems for automatic speech processing. In this study we present the results of acoustic analysis of the most frequent disfluencies - voiced hesitations (filled pauses and lengthenings) across different speaking styles in spontaneous Russian speech, as well as results of experiments on their detection using SVM classifier on a joint Russian and English spontaneous speech corpus. Results of acoustic analysis showed significant differences in fundamental frequency and energy distribution ratios of hesitations and their contexts across speaking styles in Russian: comparing to the dialogues, in monologues speakers exhibit more prosodic cues for the adjacent context and hesitations. Experiments on detection of voiced hesitations on a mixed language and style corpus with SVM resulted in achieving F1–score = 0.48 (With F1–score = 0.55 for only Russian data).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/
Scikit-Learn: Machine learning in Python. http://scikit-learn.org
Allwood, J., Nivre, J., Ahlsén, E.: Speech management on the non-written life of speech. Nordic J. Linguist. 13(1), 3–48 (1990)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 1–27 (2011). http://www.csie.ntu.edu.tw/cjlin/libsvm
Clark, H.H., Tree, J.E.F.: Using uh and um in spontaneous speaking. Cognition 84(1), 73–111 (2002)
Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)
Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., Martey, N.: Santa Barbara Corpus of Spoken American English, Linguistic Data Consortium. Philadelphia (2000–2005)
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceeding of 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Giannini, A.: Hesitation phenomena in spontaneous Italian. In: Proceeding of 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp. 2653–2656 (2003)
Godfrey, J.J., Holliman, E.C., McDaniel, J.: SwitchBoard: telephone speech corpus for research and development. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1992), vol. 1, pp. 517–520. IEEE (1992)
Heijmans, H.J.: Mathematical morphology: a modern approach in image processing based on algebra and geometry. SIAM Rev. 37(1), 1–36 (1995)
INTERSPEECH: Computational Paralinguistic Challenge (2013). http://emotion-research.net/sigs/speech-sig/is13-compare
Khurshudian, V.: Hesitation in typologically different languages: an experimental study. In: Proceeding of International Conference on Computational Linguistics Dialogue, pp. 497–501 (2005)
Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)
Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)
Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: Proceeding of INTERSPEECH 2013, Lyon, France, pp. 2629–2633 (2013)
Moniz, H., Batista, F., Mata, A.I., Trancoso, I.: Speaking style effects in the production of disfluencies. Speech Commun. 65, 20–35 (2014)
O’Connel, D.C., Kowal, S.: Communicating with One Another: Toward a Psychology of Spontaneous Spoken Discourse. Cognition and Language: A Series in Psycholinguistics. Springer Science & Business Media, New York (2009). doi:10.1007/978-0-387-77632-3
O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics. J. Psycholinguist. Res. 33(6), 459–474 (2004)
Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–52 (2001)
O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP-1992), vol. 1, pp. 521–524. IEEE (1992)
Ostendorf, M., Shriberg, E., Stolcke, A.: Human language technology: opportunities and challenges. Technical report, DTIC Document (2005)
Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: Proceeding of INTERSPEECH 2014, Singapore, pp. 1816–1820. ISCA (2014)
Ranganath, R., Jurafsky, D., McFarland, D.A.: Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Comput. Speech Lang. 27(1), 89–115 (2013)
Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)
Shriberg, E.: To ‘Errrr’ is human: ecology and acoustics of speech disfluencies. J. Int. Phonetic Assoc. 31(1), 153–169 (2001)
Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Proceeding of the Eurospeech 1997, 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2383–2386 (1997)
Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. In: Proceeding of 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, vol. 16, pp. 1325–1328 (2007)
Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
Thordardottir, E.T., Weismer, S.E.: Content mazes and filled pauses in narrative language samples of children with specific language impairment. Brain Cogn. 48(2–3), 587–592 (2001)
Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech. In: Proceeding of 7th International Conference Speech Prosody, pp. 1110–1114 (2014)
Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in Russian spontaneous speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 285–292. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_35
Verkhodanova, V., Shapranov, V.: Detecting filled pauses and lengthenings in Russian spontaneous speech using SVM. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 224–231. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_26
Watanabe, M., Hirose, K., Den, Y., Minematsu, N.: Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Commun. 50(2), 81–94 (2008)
Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A., Tokhtamyshev, E.: Open-source multi-language audio database for spoken language processing applications. In: Proceeding of INTERSPEECH 2011, Florence, Italy, pp. 1493–1496 (2011)
Acknowledgments
This research is supported by the grant of Russian Foundation for Basic Research (project No. 15-06-04465) and by the Council for Grants of the President of the Russian Federation (projects No. MK-1000.2017.8).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Verkhodanova, V., Shapranov, V., Kipyatkova, I. (2017). Hesitations in Spontaneous Speech: Acoustic Analysis and Detection. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)