Hesitations in Spontaneous Speech: Acoustic Analysis and Detection

Verkhodanova, Vasilisa; Shapranov, Vladimir; Kipyatkova, Irina

doi:10.1007/978-3-319-66429-3_39

Hesitations in Spontaneous Speech: Acoustic Analysis and Detection

Vasilisa Verkhodanova¹⁶,
Vladimir Shapranov¹⁶ &
Irina Kipyatkova¹⁶

Conference paper
First Online: 13 August 2017

2268 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

Spontaneous speech is different from any other type of speech in many ways, with speech disfluencies being the prominent feature. These phenomena both play an important role in communication, and also cause problems for automatic speech processing. In this study we present the results of acoustic analysis of the most frequent disfluencies - voiced hesitations (filled pauses and lengthenings) across different speaking styles in spontaneous Russian speech, as well as results of experiments on their detection using SVM classifier on a joint Russian and English spontaneous speech corpus. Results of acoustic analysis showed significant differences in fundamental frequency and energy distribution ratios of hesitations and their contexts across speaking styles in Russian: comparing to the dialogues, in monologues speakers exhibit more prosodic cues for the adjacent context and hesitations. Experiments on detection of voiced hesitations on a mixed language and style corpus with SVM resulted in achieving F1–score = 0.48 (With F1–score = 0.55 for only Russian data).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/
Scikit-Learn: Machine learning in Python. http://scikit-learn.org
Allwood, J., Nivre, J., Ahlsén, E.: Speech management on the non-written life of speech. Nordic J. Linguist. 13(1), 3–48 (1990)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 1–27 (2011). http://www.csie.ntu.edu.tw/cjlin/libsvm
Article Google Scholar
Clark, H.H., Tree, J.E.F.: Using uh and um in spontaneous speaking. Cognition 84(1), 73–111 (2002)
Article Google Scholar
Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)
Book Google Scholar
Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., Martey, N.: Santa Barbara Corpus of Spoken American English, Linguistic Data Consortium. Philadelphia (2000–2005)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceeding of 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Giannini, A.: Hesitation phenomena in spontaneous Italian. In: Proceeding of 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp. 2653–2656 (2003)
Google Scholar
Godfrey, J.J., Holliman, E.C., McDaniel, J.: SwitchBoard: telephone speech corpus for research and development. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1992), vol. 1, pp. 517–520. IEEE (1992)
Google Scholar
Heijmans, H.J.: Mathematical morphology: a modern approach in image processing based on algebra and geometry. SIAM Rev. 37(1), 1–36 (1995)
Article MathSciNet MATH Google Scholar
INTERSPEECH: Computational Paralinguistic Challenge (2013). http://emotion-research.net/sigs/speech-sig/is13-compare
Khurshudian, V.: Hesitation in typologically different languages: an experimental study. In: Proceeding of International Conference on Computational Linguistics Dialogue, pp. 497–501 (2005)
Google Scholar
Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)
Google Scholar
Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)
Google Scholar
Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: Proceeding of INTERSPEECH 2013, Lyon, France, pp. 2629–2633 (2013)
Google Scholar
Moniz, H., Batista, F., Mata, A.I., Trancoso, I.: Speaking style effects in the production of disfluencies. Speech Commun. 65, 20–35 (2014)
Article Google Scholar
O’Connel, D.C., Kowal, S.: Communicating with One Another: Toward a Psychology of Spontaneous Spoken Discourse. Cognition and Language: A Series in Psycholinguistics. Springer Science & Business Media, New York (2009). doi:10.1007/978-0-387-77632-3
Google Scholar
O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics. J. Psycholinguist. Res. 33(6), 459–474 (2004)
Article Google Scholar
Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–52 (2001)
Google Scholar
O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP-1992), vol. 1, pp. 521–524. IEEE (1992)
Google Scholar
Ostendorf, M., Shriberg, E., Stolcke, A.: Human language technology: opportunities and challenges. Technical report, DTIC Document (2005)
Google Scholar
Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: Proceeding of INTERSPEECH 2014, Singapore, pp. 1816–1820. ISCA (2014)
Google Scholar
Ranganath, R., Jurafsky, D., McFarland, D.A.: Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Comput. Speech Lang. 27(1), 89–115 (2013)
Article Google Scholar
Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)
Google Scholar
Shriberg, E.: To ‘Errrr’ is human: ecology and acoustics of speech disfluencies. J. Int. Phonetic Assoc. 31(1), 153–169 (2001)
Article Google Scholar
Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Proceeding of the Eurospeech 1997, 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2383–2386 (1997)
Google Scholar
Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. In: Proceeding of 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, vol. 16, pp. 1325–1328 (2007)
Google Scholar
Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)
Google Scholar
Thordardottir, E.T., Weismer, S.E.: Content mazes and filled pauses in narrative language samples of children with specific language impairment. Brain Cogn. 48(2–3), 587–592 (2001)
Google Scholar
Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech. In: Proceeding of 7th International Conference Speech Prosody, pp. 1110–1114 (2014)
Google Scholar
Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in Russian spontaneous speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 285–292. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_35
Chapter Google Scholar
Verkhodanova, V., Shapranov, V.: Detecting filled pauses and lengthenings in Russian spontaneous speech using SVM. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 224–231. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_26
Chapter Google Scholar
Watanabe, M., Hirose, K., Den, Y., Minematsu, N.: Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Commun. 50(2), 81–94 (2008)
Article Google Scholar
Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A., Tokhtamyshev, E.: Open-source multi-language audio database for spoken language processing applications. In: Proceeding of INTERSPEECH 2011, Florence, Italy, pp. 1493–1496 (2011)
Google Scholar

Download references

Acknowledgments

This research is supported by the grant of Russian Foundation for Basic Research (project No. 15-06-04465) and by the Council for Grants of the President of the Russian Federation (projects No. MK-1000.2017.8).

Author information

Authors and Affiliations

SPIIRAS, St. Petersburg, Russia
Vasilisa Verkhodanova, Vladimir Shapranov & Irina Kipyatkova

Authors

Vasilisa Verkhodanova
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Shapranov
View author publications
You can also search for this author in PubMed Google Scholar
Irina Kipyatkova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasilisa Verkhodanova .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verkhodanova, V., Shapranov, V., Kipyatkova, I. (2017). Hesitations in Spontaneous Speech: Acoustic Analysis and Detection. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_39
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics