Skip to main content

Hesitations in Spontaneous Speech: Acoustic Analysis and Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Abstract

Spontaneous speech is different from any other type of speech in many ways, with speech disfluencies being the prominent feature. These phenomena both play an important role in communication, and also cause problems for automatic speech processing. In this study we present the results of acoustic analysis of the most frequent disfluencies - voiced hesitations (filled pauses and lengthenings) across different speaking styles in spontaneous Russian speech, as well as results of experiments on their detection using SVM classifier on a joint Russian and English spontaneous speech corpus. Results of acoustic analysis showed significant differences in fundamental frequency and energy distribution ratios of hesitations and their contexts across speaking styles in Russian: comparing to the dialogues, in monologues speakers exhibit more prosodic cues for the adjacent context and hesitations. Experiments on detection of voiced hesitations on a mixed language and style corpus with SVM resulted in achieving F1–score = 0.48 (With F1–score = 0.55 for only Russian data).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/

  2. Scikit-Learn: Machine learning in Python. http://scikit-learn.org

  3. Allwood, J., Nivre, J., Ahlsén, E.: Speech management on the non-written life of speech. Nordic J. Linguist. 13(1), 3–48 (1990)

    Article  Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 1–27 (2011). http://www.csie.ntu.edu.tw/cjlin/libsvm

    Article  Google Scholar 

  5. Clark, H.H., Tree, J.E.F.: Using uh and um in spontaneous speaking. Cognition 84(1), 73–111 (2002)

    Article  Google Scholar 

  6. Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)

    Book  Google Scholar 

  7. Du Bois, J.W., Chafe, W.L., Meyer, C., Thompson, S.A., Martey, N.: Santa Barbara Corpus of Spoken American English, Linguistic Data Consortium. Philadelphia (2000–2005)

    Google Scholar 

  8. Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proceeding of 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)

    Google Scholar 

  9. Giannini, A.: Hesitation phenomena in spontaneous Italian. In: Proceeding of 15th International Congress of Phonetic Sciences, Barcelona, Spain, pp. 2653–2656 (2003)

    Google Scholar 

  10. Godfrey, J.J., Holliman, E.C., McDaniel, J.: SwitchBoard: telephone speech corpus for research and development. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1992), vol. 1, pp. 517–520. IEEE (1992)

    Google Scholar 

  11. Heijmans, H.J.: Mathematical morphology: a modern approach in image processing based on algebra and geometry. SIAM Rev. 37(1), 1–36 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  12. INTERSPEECH: Computational Paralinguistic Challenge (2013). http://emotion-research.net/sigs/speech-sig/is13-compare

  13. Khurshudian, V.: Hesitation in typologically different languages: an experimental study. In: Proceeding of International Conference on Computational Linguistics Dialogue, pp. 497–501 (2005)

    Google Scholar 

  14. Kibrik, A., Podlesskaya, V. (eds.): Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres (2014)

    Google Scholar 

  15. Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)

    Google Scholar 

  16. Medeiros, H., Moniz, H., Batista, F., Trancoso, I., Nunes, L., et al.: Disfluency detection based on prosodic features for university lectures. In: Proceeding of INTERSPEECH 2013, Lyon, France, pp. 2629–2633 (2013)

    Google Scholar 

  17. Moniz, H., Batista, F., Mata, A.I., Trancoso, I.: Speaking style effects in the production of disfluencies. Speech Commun. 65, 20–35 (2014)

    Article  Google Scholar 

  18. O’Connel, D.C., Kowal, S.: Communicating with One Another: Toward a Psychology of Spontaneous Spoken Discourse. Cognition and Language: A Series in Psycholinguistics. Springer Science & Business Media, New York (2009). doi:10.1007/978-0-387-77632-3

    Google Scholar 

  19. O’Connell, D., Kowal, S.: The history of research on the filled pause as evidence of the written language bias in linguistics. J. Psycholinguist. Res. 33(6), 459–474 (2004)

    Article  Google Scholar 

  20. Ogden, R.: Turn-holding, turn-yielding and laryngeal activity in finnish talk-in-interaction. J. Int. Phonetics Assoc. 31(1), 139–52 (2001)

    Google Scholar 

  21. O’Shaughnessy, D.: Recognition of hesitations in spontaneous speech. In: Proceeding of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP-1992), vol. 1, pp. 521–524. IEEE (1992)

    Google Scholar 

  22. Ostendorf, M., Shriberg, E., Stolcke, A.: Human language technology: opportunities and challenges. Technical report, DTIC Document (2005)

    Google Scholar 

  23. Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A.: Application of image processing methods to filled pauses detection from spontaneous speech. In: Proceeding of INTERSPEECH 2014, Singapore, pp. 1816–1820. ISCA (2014)

    Google Scholar 

  24. Ranganath, R., Jurafsky, D., McFarland, D.A.: Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Comput. Speech Lang. 27(1), 89–115 (2013)

    Article  Google Scholar 

  25. Shriberg, E.: Preliminaries to a theory of speech disfluencies. Ph.D. thesis, University of California at Berkeley (1994)

    Google Scholar 

  26. Shriberg, E.: To ‘Errrr’ is human: ecology and acoustics of speech disfluencies. J. Int. Phonetic Assoc. 31(1), 153–169 (2001)

    Article  Google Scholar 

  27. Shriberg, E., Bates, R.A., Stolcke, A.: A prosody only decision-tree model for disfluency detection. In: Proceeding of the Eurospeech 1997, 5th European Conference on Speech Communication and Technology, Rhodes, Greece, pp. 2383–2386 (1997)

    Google Scholar 

  28. Stepanova, S.: Some features of filled hesitation pauses in spontaneous Russian. In: Proceeding of 16th International Congress of Phonetic Sciences, Saarbrucken, Germany, vol. 16, pp. 1325–1328 (2007)

    Google Scholar 

  29. Stolcke, A., Shriberg, E., Bates, R.A., Ostendorf, M., Hakkani, D., Plauche, M., Tür, G., Lu, Y.: Automatic detection of sentence boundaries and disfluencies based on recognized words. In: ICSLP (1998)

    Google Scholar 

  30. Thordardottir, E.T., Weismer, S.E.: Content mazes and filled pauses in narrative language samples of children with specific language impairment. Brain Cogn. 48(2–3), 587–592 (2001)

    Google Scholar 

  31. Verkhodanova, V., Shapranov, V.: Automatic detection of filled pauses and lengthenings in the spontaneous Russian speech. In: Proceeding of 7th International Conference Speech Prosody, pp. 1110–1114 (2014)

    Google Scholar 

  32. Verkhodanova, V., Shapranov, V.: Multi-factor method for detection of filled pauses and lengthenings in Russian spontaneous speech. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 285–292. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_35

    Chapter  Google Scholar 

  33. Verkhodanova, V., Shapranov, V.: Detecting filled pauses and lengthenings in Russian spontaneous speech using SVM. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 224–231. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_26

    Chapter  Google Scholar 

  34. Watanabe, M., Hirose, K., Den, Y., Minematsu, N.: Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Commun. 50(2), 81–94 (2008)

    Article  Google Scholar 

  35. Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A., Tokhtamyshev, E.: Open-source multi-language audio database for spoken language processing applications. In: Proceeding of INTERSPEECH 2011, Florence, Italy, pp. 1493–1496 (2011)

    Google Scholar 

Download references

Acknowledgments

This research is supported by the grant of Russian Foundation for Basic Research (project No. 15-06-04465) and by the Council for Grants of the President of the Russian Federation (projects No. MK-1000.2017.8).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasilisa Verkhodanova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Verkhodanova, V., Shapranov, V., Kipyatkova, I. (2017). Hesitations in Spontaneous Speech: Acoustic Analysis and Detection. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics