Skip to main content

Lexical Emphasis Detection in Spoken French Using F-BANKs and Neural Networks

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

Abstract

Expressiveness and non-verbal information in speech are active research topics in speech processing. In this work, we are interested in detecting emphasis at word-level as a mean to identify what are the focus words in a given utterance. We compare several machine learning techniques (Linear Discriminant Analysis, Support Vector Machines, Neural Networks) for this task carried out on SIWIS, a French speech synthesis database. Our approach consists first in aligning the spoken words to the speech signal and second to feed classifier with filter bank coefficients in order to take a binary decision at word-level: neutral/emphasized. Evaluation results show that a three-layer neural network performed best with a \(93\%\) accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  2. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    Article  Google Scholar 

  3. Campbell, N.: Loudness, spectral tilt, and perceived prominence in dialogues. In: Proceedings ICPhS, vol. 95, pp. 676–679 (1995)

    Google Scholar 

  4. Campbell, N.: On the use of nonverbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) Verbal and Nonverbal Communication Behaviours. LNCS, vol. 4775, pp. 117–128. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76442-7_11

    Chapter  Google Scholar 

  5. Campbell, W.N.: Prosodic encoding of English speech. In: Second International Conference on Spoken Language Processing (1992)

    Google Scholar 

  6. Cohn, A.C., Fougeron, C., Huffman, M.K.: The Oxford Handbook of Laboratory Phonology. Oxford University Press, Oxford (2012). Sect. 6.2, pp. 103–114

    Google Scholar 

  7. Cole, J., Mo, Y., Hasegawa-Johnson, M.: Signal-based and expectation-based factors in the perception of prosodic prominence. Lab. Phonol. 1(2), 425–452 (2010)

    Article  Google Scholar 

  8. Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. In: INTERSPEECH, pp. 1149–1152 (2005)

    Google Scholar 

  9. Heldner, M.: On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in swedish. J. Phon. 31(1), 39–62 (2003)

    Article  Google Scholar 

  10. Honnet, P.E., Lazaridis, A., Garner, P.N., Yamagishi, J.: The SIWIS French speech synthesis database? Design and recording of a high quality French database for speech synthesis. Technical report, Idiap (2017)

    Google Scholar 

  11. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  12. Li, K., Meng, H.: Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks. Speech Commun. (2016)

    Google Scholar 

  13. Li, K., Zhang, S., Li, M., Lo, W.K., Meng, H.M.: Prominence model for prosodic features in automatic lexical stress and pitch accent detection. In: INTERSPEECH, pp. 2009–2012 (2011)

    Google Scholar 

  14. Narupiyakul, L., Keselj, V., Cercone, N., Sirinaovakul, B.: Focus to emphasize tone analysis for prosodic generation. Comput. Math. Appl. 55(8), 1735–1753 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  15. Noth, E., Batliner, A., Kießling, A., Kompe, R., Niemann, H.: Verbmobil: the use of prosody in the linguistic components of a speech understanding system. IEEE Trans. Speech Audio Process. 8(5), 519–532 (2000)

    Article  Google Scholar 

  16. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, No. EPFL-CONF-192584. IEEE Signal Processing Society (2011)

    Google Scholar 

  17. Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)

    Article  Google Scholar 

  18. Sluijter, A.M., Shattuck-Hufnagel, S., Stevens, K.N., Van Heuven, V., et al.: Supralaryngeal resonance and glottal pulse shape as correlates of prosodic stress and accent in American English (1995)

    Google Scholar 

  19. Sluijter, A.M., Van Heuven, V.J.: Spectral balance as an acoustic correlate of linguistic stress. J. Acoust. Soc. Am. 100(4), 2471–2485 (1996)

    Article  Google Scholar 

  20. Streefkerk, B.M., Pols, L.C., Ten Bosch, L., et al.: Automatic detection of prominence (as defined by listeners’ judgements) in read aloud Dutch sentences. In: ICSLP (1998)

    Google Scholar 

  21. Tepperman, J., Narayanan, S.: Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners. In: IEEE International Conference on Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 1, pp. I–937. IEEE (2005)

    Google Scholar 

  22. Van Kuijk, D., Boves, L.: Acoustic characteristics of lexical stress in continuous telephone speech. Speech Commun. 27(2), 95–111 (1999)

    Article  Google Scholar 

  23. Wheatley, B., Doddington, G., Hemphill, C., Godfrey, J., Holliman, E., McDaniel, J., Fisher, D.: Robust automatic time alignment of orthographic transcriptions with unconstrained speech. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1992, vol. 1, pp. 533–536. IEEE (1992)

    Google Scholar 

  24. Wightman, C.W., Ostendorf, M.: Automatic labeling of prosodic patterns. IEEE Trans. Speech Audio Process. 2(4), 469–481 (1994)

    Article  Google Scholar 

  25. Yu, K., Mairesse, F., Young, S.: Word-level emphasis modelling in HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4238–4241. IEEE (2010)

    Google Scholar 

  26. Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3517–3521. IEEE (2013)

    Google Scholar 

  27. Zhao, J., Yuan, H., Liu, J., Xia, S.: Automatic lexical stress detection using acoustic features for computer assisted language learning. In: Proceedings of the APSIPA ASC, pp. 247–251 (2011)

    Google Scholar 

  28. Zhu, Y., Liu, J., Liu, R.: Automatic lexical stress detection for English learning. In: Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 728–733. IEEE (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelwahab Heba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Heba, A., Pellegrini, T., Jorquera, T., André-Obrecht, R., Lorré, JP. (2017). Lexical Emphasis Detection in Spoken French Using F-BANKs and Neural Networks. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68456-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68455-0

  • Online ISBN: 978-3-319-68456-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics