Lexical Emphasis Detection in Spoken French Using F-BANKs and Neural Networks

Heba, Abdelwahab; Pellegrini, Thomas; Jorquera, Tom; André-Obrecht, Régine; Lorré, Jean-Pierre

doi:10.1007/978-3-319-68456-7_20

Abdelwahab Heba^16,17,
Thomas Pellegrini¹⁷,
Tom Jorquera¹⁶,
Régine André-Obrecht¹⁷ &
…
Jean-Pierre Lorré¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

701 Accesses
1 Citations

Abstract

Expressiveness and non-verbal information in speech are active research topics in speech processing. In this work, we are interested in detecting emphasis at word-level as a mean to identify what are the focus words in a given utterance. We compare several machine learning techniques (Linear Discriminant Analysis, Support Vector Machines, Neural Networks) for this task carried out on SIWIS, a French speech synthesis database. Our approach consists first in aligning the spoken words to the speech signal and second to feed classifier with filter bank coefficients in order to take a binary decision at word-level: neutral/emphasized. Evaluation results show that a three-layer neural network performed best with a \(93\%\) accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Article Google Scholar
Campbell, N.: Loudness, spectral tilt, and perceived prominence in dialogues. In: Proceedings ICPhS, vol. 95, pp. 676–679 (1995)
Google Scholar
Campbell, N.: On the use of nonverbal speech sounds in human communication. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) Verbal and Nonverbal Communication Behaviours. LNCS, vol. 4775, pp. 117–128. Springer, Heidelberg (2007). doi:10.1007/978-3-540-76442-7_11
Chapter Google Scholar
Campbell, W.N.: Prosodic encoding of English speech. In: Second International Conference on Spoken Language Processing (1992)
Google Scholar
Cohn, A.C., Fougeron, C., Huffman, M.K.: The Oxford Handbook of Laboratory Phonology. Oxford University Press, Oxford (2012). Sect. 6.2, pp. 103–114
Google Scholar
Cole, J., Mo, Y., Hasegawa-Johnson, M.: Signal-based and expectation-based factors in the perception of prosodic prominence. Lab. Phonol. 1(2), 425–452 (2010)
Article Google Scholar
Galliano, S., Geoffrois, E., Mostefa, D., Choukri, K., Bonastre, J.F., Gravier, G.: The ESTER phase II evaluation campaign for the rich transcription of French broadcast news. In: INTERSPEECH, pp. 1149–1152 (2005)
Google Scholar
Heldner, M.: On the reliability of overall intensity and spectral emphasis as acoustic correlates of focal accents in swedish. J. Phon. 31(1), 39–62 (2003)
Article Google Scholar
Honnet, P.E., Lazaridis, A., Garner, P.N., Yamagishi, J.: The SIWIS French speech synthesis database? Design and recording of a high quality French database for speech synthesis. Technical report, Idiap (2017)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, K., Meng, H.: Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks. Speech Commun. (2016)
Google Scholar
Li, K., Zhang, S., Li, M., Lo, W.K., Meng, H.M.: Prominence model for prosodic features in automatic lexical stress and pitch accent detection. In: INTERSPEECH, pp. 2009–2012 (2011)
Google Scholar
Narupiyakul, L., Keselj, V., Cercone, N., Sirinaovakul, B.: Focus to emphasize tone analysis for prosodic generation. Comput. Math. Appl. 55(8), 1735–1753 (2008)
Article MathSciNet MATH Google Scholar
Noth, E., Batliner, A., Kießling, A., Kompe, R., Niemann, H.: Verbmobil: the use of prosody in the linguistic components of a speech understanding system. IEEE Trans. Speech Audio Process. 8(5), 519–532 (2000)
Article Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, No. EPFL-CONF-192584. IEEE Signal Processing Society (2011)
Google Scholar
Shriberg, E., Stolcke, A., Hakkani-Tür, D., Tür, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)
Article Google Scholar
Sluijter, A.M., Shattuck-Hufnagel, S., Stevens, K.N., Van Heuven, V., et al.: Supralaryngeal resonance and glottal pulse shape as correlates of prosodic stress and accent in American English (1995)
Google Scholar
Sluijter, A.M., Van Heuven, V.J.: Spectral balance as an acoustic correlate of linguistic stress. J. Acoust. Soc. Am. 100(4), 2471–2485 (1996)
Article Google Scholar
Streefkerk, B.M., Pols, L.C., Ten Bosch, L., et al.: Automatic detection of prominence (as defined by listeners’ judgements) in read aloud Dutch sentences. In: ICSLP (1998)
Google Scholar
Tepperman, J., Narayanan, S.: Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners. In: IEEE International Conference on Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 1, pp. I–937. IEEE (2005)
Google Scholar
Van Kuijk, D., Boves, L.: Acoustic characteristics of lexical stress in continuous telephone speech. Speech Commun. 27(2), 95–111 (1999)
Article Google Scholar
Wheatley, B., Doddington, G., Hemphill, C., Godfrey, J., Holliman, E., McDaniel, J., Fisher, D.: Robust automatic time alignment of orthographic transcriptions with unconstrained speech. In: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-1992, vol. 1, pp. 533–536. IEEE (1992)
Google Scholar
Wightman, C.W., Ostendorf, M.: Automatic labeling of prosodic patterns. IEEE Trans. Speech Audio Process. 2(4), 469–481 (1994)
Article Google Scholar
Yu, K., Mairesse, F., Young, S.: Word-level emphasis modelling in HMM-based speech synthesis. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 4238–4241. IEEE (2010)
Google Scholar
Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3517–3521. IEEE (2013)
Google Scholar
Zhao, J., Yuan, H., Liu, J., Xia, S.: Automatic lexical stress detection using acoustic features for computer assisted language learning. In: Proceedings of the APSIPA ASC, pp. 247–251 (2011)
Google Scholar
Zhu, Y., Liu, J., Liu, R.: Automatic lexical stress detection for English learning. In: Proceedings of the 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 728–733. IEEE (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Linagora, Toulouse, France
Abdelwahab Heba, Tom Jorquera & Jean-Pierre Lorré
IRIT, Université de Toulouse, Toulouse, France
Abdelwahab Heba, Thomas Pellegrini & Régine André-Obrecht

Authors

Abdelwahab Heba
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Pellegrini
View author publications
You can also search for this author in PubMed Google Scholar
Tom Jorquera
View author publications
You can also search for this author in PubMed Google Scholar
Régine André-Obrecht
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Lorré
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdelwahab Heba .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heba, A., Pellegrini, T., Jorquera, T., André-Obrecht, R., Lorré, JP. (2017). Lexical Emphasis Detection in Spoken French Using F-BANKs and Neural Networks. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_20
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics