Abstract
This article analyzes the automatic detection of sentence modality in French using both prosodic and linguistic information. The goal is to later use such an approach as a support for helping communication with deaf people. Two sentence modalities are evaluated: questions and statements. As linguistic features, we considered the presence of discriminative interrogative patterns and two log-likelihood ratios of the sentence being a question rather than a statement: one based on words and the other one based on part-of-speech tags. The prosodic features are based on duration, energy and pitch features estimated over the last prosodic group of the sentence. The evaluations consider using linguistic features stemming from manual transcriptions or from an automatic speech transcription system. The behavior of various sets of features are analyzed and compared. The combination of linguistic and prosodic features gives a slight improvement on automatic transcriptions, where the correct classification performance reaches 72 %.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jurafsky, D., Bates, R., Coccaro, N., Martin, R., Meteer, M., Ries, K., Shriberg, E., Stolcke, A., Taylor, P., Van Ess-Dykema, C.: Automatic detection of discourse structure for speech recognition and understanding. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 88–95 (1997)
Kral, P., Kleckova, J., Cerisara, C.: Sentence modality recognition in french based on prosody. In: International Conference on Enformatika, Systems Sciences and Engineering - ESSE 2005. vol. 8, pp. 185–188 (2005)
Yuan, J., Jurafsky, D.: Detection of questions in chinese conversational speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 47–52 (2005)
Quang, V.M., Castelli, E., Yên, P.N.: A decision tree-based method for speech processing: question sentence detection. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 1205–1212. Springer, Heidelberg (2006)
Quang, V.M., Besacier, L., Castelli, E.: Automatic question detection: prosodic-lexical features and crosslingual experiments. In: Proceedings of Interspeech, pp. 2257–2260 (2007)
Khan, O., Al-Khatib, W.G., Cheded, L.: A preliminary study of prosody-based detection of questions in Arabic speech monologues. Arab. J. Sci. Eng. 35(2C), 167–181 (2010)
Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Association for Computational Linguistics, pp. 118–124 (2011)
Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for French and English speech-to-text. In: Proceedings of Interspeech (2012)
Liscombe, J., Venditti, J.J., Hirschberg, J.: Detecting question-bearing turns in spoken tutorial dialogues. In: Proceedings of Interspeech (2006)
Boakye, K., Favre, B., Hakkani-Tur, D.: Any questions? automatic question detection in meetings. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 485–489 (2009)
Mendonça, A., Graff, D., DiPersio, D.: French Gigaword third edition. In: Proceedings of the Linguistic Data Consortium (2011)
Galliano, S., Gravier, G., Chaubard, L.: The ESTER 2 evaluation campaign for rich transcription of French broadcasts. In: Proceedings of Interspeech (2009)
Gravier, G., Adda, G., Paulson, N., Carré, M., Giraudel, A., Galibert, O.: The ETAPE corpus for the evaluation of speech-based TV content processing in the French language. In: Proceedings of the International Conference on Language Resources, Evaluation and Corpora (LREC) (2012)
Estève, Y., Bazillon, T., Antoine, J.Y., Béchet, F., Farinas, J.: The EPAC corpus: manual and automatic annotations of conversational speech in French broadcast news. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC) (2010)
Stolcke, A.: SRILM an extensible language modeling toolkit. In: Conference on Spoken Language Processing (2002)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009)
le Cessie, S., van Houwelingen, J.: Ridge estimators in logistic regression. Appl. Stat. 41(1), 191–201 (1992)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc, San Francisco (1993)
Cohen, W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Keerthi, S., Shevade, S., Bhattacharyya, C., Murthy, K.: Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput. 13(3), 637–649 (2001)
Ruck, D.W., Rogers, S.K., Kabrisky, M., Oxley, M.E., Suter, B.W.: The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Trans. Neural Netw. 1(4), 296–298 (1990)
ETSI ES 202 212: Speech processing, transmission and quality aspects (STQ); distributed speech recognition; extended advanced front-end feature extraction algorithm; compression algorithms. ETSI ES (2005)
Placeway, P., et al.: The 1996 Hub-4 Sphinx-3 System. In: DARPA Speech Recognition Workshop (1996)
de Calmès, M., Pérennou, G.: BDLEX : a lexicon for spoken and written French. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 1129–1136 (1998)
Jouvet, D., Fohr, D., Illina, I.: Evaluating grapheme-to-phoneme converters in automatic speech recognition context. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4821–4824 (2012)
Jouvet, D., Fohr, D.: Combining forward-based and backward-based decoders for improved speech recognition performance. In: Proceedings of Interspeech (2013)
Jouvet, D., Langlois, D.: A machine learning based approach for vocabulary selection for speech transcription. In: Habernal, I., Matouŝek, V. (eds.) TSD. LNCS, vol. 8082, pp. 60–67. Springer, Heidelberg (2013)
Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: International Conference on Data Engineering, pp. 215–224 (2001)
Bartkova, K., Jouvet, D.: Automatic detection of the prosodic structures of speech utterances. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 1–8. Springer, Heidelberg (2013)
Acknowledgements
The work presented in this article is part of the RAPSODIE project, and has received support from the “Conseil Régional de Lorraine” and from the “Région Lorraine” (FEDER) (http://erocca.com/rapsodie).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Orosanu, L., Jouvet, D. (2015). Combining Lexical and Prosodic Features for Automatic Detection of Sentence Modality in French. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)