Abstract
Today, voice and natural language processing are at the forefront of any human machine interaction environment. The chapter emphasizes the tremendous progress that has taken place in machine learning, statistical data-mining and pattern recognition approaches that can help in making speech interfaces more versatile and pervasive. The growing requirements of speech interfaces also warn against the impediments that may come in the way of successful implementation of acoustically robust natural interfaces. Finally, the chapter underlines the technical advances and research efforts to be undertaken for high performance real-time speech recognition that will completely change the way humans interact with their computing devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291
Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128
Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge
Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94
Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge
Carlson R (2002) Dialogue system. Slide presentation, speech technology, GSLT, Göteborg, 23 Oct 2002. http://www.speech.kth.se/~rolf/gslt/GSLT021023_dialogue.pdf. Accessed 17 August 2015
Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788
Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296
Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389
Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75
Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628
Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475
Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1
Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872
Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783
Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59
Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics
Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52
He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36
Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475
Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520
Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4
Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152
Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504
Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332
Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642
Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297
Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88
Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720
Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 The Author(s)
About this chapter
Cite this chapter
Johar, S. (2016). Where Speech Recognition Is Going: Conclusion and Future Scope. In: Emotion, Affect and Personality in Speech. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-28047-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-28047-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28045-5
Online ISBN: 978-3-319-28047-9
eBook Packages: EngineeringEngineering (R0)