Where Speech Recognition Is Going: Conclusion and Future Scope

Johar, Swati

doi:10.1007/978-3-319-28047-9_6

Swati Johar²

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

1560 Accesses
2 Citations

Abstract

Today, voice and natural language processing are at the forefront of any human machine interaction environment. The chapter emphasizes the tremendous progress that has taken place in machine learning, statistical data-mining and pattern recognition approaches that can help in making speech interfaces more versatile and pervasive. The growing requirements of speech interfaces also warn against the impediments that may come in the way of successful implementation of acoustically robust natural interfaces. Finally, the chapter underlines the technical advances and research efforts to be undertaken for high performance real-time speech recognition that will completely change the way humans interact with their computing devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291
Article Google Scholar
Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128
Chapter Google Scholar
Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge
Google Scholar
Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94
Article Google Scholar
Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge
Google Scholar
Carlson R (2002) Dialogue system. Slide presentation, speech technology, GSLT, Göteborg, 23 Oct 2002. http://www.speech.kth.se/~rolf/gslt/GSLT021023_dialogue.pdf. Accessed 17 August 2015
Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788
Google Scholar
Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296
Google Scholar
Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389
Google Scholar
Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75
Article Google Scholar
Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628
Google Scholar
Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475
Google Scholar
Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1
Google Scholar
Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872
Article Google Scholar
Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783
Article Google Scholar
Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59
Google Scholar
Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics
Google Scholar
Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52
Google Scholar
He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36
Article Google Scholar
Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475
Article MathSciNet Google Scholar
Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520
Article Google Scholar
Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4
Google Scholar
Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152
Google Scholar
Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504
Google Scholar
Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332
Article Google Scholar
Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642
Google Scholar
Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297
Article Google Scholar
Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88
Article Google Scholar
Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720
Google Scholar
Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185
Article Google Scholar

Download references

Author information

Authors and Affiliations

Defence Institute of Psychological Research, DRDO, Ministry of Defence, New Delhi, India
Swati Johar

Authors

Swati Johar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Swati Johar .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Johar, S. (2016). Where Speech Recognition Is Going: Conclusion and Future Scope. In: Emotion, Affect and Personality in Speech. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-28047-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-28047-9_6
Published: 23 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28045-5
Online ISBN: 978-3-319-28047-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics