Skip to main content

Where Speech Recognition Is Going: Conclusion and Future Scope

  • Chapter
  • First Online:
Emotion, Affect and Personality in Speech

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

Today, voice and natural language processing are at the forefront of any human machine interaction environment. The chapter emphasizes the tremendous progress that has taken place in machine learning, statistical data-mining and pattern recognition approaches that can help in making speech interfaces more versatile and pervasive. The growing requirements of speech interfaces also warn against the impediments that may come in the way of successful implementation of acoustically robust natural interfaces. Finally, the chapter underlines the technical advances and research efforts to be undertaken for high performance real-time speech recognition that will completely change the way humans interact with their computing devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Geoffrey Z, Picheny M (2004) Advances in large vocabulary continuous speech recognition. Adv Comput 60:249–291

    Article  Google Scholar 

  2. Campbell N (2007) On the use of nonverbal speech sounds in human communication. In: Campbell N (ed) Verbal and nonverbal communication behaviours LNAI, vol 4775. Springer, New York, pp 117–128

    Chapter  Google Scholar 

  3. Laver J (1980) The phonetic description of voice quality. Cambridge University Press, Cambridge

    Google Scholar 

  4. Roach P, Stibbard R, Osborne J, Arnfield S, Setter J (1998) Transcription of prosodic and paralinguistic features of emotional speech. J Int Phonetic Assoc 28(1–2):83–94

    Article  Google Scholar 

  5. Crystal D (1969) Prosodic systems and intonation in English: David Crystal. Cambridge University Press, Cambridge

    Google Scholar 

  6. Carlson R (2002) Dialogue system. Slide presentation, speech technology, GSLT, Göteborg, 23 Oct 2002. http://www.speech.kth.se/~rolf/gslt/GSLT021023_dialogue.pdf. Accessed 17 August 2015

  7. Rolf C, Granström B (1997) Speech synthesis. In: Hardcastle WJ, Laver J (eds) The handbook of phonetic sciences. Blackwell Publishers Ltd, Oxford, pp 768–788

    Google Scholar 

  8. Schultz T, Rogina I (1995) Acoustic and language modeling of human and nonhuman noises for human-to-human spontaneous speech recognition. In: Proceedings of ICASSP, IEEE, vol 1, Detroit, pp 293–296

    Google Scholar 

  9. Siu M, Ostendorf M (1996) Modeling disfluencies in conversational speech. In: Proceedings of the 4th international conference on spoken language processing (ICSLP-96), vol I, Atlanta, pp 386–389

    Google Scholar 

  10. Siu MH, Ostendorf M (2000) Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process 8(1):63–75

    Article  Google Scholar 

  11. Prylipko D, Vlasenko B, Stolcke A, Wendemuth A (2012) Language modeling of nonverbal vocalizations in spontaneous speech. In: Proceedings of 15th international conference on text, speech and dialogue, 2012. LNCS 7499. Springer, Heidelberg, pp 4625–4628

    Google Scholar 

  12. Mary ZJ, Tian X, Woods KJ, Poeppel D (2015) Multiple levels of linguistic and paralinguistic features contribute to voice recognition. Sci Rep 5:11475

    Google Scholar 

  13. Schötz S (2002) Linguistic & paralinguistic phonetic variation in speaker recognition & text-to-speech synthesis. GSLT papers: speech technology 1

    Google Scholar 

  14. Furui S (1997) Recent advances in speaker recognition. Pattern Recogn Lett 18(9):859–872

    Article  Google Scholar 

  15. Klatt D (1987) Review of text-to-speech conversion for English. J Acoust Soc Am 82:737–783

    Article  Google Scholar 

  16. Roach P (2000). The emotion in speech project. In: Proceedings of the ISCA workshop on speech and emotion. Newcastle, Northern Ireland, Sept 2000, pp 53–59

    Google Scholar 

  17. Gustafson-Capková S (2001) Emotions in speech: tagset and acoustic correlates. Term paper in speech technology 1, Swedish National Graduate School of Language Technology (GSLT), Stockholm University, Department of Linguistics

    Google Scholar 

  18. Bahl L, Brown P, de Souza P, Mercer R (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Tokyo, Japan, pp 49–52

    Google Scholar 

  19. He X, Deng L, Wu C (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36

    Article  Google Scholar 

  20. Deng L (1993) A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans Speech Audio Process 1(4):471–475

    Article  MathSciNet  Google Scholar 

  21. Deng L, Aksmanovic M, Sun D, Wu J (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states. IEEE Trans Speech Audio Process 2:507–520

    Article  Google Scholar 

  22. Poritz A (1998) Hidden Markov models: a guided tour. In: Proceedings of the international conference on acoustics, speech, and signal processing, vol 1, Seattle, WA, pp 1–4

    Google Scholar 

  23. Glass J (2003) A probabilistic framework for segment-based speech recognition. In: Russell M, Bilmes J (eds) New computational paradigms for acoustic modeling in speech recognition, computer, speech and language (special issue), vol 17(2–3), pp 137–152

    Google Scholar 

  24. Deng L, Yu D, Acero A (2006) Structured speech modeling. IEEE Trans Audio, Speech Lang Process (special issue on Rich Transcription) 14(5):1492–1504

    Google Scholar 

  25. Chelba C, Jelinek F (2000) Structured language modeling. Comput Speech Lang 14:283–332

    Article  Google Scholar 

  26. Wang Y, Mahajan M, Huang X (2000) A unified context-free grammar and n-gram model for spoken language processing. In: Proceedings of the international conference on acoustics, speech, and signal processing, Istanbul, Turkey, vol 3, pp 1639–1642

    Google Scholar 

  27. Kumar N, Andreou A (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition. Speech Commun 26:283–297

    Article  Google Scholar 

  28. Morgan N, Zhu Q, Stolcke A, Sonmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Cetin O, Bourlard H, Athineos M (2005) Pushing the envelope—Aside. IEEE Signal Process Mag 22:81–88

    Article  Google Scholar 

  29. Gauvain J-L, Lee C-H (1997) Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 7:711–720

    Google Scholar 

  30. Leggetter C, Woodland P (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput Speech Lang 9:171–185

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swati Johar .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 The Author(s)

About this chapter

Cite this chapter

Johar, S. (2016). Where Speech Recognition Is Going: Conclusion and Future Scope. In: Emotion, Affect and Personality in Speech. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-28047-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28047-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28045-5

  • Online ISBN: 978-3-319-28047-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics