Skip to main content

The Business of Speech Technologies

  • Chapter
Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

  • 7883 Accesses

Abstract

With the fast pace of developments of communications networks and devices, immediate and easy access to information and services is now the expected norm. Several critical technologies have entered the marketplace as key enablers to help make this a reality. In particular, speech technologies, such as speech recognition and natural language understanding, have changed the landscape of how services are provided by businesses to consumers forever. In 30 short years, speech has progressed from an idea in research laboratories across the world, to a multibillion-dollar industry of software, hardware, service hosting, and professional services. Speech is now almost ubiquitous in cell phones. Yet, the industry is still very much in its infancy with its focus being on simple low hanging fruit applications of the technologies where the current state of technology actually fits a specific market need, such as voice enabling of call center services or voice dialing over a cell phone.

With broadband access to networks (and therefore data), anywhere, anytime, and using any device, almost a reality, speech technologies will continue to be essential for unlocking the potential that such access provides. However, to unlock this potential, advances in basic speech technologies beyond the current state of the art are essential. In this chapter, we review the business of speech technologies and its development since the 1980s. How did it start? What were the key inventions that got us where we are, and the services innovations that supported the industry over the past few decades? What are the future trends on how speech technologies will be used? And what are the key technical challenges researchers must address and resolve for the industry to move forward to meet this vision of the future? This chapter is by no means meant to be exhaustive, but it gives the reader an understanding of speech technologies, the speech business, and areas where continued technical invention and innovation will be needed before the ubiquitous use of speech technologies can be seen in the marketplace.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ARPA:

Advanced Research Projects Agency

ART:

advanced recognition technology

ASR:

automatic speech recognition

ATIS:

airline travel information system

BBN:

Bolt, Beranek and Newman

CE:

categorical estimation

CMU:

Carnegie Mellon University

DARPA:

Defense Advanced Research Projects Agency

DM:

dialog management

DP:

dynamic programming

DSP:

digital signal processing

DT:

discriminative training

DTW:

dynamic time warping

FFT:

fast Fourier transform

GMM:

Gaussian mixture model

HMIHY:

How May I Help You

HMM:

hidden Markov models

IP:

internet protocol

IVR:

interactive voice response

LDA:

linear discriminant analysis

LPC:

linear predictive coding

MCE:

minimum classification error

MLLR:

maximum-likelihood linear regression

MMI:

maximum mutual information

NLU:

natural language understanding

PDA:

pitch determination algorithms

SDC:

shifted delta cepstral

SLM:

statistical language model

SMS:

speaker model synthesis

SVM:

support vector machines

TI:

transinformation index

UE:

user experience

VRCP:

voice recognition call processing

VTLN:

vocal-tract-length normalization

VoIP:

voice over IP

XML:

extensible mark-up languages

References

  1. J.R. Pierce: Whither speech recognition?, J. Acoust. Soc. Am. 46(4), 1029-1051 (1969)

    Google Scholar 

  2. K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. In: Communication Theory, ed. by W. Jackson (Butterworths, London 1953)

    Google Scholar 

  3. A. Lolje, M. Riley, D. Hindle, F. Pereira: The AT&T 60000 word speech-to-text system, Proc. Spoken Language Technology Workshop (Morgan Kaufmann, Austin 1995) pp. 162-165

    Google Scholar 

  4. L. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)

    MATH  Google Scholar 

  5. F.C. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Devices for Natural Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)

    Google Scholar 

  6. V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, M. Saraclar: The AT&T watson speech recognizer, Proc. IEEE ICASSP (2005)

    Google Scholar 

  7. J. Huang, B. Kingsbury, L. Mangu, M. Padmanabhan, G. Saon, G. Zweig: Recent improvements in speech recognition performance on large conversational speech, Proc. ICSLP (2000)

    Google Scholar 

  8. B. Atal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55(6), 1304-1312 (1974)

    Article  Google Scholar 

  9. K. Vintsyuk: Speech discrimination by dynamic programming, Kibernetika 4, 81-88 (1968)

    Article  MathSciNet  Google Scholar 

  10. F. Jelinek: Continuous speech recognition by statistical methods, Proc. IEEE 64(4), 532-556 (1976)

    Article  Google Scholar 

  11. L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)

    Article  Google Scholar 

  12. F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1997)

    Google Scholar 

  13. L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer: Maximum mutual information estimation of HMM parameters for speech recognition, Proc. IEEE ICASSP (1986)

    Google Scholar 

  14. M.H. Cohen, J.P. Giangola, J. Balogh: Voice User Interface Design (Addison Wesley, Boston 2004)

    Google Scholar 

  15. A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans: Advances in Large Margin Classifiers (MIT Press, Cambridge 2000)

    MATH  Google Scholar 

  16. R. Schapire, M. Rochery, M. Rahim, N. Gupta: Incorporating prior knowledge into boosting, Proc. Nineteenth Int. Conf. Machine Learning (2002)

    Google Scholar 

  17. J. Baker: The Dragon system - an overview, IEEE Trans. ASSP 23(1), 24-29 (1975)

    Article  Google Scholar 

  18. A. Gorin, G. Riccardi, J. Wright: How May I Help You?, Speech Commun. 23, 113-127 (1997)

    Article  MATH  Google Scholar 

  19. http://www.nexidia.com

  20. http://www.verint.com

  21. R. Natarajan, B. Prasad, B. Suhm, D. McCarthy: Speech enabled natural language call routing: BBN call director, Proc. Int. Conf. Spoken Language Process. (2002)

    Google Scholar 

  22. L. Lee, R. Rose: A Frequency Warping Approach to Speaker Normalization, IEEE Trans. Speech Audio Process. 6, 49-60 (1998)

    Article  Google Scholar 

  23. D.A. Reynolds, R.C. Rose: Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3(1), 72-83 (1995)

    Article  Google Scholar 

  24. X.D. Huang, A. Acero, H.-W. Hon: Spoken Language Processing (Prentice Hall, Englewood Cliffs 2001)

    Google Scholar 

  25. M. Rahim, B.-H. Juang: Signal bias removal by maximum likelihood estimation for robust speech recognition, IEEE Trans. Speech Audio Process. 4(1), 19-30 (1996)

    Article  Google Scholar 

  26. S. Bangalore, G. Riccardi: Stochastic finite-state models for spoken language machine translation, Mach. Transl. 17(3), 165-184 (2002)

    Article  Google Scholar 

  27. N. Gupta, G. Tur, D. Hakkani-Tür, S. Bangalore, G. Riccardi, M. Rahim: The AT&T spoken language understanding system, IEEE Trans. Audio Speech Lang. Process. 14(1), 213-222 (2006)

    Article  Google Scholar 

  28. G. Riccardi, D. Hakkani-Tür: Active and unsupervised learning for automatic speech recognition, Proc. 8th European Conf. Speech Commun. and Technol. (2003)

    Google Scholar 

  29. S. McGlashan: Voice Extensible Markup Language (VoiceXML) Version 2.0 (2004) (http://www.w3.org/TR/2004/PR-voicexml20-20040203)

  30. R. Nakatsu: Anser - An application of speech technology to the Japanese banking industry, Computer 23(8), 43-48 (1990)

    Article  Google Scholar 

  31. http://www.nuance.com

  32. http://www.tellme.com

  33. http://www.bevocal.com

  34. http://www.telureka.com

  35. http://www.convergys.com

  36. http://www.west.com

  37. J. Wilpon, L.R. Rabiner, C.H. Lee, E.R. Goldman: Automatic recognition of keywords in unconstrained speech using hidden Markov models, IEEE Trans. Acoust. Speech Signal Process. 38(11), 1870-1878 (1990)

    Article  Google Scholar 

  38. W.T. Hartwell, M.A. Johnson, J. Picone: Automatic speech recognition using echo cancellation, US Patent 4,914,692 (1990)

    Google Scholar 

  39. V. Franco: Automation of operator services at AT&T, Proc. Voice (1993)

    Google Scholar 

  40. S. Shanmugham, D. Burnett: Media Resource Control Protocol Version 2 (MRCPv2) (http://tools.ietf.org/wg/speechsc/draft-ietf-speechsc-mrcpv2/draft-ietf-speechsc-mrcpv2-09.txt)

  41. http://www.w3.org/TR/xhtml+voice

  42. L.R. Rabiner: Applications of voice processing to telecommunications, Proc. IEEE 82(2), 199-228 (1994)

    Article  Google Scholar 

  43. H. Sakoe, C. Chiba: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 43-49 (1978)

    Article  MATH  Google Scholar 

  44. J. Cooperstock: From the flashing 12:00 to a usable machine: Applying UbiComp to the VCR (http://acm.org/sigchi/chi97/proceedings/short-talk/jrc.htm)

  45. A.H. Gray Jr., J.D. Markel: Distance measures for speech processing, IEEE Trans. ASSP 24(5), 380-391 (1976)

    Article  Google Scholar 

  46. M. Przybocki, A. Martin: NISTʼs Assessment of Text Independent Speaker Recognition Performance (2005)

    Google Scholar 

  47. M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, P. Maloor: MATCH: An architecture for multimodal dialogue systems, Proc. 40th Annual Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  48. http://www.saltforum.org/saltforum/downloads/SALT1.0.pdf

  49. T. Paek, E. Horvitz: Conversation as action under uncertainty, Proc. Conf. Uncertainty in Artificial Intelligence (UAI) (2000)

    Google Scholar 

  50. J.D. Williams: Partially Observable Markov Decision processes for Spoken Dialog Management, Ph.D. Thesis (University of Cambridge, Cambridge 2006)

    Google Scholar 

  51. I. Witten, E. Frank: Data Mining (Morgan Kaufmann, Austin 1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jay Wilpon , Mazin E. Gilbert or Jordan Cohen Ph.D .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wilpon, J., Gilbert, M.E., Cohen, J. (2008). The Business of Speech Technologies. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics