Abstract
With the fast pace of developments of communications networks and devices, immediate and easy access to information and services is now the expected norm. Several critical technologies have entered the marketplace as key enablers to help make this a reality. In particular, speech technologies, such as speech recognition and natural language understanding, have changed the landscape of how services are provided by businesses to consumers forever. In 30 short years, speech has progressed from an idea in research laboratories across the world, to a multibillion-dollar industry of software, hardware, service hosting, and professional services. Speech is now almost ubiquitous in cell phones. Yet, the industry is still very much in its infancy with its focus being on simple low hanging fruit applications of the technologies where the current state of technology actually fits a specific market need, such as voice enabling of call center services or voice dialing over a cell phone.
With broadband access to networks (and therefore data), anywhere, anytime, and using any device, almost a reality, speech technologies will continue to be essential for unlocking the potential that such access provides. However, to unlock this potential, advances in basic speech technologies beyond the current state of the art are essential. In this chapter, we review the business of speech technologies and its development since the 1980s. How did it start? What were the key inventions that got us where we are, and the services innovations that supported the industry over the past few decades? What are the future trends on how speech technologies will be used? And what are the key technical challenges researchers must address and resolve for the industry to move forward to meet this vision of the future? This chapter is by no means meant to be exhaustive, but it gives the reader an understanding of speech technologies, the speech business, and areas where continued technical invention and innovation will be needed before the ubiquitous use of speech technologies can be seen in the marketplace.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- ARPA:
-
Advanced Research Projects Agency
- ART:
-
advanced recognition technology
- ASR:
-
automatic speech recognition
- ATIS:
-
airline travel information system
- BBN:
-
Bolt, Beranek and Newman
- CE:
-
categorical estimation
- CMU:
-
Carnegie Mellon University
- DARPA:
-
Defense Advanced Research Projects Agency
- DM:
-
dialog management
- DP:
-
dynamic programming
- DSP:
-
digital signal processing
- DT:
-
discriminative training
- DTW:
-
dynamic time warping
- FFT:
-
fast Fourier transform
- GMM:
-
Gaussian mixture model
- HMIHY:
-
How May I Help You
- HMM:
-
hidden Markov models
- IP:
-
internet protocol
- IVR:
-
interactive voice response
- LDA:
-
linear discriminant analysis
- LPC:
-
linear predictive coding
- MCE:
-
minimum classification error
- MLLR:
-
maximum-likelihood linear regression
- MMI:
-
maximum mutual information
- NLU:
-
natural language understanding
- PDA:
-
pitch determination algorithms
- SDC:
-
shifted delta cepstral
- SLM:
-
statistical language model
- SMS:
-
speaker model synthesis
- SVM:
-
support vector machines
- TI:
-
transinformation index
- UE:
-
user experience
- VRCP:
-
voice recognition call processing
- VTLN:
-
vocal-tract-length normalization
- VoIP:
-
voice over IP
- XML:
-
extensible mark-up languages
References
J.R. Pierce: Whither speech recognition?, J. Acoust. Soc. Am. 46(4), 1029-1051 (1969)
K.H. Davis, R. Biddulph, S. Balashek: Automatic recognition of spoken digits. In: Communication Theory, ed. by W. Jackson (Butterworths, London 1953)
A. Lolje, M. Riley, D. Hindle, F. Pereira: The AT&T 60000 word speech-to-text system, Proc. Spoken Language Technology Workshop (Morgan Kaufmann, Austin 1995) pp. 162-165
L. Rabiner, B.-H. Juang: Fundamentals of Speech Recognition (Prentice Hall, Englewood Cliffs 1993)
F.C. Pereira, M. Riley: Speech recognition by composition of weighted finite automata. In: Finite-State Devices for Natural Language Processing, ed. by E. Roche, Y. Schabes (MIT Press, Cambridge 1997)
V. Goffin, C. Allauzen, E. Bocchieri, D. Hakkani-Tur, A. Ljolje, S. Parthasarathy, M. Rahim, G. Riccardi, M. Saraclar: The AT&T watson speech recognizer, Proc. IEEE ICASSP (2005)
J. Huang, B. Kingsbury, L. Mangu, M. Padmanabhan, G. Saon, G. Zweig: Recent improvements in speech recognition performance on large conversational speech, Proc. ICSLP (2000)
B. Atal: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, J. Acoust. Soc. Am. 55(6), 1304-1312 (1974)
K. Vintsyuk: Speech discrimination by dynamic programming, Kibernetika 4, 81-88 (1968)
F. Jelinek: Continuous speech recognition by statistical methods, Proc. IEEE 64(4), 532-556 (1976)
L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)
F. Jelinek: Statistical Methods for Speech Recognition (MIT Press, Cambridge 1997)
L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer: Maximum mutual information estimation of HMM parameters for speech recognition, Proc. IEEE ICASSP (1986)
M.H. Cohen, J.P. Giangola, J. Balogh: Voice User Interface Design (Addison Wesley, Boston 2004)
A. Smola, P. Bartlett, B. Scholkopf, D. Schuurmans: Advances in Large Margin Classifiers (MIT Press, Cambridge 2000)
R. Schapire, M. Rochery, M. Rahim, N. Gupta: Incorporating prior knowledge into boosting, Proc. Nineteenth Int. Conf. Machine Learning (2002)
J. Baker: The Dragon system - an overview, IEEE Trans. ASSP 23(1), 24-29 (1975)
A. Gorin, G. Riccardi, J. Wright: How May I Help You?, Speech Commun. 23, 113-127 (1997)
R. Natarajan, B. Prasad, B. Suhm, D. McCarthy: Speech enabled natural language call routing: BBN call director, Proc. Int. Conf. Spoken Language Process. (2002)
L. Lee, R. Rose: A Frequency Warping Approach to Speaker Normalization, IEEE Trans. Speech Audio Process. 6, 49-60 (1998)
D.A. Reynolds, R.C. Rose: Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process. 3(1), 72-83 (1995)
X.D. Huang, A. Acero, H.-W. Hon: Spoken Language Processing (Prentice Hall, Englewood Cliffs 2001)
M. Rahim, B.-H. Juang: Signal bias removal by maximum likelihood estimation for robust speech recognition, IEEE Trans. Speech Audio Process. 4(1), 19-30 (1996)
S. Bangalore, G. Riccardi: Stochastic finite-state models for spoken language machine translation, Mach. Transl. 17(3), 165-184 (2002)
N. Gupta, G. Tur, D. Hakkani-Tür, S. Bangalore, G. Riccardi, M. Rahim: The AT&T spoken language understanding system, IEEE Trans. Audio Speech Lang. Process. 14(1), 213-222 (2006)
G. Riccardi, D. Hakkani-Tür: Active and unsupervised learning for automatic speech recognition, Proc. 8th European Conf. Speech Commun. and Technol. (2003)
S. McGlashan: Voice Extensible Markup Language (VoiceXML) Version 2.0 (2004) (http://www.w3.org/TR/2004/PR-voicexml20-20040203)
R. Nakatsu: Anser - An application of speech technology to the Japanese banking industry, Computer 23(8), 43-48 (1990)
J. Wilpon, L.R. Rabiner, C.H. Lee, E.R. Goldman: Automatic recognition of keywords in unconstrained speech using hidden Markov models, IEEE Trans. Acoust. Speech Signal Process. 38(11), 1870-1878 (1990)
W.T. Hartwell, M.A. Johnson, J. Picone: Automatic speech recognition using echo cancellation, US Patent 4,914,692 (1990)
V. Franco: Automation of operator services at AT&T, Proc. Voice (1993)
S. Shanmugham, D. Burnett: Media Resource Control Protocol Version 2 (MRCPv2) (http://tools.ietf.org/wg/speechsc/draft-ietf-speechsc-mrcpv2/draft-ietf-speechsc-mrcpv2-09.txt)
L.R. Rabiner: Applications of voice processing to telecommunications, Proc. IEEE 82(2), 199-228 (1994)
H. Sakoe, C. Chiba: Dynamic programming algorithm optimization for spoken word recognition, IEEE Trans. Acoust. Speech Signal Process. ASSP-26, 43-49 (1978)
J. Cooperstock: From the flashing 12:00 to a usable machine: Applying UbiComp to the VCR (http://acm.org/sigchi/chi97/proceedings/short-talk/jrc.htm)
A.H. Gray Jr., J.D. Markel: Distance measures for speech processing, IEEE Trans. ASSP 24(5), 380-391 (1976)
M. Przybocki, A. Martin: NISTʼs Assessment of Text Independent Speaker Recognition Performance (2005)
M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, P. Maloor: MATCH: An architecture for multimodal dialogue systems, Proc. 40th Annual Meeting of the Association for Computational Linguistics (2002)
T. Paek, E. Horvitz: Conversation as action under uncertainty, Proc. Conf. Uncertainty in Artificial Intelligence (UAI) (2000)
J.D. Williams: Partially Observable Markov Decision processes for Spoken Dialog Management, Ph.D. Thesis (University of Cambridge, Cambridge 2006)
I. Witten, E. Frank: Data Mining (Morgan Kaufmann, Austin 1999)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wilpon, J., Gilbert, M.E., Cohen, J. (2008). The Business of Speech Technologies. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-49127-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)