Speech Recognition in Mobile Phones

Varga, Imre; Kiss, Imre

doi:10.1007/978-1-84800-143-5_14

Imre Varga³ &
Imre Kiss⁴

Part of the book series: Advances in Pattern Recognition ((ACVPR))

1257 Accesses
4 Citations

Speech input implemented in voice user interface (voice UI) plays an important role in enhancing the usability of small portable devices, such as mobile phones. In these devices more traditional ways of interaction (e.g. keyboard and display) are limited by small size, battery life and cost. Speech is considered as a natural way of interaction for man-machine interfaces. After decades of research and development, voice UIs are becoming widely deployed and accepted in commercial applications. It is expected that the global proliferation of embedded devices will further strengthen this trend in the coming years. A core technology enabler of voice UIs is automatic speech recognition (ASR). Example applications in mobile phones relying on embedded ASR are name dialling, phone book search, command-and-control and more recently large vocabulary dictation. In the mobile context several technological challenges have to be overcome concerning ambient noise in the environment, constraints of available hardware platforms and cost limitations, and necessity for wide language coverage. In addition, mobile ASR systems need to achieve a virtually perfect performance level for user acceptance. This chapter reviews the application of embedded ASR in mobile phones, and describes specific issues related to language development, noise robustness and embedded implementation and platforms. Several practical solutions are presented throughout the chapter with supporting experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andrassy, B., Vlaj, D., and Beaugeant, Ch. (2001). Recognition performance of the siemens front-end with and without frame dropping on the Aurora 2 database. In Proc. Eur. Conf. Speech Comm. Technol. (Eurospeech), vol. 1, pp. 193-196.
Google Scholar
Balan, R., Rosca, J., Beaugeant, Ch., Gilg, V., and Fingscheidt, T. (2004). Generalized stochastic principle for microphone array speech enhancement and applications to car environments. In Proc. Eur. Signal Proc. Conf. (Eusipco), September 6-10, 2004.
Google Scholar
Bocchieri, E. (1993). Vector quantization for efficient computation of continuous density likelihoods. In Proc. of ICASSP, Minneapolis, MN, vol. 2, pp. II-692-II-695.
Google Scholar
Bocchieri, E., and Mak, B. (1997). Subspace distribution clustering for continuous observation density hidden Markov models. In Proc. 5th Eur. Conf. Speech Comm. Technol., vol. 1, pp. 107-110.
Google Scholar
Bulyko, I., Ostendorf, M., and Stolcke, A. (2003). Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proc. 2003 Conf. North Amer. Chapter Assoc. Comput. Linguistics Human Language Technol.: Companion Volume Proc. HLT-NAACL 2003—short papers—vol. 2.
Google Scholar
Caseiro, D., Trancoso, L., Oliveira, L., and Viana, C. (2002). Grapheme-to-phone using finite-state transducers. In Proc. 2002 IEEE Workshop Speech Synthesis.
Google Scholar
Fenn, J. (2005). Speech recognition on the desktop: still niche after all these years. Gartner Research Report, G00132456.
Google Scholar
Fischer, V., Gonzalez, J., Janke, E., Villani, M., and Waast-Richard, C. (2000). Towards multilingual acoustic modeling for large vocabulary continuous speech recognition. In Proc. IEEE Workshop Multilingual Speech Comm., pp. 31-35.
Google Scholar
Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Signal Process., vol. 34, pp. 52-59.
Article Google Scholar
Gales, M.J.F., Knill, K.M., and Young, S.J. (1999). State-based Gaussian selection in large vocabulary continuous speech recognition using HMM’s. IEEE Trans. Speech Audio Process., vol. 7, no. 2.
Google Scholar
Häkkinen, J., Suontausta, J., Jensen, K., and Riis, S. (2000). Methods for text-to-phoneme mapping in speaker independent isolated word recognition. Technical Report, Nokia Research Center.
Google Scholar
Hermansky, H., and Morgan, N. (1994). RASTA processing of speech. IEEE Trans. Speech Audio Proc., vol. 2, no. 4, pp. 578-589.
Article Google Scholar
Höge, H. (2000). Speech database technology for commercially used recognizers-status and future issues. In Proc. Workshop XLDB LREC2000, Athens.
Google Scholar
Houtgast, T. (1989). Frequency selectivity in amplitude-modulation detection. J. Acoust. Soc. Amer., vol. 85, pp. 1676-1680.
Article Google Scholar
Karpov, E., Kiss, I., Leppänen, J., Olsen, J., Oria, D., Sivadas, S., and Tian, J. (2006). Short message dictation on symbian series 60 mobile phones. In Proc. Workshop Speech Mobile Pervasive Environments (SiMPE) Conjunction MobileHCI 2006.
Google Scholar
Kiss, I., and Vasilache, M. (2002). Low complexity techniques for embedded ASR systems. In Proc. ICSLP, Denver, Colorado, pp. 1593-1596.
Google Scholar
Laurila, K. (1997). Noise robust speech recognition with state duration constraints. In Proc. ICASSP.
Google Scholar
Leppänen, J., and Kiss, I. (2005). Comparison of low footprint acoustic modeling techniques for embedded ASR systems. In Proc. Interspeech.
Google Scholar
Leppänen, J., and Kiss, I. (2006). Gaussian selection with non-overlapping clusters for ASR in embedded devices. In Proc. ICASSP.
Google Scholar
McCulloch, N., Bedworth, M., and Bridle, J. (1987). NETspeak, a re-implementation of NETtalk. Computer Speech and Language, no. 2, pp. 289-301.
Google Scholar
Ning, B., Garudadri, H., Chienchung, C., DeJaco, A., Yingyong, Q., Malayath, N., and Huang, W. (2002). A robust speech recognition system embedded in CDMA cellular phone chipsets. In Proc. ICASSP.
Google Scholar
Olsen, J., and Oria, D. (2006). Profile-based compression of N-gram language models. In Proc. ICASSP.
Google Scholar
Oria D., and Olsen, J. (2006). Statistical language modeling with semantic classes for large vocabulary speech recognition in embedded devices. CI 2006 Special Session on NLP for Real Life Applications.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo, CA.
Google Scholar
Ramírez, J., Segura, J.C., Benítez, C., de la Torre, Á., and Rubio, A. (2004). Efficient voice activity detection algorithms using long-term speech information. Speech Comm., vol. 42, no. 3-4, pp. 271-287.
Article Google Scholar
Sarikaya, R., Gravano, A., and Yuqing, G. (2005). Rapid language model development using external resources for new spoken dialog domains. In Proc. ICASSP.
Google Scholar
Schultz, T., and Waibel, A. (2001). Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Comm., vol. 35, no. 1-2, pp. 31-51.
Article MATH Google Scholar
Schultz, T., and Waibel, A. (2000). Language portability in acoustic modeling. In Proc. IEEE Workshop Multilingual Speech Comm., pp. 59-64.
Google Scholar
Sejnowski, J.T., and Rosenberg, C.R. (1987). Parallel networks that learn to pronounce English text, Complex Systems, vol. 1, no. 1, pp. 145-168.
MATH Google Scholar
Sethy, A., Georgiou, P., and Narayanan, S. (2006). Text data acquisition for domain-specific language models. In Proc. EMNLP.
Google Scholar
Sethy, A., Ramabhadran, B., and Narayanan, S. (2007). Data driven approach for language model adaptation using stepwise relative entropy minimization. In Proc. ICASSP. Sivadas, S. (2006). Additive noise reduction for speech recognition by filtering in modulation spectral domain. Technical Report, Nokia Research Center.
Google Scholar
SpeechDat (2000). http://www.speechdat.org
Stolcke, A. (1998). Entropy-based pruning of backoff language models. In Proc. DARPA Broadcast News Trans. Understanding Workshop, pp. 270-274.
Google Scholar
Varga, I., Aalburg, S., Andrassy, B., Astrov, S., Bauer, J.G., Beaugeant, Ch., Geissler, Ch., and Höge, H. (2002). ASR in mobile phones—an industrial approach. IEEE Trans. Speech Audio Process., vol. 10, no. 8, pp. 562-569.
Article Google Scholar
Vasilache, M. (2000). Speech recognition using HMMs with quantized parameters. In Proc. ICSLP, vol. 1, pp. 441-443.
Google Scholar
Vasilache, M., and Viikki, O. (2001). Speaker adaptation of quantized parameter HMMs. In Proc. Eurospeech, vol. 2, pp. 1265-1268.
Google Scholar
Viikki, O., Bye, D., and Laurila, K. (1998). A recursive feature vector normalization approach for robust speech recognition in noise. In Proc. ICASSP.
Google Scholar
Virag, N. (1999). Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process., vol. 7, no. 2, pp.126-137.
Article Google Scholar
Westphal, M. (1997). The use of cepstral means in conversational speech recognition. In Proc. Eur. Conf. Speech Comm. Technol. (Eurospeech).
Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (2002). The HTK Book (for HTK Version 3.1).
Google Scholar
Young, S.J., Russel, N.H., and Thornton, J.H.S. (1989). Token passing: a conceptual model for connected speech recognition systems. Technical Report CUED/F-INFENG/TR.38, Cambridge University.
Google Scholar

Download references

Author information

Authors and Affiliations

Corporate Technology, Siemens AG, Germany
Imre Varga
Nokia, Finland
Imre Kiss

Authors

Imre Varga
View author publications
You can also search for this author in PubMed Google Scholar
Imre Kiss
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Varga, I., Kiss, I. (2008). Speech Recognition in Mobile Phones. In: Automatic Speech Recognition on Mobile Devices and over Communication Networks. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84800-143-5_14

Download citation

DOI: https://doi.org/10.1007/978-1-84800-143-5_14
Publisher Name: Springer, London
Print ISBN: 978-1-84800-142-8
Online ISBN: 978-1-84800-143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics