Speech Generation in Mobile Phones

  • Géza Németh
  • Géza Kiss
  • Csaba Zainkó
  • Gábor Olaszy
  • Bálint Tóth

Abstract

Mobile phones became indispensable friends for many people. They are being used in all spaces of life including the car. The security risk of this situation has motivated severe regulation of use on one hand and on the other hand, increased attention to built-in speech recognition. Far less attention has been paid however to possible advantages of automatic speech generation by phones including text-to-speech (TTS). This chapter addresses this domain. It will examine the general concepts and application areas of speaking mobile phones. In addition to the well known advantages for visually impaired, blind or speech impaired people such functionalities may help in the case of other hands-busy or eyes-busy situations (e.g., cooking in the kitchen). The advancement of this area is due to the appearance of mobile phone operating systems (Symbian, Palm OS, MS Smartphone and Linux Mobile) which can run applications created by developers independent from the phone manufacturers. A case study of a speaking aid mobile phone application and the first automatic SMS-reading mobile phone application introduced in Hungary in October 2003 will also be presented. It is shown that the proper combination of careful user interface design and high quality TTS should be supplemented by automatic language identification and other modules as well. Analysis of these supplementary modules is also presented.

Keywords

speech I/O speech generation text-to-speech language identification diacritic regeneration SMS-reading speaking aid 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cavnar, W., & Trenkle. J. (1994). N-Gram based text categorization. Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (pp. 161-175). Las Vegas, NV.Google Scholar
  2. Chen, S. F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. Proceedings of the 34th Annual Meeting of the ACL (pp. 310-318). Santa Cruz, CA.Google Scholar
  3. Dunning, T. (1994). Statistical identification of language (Technical Report MCCS94–273). Las Cruces, NM: New Mexico State University.Google Scholar
  4. Hunnicutt, S. (1995). The development of text-to-speech technology for use in communication aids. In A. Syrdal, R. Bennett, & S. Greenspan (Eds.), Applied Speech Technology (pp. 547-563). CRC Press.Google Scholar
  5. Németh G., & Zainkó, Cs. (2002). Multilingual statistical text analysis, Zipf’s law and Hungarian speech generation. Acta Linguistica Hungarica, 49(3-4), 385-405.CrossRefGoogle Scholar
  6. Németh, G., Zainkó, Cs., Fekete, L., Olaszy, G., Endrédi, G., Olaszi, P., Kiss, G., & Kis, P. (2000). The design, implementation and operation of a Hungarian E-mail reader. International Journal of Speech Technology, 3(3/4), 217-236.CrossRefMATHGoogle Scholar
  7. Németh, G., Olaszy, G., Zainkó Cs., & Gordos, G. (2000). Eljárás magyar nyelven ékezetes betük használata nélkül készített szövegek ékezetes betvisszaállítására (Process for the re-generation of the diacritical characters of Hungarian text generated without diacritics) (P 0003443) (in Hungarian).Google Scholar
  8. Olaszy, G., Németh G., Olaszi, P., Kiss, G., & Gordos, G. (2000). PROFIVOX - A Hungarian professional TTS system for telecommunications applications. International Journal of Speech Technology, 3(3/4), 201-216.CrossRefMATHGoogle Scholar
  9. Olaszy, G., & Németh, G. (1999). IVR for banking and residential telephone subscribers using stored messages combined with a new number-to-speech synthesis method. In D. Gardner-Bonneau (Ed.), Human factors and voice interactive systems (pp. 237-255). Norwell, MA: Kluwer Academic Publishers.Google Scholar
  10. Papp, F. (1966). A magyar fõnévragozás három modellje (Three models of declension of Hungarian nouns). Magyar Nyelv, 62, 194-206. Budapest, Hungary.Google Scholar
  11. Peng, F,, Schuurmans, D., & Wang, S. (2003). Language and task independent text categorization with simple language models. Proceedings of HLT-NAACL’03, Edmonton, Canada.Google Scholar
  12. Prosise, J. (2002). Programming Microsoft .NET. Microsoft Press.Google Scholar
  13. Prószéky, G., Váradi, T., & Olaszy, G. (2003). Nyelvtechnológia” (Language technology). In F. Kiefer, and P. Siptár, (Eds.), A magyar nyelv kézikönyve (Handbook of the Hungarian language)(pp. 567-587). Akadémiai Kiadó.Google Scholar
  14. Tóth, B., Németh G., & Kiss, G. (2004). Mobile devices converted into a speaking communication aid. Proceedings of Computers Helping People with Special Needs, 9 thICCHP (pp. 1016-1023). Paris, France: Springer.Google Scholar
  15. Wells, J.C. (1997-2005). SAMPA computer readable phonetic alphabet. http://www.phon.ucl.ac.uk/home/sampa/home.htm .Google Scholar
  16. Wigley, A., & Wheelwright, S. (2003). Microsoft .NET compact framework. Microsoft Press.Google Scholar
  17. Yarowsky, D. (1999). Corpus-based techniques for restoring accents in Spanish and French text. In Natural language processing using very large corpora(pp. 99-120). Norwell, MA: Kluwer Academic Publishers.Google Scholar
  18. Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics(pp. 88-95). Las Cruces, NM.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2008

Authors and Affiliations

  • Géza Németh
    • 1
  • Géza Kiss
    • 1
  • Csaba Zainkó
    • 1
  • Gábor Olaszy
    • 1
  • Bálint Tóth
    • 1
  1. 1.Department of Telecommunications and Media InformaticsBudapest University of Technology and Economics (BME TMIT)BudapestHungary

Personalised recommendations