Multilingualization of Speech Processing

Kato, Hiroaki; Harada, Shoji; Kitade, Tasuku; Shiga, Yoshinori

doi:10.1007/978-981-15-0595-9_1

Multilingualization of Speech Processing

Hiroaki Kato¹⁷,
Shoji Harada¹⁸,
Tasuku Kitade¹⁹ &
…
Yoshinori Shiga¹⁷

Chapter
First Online: 23 November 2019

761 Accesses

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

Speech-to-speech translation is a technology that connects people of different languages together and its multilingualization dramatically expands the circle of people connected. “Population” in Table 1.1a shows the potential number of people who can be part of the circle, when the corresponding language benefits from the technology. However, the same table also tells us that the languages of the world are incredibly diverse, and therefore multilingualization is not an easy task. Nevertheless, methods of processing speech sounds have been devised and developed uniformly regardless of language differences. What made this possible, is the wide commonality across languages due to the nature of language—it is a spontaneous tool created for the single purpose of mutual communication between humans who basically share the same biological hardware. This chapter will describe the multilingualization of automatic speech recognition (ASR) and text-to-speech synthesis (TTS); the two speech-related components of the three that constitute the speech-to-speech translation technology.

S. Harada and T. Kitade belonged to NICT at the time of writing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Kanji is assumed to be the only logographic system that occupies the status of the standard grapheme in modern languages. It has been popular in East Asia; in addition to modern Chinese and Japanese, it was also the standard grapheme in Korean (still used occasionally) and Vietnamese in the past.
2.
Three distinct major groups are found in phonograms all of which are said to have direct or indirect roots in Mesopotamia. First, alphabetical systems, which spread out to the West, mainly developed in Europe and then to the world, assign separate characters (not diacritic marks) for both vowels and consonants, exemplified by Cyrillic and Latin scripts. Next, the Brahmic family, which first prevailed in India, and onto other parts of South Asia and Southeast Asia, basically has only consonant characters with designation of vowels and tones (if any) as diacritic marks, including Khmer, Thai and Myanmar scripts. The last group, which remained in the Middle Eastern region, is represented by the Arabic script and is basically composed of only consonant characters, and the designation of vowels is optional.
3.
The term “phone” is occasionally used instead of “phoneme” when any variant of linguistic phonemes, e.g., allophone, is implied.
4.
http://www.speech.sri.com/projects/srilm/.

Reference

Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Speech Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology, Kyoto, Japan
Hiroaki Kato & Yoshinori Shiga
Fujitsu Laboratories Ltd., Kanagawa, Japan
Shoji Harada
Biometrics Research Laboratories, NEC Corporation, Kanagawa, Japan
Tasuku Kitade

Authors

Hiroaki Kato
View author publications
You can also search for this author in PubMed Google Scholar
Shoji Harada
View author publications
You can also search for this author in PubMed Google Scholar
Tasuku Kitade
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinori Shiga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroaki Kato .

Editor information

Editors and Affiliations

Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology, Kyoto, Japan
Yutaka Kidawara
Advanced Translation Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology, Kyoto, Japan
Eiichiro Sumita
Advanced Speech Technology Laboratory, Advanced Speech Translation Research and Development Promotion Center, National Institute of Information and Communications Technology, Kyoto, Japan
Hisashi Kawai

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kato, H., Harada, S., Kitade, T., Shiga, Y. (2020). Multilingualization of Speech Processing. In: Kidawara, Y., Sumita, E., Kawai, H. (eds) Speech-to-Speech Translation. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-15-0595-9_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-0595-9_1
Published: 23 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0594-2
Online ISBN: 978-981-15-0595-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics