Abstract
Speech-to-speech translation is a technology that connects people of different languages together and its multilingualization dramatically expands the circle of people connected. “Population” in Table 1.1a shows the potential number of people who can be part of the circle, when the corresponding language benefits from the technology. However, the same table also tells us that the languages of the world are incredibly diverse, and therefore multilingualization is not an easy task. Nevertheless, methods of processing speech sounds have been devised and developed uniformly regardless of language differences. What made this possible, is the wide commonality across languages due to the nature of language—it is a spontaneous tool created for the single purpose of mutual communication between humans who basically share the same biological hardware. This chapter will describe the multilingualization of automatic speech recognition (ASR) and text-to-speech synthesis (TTS); the two speech-related components of the three that constitute the speech-to-speech translation technology.
S. Harada and T. Kitade belonged to NICT at the time of writing.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Kanji is assumed to be the only logographic system that occupies the status of the standard grapheme in modern languages. It has been popular in East Asia; in addition to modern Chinese and Japanese, it was also the standard grapheme in Korean (still used occasionally) and Vietnamese in the past.
- 2.
Three distinct major groups are found in phonograms all of which are said to have direct or indirect roots in Mesopotamia. First, alphabetical systems, which spread out to the West, mainly developed in Europe and then to the world, assign separate characters (not diacritic marks) for both vowels and consonants, exemplified by Cyrillic and Latin scripts. Next, the Brahmic family, which first prevailed in India, and onto other parts of South Asia and Southeast Asia, basically has only consonant characters with designation of vowels and tones (if any) as diacritic marks, including Khmer, Thai and Myanmar scripts. The last group, which remained in the Middle Eastern region, is represented by the Arabic script and is basically composed of only consonant characters, and the designation of vowels is optional.
- 3.
The term “phone” is occasionally used instead of “phoneme” when any variant of linguistic phonemes, e.g., allophone, is implied.
- 4.
Reference
Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Kato, H., Harada, S., Kitade, T., Shiga, Y. (2020). Multilingualization of Speech Processing. In: Kidawara, Y., Sumita, E., Kawai, H. (eds) Speech-to-Speech Translation. SpringerBriefs in Computer Science. Springer, Singapore. https://doi.org/10.1007/978-981-15-0595-9_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-0595-9_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0594-2
Online ISBN: 978-981-15-0595-9
eBook Packages: Computer ScienceComputer Science (R0)