Examination of Balance Adjustment Method Between Voice and BGM in TV Viewing

Kono, Takanori; Hirakawa, Rin; Kawano, Hideki; Nakatoh, Yoshihisa

doi:10.1007/978-3-030-85540-6_120

Takanori Kono¹¹,
Rin Hirakawa¹¹,
Hideki Kawano¹¹ &
…
Yoshihisa Nakatoh¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 319))

Included in the following conference series:

International Conference on Human Interaction and Emerging Technologies

3185 Accesses

Abstract

As the elderly age, their hearing deteriorates and it becomes difficult to hear the sounds of daily life. Especially, “TV sound” is hard to hear for one in two elderly people. The cause is that the voice is drowned out by the BGM. In this research, we focus on the volume balance between voice and BGM in TV sound, and propose a volume balance adjustment method using sound source separation technology as a method to adjust these appropriately. In addition, the effectiveness of the proposed method will be evaluated through subjective evaluation. In order to improve your hearing of TV sounds, you need to emphasize them. Therefore, in this research, we propose a method to emphasize the sound by separating the TV sound into voice and BGM by the sound source separation technology, suppressing the gain of BGM, and then reintegrating it. In this study, Spleeter is used. Spleeter is a sound source separation software that uses supervised deep learning. It is mainly used to separate songs into parts, and the input music data can be divided into parts (Example: Vocal/Accompaniment). In the experiment, we used a mixture of voice and BGM as the sound of the TV. (We have prepared two types of voice, “Natural voice” and “Whispering voice”.) This simulated data is separated by Spleeter, and the gain of the BGM after separation is suppressed and mixed again. Eight male subjects in their twenties will be asked to hear the sound before and after processing to evaluate whether the ease of hearing the voice can be improved. As a result, it was found that increasing the ratio of voice improves the ease of hearing the voice. However, it was also found that the distortion generated in the process of sound source separation also affects the sound quality. Therefore, it can be said that it is necessary to improve the accuracy of sound source separation in order to further enhance the effect. In this study, we proposed a method to adjust the volume balance between voice and BGM to an appropriate level using sound source separation technology. In the future, we would like to consider ways to further improve hearing by improving the accuracy of sound source separation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mizumachi, M.: Interaction between hearing loss and speech change due to aging, “karei ni yoru choryokuteika to hatsuwahenka no sogosayo,” (in Japanese). J. Acoust. Soc. Jpn. 73(5), 297–302 (2017)
Google Scholar
Onuma, N., Mizuno, E.: Self-evaluation of hearing in the elderly and examination of candidates for hearing aid consultation, “Koreisha no kikoe no jikohyoka to hocho sodan taisho kohosha no kento,” (in Japanese). Tsukuba Coll. Technol. Techno Rep. 8, 145–152 (2001)
Google Scholar
Watanabe, K.: A study on the effect of slower speech rate produced by the speech rate converter. Nippon Jibiinkoka Gakkai Kaiho 99(3), 445–453 (1996)
Article Google Scholar
Komori, T., Imai, A., Seiyama, N., Takou, R., Takagi, T.: Background sound suppression techniques of broadcast programs for elderly people. NHK Giken R&D, no. 161, pp. 31–41 (2017)
Google Scholar
Murayama, Hamada, Komiyama, Kawabata: A method for improving the intelligibility of narration speech using adaptive signal processing, “Tekioshingoshori wo mochiita nareshononsei no kikitoriyasusakaizenho,” (in Japanese). In: 13th AES Regional Convention, Tokyo (2007)
Google Scholar
Hirohata, M., Ono, T., Nishiyama, M.: Audio source separation technology to control volume balance between voices and background sounds. Toshiba Rev. 68(9), 26–29 (2013)
Google Scholar
Lee, D.D., et al.: Algorithms for Non-negative Matrix Factorization. Proc. NIPS. 13, 556–562 (2000)
Google Scholar
Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Sig. Process. 27(2), 113–120 (1979)
Article Google Scholar
Toji, Y., Kodaira, Y., Sakata, S., Ueda, Y.: Development of speech enhancement system by formant emphasis “Horumantokyocho ni yoru onseihochoshorishisutemu no kaihatsu,” (in Japanese). In: Proceedings of the 2012 Kyushu Section Joint Convention of Institutes of Electrical and Related Engineers (65th Joint Convention), pp. 73–74 (2012)
Google Scholar
Omachi, M., Ogawa, T., Kobayashi, T.: Blind source separation using associative memory and linear separation filter “Rensokioku to senkeibunrifuiruta wo mochiita buraindongembunri,” (in Japanese). In: Information Processing Society of Japan (SLP), vol. 2015-SLP-105, no. 4, pp. 1–6 (2015)
Google Scholar
https://github.com/deezer/spleeter
https://www.audacityteam.org/
https://research.google.com/colaboratory/faq.html
https://dova-s.jp/
Sagisaka, Y., Uratani, N.: ATR speech and language database, “ATR onsei gengo detabesu,” (in Japanese). J. Acoust. Soc. Jpn. 48(12), 878–882 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, 1-1, Sensui-cho, Tobata-ku, Kitakyusyu-shi, Fukuoka, 804-0015, Japan
Takanori Kono, Rin Hirakawa, Hideki Kawano & Yoshihisa Nakatoh

Authors

Takanori Kono
View author publications
You can also search for this author in PubMed Google Scholar
Rin Hirakawa
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kawano
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihisa Nakatoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Takanori Kono or Yoshihisa Nakatoh .

Editor information

Editors and Affiliations

Institute for Advanced Systems Engineering, University of Central Florida, Orlando, FL, USA
Tareq Ahram
Campus du Moulin de la Housse, Université de Reims Champagne Ardenne GRESPI, Reims Cedex, France
Redha Taiar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kono, T., Hirakawa, R., Kawano, H., Nakatoh, Y. (2022). Examination of Balance Adjustment Method Between Voice and BGM in TV Viewing. In: Ahram, T., Taiar, R. (eds) Human Interaction, Emerging Technologies and Future Systems V. IHIET 2021. Lecture Notes in Networks and Systems, vol 319. Springer, Cham. https://doi.org/10.1007/978-3-030-85540-6_120

Download citation

DOI: https://doi.org/10.1007/978-3-030-85540-6_120
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85539-0
Online ISBN: 978-3-030-85540-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics