Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network

Yamamoto, Kaho; Ogihara, Akio; Murata, Harumi

doi:10.1007/s00034-023-02433-z

Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network

Short Paper
Published: 28 June 2023

Volume 42, pages 7004–7021, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

124 Accesses
Explore all metrics

Abstract

There has been much research on estimating noise and speech source direction, but there have not been many studies on estimating the source direction of instrumental sound sources. In this study, we considered the source direction estimation of a single instrumental sound. Direction estimation of sound sources by the multiple signal classification (MUSIC) method often causes large estimation errors. Then, we propose a technique for estimating the direction of musical instrument sound sources by applying regression analysis using a convolutional neural network (CNN), a type of neural network. We calculated the MUSIC spectrum obtained using MUSIC that uses the fundamental and harmonic components, which have relatively large amplitudes, and we estimated the direction of the sound source using the CNN with these components as input. We achieved this by focusing on the overtone structure of the instrumental sound source. This study demonstrated the effectiveness of this method using simulations in a monaural environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phased microphone array for sound source localization with deep learning

Article 14 May 2019

Fusion Spectrogram for Sound Classification Using 2D Convolutional Neural Network

Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks

Data Availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

S. Chakrabarty, E.A.P. Habets, Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J. Select. Topics Signal Process. 13(1), 8–21 (2019). https://doi.org/10.1109/JSTSP.2019.2901664
Article Google Scholar
Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, Language modeling with gated convolutional networks, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 933–941. https://doi.org/10.48550/arXiv.1612.08083
Y. Denda, T. Nishiura, Y. Yamashita, Robust talker direction estimation based on weighted CSP analysis and maximum likelihood estimation. IEICE Trans. Inf. Syst. E89-D(3), 1050–1057 (2006). https://doi.org/10.1093/ietisy/e89-d.3.1050
Article Google Scholar
A.M. Elbir, DeepMUSIC: multiple signal classification via deep learning. IEEE Sens. Lett. (2020). https://doi.org/10.1109/LSENS.2020.2980384
Article Google Scholar
E.L. Ferguson, S.B. Williams, C.T. Jin, Sound source localization in a multipath environment using convolutional neural networks, in Proceedings of 2018 IEEE International Conference on Acoustic, Speech and Signal Process (2018), pp. 2386–2390. https://doi.org/10.1109/ICASSP.2018.8462024
P.-A. Grumiaux, S. Kitić, L. Girin, A. Guérin, A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022). https://doi.org/10.1121/10.0011809
Article Google Scholar
M. Ikeuchi, H. Tanji, T. Murakami, Improvement of the direction-of-arrival estimation method using a single channel microphone by correcting a spectral slope of speech, in Proceedings of APSIPA ASC 2022 (2022), pp. 186–393. https://doi.org/10.23919/APSIPAASC55919.2022.9980291
H. Kameoka, Deep learning approach to audio source separation. J. Acoust. Soc. Jpn. 75(9), 525–531 (2019). https://doi.org/10.20697/jasj.75.9_525
Article MathSciNet Google Scholar
K. Kikuma, Adaptive signal processing with array antenna. (Science and Technology Publishing Company, 1999)
M. Kitahashi, H. Handa, Estimating classroom situations by using CNN with environmental sound spectrograms. J. Adv. Comput. Intell. Intell. Inf. 22(2), 242–248 (2018). https://doi.org/10.20965/jaciii.2018.p0242
Article Google Scholar
W. Ma, X. Liu, Phased microphone array for sound source localization with deep learning. Aerosp. Syst. 2(2), 71–81 (2019). https://doi.org/10.1007/s42401-019-00026-w
Article Google Scholar
R. Masumura, Language modeling and spoken language understanding based on deep learning. J. Acoust. Soc. Jpn. 73(1), 39–46 (2017). https://doi.org/10.20697/jasj.73.1_39
Article Google Scholar
K. Mori, T. Yokoyama, A. Hasegawa, Comparison of high-resolution techniques for array signal processing method in silent target detection using ambient noise. J. Marine Acoust. Soc. Jpn. 32(2), 89–97 (2005). https://doi.org/10.3135/jmasj.32.89
Article Google Scholar
R. Nishimura, Y. Suzuki, Source and direction of arrival estimation based on maximum likelihood combined with GMM and eigenanalysis, in Proceedings of 2018 IEEE International Conference on Acoustic, Speech and Signal Processing (2018), pp. 3434–3438. https://doi.org/10.1109/ICASSP.2018.8461658
R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. ASSP-37(7), 984–995 (1989). https://doi.org/10.1109/29.32276
Article MATH Google Scholar
R.O. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. AP-34(3), 276–280 (1986). https://doi.org/10.1109/TAP.1986.1143830
Article MathSciNet Google Scholar
T. Suzuki, Y. Kaneda, Sound source direction estimation based on subband peak-hold processing. J. Acoust. Soc. Jpn. 65(10), 513–522 (2009). https://doi.org/10.20697/jasj.65.10_513
Article Google Scholar
M. Uneda, K. Ishikawa, Study on high resolvable location finding of near sound source using MUSIC algorithm. J. Jpn. Soc. Precis. Eng. 70(8), 1111–1116 (2004). https://doi.org/10.2493/jspe.70.1111
Article Google Scholar
M. Uneda, H. Kondo, K. Ishikawa, O. Ohnishi, S. Kurokawa, T. Doi, Location finding function of high correlation sound sources, using combined methods of spatial smoothing processing and MUSIC-development of handy microphone array system for high efficiency location finding-. J. Jpn. Soc. Precis. Eng. 77(12), 1158–1164 (2011). https://doi.org/10.2493/jjspe.77.1158
Article Google Scholar
M. Unoki, M. Akagi, Signal extraction from noisy signal based on auditory scene analysis, in Proceedings of 5th International Conference on Spoken Language Process (1998). https://doi.org/10.21437/ICSLP.1998-342
K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of virtual sound source by MUSIC method using fundamental frequency components in stereo sound, in The 2019 (70th) Chugoku-branch Joint Convention of the Institutes of Electrical and Information Engineers, R19-08-01-05 (2019)
K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of sound source by MUSIC method and CNN considering overtone structure, in Proceedings of 2022 international technical conference on circuits/systems, computers and communications (2022), pp. 671–674. https://doi.org/10.1109/ITC-CSCC55581.2022.9895088
K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of sound source using MUSIC method and FFNN focusing on the overtone structure of instrumental sounds. IEICE Trans. Inf. Syst. J104-D(10), 780–783 (2021). https://doi.org/10.14923/transinfj.2020JDL8018
Article Google Scholar
K. Yamamoto, F. Asano, I. Hara, J. Ogata, H. Asoh, T. Yamada, N. Kitawaki, Real-time speech interface based on the fusion of audio and video information for humanoid robot HRP-2. J. Acoust. Soc. Jpn 62(3), 161–172 (2006). https://doi.org/10.20697/jasj.62.3_161
Article Google Scholar
Y.X. Zhu, H.R. Jin, Speaker localization based on audio-visual bimodal fusion. JACIII 25(3), 375–382 (2021). https://doi.org/10.20965/jaciii.2021.p0375
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Shonan Institute of Technology, 1-1-25 Tsujido-Nishikaigan, Fujisawa, Kanagawa, 251-8511, Japan
Kaho Yamamoto
Faculty of Engineering, Kindai University, 1 Takaya Umenobe, Higashi-Hiroshima, Hiroshima, 739-2116, Japan
Akio Ogihara
School of Engineering, Chukyo University, 101 Tokodachi, Kaizu-cho, Toyota, 470-0393, Japan
Harumi Murata

Authors

Kaho Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Akio Ogihara
View author publications
You can also search for this author in PubMed Google Scholar
Harumi Murata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaho Yamamoto.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yamamoto, K., Ogihara, A. & Murata, H. Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network. Circuits Syst Signal Process 42, 7004–7021 (2023). https://doi.org/10.1007/s00034-023-02433-z

Download citation

Received: 05 August 2022
Revised: 13 June 2023
Accepted: 13 June 2023
Published: 28 June 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00034-023-02433-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network

Abstract

Access this article

Similar content being viewed by others

Phased microphone array for sound source localization with deep learning

Fusion Spectrogram for Sound Classification Using 2D Convolutional Neural Network

Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network

Abstract

Access this article

Similar content being viewed by others

Phased microphone array for sound source localization with deep learning

Fusion Spectrogram for Sound Classification Using 2D Convolutional Neural Network

Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation