Skip to main content
Log in

Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network

  • Short Paper
  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

There has been much research on estimating noise and speech source direction, but there have not been many studies on estimating the source direction of instrumental sound sources. In this study, we considered the source direction estimation of a single instrumental sound. Direction estimation of sound sources by the multiple signal classification (MUSIC) method often causes large estimation errors. Then, we propose a technique for estimating the direction of musical instrument sound sources by applying regression analysis using a convolutional neural network (CNN), a type of neural network. We calculated the MUSIC spectrum obtained using MUSIC that uses the fundamental and harmonic components, which have relatively large amplitudes, and we estimated the direction of the sound source using the CNN with these components as input. We achieved this by focusing on the overtone structure of the instrumental sound source. This study demonstrated the effectiveness of this method using simulations in a monaural environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

  1. S. Chakrabarty, E.A.P. Habets, Multi-speaker DOA estimation using deep convolutional networks trained with noise signals. IEEE J. Select. Topics Signal Process. 13(1), 8–21 (2019). https://doi.org/10.1109/JSTSP.2019.2901664

    Article  Google Scholar 

  2. Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, Language modeling with gated convolutional networks, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 933–941. https://doi.org/10.48550/arXiv.1612.08083

  3. Y. Denda, T. Nishiura, Y. Yamashita, Robust talker direction estimation based on weighted CSP analysis and maximum likelihood estimation. IEICE Trans. Inf. Syst. E89-D(3), 1050–1057 (2006). https://doi.org/10.1093/ietisy/e89-d.3.1050

    Article  Google Scholar 

  4. A.M. Elbir, DeepMUSIC: multiple signal classification via deep learning. IEEE Sens. Lett. (2020). https://doi.org/10.1109/LSENS.2020.2980384

    Article  Google Scholar 

  5. E.L. Ferguson, S.B. Williams, C.T. Jin, Sound source localization in a multipath environment using convolutional neural networks, in Proceedings of 2018 IEEE International Conference on Acoustic, Speech and Signal Process (2018), pp. 2386–2390. https://doi.org/10.1109/ICASSP.2018.8462024

  6. P.-A. Grumiaux, S. Kitić, L. Girin, A. Guérin, A survey of sound source localization with deep learning methods. J. Acoust. Soc. Am. 152(1), 107–151 (2022). https://doi.org/10.1121/10.0011809

    Article  Google Scholar 

  7. M. Ikeuchi, H. Tanji, T. Murakami, Improvement of the direction-of-arrival estimation method using a single channel microphone by correcting a spectral slope of speech, in Proceedings of APSIPA ASC 2022 (2022), pp. 186–393. https://doi.org/10.23919/APSIPAASC55919.2022.9980291

  8. H. Kameoka, Deep learning approach to audio source separation. J. Acoust. Soc. Jpn. 75(9), 525–531 (2019). https://doi.org/10.20697/jasj.75.9_525

    Article  MathSciNet  Google Scholar 

  9. K. Kikuma, Adaptive signal processing with array antenna. (Science and Technology Publishing Company, 1999)

  10. M. Kitahashi, H. Handa, Estimating classroom situations by using CNN with environmental sound spectrograms. J. Adv. Comput. Intell. Intell. Inf. 22(2), 242–248 (2018). https://doi.org/10.20965/jaciii.2018.p0242

    Article  Google Scholar 

  11. W. Ma, X. Liu, Phased microphone array for sound source localization with deep learning. Aerosp. Syst. 2(2), 71–81 (2019). https://doi.org/10.1007/s42401-019-00026-w

    Article  Google Scholar 

  12. R. Masumura, Language modeling and spoken language understanding based on deep learning. J. Acoust. Soc. Jpn. 73(1), 39–46 (2017). https://doi.org/10.20697/jasj.73.1_39

    Article  Google Scholar 

  13. K. Mori, T. Yokoyama, A. Hasegawa, Comparison of high-resolution techniques for array signal processing method in silent target detection using ambient noise. J. Marine Acoust. Soc. Jpn. 32(2), 89–97 (2005). https://doi.org/10.3135/jmasj.32.89

    Article  Google Scholar 

  14. R. Nishimura, Y. Suzuki, Source and direction of arrival estimation based on maximum likelihood combined with GMM and eigenanalysis, in Proceedings of 2018 IEEE International Conference on Acoustic, Speech and Signal Processing (2018), pp. 3434–3438. https://doi.org/10.1109/ICASSP.2018.8461658

  15. R. Roy, T. Kailath, ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. ASSP-37(7), 984–995 (1989). https://doi.org/10.1109/29.32276

    Article  MATH  Google Scholar 

  16. R.O. Schmidt, Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. AP-34(3), 276–280 (1986). https://doi.org/10.1109/TAP.1986.1143830

    Article  MathSciNet  Google Scholar 

  17. T. Suzuki, Y. Kaneda, Sound source direction estimation based on subband peak-hold processing. J. Acoust. Soc. Jpn. 65(10), 513–522 (2009). https://doi.org/10.20697/jasj.65.10_513

    Article  Google Scholar 

  18. M. Uneda, K. Ishikawa, Study on high resolvable location finding of near sound source using MUSIC algorithm. J. Jpn. Soc. Precis. Eng. 70(8), 1111–1116 (2004). https://doi.org/10.2493/jspe.70.1111

    Article  Google Scholar 

  19. M. Uneda, H. Kondo, K. Ishikawa, O. Ohnishi, S. Kurokawa, T. Doi, Location finding function of high correlation sound sources, using combined methods of spatial smoothing processing and MUSIC-development of handy microphone array system for high efficiency location finding-. J. Jpn. Soc. Precis. Eng. 77(12), 1158–1164 (2011). https://doi.org/10.2493/jjspe.77.1158

    Article  Google Scholar 

  20. M. Unoki, M. Akagi, Signal extraction from noisy signal based on auditory scene analysis, in Proceedings of 5th International Conference on Spoken Language Process (1998). https://doi.org/10.21437/ICSLP.1998-342

  21. K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of virtual sound source by MUSIC method using fundamental frequency components in stereo sound, in The 2019 (70th) Chugoku-branch Joint Convention of the Institutes of Electrical and Information Engineers, R19-08-01-05 (2019)

  22. K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of sound source by MUSIC method and CNN considering overtone structure, in Proceedings of 2022 international technical conference on circuits/systems, computers and communications (2022), pp. 671–674. https://doi.org/10.1109/ITC-CSCC55581.2022.9895088

  23. K. Yamamoto, A. Ogihara, H. Murata, Direction estimation of sound source using MUSIC method and FFNN focusing on the overtone structure of instrumental sounds. IEICE Trans. Inf. Syst. J104-D(10), 780–783 (2021). https://doi.org/10.14923/transinfj.2020JDL8018

    Article  Google Scholar 

  24. K. Yamamoto, F. Asano, I. Hara, J. Ogata, H. Asoh, T. Yamada, N. Kitawaki, Real-time speech interface based on the fusion of audio and video information for humanoid robot HRP-2. J. Acoust. Soc. Jpn 62(3), 161–172 (2006). https://doi.org/10.20697/jasj.62.3_161

    Article  Google Scholar 

  25. Y.X. Zhu, H.R. Jin, Speaker localization based on audio-visual bimodal fusion. JACIII 25(3), 375–382 (2021). https://doi.org/10.20965/jaciii.2021.p0375

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaho Yamamoto.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yamamoto, K., Ogihara, A. & Murata, H. Direction Estimation of Instrumental Sound Sources Using Regression Analysis by Convolutional Neural Network. Circuits Syst Signal Process 42, 7004–7021 (2023). https://doi.org/10.1007/s00034-023-02433-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02433-z

Keywords

Navigation