Abstract
Lately developed approaches to distant speech processing tasks fail to reach the quality of close-talking speech processing in terms of speech recognition, speaker identification and diarization quality. Sound source localization remains an important aspect in multi-channel distant speech processing applications. This paper considers an approach to improve speaker localization quality on large-aperture microphone arrays. To reduce the shortcomings of signal acquisition with large-aperture arrays and reduce the impact of noise and interference, a Time-Frequency masking approach is proposed applying Complex Angular Central Gaussian Mixture Models for sound source directional clustering and inter-component phase analysis for polyharmonic speech component restoration. The approach is tested on real-life multi-speaker recordings and shown to increase speaker localization accuracy for the cases of non-overlapped and partially overlapped speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Astapov, S., Lavrentyev, A., Shuranov, E.: Far field speech enhancement at low SNR in presence of nonstationary noise based on spectral masking and MVDR beamforming. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 21–31. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_3
Astapov, S., et al.: Acoustic event mixing to multichannel AMI data for distant speech recognition and acoustic event classification benchmarking. In: Salah, A.A., Karpov, A., Potapova, R. (eds.) SPECOM 2019. LNCS (LNAI), vol. 11658, pp. 31–42. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26061-3_4
Barysenka, S.Y., Vorobiov, V.I., Mowlaee, P.: Single-channel speech enhancement using inter-component phase relations. Speech Commun. 99, 144–160 (2018)
Comanducci, L., Cobos, M., Antonacci, F., Sarti, A.: Time difference of arrival estimation from frequency-sliding generalized cross-correlations using convolutional neural networks. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949 (2020)
Dey, N., Ashour, A.: Direction of Arrival Estimation and Localization of Multi-Speech Sources. Springer Briefs in Speech Technology. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-73059-2
DiBiase, J.H.: A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Ph.D. thesis, Brown University, Providence, RI, USA (2000)
Do, H., Silverman, H.F.: Stochastic particle filtering: a fast SRP-PHAT single source localization algorithm. In: 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 213–216 (2009)
He, W., Lu, L., Zhang, B., Mahadeokar, J., Kalgaonkar, K., Fuegen, C.: Spatial attention for far-field speech recognition with deep beamforming neural networks. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7499–7503, May 2020
Ito, N., Araki, S., Nakatani, T.: Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1153–1157 (2016)
Kulmer, J., Mowlaee, P.: Phase estimation in single channel speech enhancement using phase decomposition. IEEE Signal Process. Lett. 22(5), 598–602 (2015)
Luo, Y., Han, C., Mesgarani, N., Ceolini, E., Liu, S.C.: FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 260–267. IEEE, Piscataway, NJ (2020). IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019); Conference Location: Singapore, Singapore; Conference Date: December 14–18 (2019)
Sachar, J.M.: Some Important Algorithms for Large-Aperture Microphone Arrays: Calibration and Determination of Talker Orientation. Ph.D. thesis, Brown University, Providence, RI, USA (2004)
Silverman, H.F., Patterson, W.R., Sachar, J.: Factors affecting the performance of large-aperture microphone arrays. J. Acoust. Soc. Am. 111(5 Pt 1), 2140–2157 (2002)
Vera-Diaz, J., Pizarro, D., Macias-Guarasa, J.: Towards end-to-end acoustic localization using deep learning: from audio signals to source position coordinates. Sensors 18, 3418 (2018)
Vorobiov, V.I., Davydov, A.G.: Study of the relations between quasi-harmonic components of speech signal in Chinese language. Proc. Twenty-Fifth Session Russian Acoust. Soc. 3, 11–14 (2012)
Watanabe, S., Araki, S., Bacchiani, M., Haeb-Umbach, R., Seltzer, M.L.: Introduction to the issue on far-field speech processing in the era of deep learning: speech enhancement, separation, and recognition. IEEE J. Sel. Top. Sig. Process. 13(4), 785–786 (2019)
Xiao, X., Watanabe, S., Chng, E.S., Li, H.: Beamforming networks using spatial covariance features for far-field speech recognition. In: 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–6 (2016)
Zhao, H., Zarar, S., Tashev, I., Lee, C.H.: Convolutional-recurrent neural networks for speech enhancement. In: IEEE International Conference Acoustics Speech and Signal Processing (ICASSP), April 2018
Acknowledgments
This research was financially supported by the Foundation NTI (Contract 20/18gr, ID 0000000007418QR20002) and by the Government of the Russian Federation (Grant 08-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Astapov, S., Popov, D., Kabarov, V. (2020). Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)