Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition

Ito, Nobutaka; Vincent, Emmanuel; Nakatani, Tomohiro; Ono, Nobutaka; Araki, Shoko; Sagayama, Shigeki

doi:10.1007/s11265-014-0922-z

Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition

Published: 03 August 2014

Volume 79, pages 145–157, (2015)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Nobutaka Ito¹,
Emmanuel Vincent²,
Tomohiro Nakatani¹,
Nobutaka Ono^3,4,
Shoko Araki¹ &
…
Shigeki Sagayama⁵

440 Accesses
4 Citations
Explore all metrics

Abstract

We propose methods for blind suppression of nonstationary diffuse noise based on decomposition of the observed spatial covariance matrix into signal and noise parts. In modeling noise to regularize the ill-posed decomposition problem, we exploit spatial invariance (isotropy) instead of temporal invariance (stationarity). The isotropy assumption is that the spatial cross-spectrum of noise is dependent on the distance between microphones and independent of the direction between them. We propose methods for spatial covariance matrix decomposition based on least squares and maximum likelihood estimation. The methods are validated on real-world data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Singular value decomposition of noisy data: noise filtering

Article Open access 16 July 2019

Brenden P. Epps & Eric M. Krivitzky

Introduction to Acoustic Terminology and Signal Processing

A Review on Sound Source Localization Systems

Article 05 May 2022

Dhwani Desai & Ninad Mehendale

References

Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions ASSP, 27(2), 113–120.
Article Google Scholar
Martin, R. (1994). Spectral subtraction based on minimum statistics. In Proclamation EUSIPCO, (pp. 1982–1185).
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions ASSP, 32(6), 1109–1121.
Article Google Scholar
Dudgeon, D.E., & Johnson, D.H. (1993). Array signal processing: concepts and techniques. Prentice Hall, Englewood Cliffs.
Brandstein, M., & Ward, D. (2001). Microphone arrays: signal processing techniques and applications. Berlin, Heidelberg: Springer.
Itakura, F., & Saito, S. (1968). Analysis synthesis telephony based on the maximum likelihood method. In Report of 6th International Congress on Acoustics, (pp. 17–20).
Duong, N.Q.K., Vincent, E., Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions ASLP, 18(7), 1830–1840.
Google Scholar
Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., Juang, B.-H. (2010). Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Transactions ASLP, 18(7), 1717–1731.
Google Scholar
Vincent, E., Bertin, N., Gribonval, R., Bimbot, F. (2014). From blind to guided audio source separation. IEEE Signal Proclamation Magazine, 31(3).
Sawada, H., Kameoka, H., Araki, S., Ueda, N. (2013). Multichannel extensions of non-negative matrix factorization with complex-valued data. IEEE Transactions ASLP, 21(5), 971–982.
Google Scholar
Simmer, K.U., Bitzer, J., Marro, C. (2001). Post-filtering techniques, In M. Brandstein & D. Ward (Eds.), Microphone Arrays (pp. 39–60). Berlin, Heidelberg: Springer.
Doclo, S., & Moonen, M. (2002). GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions SP, 50(9), 2230–2244.
Article Google Scholar
Ito, N. (2012). Robust microphone array signal processing against diffuse noise, Ph.D. thesis, The University of Tokyo.
Ito, N., Vincent, E., Ono, N., Sagayama, S. (2013). General algorithms for estimating spectrogram and transfer functions of target signal for blind suppression of diffuse noise. In Proceedings of the IEEE international workshop on machine learning for signal processing (MLSP).
Ito, N., Shimizu, H., Ono, N., Sagayama, S. (2011). Diffuse noise suppression using crystal-shaped microphone arrays. IEEE Transactions ASLP, 19(7), 2101–2110.
Google Scholar
Zelinski, R. (1988). A microphone array with adaptive post-filtering for noise reduction in reverberant rooms. In Proclamation ICASSP (pp. 2578–2581).
McCowan, I.A., & Bourlard, H. (2003). Microphone array post-filter based on noise field coherence. IEEE Transactions SAP, 11(6), 709–716.
Google Scholar
Ito, N., Ono, N., Sagayama, S (2010). Designing the Wiener post-filter for diffuse noise suppression using imaginary parts of inter-channel cross-spectra, In Proclamation ICASSP (pp. 2818–2821).
Ito, N., Vincent, E., Ono, N., Gribonval, R., Sagayama, S. (2010). Crystal-MUSIC: Accurate localization of multiple sources in diffuse noise environments using crystal-shaped microphone arrays. In Proclamation of LVA/ICA, lecture notes in computer science (Vol. , pp. 81–88).
Srebro, N., & Jaakkola, T. (2003). Weighted low-rank approximations. In Proceedings of the international conference on machine learning (ICML) (pp. 720–727). AAAI Press.
Toh, K., & Yun, S. (2010). An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pacific Journal of Optimization, 6(3), 615–640.
MATH MathSciNet Google Scholar
Pham, D.-T., & Cardoso, J.-F. (2001). Blind separation of instantaneous mixtures of non stationary sources. IEEE Transactions SP, 49(9), 1837–1848.
Article MathSciNet Google Scholar
Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions ASLP, 18(3), 550–563.
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm,”. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–38.
MATH MathSciNet Google Scholar
Kurematsu, A., Takeda, K., Sagisaka, Y., Katagiri, S., Kuwabara, H., Shikano, K. (1990). ATR Japanese speech database as a tool of speech recognition and synthesis. Speech Communications, 9(4), 357–363.
Article Google Scholar
Ono, N. (2011). Stable and fast update rules for independent vector analysis based on auxiliary function technique. In Proceedings of IEEE workshop applications of signal processing audio acoustics (WASPAA) (pp. 189–192).

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Nobutaka Ito, Tomohiro Nakatani & Shoko Araki
Inria, Nancy, France
Emmanuel Vincent
Principles of Informatics Research Division, National Institute of Informatics, Tokyo, Japan
Nobutaka Ono
School of Multidisciplinary Sciences, The Graduate University for Advanced Studies (SOKENDAI), Tokyo, Japan
Nobutaka Ono
School of Interdisciplinary Mathematical Sciences, Meiji University, Tokyo, Japan
Shigeki Sagayama

Authors

Nobutaka Ito
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Vincent
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro Nakatani
View author publications
You can also search for this author in PubMed Google Scholar
Nobutaka Ono
View author publications
You can also search for this author in PubMed Google Scholar
Shoko Araki
View author publications
You can also search for this author in PubMed Google Scholar
Shigeki Sagayama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nobutaka Ito.

Appendix: Derivation of the Update Rules in the M-Step of Maximum Likelihood Estimation

By setting the partial derivative of the Q-function (38) to zero, we have

$$ -M\frac{1}{{\phi^{x}_{t}}}+\text{tr}\biggl[({B}^{{x}})^{-1}\bigl\langle{x}_{t}{x}_{t}^{\textsf{H}}\bigr\rangle\biggr]\frac{1}{({\phi^{x}_{t}})^{2}}=0. $$

(43)

By solving this w.r.t. ${\phi ^{x}_{t}}$, we get [7]

$$ {\phi^{x}_{t}}=\frac{1}{M}\text{tr}\bigl[({B}^{{x}})^{-1}\hat{{\Phi}}^{{x}}_{t}\bigr]. $$

(44)

Here, we defined

$$ \hat{\Phi}^{x}_{t} \triangleq\bigl\langle{x}_{t}{x}_{t}^{\textsf{H}}\bigr\rangle_{p({x}_{t}|{y}_{t};{\Theta}^{\prime})}$$

(45)

$$ =\bigl({\Phi}^{{x}|{y}}_{t}\bigr)^{\prime}+\bigl({\mu}_{t}^{{x}|{y}}\bigr)^{\prime}\bigl({\mu}_{t}^{{x}|{y}}\bigr)^{\prime\textsf{H}}. $$

(46)

Next, partial differentiation w.r.t. B ^x gives

$$ -T({B}^{{x}})^{-1}+({B}^{{x}})^{-1}\Biggl(\sum\limits_{t=1}^{T}\frac{1}{{\phi^{x}_{t}}}\bigl\langle{x}_{t}{x}_{t}^{\textsf{H}}\bigr\rangle \Biggr)({B}^{{x}})^{-1}=0. $$

(47)

Solving this w.r.t. B ^x, we have [7]

$$ {B}^{{x}}=\frac{1}{T}\sum\limits_{t=1}^{T}\frac{1}{{\phi^{x}_{t}}}\hat{{\Phi}}^{{x}}_{t}. $$

(48)

The update rule for ${\Phi }^{{v}}_{t}$ depends on the explicit form of the matrix subspace 𝓥. In the following, we first show that for the class of 𝓥 satisfying

$$ {\Phi}^{{v}}_{t}\in\mathcal{V}\text{: positive definite} \Rightarrow ({\Phi}^{{v}}_{t})^{-1}\in\mathcal{V}, $$

(49)

we can derive a unified update rule. Clearly, the subspaces 𝓥_uncor, 𝓥_BND, 𝓥_real defined in Section 3 belong to the class. We then derive the update rule for 𝓥_coh, which does not belong to the class.

When 𝓥 satisfies (49), the terms of (38) depending on ${\Phi }^{{v}}_{t}$ can be rewritten as

$$ -U\log\det{\Phi}^{{v}}_{t}\\ $$

$$ -\text{tr}\biggl\{({\Phi}^{{v}}_{t})^{-1}\mathcal{P}\biggl[\bigl\langle({y}_{t}-{x}_{t})({y}_{t}-{x}_{t})^{\textsf{H}}\bigr\rangle\biggr]\biggr\}. $$

(50)

Here, 𝓟[⋅] denotes the orthogonal projection onto 𝓥 defined using the standard inner product $\langle {A},{B}\rangle \triangleq \text {tr}[{AB}]$ of ℋ:

$$ \mathcal{P}\bigl[{A}\bigr]=\sum\limits_{d=1}^{D}\text{tr}\bigl[{A}{Q}_{d}\bigr]{Q}_{d}. $$

(51)

Here, {Q _d}_{1 ≤ d ≤ D} is an orthonormal basis of 𝓥, and D denotes the dimension of 𝓥. The explicit form of Q _d depends on the choice of 𝓥, for which the readers are referred to [13, 14]. The term in 𝓟[⋅] in (50) generally has both components parallel and orthogonal to 𝓥. However, the latter vanishes owing to $({\Phi }^{{v}}_{t})^{-1}\in \mathcal {V}$, and hence (50). To derive ${\Phi }^{{v}}_{t}\in \mathcal {V}$ that maximizes (50), we forget the constraint ${\Phi }^{{v}}_{t}\in \mathcal {V}$ for the moment, and differentiate (50) w.r.t. ${\Phi }^{{v}}_{t}$. We have

$$ {\Phi}^{{v}}_{t}=\mathcal{P}\bigl[\hat{{\Phi}}^{{v}}_{t}\bigr], $$

(52)

where

$$ \hat{{\Phi}}^{{v}}_{t}\triangleq\bigl\langle({y}_{t}-{x}_{t})({y}_{t}-{x}_{t})^{\textsf{H}}\bigr\rangle_{p({x}_{t}|{y}_{t};{\Theta}^{\prime})}\\ $$

(53)

$$ \kern1.3pc=\bigl({\Phi}^{{x}|{y}}_{t}\bigr)^{\prime}+\bigl\{{y}_{t}-\bigl({\mu}_{t}^{{x}|{y}}\bigr)^{\prime}\bigr\}\bigl\{{y}_{t}-\bigl({\mu}_{t}^{{x}|{y}}\bigr)^{\prime}\bigr\}^{\textsf{H}}. $$

(54)

As is clear from the definition of 𝓟[⋅], (52) certainly satisfies ${\Phi }^{{v}}_{t}\in \mathcal {V}$.

Although we have derived (52) through partial differentiation, we can also derive it more intuitively as follows. Inverting the sign and ignoring a constant independent of ${\Phi }^{{v}}_{t}$, (50) becomes the following matrix Itakura-Saito divergence:

$$\begin{array}{@{}rcl@{}} D_{\text{IS}}\bigl(\mathcal{P}\bigl[\hat{{\Phi}}^{{v}}_{t}\bigr];{\Phi}^{{v}}_{t}\bigr)&\triangleq& \text{tr}\bigl\{\mathcal{P}\bigl[\hat{{\Phi}}^{{v}}_{t}\bigr]({\Phi}^{{v}}_{t})^{-1}\bigr\}\notag\\ &&-\log\det\bigl\{\mathcal{P}\bigl[\hat{{\Phi}}^{{v}}_{t}\bigr]({\Phi}^{{v}}_{t})^{-1}\bigr\}-M. \end{array} $$

(55)

Therefore, the maximization of (50) is equivalent to the minimization of (55). D _IS(⋅,⋅) is nonnegative, and equal to zero if and only if the two arguments are equal. Since $\mathcal {P}\bigl [\hat {{\Phi }}^{{v}}_{t}\bigr ]$ belong to the feasible set 𝓥 of ${\Phi }^{{v}}_{t}$, (55) is minimized when ${\Phi }^{{v}}_{t}=\mathcal {P}\bigl [\hat {{\Phi }}^{{v}}_{t}\bigr ]$.

Next we consider the case 𝓥 = 𝓥_coh. Substituting

$$ {\Phi}^{{v}}_{t}={\phi^{v}_{t}}{B}^{{v}} $$

(56)

into the Q-function (38), and differentiating it w.r.t. ${\phi ^{v}_{t}}$, we have, as in the derivation of (44),

$$ {\phi^{v}_{t}}=\frac{1}{M}\text{tr}\bigl[({B}^{{v}})^{-1}\hat{{\Phi}}^{{v}}_{t}\bigr]. $$

(57)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ito, N., Vincent, E., Nakatani, T. et al. Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition. J Sign Process Syst 79, 145–157 (2015). https://doi.org/10.1007/s11265-014-0922-z

Download citation

Received: 02 February 2014
Revised: 28 May 2014
Accepted: 30 June 2014
Published: 03 August 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11265-014-0922-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition

Abstract

Access this article

Similar content being viewed by others

Singular value decomposition of noisy data: noise filtering

Introduction to Acoustic Terminology and Signal Processing

A Review on Sound Source Localization Systems

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of the Update Rules in the M-Step of Maximum Likelihood Estimation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blind Suppression of Nonstationary Diffuse Acoustic Noise Based on Spatial Covariance Matrix Decomposition

Abstract

Access this article

Similar content being viewed by others

Singular value decomposition of noisy data: noise filtering

Introduction to Acoustic Terminology and Signal Processing

A Review on Sound Source Localization Systems

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of the Update Rules in the M-Step of Maximum Likelihood Estimation

Appendix: Derivation of the Update Rules in the M-Step of Maximum Likelihood Estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation