Skip to main content
Log in

Statistically Optimal Joint Multimicrophone MAP Estimators Under Super-Gaussian Assumption

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper presents two super-Gaussian-based multimicrophone maximum a posteriori (MAP) estimators which exploit both amplitude and phase of speech signal from noisy observations. It is well known that super-Gaussian distributions model the statistical properties of speech signal more accurately. Under the independent Gaussian statistical assumption for noise signals, which is usually valid in wireless acoustic sensor networks, two joint multimicrophone estimators are derived while the speech signal is modeled by super-Gaussian distribution. Since the microphones are distributed randomly and may also belong to different devices, the independency assumption of noise signals is more reasonable in these networks. The performance of the proposed estimators is compared to that of four baseline estimators; the first is the multimicrophone minimum mean square error (MMSE) estimation, where both amplitude and phase are derived assuming Gaussian properties for speech signal. The second baseline is the multimicrophone MAP-based amplitude estimator, that utilizes the super-Gaussian statistics to just obtain the amplitude of speech and keeps the phase unchanged. As the third one, we have considered a minimum variance distortion-less response filter followed by a super-Gaussian MMSE estimator. We have also compared the performance of the proposed estimators with the centralized multichannel Wiener filter. The simulation experiments demonstrate remarkable ability of the proposed estimators to enhance speech quality and intelligibility when the clean speech is degraded by a mixture of both point source interference and additive noise in reverberant environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The simulated datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. The input full-band SNRs are computed for the first microphone since we considered it as the reference one.

  2. The simulation codes are available at https://pws.yazd.ac.ir/sprl/Ranjbaryan-CSSP/Codes.rar.

References

  1. A. Abramson, I. Cohen, Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 15(8), 2348–2359 (2007). https://doi.org/10.1109/TASL.2007.904231

    Article  Google Scholar 

  2. H.R. Abutalebi, M. Rashidinejad, Speech enhancement based on beta-order MMSE estimation of short time spectral amplitude and Laplacian speech modeling. Speech Commun. 67, 92–101 (2015). https://doi.org/10.1016/j.specom.2014.12.002

    Article  Google Scholar 

  3. J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. Acoust. Soc. Am. J. 65, 943–950 (1979). https://doi.org/10.1121/1.382599

    Article  ADS  Google Scholar 

  4. I. Andrianakis, P.R. White, MMSE speech spectral amplitude estimators with Chi and Gamma speech priors. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1068–1071, (2006). https://doi.org/10.1109/ICASSP.2006.1660842

  5. R. Balan, J. Rosca, Microphone array speech enhancement by Bayesian estimation of spectral amplitude and phase. In: proc. Sensor Array and Multichannel Signal Processing Workshop Proceedings (SAM), pp 209–213, (2002) https://doi.org/10.1109/SAM.2002.1191030

  6. A. Bertrand, M. Moonen, Distributed adaptive node-specific signal estimation in fully connected sensor networks—part I: sequential node updating. IEEE Trans. Signal Process. 58(10), 5277–5291 (2010). https://doi.org/10.1109/TSP.2010.2052612

    Article  ADS  MathSciNet  Google Scholar 

  7. S.R. Chiluveru, M. Tripathy, Low SNR speech enhancement with DNN based phase estimation. Int. J. Speech Technol. 22(1), 283–292 (2019). https://doi.org/10.1007/s10772-019-09603-y

    Article  Google Scholar 

  8. T.H. Dat, K. Takeda, F. Itakura, Generalized Gamma modeling of speech and its online estimation for speech enhancement. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 181–184, (2005). https://doi.org/10.1109/ICASSP.2005.1415975

  9. S. Doclo, M. Moonen, T. Van den Bogaert et al., Reduced-bandwidth and distributed MWF-based noise reduction algorithms for binaural hearing aids. IEEE Trans. Audio Speech Lang. Process. 17(1), 38–51 (2009). https://doi.org/10.1109/TASL.2008.2004291

    Article  Google Scholar 

  10. Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984). https://doi.org/10.1109/TASSP.1984.1164453

    Article  Google Scholar 

  11. J.S. Erkelens, R.C. Hendriks, R. Heusdens et al., Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process. 15(6), 1741–1752 (2007). https://doi.org/10.1109/TASL.2007.899233

    Article  Google Scholar 

  12. J.S. Garofolo, Getting started with the DARPA TIMIT CD-ROM: an acoustic phonetic continuous speech database. Tech. rep., National Institute of Standards and Technology (NIST), Gaithersburgh, MD, (prototype as of) (1988)

  13. T. Gerkmann, M. Krawczyk-Becker, J.L. Roux, Phase processing for single channel speech enhancement. IEEE Signal Process. Mag. (2015)

  14. T. Gerkmann, Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase. IEEE Trans. Signal Process. 62(16), 4199–4208 (2014). https://doi.org/10.1109/TSP.2014.2336615

    Article  ADS  MathSciNet  Google Scholar 

  15. T. Gerkmann, MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4478–4482, (2014) https://doi.org/10.1109/ICASSP.2014.6854449

  16. T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012). https://doi.org/10.1109/TASL.2011.2180896

    Article  Google Scholar 

  17. T. Gerkmann, C. Breithaupt, R. Martin, Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Trans. Audio, Speech Lang. Process. 16(5), 910–919 (2008). https://doi.org/10.1109/TASL.2008.921764

    Article  Google Scholar 

  18. R.C. Hendriks, R. Heusdens, J. Jensen, On robustness of multi-channel minimum mean-squared error estimators under super-Gaussian priors. In: proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp 157–160, (2009a). https://doi.org/10.1109/ASPAA.2009.5346488

  19. R.C. Hendriks, R. Heusdens, U. Kjems et al., On optimal multichannel mean-squared error estimators for speech enhancement. IEEE Signal Process. Lett. 16(10), 885–888 (2009). https://doi.org/10.1109/LSP.2009.2026205

    Article  ADS  Google Scholar 

  20. Y.A. Huang, J. Benesty, A multi-frame approach to the frequency-domain single-channel noise reduction problem. IEEE Trans. Audio Speech Lang. Process. 20(4), 1256–1269 (2012). https://doi.org/10.1109/TASL.2011.2174226

    Article  Google Scholar 

  21. M. Kazama, S. Gotoh, M. Tohyama et al., On the significance of phase in the short term Fourier spectrum for speech intelligibility. Acoust. Soc. Am. 127(3), 1432–1439 (2010)

    Article  ADS  Google Scholar 

  22. H. Lang, J. Yang, Speech enhancement based on fusion of both magnitude/phase-aware features and targets. Electronics 9(7), 1125–1144 (2020). https://doi.org/10.3390/electronics9071125

    Article  Google Scholar 

  23. P. Loizou, Speech Enhancement: Theory and Practice, 1st edn. (CRC Press, Boca Raton, 2007)

    Book  Google Scholar 

  24. T. Lotter, Single- and Multi-Microphone Spectral Amplitude Estimation Using a Super-Gaussian Speech Model (Springer, Berlin, 2005)

    Book  Google Scholar 

  25. T. Lotter, P. Vary, Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Adv. Signal Process. 7, 1110–1126 (2005). https://doi.org/10.1155/ASP.2005.1110

    Article  Google Scholar 

  26. T. Lotter, C. Benien, P. Vary, Multi channel direction independent speech enhancement using spectral amplitude estimation. EURASIP J. Appl. Signal Process. 2003, 1147–1156 (2003)

    Google Scholar 

  27. S. Markovich-Golan, S. Gannot, I. Cohen, Distributed multiple constraints generalized sidelobe canceler for fully connected wireless acoustic sensor networks. IEEE Trans. Audio Speech Lang. Process. 21(2), 343–356 (2013). https://doi.org/10.1109/TASL.2012.2224454

    Article  Google Scholar 

  28. S. Markovich-Golan, A. Bertrand, M. Moonen et al., Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks. Signal Process. 107, 4–20 (2015). https://doi.org/10.1016/j.sigpro.2014.07.014

    Article  Google Scholar 

  29. R. Martin, Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 253–256, (2002). https://doi.org/10.1109/ICASSP.2002.5743702

  30. R. Martin, Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005). https://doi.org/10.1109/TSA.2005.851927

    Article  Google Scholar 

  31. R. Martin, C. Breithaupt, Speech enhancement in the DFT domain using Laplacian speech priors. In: proc. International Workshop on Acoustic Echo and Noise Control (IWAENC), pp 87–90 (2003)

  32. N. Oo, W.S. Gan, On harmonic addition theorem. Int. J. Comput. Commun. Eng. 1(3), 200–202 (2012)

    Article  Google Scholar 

  33. K. Paliwal, K. Wójcicki, B. Shannon, The importance of phase in speech enhancement. Speech Commun. 53(4), 465–494 (2011). https://doi.org/10.1016/j.specom.2010.12.003

    Article  Google Scholar 

  34. A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn. (McGraw Hill, Boston, 2002)

    Google Scholar 

  35. P.G. Patil, T.H. Jaware, S.P. Patil et al., Marathi speech intelligibility enhancement using I-AMS based neuro-fuzzy classifier approach for hearing aid users. IEEE Access 10, 123028–123042 (2022). https://doi.org/10.1109/ACCESS.2022.3223365

    Article  Google Scholar 

  36. P.S. Rani, S. Andhavarapu, S.R. Murty Kodukula, Significance of phase in DNN based speech enhancement algorithms. In: proc. National Conference on Communications (NCC), pp 1–5, (2020), https://doi.org/10.1109/NCC48643.2020.9056089

  37. R. Ranjbaryan, H.R. Abutalebi, Distributed speech presence probability estimator in fully connected wireless acoustic sensor networks. Circuits Syst. Signal Process. 39, 6121–6141 (2020). https://doi.org/10.1007/s00034-020-01452-4

    Article  Google Scholar 

  38. R. Ranjbaryan, H.R. Abutalebi, Multiframe maximum a posteriori estimators for single-microphone speech enhancement. IET Signal Proc. 15(7), 467–481 (2021). https://doi.org/10.1049/sil2.12045

    Article  Google Scholar 

  39. R. Ranjbaryan, S. Doclo, H.R. Abutalebi, Distributed MAP estimators for noise reduction in fully connected wireless acoustic sensor networks. In: Proc. Speech Communication; 13th ITG-Symposium, pp 1–5 (2018)

  40. S. Samui, I. Chakrabarti, S.K. Ghosh, Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal. IET Signal Proc. 10(6), 641–650 (2016). https://doi.org/10.1049/iet-spr.2015.0182

    Article  Google Scholar 

  41. M. Souden, J. Chen, J. Benesty et al., An integrated solution for online multichannel noise tracking and reduction. IEEE Trans. Audio Speech Lang. Process. 19(7), 2159–2169 (2011). https://doi.org/10.1109/TASL.2011.2118205

    Article  Google Scholar 

  42. C.H. Taal, R.C. Hendriks, R. Heusdens, et al., A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4214–4217, (2010). https://doi.org/10.1109/ICASSP.2010.5495701

  43. M. Trawicki, M.T. Johnson, Improvements of the Beta-order minimum mean-square error (MMSE) spectral amplitude estimator using Chi priors. In: proc. Thirteenth Annual Conference of the International Speech Communication Association (INTERSPEECH), pp 939–942 (2012a)

  44. M.B. Trawicki, M.T. Johnson. Distributed multichannel speech enhancement with minimum mean-square short time spectral amplitude, log-spectral amplitude and spectral phase estimation. Signal Processing pp 345–356 (2012b)

  45. M.B. Trawicki, M.T. Johnson, Speech enhancement using Bayesian estimators of the perceptually-motivated short-time spectral amplitude (STSA) with Chi speech priors. Speech Commun. 57, 101–113 (2014). https://doi.org/10.1016/j.specom.2013.09.009

    Article  Google Scholar 

  46. Y. Wakabayashi, T. Fukumori, M. Nakayama et al., Single-channel speech enhancement with phase reconstruction based on phase distortion averaging. IEEE/ACM Trans. Audio Speech Lang. Process. 26(9), 1559–1569 (2018). https://doi.org/10.1109/TASLP.2018.2831632

    Article  Google Scholar 

  47. D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982). https://doi.org/10.1109/TASSP.1982.1163920

    Article  Google Scholar 

  48. P.J. Wolfe, S.J. Godsill, Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP J. Adv. Signal Process. 10, 1043–1051 (2003)

    Google Scholar 

  49. Z. Zhang, D.S. Williamson, Y. Shen, Impact of phase distortion and phase-insensitive speech enhancement on speech quality perceived by hearing-impaired listeners. J. Acoust. Soc. Am. 148(4), 2650–2650 (2020). https://doi.org/10.1121/1.5147369

    Article  ADS  Google Scholar 

  50. N. Zheng, X.L. Zhang, Phase-aware speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 63–76 (2019). https://doi.org/10.1109/TASLP.2018.2870742

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are grateful to the Department of Medical Physics and Acoustics, University of Oldenburg, for allowing access to their recorded data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raziyeh Ranjbaryan.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

A generalized form of (9) has been presented in Example 1 of Chapter 5 of [34], that expresses the probability distribution of random variable Y, which is a function of variable X as follows:

$$\begin{aligned} Y = aX + b, \end{aligned}$$
(A1)

where a and b represent deterministic variables. In the case of \( b = 0 \), this equation is simplified to our case. Although in general, the division of two random variables X and Y, i.e., Y/X yields a random variable, however, in special case like the current situation, (\( a = \dfrac{Y}{X} \)) represents a deterministic value. In the problem at hand

$$\begin{aligned} {\left\{ \begin{array}{ll} Y \longleftarrow A_{m} \\ X \longleftarrow A_{1} \end{array}\right. } \end{aligned}$$
(A2)

where random variables are with Rayleigh distribution, and

$$\begin{aligned} {\left\{ \begin{array}{ll} a \longleftarrow C_{m} \\ b \longleftarrow 0 \end{array}\right. } \end{aligned}$$
(A3)

so, \( C_{m} \) represents a deterministic value (the ratio of two standard deviations) as explained in the manuscript.

Based on [34], the distribution function of \( F_y(y) \) is computed as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} F_y(y) = P\{X\le \dfrac{y-b}{a}\}= F_x(\dfrac{y-b}{a}), \qquad &{} a > 0, \\ F_y(y)= P\{X \ge \dfrac{y-b}{a}\}= 1- F_x(\dfrac{y-b}{a}), \qquad &{} a < 0, \end{array}\right. } \end{aligned}$$
(A4)

and the PDF is computed as

$$\begin{aligned} f_y(y) =\frac{1}{\mid a \mid } f_x(\dfrac{y-b}{a}). \end{aligned}$$
(A5)

In our problem the amplitude \( A_1 \) has the super-Gaussian distribution

$$\begin{aligned} p(A_1) = {\left\{ \begin{array}{ll} \dfrac{\mu ^{\nu +1} A_1^\nu }{\Gamma (\nu +1) \sigma ^{\nu +1}_x(1)} \exp \left( \dfrac{-\mu A_1}{\sigma _x(1)} \right) , \qquad &{} A_1 > 0, \\ 0, \qquad &{} \text {else}, \end{array}\right. } \end{aligned}$$
(A6)

hence, the PDF of \( A_{m} = C_{m}A_1 \) is given by

$$\begin{aligned} p(A_{m}) =\frac{1}{C_{m}} \, p(\dfrac{A_{m}}{C_{m}}), \end{aligned}$$
(A7)

consequently:

$$\begin{aligned} p(A_{m}) = {\left\{ \begin{array}{ll} \dfrac{\mu ^{\nu +1}A_m^\nu }{ \Gamma (\nu +1)(C_m\sigma _x(1))^{\nu +1}} \exp \left( \dfrac{-\mu A_m}{C_m\sigma _x(1)} \right) , \qquad &{} A_m > 0, \\ 0, \qquad &{} \text {else}, \end{array}\right. } \end{aligned}$$
(A8)

which again represents super-Gaussian distribution with variance \( \sigma ^2_x(m) = C_m^2\sigma ^2_x(1) \) as mentioned in the manuscript.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ranjbaryan, R., Abutalebi, H.R. Statistically Optimal Joint Multimicrophone MAP Estimators Under Super-Gaussian Assumption. Circuits Syst Signal Process 43, 1492–1517 (2024). https://doi.org/10.1007/s00034-023-02515-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02515-y

Keywords

Navigation