Abstract
This paper presents two super-Gaussian-based multimicrophone maximum a posteriori (MAP) estimators which exploit both amplitude and phase of speech signal from noisy observations. It is well known that super-Gaussian distributions model the statistical properties of speech signal more accurately. Under the independent Gaussian statistical assumption for noise signals, which is usually valid in wireless acoustic sensor networks, two joint multimicrophone estimators are derived while the speech signal is modeled by super-Gaussian distribution. Since the microphones are distributed randomly and may also belong to different devices, the independency assumption of noise signals is more reasonable in these networks. The performance of the proposed estimators is compared to that of four baseline estimators; the first is the multimicrophone minimum mean square error (MMSE) estimation, where both amplitude and phase are derived assuming Gaussian properties for speech signal. The second baseline is the multimicrophone MAP-based amplitude estimator, that utilizes the super-Gaussian statistics to just obtain the amplitude of speech and keeps the phase unchanged. As the third one, we have considered a minimum variance distortion-less response filter followed by a super-Gaussian MMSE estimator. We have also compared the performance of the proposed estimators with the centralized multichannel Wiener filter. The simulation experiments demonstrate remarkable ability of the proposed estimators to enhance speech quality and intelligibility when the clean speech is degraded by a mixture of both point source interference and additive noise in reverberant environments.
Similar content being viewed by others
Data Availability
The simulated datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
The input full-band SNRs are computed for the first microphone since we considered it as the reference one.
The simulation codes are available at https://pws.yazd.ac.ir/sprl/Ranjbaryan-CSSP/Codes.rar.
References
A. Abramson, I. Cohen, Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 15(8), 2348–2359 (2007). https://doi.org/10.1109/TASL.2007.904231
H.R. Abutalebi, M. Rashidinejad, Speech enhancement based on beta-order MMSE estimation of short time spectral amplitude and Laplacian speech modeling. Speech Commun. 67, 92–101 (2015). https://doi.org/10.1016/j.specom.2014.12.002
J.B. Allen, D.A. Berkley, Image method for efficiently simulating small-room acoustics. Acoust. Soc. Am. J. 65, 943–950 (1979). https://doi.org/10.1121/1.382599
I. Andrianakis, P.R. White, MMSE speech spectral amplitude estimators with Chi and Gamma speech priors. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1068–1071, (2006). https://doi.org/10.1109/ICASSP.2006.1660842
R. Balan, J. Rosca, Microphone array speech enhancement by Bayesian estimation of spectral amplitude and phase. In: proc. Sensor Array and Multichannel Signal Processing Workshop Proceedings (SAM), pp 209–213, (2002) https://doi.org/10.1109/SAM.2002.1191030
A. Bertrand, M. Moonen, Distributed adaptive node-specific signal estimation in fully connected sensor networks—part I: sequential node updating. IEEE Trans. Signal Process. 58(10), 5277–5291 (2010). https://doi.org/10.1109/TSP.2010.2052612
S.R. Chiluveru, M. Tripathy, Low SNR speech enhancement with DNN based phase estimation. Int. J. Speech Technol. 22(1), 283–292 (2019). https://doi.org/10.1007/s10772-019-09603-y
T.H. Dat, K. Takeda, F. Itakura, Generalized Gamma modeling of speech and its online estimation for speech enhancement. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 181–184, (2005). https://doi.org/10.1109/ICASSP.2005.1415975
S. Doclo, M. Moonen, T. Van den Bogaert et al., Reduced-bandwidth and distributed MWF-based noise reduction algorithms for binaural hearing aids. IEEE Trans. Audio Speech Lang. Process. 17(1), 38–51 (2009). https://doi.org/10.1109/TASL.2008.2004291
Y. Ephraim, D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984). https://doi.org/10.1109/TASSP.1984.1164453
J.S. Erkelens, R.C. Hendriks, R. Heusdens et al., Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process. 15(6), 1741–1752 (2007). https://doi.org/10.1109/TASL.2007.899233
J.S. Garofolo, Getting started with the DARPA TIMIT CD-ROM: an acoustic phonetic continuous speech database. Tech. rep., National Institute of Standards and Technology (NIST), Gaithersburgh, MD, (prototype as of) (1988)
T. Gerkmann, M. Krawczyk-Becker, J.L. Roux, Phase processing for single channel speech enhancement. IEEE Signal Process. Mag. (2015)
T. Gerkmann, Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase. IEEE Trans. Signal Process. 62(16), 4199–4208 (2014). https://doi.org/10.1109/TSP.2014.2336615
T. Gerkmann, MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4478–4482, (2014) https://doi.org/10.1109/ICASSP.2014.6854449
T. Gerkmann, R.C. Hendriks, Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012). https://doi.org/10.1109/TASL.2011.2180896
T. Gerkmann, C. Breithaupt, R. Martin, Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Trans. Audio, Speech Lang. Process. 16(5), 910–919 (2008). https://doi.org/10.1109/TASL.2008.921764
R.C. Hendriks, R. Heusdens, J. Jensen, On robustness of multi-channel minimum mean-squared error estimators under super-Gaussian priors. In: proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp 157–160, (2009a). https://doi.org/10.1109/ASPAA.2009.5346488
R.C. Hendriks, R. Heusdens, U. Kjems et al., On optimal multichannel mean-squared error estimators for speech enhancement. IEEE Signal Process. Lett. 16(10), 885–888 (2009). https://doi.org/10.1109/LSP.2009.2026205
Y.A. Huang, J. Benesty, A multi-frame approach to the frequency-domain single-channel noise reduction problem. IEEE Trans. Audio Speech Lang. Process. 20(4), 1256–1269 (2012). https://doi.org/10.1109/TASL.2011.2174226
M. Kazama, S. Gotoh, M. Tohyama et al., On the significance of phase in the short term Fourier spectrum for speech intelligibility. Acoust. Soc. Am. 127(3), 1432–1439 (2010)
H. Lang, J. Yang, Speech enhancement based on fusion of both magnitude/phase-aware features and targets. Electronics 9(7), 1125–1144 (2020). https://doi.org/10.3390/electronics9071125
P. Loizou, Speech Enhancement: Theory and Practice, 1st edn. (CRC Press, Boca Raton, 2007)
T. Lotter, Single- and Multi-Microphone Spectral Amplitude Estimation Using a Super-Gaussian Speech Model (Springer, Berlin, 2005)
T. Lotter, P. Vary, Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Adv. Signal Process. 7, 1110–1126 (2005). https://doi.org/10.1155/ASP.2005.1110
T. Lotter, C. Benien, P. Vary, Multi channel direction independent speech enhancement using spectral amplitude estimation. EURASIP J. Appl. Signal Process. 2003, 1147–1156 (2003)
S. Markovich-Golan, S. Gannot, I. Cohen, Distributed multiple constraints generalized sidelobe canceler for fully connected wireless acoustic sensor networks. IEEE Trans. Audio Speech Lang. Process. 21(2), 343–356 (2013). https://doi.org/10.1109/TASL.2012.2224454
S. Markovich-Golan, A. Bertrand, M. Moonen et al., Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor networks. Signal Process. 107, 4–20 (2015). https://doi.org/10.1016/j.sigpro.2014.07.014
R. Martin, Speech enhancement using MMSE short time spectral estimation with Gamma distributed speech priors. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 253–256, (2002). https://doi.org/10.1109/ICASSP.2002.5743702
R. Martin, Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005). https://doi.org/10.1109/TSA.2005.851927
R. Martin, C. Breithaupt, Speech enhancement in the DFT domain using Laplacian speech priors. In: proc. International Workshop on Acoustic Echo and Noise Control (IWAENC), pp 87–90 (2003)
N. Oo, W.S. Gan, On harmonic addition theorem. Int. J. Comput. Commun. Eng. 1(3), 200–202 (2012)
K. Paliwal, K. Wójcicki, B. Shannon, The importance of phase in speech enhancement. Speech Commun. 53(4), 465–494 (2011). https://doi.org/10.1016/j.specom.2010.12.003
A. Papoulis, S.U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn. (McGraw Hill, Boston, 2002)
P.G. Patil, T.H. Jaware, S.P. Patil et al., Marathi speech intelligibility enhancement using I-AMS based neuro-fuzzy classifier approach for hearing aid users. IEEE Access 10, 123028–123042 (2022). https://doi.org/10.1109/ACCESS.2022.3223365
P.S. Rani, S. Andhavarapu, S.R. Murty Kodukula, Significance of phase in DNN based speech enhancement algorithms. In: proc. National Conference on Communications (NCC), pp 1–5, (2020), https://doi.org/10.1109/NCC48643.2020.9056089
R. Ranjbaryan, H.R. Abutalebi, Distributed speech presence probability estimator in fully connected wireless acoustic sensor networks. Circuits Syst. Signal Process. 39, 6121–6141 (2020). https://doi.org/10.1007/s00034-020-01452-4
R. Ranjbaryan, H.R. Abutalebi, Multiframe maximum a posteriori estimators for single-microphone speech enhancement. IET Signal Proc. 15(7), 467–481 (2021). https://doi.org/10.1049/sil2.12045
R. Ranjbaryan, S. Doclo, H.R. Abutalebi, Distributed MAP estimators for noise reduction in fully connected wireless acoustic sensor networks. In: Proc. Speech Communication; 13th ITG-Symposium, pp 1–5 (2018)
S. Samui, I. Chakrabarti, S.K. Ghosh, Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal. IET Signal Proc. 10(6), 641–650 (2016). https://doi.org/10.1049/iet-spr.2015.0182
M. Souden, J. Chen, J. Benesty et al., An integrated solution for online multichannel noise tracking and reduction. IEEE Trans. Audio Speech Lang. Process. 19(7), 2159–2169 (2011). https://doi.org/10.1109/TASL.2011.2118205
C.H. Taal, R.C. Hendriks, R. Heusdens, et al., A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 4214–4217, (2010). https://doi.org/10.1109/ICASSP.2010.5495701
M. Trawicki, M.T. Johnson, Improvements of the Beta-order minimum mean-square error (MMSE) spectral amplitude estimator using Chi priors. In: proc. Thirteenth Annual Conference of the International Speech Communication Association (INTERSPEECH), pp 939–942 (2012a)
M.B. Trawicki, M.T. Johnson. Distributed multichannel speech enhancement with minimum mean-square short time spectral amplitude, log-spectral amplitude and spectral phase estimation. Signal Processing pp 345–356 (2012b)
M.B. Trawicki, M.T. Johnson, Speech enhancement using Bayesian estimators of the perceptually-motivated short-time spectral amplitude (STSA) with Chi speech priors. Speech Commun. 57, 101–113 (2014). https://doi.org/10.1016/j.specom.2013.09.009
Y. Wakabayashi, T. Fukumori, M. Nakayama et al., Single-channel speech enhancement with phase reconstruction based on phase distortion averaging. IEEE/ACM Trans. Audio Speech Lang. Process. 26(9), 1559–1569 (2018). https://doi.org/10.1109/TASLP.2018.2831632
D. Wang, J. Lim, The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982). https://doi.org/10.1109/TASSP.1982.1163920
P.J. Wolfe, S.J. Godsill, Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement. EURASIP J. Adv. Signal Process. 10, 1043–1051 (2003)
Z. Zhang, D.S. Williamson, Y. Shen, Impact of phase distortion and phase-insensitive speech enhancement on speech quality perceived by hearing-impaired listeners. J. Acoust. Soc. Am. 148(4), 2650–2650 (2020). https://doi.org/10.1121/1.5147369
N. Zheng, X.L. Zhang, Phase-aware speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 27(1), 63–76 (2019). https://doi.org/10.1109/TASLP.2018.2870742
Acknowledgements
We are grateful to the Department of Medical Physics and Acoustics, University of Oldenburg, for allowing access to their recorded data.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
A generalized form of (9) has been presented in Example 1 of Chapter 5 of [34], that expresses the probability distribution of random variable Y, which is a function of variable X as follows:
where a and b represent deterministic variables. In the case of \( b = 0 \), this equation is simplified to our case. Although in general, the division of two random variables X and Y, i.e., Y/X yields a random variable, however, in special case like the current situation, (\( a = \dfrac{Y}{X} \)) represents a deterministic value. In the problem at hand
where random variables are with Rayleigh distribution, and
so, \( C_{m} \) represents a deterministic value (the ratio of two standard deviations) as explained in the manuscript.
Based on [34], the distribution function of \( F_y(y) \) is computed as follows:
and the PDF is computed as
In our problem the amplitude \( A_1 \) has the super-Gaussian distribution
hence, the PDF of \( A_{m} = C_{m}A_1 \) is given by
consequently:
which again represents super-Gaussian distribution with variance \( \sigma ^2_x(m) = C_m^2\sigma ^2_x(1) \) as mentioned in the manuscript.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ranjbaryan, R., Abutalebi, H.R. Statistically Optimal Joint Multimicrophone MAP Estimators Under Super-Gaussian Assumption. Circuits Syst Signal Process 43, 1492–1517 (2024). https://doi.org/10.1007/s00034-023-02515-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02515-y