Speech Dereverberation pp 311-385

Part of the Signals and Commmunication Technology book series (SCT) | Cite as

TRINICON for Dereverberation of Speech and Audio Signals

  • Herbert Buchner
  • Walter Kellermann

Abstract

In this chapter, we develop an analytical top-down approach to the problem of blind dereverberation of speech and audio signals based on TRINICON (TRIple-N Independent component analysis for CONvolutive mixtures), a general framework for broadband adaptive Multi-Input Multi-Output (MIMO) signal processing. Two fundamentally different approaches to the dereverberation problem for realistic scenarios can be distinguished: The “identification-and-inversion approach”, which results in a two-step procedure consisting of blind identification of the acoustic MIMO mixing system, followed by an inversion of the identified system. As an alternative, the “direct-inverse approach” blindly estimates the inverse of the acoustic mixing system directly. As shown in this chapter, for both cases TRINICON yields the information-theoretically optimum estimation procedures in a unified way and allows for a direct comparison between the approaches, paves the way to synergies, and yields various useful insights for practical realizations. This chapter also relates other known algorithms, and presents novel improved algorithms as special cases of the generic concept.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aichner, R., Buchner, H., Kellermann, W.: On the causality problem in time-domain blind source separation and deconvolution algorithms. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 181–184. Philadelphia, PA, USA (2005)Google Scholar
  2. 2.
    Aichner, R., Buchner, H., Kellermann, W.: Exploiting narrowband efficiency for broadband convolutive blind source separation. EURASIP J. on App. Signal Process. 2007(Article ID 16381) (2007). DOI doi:10.1155/2007/16381Google Scholar
  3. 3.
    Aichner, R., Buchner, H., Yan, F., Kellermann, W.: A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments. Signal Processing 86(6), 1260–1277 (2006)MATHCrossRefGoogle Scholar
  4. 4.
    Amari, S., Douglas, S.C., Cichocki, A., Yang, H.H.: Multichannel blind deconvolution and equalization using the natural gradient. In: Proc. IEEE Int. Workshop Signal Processing Advances in Wireless Communications, pp. 101–107 (1997)Google Scholar
  5. 5.
    Amari, S., Kawanabe, M.: Information geometry of estimating functions in semiparametric statistical models. Bernoulli 2(3), 29–54 (1996)MathSciNetGoogle Scholar
  6. 6.
    Araki, S., Makino, S., Mukai, R., Hinamoto, Y., Nishikawa, T., Saruwatari, H.: Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1785–1788. Orlando, USA (2002)Google Scholar
  7. 7.
    Araki, S., Mukai, R., Makino, S., Nishikawa, T., Saruwatari, H.: The fundamental limitation of frequency-domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)CrossRefGoogle Scholar
  8. 8.
    Benesty, J.: Adaptive eigenvalue decomposition algorithm for passive acoustic source localization. J. Acoust. Soc. Am. 107(1), 384–391 (2000)CrossRefGoogle Scholar
  9. 9.
    Bobillet, W., Grivel, E., Guidorzi, R., Najim, M.: Cancelling convolutive and additive coloured noises for speeech enhancement. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 777–780. Montreal, Canada (2004)Google Scholar
  10. 10.
    Brandstein, M.S.: On the use of explicit speech modeling in microphone array applications. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. 3613–3616. Seattle, WA, USA (1998)Google Scholar
  11. 11.
    Brehm, H., Stammler, W.: Description and generation of spherically invariant speech-model signals. Signal Processing 12(2), 119–141 (1987)CrossRefGoogle Scholar
  12. 12.
    Broadhead, M.K., Pflug, L.A.: Performance of some sparseness criterion blind deconvolution methods in the presence of noise. J. Acoust. Soc. Am. 102(2), 885–893 (2000)CrossRefGoogle Scholar
  13. 13.
    Buchner, H., Aichner, R., Kellermann, W.: Blind source separation for convolutive mixtures exploiting nongaussianity, nonwhiteness, and nonstationarity. In: Proc. Int. Workshop Acoust. Echo Noise Control (IWAENC), pp. 223–226. Kyoto, Japan (2003)Google Scholar
  14. 14.
    Buchner, H., Aichner, R., Kellermann, W.: A generalization of a class of blind source separation algorithms for convolutive mixtures. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA). Nara, Japan (2003)Google Scholar
  15. 15.
    Buchner, H., Aichner, R., Kellermann, W.: Blind source separation for convolutive mixtures: A unified treatment. In: Y. Huang, J. Benesty (eds.) Audio signal processing for next-generation multimedia communication systems. Kluwer Academic Publishers (2004)Google Scholar
  16. 16.
    Buchner, H., Aichner, R., Kellermann, W.: TRINICON: A versatile framework for multichannel blind signal processing. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 889–892. Montreal, Canada (2004)Google Scholar
  17. 17.
    Buchner, H., Aichner, R., Kellermann, W.: A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. Speech Audio Process. 13(1), 120–134 (2005)CrossRefGoogle Scholar
  18. 18.
    Buchner, H., Aichner, R., Kellermann, W.: Relation between blind system identification and convolutive blind source separation. In: Proc. Workshop Hands-Free Speech Communication and Microphone Arrays (HSCMA), pp. d–3–d–4. Piscataway, NJ, USA (2005)Google Scholar
  19. 19.
    Buchner, H., Aichner, R., Kellermann, W.: TRINICON-based blind system identification with application to multiple-source localization and separation. In: S. Makino, T.W. Lee, S. Sawada (eds.) Blind speech separation, pp. 101–147. Springer (2007)Google Scholar
  20. 20.
    Buchner, H., Aichner, R., Kellermann, W.: The TRINICON framework for adaptive MIMO signal processing with focus on the generic Sylvester constraint. In: Proc. ITG Conf. on Speech Communication. Aachen, Germany (2008)Google Scholar
  21. 21.
    Buchner, H., Aichner, R., Stenglein, J., Teutsch, H., Kellermann, W.: Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), III-97–III-100. Philadelphia, USA (2005)Google Scholar
  22. 22.
    Buchner, H., Benesty, J., Gänsler, T., Kellermann, W.: Robust extended multidelay filter and double-talk detector for acoustic echo cancellation. IEEE Trans. Audio, Speech, Lang. Process. 14(5), 1633–1644 (2006)CrossRefGoogle Scholar
  23. 23.
    Buchner, H., Kellermann, W.: A fundamental relation between blind and supervised adaptive filtering illustrated for blind source separation and acoustic echo cancellation. In: Proc. Workshop Hands-Free Speech Communication and Microphone Arrays (HSCMA), pp. 17–20. Trento, Italy (2008)Google Scholar
  24. 24.
    Burgess, J.C.: Active adaptive sound control in a duct: A computer simulation. J. Acoust. Soc. Am. 70(3), 715–726 (1981)CrossRefGoogle Scholar
  25. 25.
    Cardoso, J.F., Souloumiac, A.: Blind beamforming for non gaussian signals. IEE Proc.-F 140, 362–370 (1993)Google Scholar
  26. 26.
    Cardoso, J.F., Souloumiac, A.: Jacobi angles for simultaneous diagonalization. SIAM J. Mat. Anal. Appl. 17(1), 161–164 (1996)MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Chen, J., Huang, Y., Benesty, J.: Time delay estimation. In: Y. Huang, J. Benesty (eds.) Audio signal processing for next-generation multimedia communication systems, pp. 197–227. Kluwer Academic Publishers (2004)Google Scholar
  28. 28.
    Choi, S., Amari, S., Cichocki, A., Liu, R.: Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 371–376. Aussois, France (1999)Google Scholar
  29. 29.
    Cover, T., Thomas, J.: Elements of information theory. Wiley & Sons (1991)Google Scholar
  30. 30.
    Douglas, S., Sawada, H., Makino, S.: A causal frequency-domain implementation of a natural gradient multichannel blind deconvolution and source separation algorithms. In: Proc. Int. Congr. on Acoustics, vol. 1, pp. 85–88. Kyoto, Japan (2004)Google Scholar
  31. 31.
    Douglas, S., Sawada, H., Makino, S.: Natural gradient multichannel blind deconvolution and source separation using causal FIR filters. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 447–480. Montreal, Canada (2004)Google Scholar
  32. 32.
    Douglas, S.C.: Blind separation of acoustic signals. In: M.S. Brandstein, D.B. Ward (eds.) Microphone arrays: Signal processing techniques and applications, pp. 355–380. Springer (2001)Google Scholar
  33. 33.
    Duda, R.O., Hart, P.E.: Pattern classification and scene analysis, 2nd edn. Wiley & Sons, New York (1973)MATHGoogle Scholar
  34. 34.
    Fancourt, C.L., Parra, L.: The coherence function in blind source separation of convolutive mixtures of nonstationary signals. In: Proc. Int. Workshop Neural Networks Signal Processing (NNSP), pp. 303–312 (2001)Google Scholar
  35. 35.
    Furuya, K.: Noise reduction and dereverberation using correlation matrix based on the multiple-input/output inverse-filtering theorem (MINT). In: Proc. Int. Workshop Hands-Free Speech Communication (HSC), pp. 59–62. Kyoto, Japan (2001)Google Scholar
  36. 36.
    Furuya, K., Kaneda, Y.: Two-channel blind deconvolution of nonminimum phase FIR systems. IEICE Trans. Fundamentals E80-A(5), 804–808 (1997)Google Scholar
  37. 37.
    Furuya, K., Kataoka, A.: Robust speech dereverberation using multichannel blind deconvolution with spectral subtraction. IEEE Trans. Audio, Speech, Lang. Process. 15(5), 1579–1591 (2007)CrossRefGoogle Scholar
  38. 38.
    Gannot, S., Moonen, M.: Subspace methods for multimicrophone speech dereverberation. EURASIP J. on App. Signal Process. 2003(11), 1074–1090 (2003)MATHCrossRefGoogle Scholar
  39. 39.
    Gänsler, T., Gay, S.L., Sondhi, M.M., Benesty, J.: Double-talk robust fast converging algorithms for network echo cancellation. IEEE Trans. Audio, Speech, Lang. Process. 8(6), 656–663 (2000)CrossRefGoogle Scholar
  40. 40.
    Gillespie, B., Atlas, L.: Strategies for improving audible quality and speech recognition accuracy of reverberant speech. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. I–676–I–679. Hongkong, China (2003)Google Scholar
  41. 41.
    Gillespie, B.W., Malvar, H.S., Florêncio, D.A.F.: Speech dereverberation via maximumkurtosis subband adaptive filtering. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 6, pp. 3701–3704. Salt Lake City, UT, USA (2001)Google Scholar
  42. 42.
    Goldman, J.: Detection in the presence of spherically symmetric random vectors. IEEE Trans. Inf. Theory 22(1), 52–59 (1976)MATHCrossRefGoogle Scholar
  43. 43.
    Gürelli, M.I., Nikias, C.L.: EVAM: An eigenvector-based algorithm for multichannel blind deconvolution of input colored signals. IEEE Trans. Signal Process. 43(1), 134–149 (1995)CrossRefGoogle Scholar
  44. 44.
    Harville, D.A.: Matrix algebra from a statistician’s perspective. Springer (1997)Google Scholar
  45. 45.
    Haykin, S.: Adaptive filter theory, fourth edn. Prentice–Hall (2002)Google Scholar
  46. 46.
    Hikichi, T., Miyoshi, M.: Blind algorithm for calculating the common poles based on linear prediction. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 89–92. Montreal, Canada (2004)Google Scholar
  47. 47.
    Hiroe, A.: Solution of permutation problem in frequency domain ICA using multivariate probability density functions. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 601–608 (2006)Google Scholar
  48. 48.
    Hofbauer, M.: Optimal linear separation and deconvolution of acoustical convolutive mixtures. Ph.D. thesis, Swiss Federal Institute of Technology (2005)Google Scholar
  49. 49.
    Huang, Y., Benesty, J., Chen, J.: Separation and dereverberation of speech signals with multiple microphones. In: J. Benesty, S. Makino, J. Chen (eds.) Speech Enhancement, pp. 271–298. Springer (2005)Google Scholar
  50. 50.
    Huber, P.J.: Robust statistics. Wiley (1981)Google Scholar
  51. 51.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley-Interscience (2001)Google Scholar
  52. 52.
    Ikeda, S., Murata, N.: An approach to blind source separation of speech signals. In: Proc. Int. Symp. on Nonlinear Theory and its Applications. Crans-Montana, Switzerland (1998)Google Scholar
  53. 53.
    Ikram, M.Z., Morgan, D.R.: Exploring permutation inconsistency in blind separation of speech signals in a reverberant environments. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 1041–1044. Istanbul, Turkey (2000)Google Scholar
  54. 54.
    Joho, M., Schniter, P.: Frequency domain realization of a multichannel blind deconvolution algorithms based on the natural gradient. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 15–26. Nara, Japan (2003)Google Scholar
  55. 55.
    Kawamoto, M., Matsuoka, K., Ohnishi, N.: A method of blind separation for convolved nonstationary signals. Neurocomputing 22(1), 157–171 (1998)MATHCrossRefGoogle Scholar
  56. 56.
    Kendal, M.G., Stuart, A.: The Advanced Theory of Statistics, vol. 1, 2nd edn. Hafner Publishing Company (1963)Google Scholar
  57. 57.
    Kim, T., Eltoft, T., Lee, T.W.: Independent vector analysis: an extension of ICA to multivariate components. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 165–172 (2006)Google Scholar
  58. 58.
    Kleijn, W.B., Paliwal, K.K. (eds.): Speech coding and synthesis. Elsevier Science (1995)Google Scholar
  59. 59.
    Kuttruff, H.: Room acoustics, 4th edn. Spon Press (2000)Google Scholar
  60. 60.
    Lambert, R.H.: Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures. Ph.D. thesis, Univ. of Southern California, Los Angeles, CA, USA (1996)Google Scholar
  61. 61.
    Liu, H., Xu, G., Tong, L.: A deterministic approach to blind identification of multi-channel FIR systems. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 581–584 (1994)Google Scholar
  62. 62.
    Ljung, L.: System identification: Theory for the user. Prentice-Hall (1987)Google Scholar
  63. 63.
    Lombard, A., Rosenkranz, T., Buchner, H., Kellermann, W.: Multidimensional localization of multiple sound sources using averaged directivity patterns of blind source separation systems. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 233–236. Taipei, Taiwan (2009)Google Scholar
  64. 64.
    Makino, S., Lee, T.W., Sawada, S. (eds.): Blind speech separation. Springer (2007)Google Scholar
  65. 65.
    Mardia, K.: Measures of multivariate skewness and kurtosis with applications. Biometrika 57(3), 519–530 (1970)MATHCrossRefMathSciNetGoogle Scholar
  66. 66.
    Markel, J.D., Gray, A.H.: Linear prediction of speech, 3rd edn. Springer (1976)Google Scholar
  67. 67.
    Matsuoka, K., Nakashima, S.: Minimal distortion principle for blind source separation. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 722–727. San Diego, CA, USA (2001)Google Scholar
  68. 68.
    Miyoshi, M., Kaneda, Y.: Inverse filtering of room acoustics. IEEE Trans. Acoust., Speech, Signal Process. 36(2), 145–152 (1988)CrossRefGoogle Scholar
  69. 69.
    Molgedey, L., Schuster, H.G.: Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 72, 3634–3636 (1994)CrossRefGoogle Scholar
  70. 70.
    Naylor, P.A., Gaubitch, N.D.: Speech dereverberation. In: Proc. Int. Workshop Acoust. Echo Noise Control (IWAENC). Eindhoven, The Netherlands (2005)Google Scholar
  71. 71.
    Nikias, C.L., Mendel, J.M.: Signal processing with higher-order spectra. IEEE Signal Process. Mag. 10(3), 10–37 (1993)CrossRefGoogle Scholar
  72. 72.
    Nishikawa, T., Saruwatari, H., Shikano, K.: Comparison of time-domain ICA, frequencydomain ICA and multistage ICA for blind source separation. In: Proc. European Signal Processing Conf. (EUSIPCO), vol. 2, pp. 15–18 (2002)Google Scholar
  73. 73.
    Papoulis, A.: Probability, random variables, and stochastic processes, 3rd edn. McGraw–Hill (1991)Google Scholar
  74. 74.
    Parra, L., Spence, C.: Convolutive blind source separation of non-stationary sources. IEEE Trans. Speech Audio Process. 8(3), 320–327 (2000)CrossRefGoogle Scholar
  75. 75.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice–Hall (1993)Google Scholar
  76. 76.
    Reichardt, W., Alim, A., Schmidt, W.: Definition und Messgrundlage eines objektiven Masses zur Ermittlung der Grenze zwischen brauchbarer und unbrauchbarer Durchsichtigkeit bei Musikdarbietung. Acoustica 32, 126–137 (1975)Google Scholar
  77. 77.
    Santamaria, I., Via, J., C.C.Gaudes: Robust blind identification of simo channels: a support vector regression approach. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 673–676. Montreal, Canada (2004)Google Scholar
  78. 78.
    Sawada, H., Mukai, R., de Ryhove, S.K., Araki, S., Makino, S.: Spectral smoothing for frequency-domain blind source separation. In: Proc. Int. Workshop Acoust. Echo Noise Control (IWAENC), pp. 311–314. Kyoto, Japan (2003)Google Scholar
  79. 79.
    Schobben, D.W.E., Sommen, P.C.W.: A frequency-domain blind signal separation method based on decorrelation. IEEE Trans. Signal Process. 50(8), 1855–1865 (2002)CrossRefGoogle Scholar
  80. 80.
    Smaragdis, P.: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 22, 21–34 (1998)MATHCrossRefGoogle Scholar
  81. 81.
    Stone, J.V.: Blind deconvolution using temporal predictability. Neurocomputing 49, 79–86 (2002)MATHCrossRefGoogle Scholar
  82. 82.
    Wiggins, R.A.: Minimum entropy deconvolution. Geoexploration 16, 21–35 (1978)CrossRefGoogle Scholar
  83. 83.
    Wu, H.C., Principe, J.C.: Simultaneous diagonalization in the frequency domain (SDIF) for source separation. In: Proc. Int. Sypm. on Independent Component Analysis and Blind Signal Separation (ICA), pp. 245–250 (1999)Google Scholar
  84. 84.
    Xu, Y.: Lecture notes on orthogonal polynomials of several variables. In: Advances in the theory of spectral functions and orthogonal polynomials, vol. 2, pp. 135–188. Nova Science Publishers, Hauppage, NY (2004)Google Scholar
  85. 85.
    Yao, K.: A representation theorem and its applications to spherically-invariant random processes. IEEE Trans. Inf. Theory 19(5), 600–608 (1973)MATHCrossRefGoogle Scholar
  86. 86.
    Yegnanarayana, B., Murthy, P.S.: Enhancement of reverberant speech using LP residual signal. IEEE Trans. Speech Audio Process. 8(3), 267–281 (2000)CrossRefGoogle Scholar
  87. 87.
    Yoshioka, T., Hikichi, T., Miyoshi, M.: Dereverberation by using time-variant nature of speech production systems. EURASIP J. Advances in Signal Process. 2007 (2007)Google Scholar
  88. 88.
    Zhang, L.Q., Cichocki, A., Amari, S.I.: Geometrical structures of FIR manifold and their application to multichannel blind deconvolution. In: Proc. IEEE Int. Workshop Neural Networks for Signal Processing (NNSP), pp. 303–312. Madison, WI, USA (1999)Google Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • Herbert Buchner
    • 1
  • Walter Kellermann
    • 2
  1. 1.Deutsche Telekom LaboratoriesBerlin University of TechnologyBerlinGermany
  2. 2.University of Erlangen-NurembergErlangenGermany

Personalised recommendations