Neural Computing and Applications

, Volume 22, Issue 7–8, pp 1321–1327 | Cite as

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

  • Soram Jun
  • Minook Kim
  • Myungwoo Oh
  • Hyung-Min Park


This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference.


Robust speech recognition Independent vector analysis Missing feature technique Blind source separation 


  1. 1.
    Juang BH (1991) Speech recognition in adverse environments. Comput Speech Lang 5:279–294MathSciNetCrossRefGoogle Scholar
  2. 2.
    Singh R, Stern RM, Raj B (2002) Model compensation and matched condition methods for robust speech recognition. In: Davis G (ed) Noise reduction in speech applications. CRC Press, FloridaGoogle Scholar
  3. 3.
    Raj B, Parikh V, Stern RM (1997) The effects of background music on speech recognition accuracy. In: IEEE ICASSP, pp 851–854Google Scholar
  4. 4.
    Haykin S (2000) Unsupervised adaptive filtering, volume 1: blind source separation. Wiley, New YorkGoogle Scholar
  5. 5.
    Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, OxfordGoogle Scholar
  6. 6.
    Hyvärinen A, Harhunen J, Oja E (2001) Independent component analysis. Wiley, New YorkCrossRefGoogle Scholar
  7. 7.
    Kim T, Attias HT, Lee S-Y, Lee T-W (2007) Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 15:70–79CrossRefGoogle Scholar
  8. 8.
    Lee I, Jang G-J, Lee T-W (2009) Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. IET Elect Lett 45(13):710–711CrossRefGoogle Scholar
  9. 9.
    Choi CH, Chang W, Lee S-Y (2012) Blind source separation of speech and music signals using harmonic frequency dependent independent vector analysis. IET Elect Lett 48(2):124–125CrossRefGoogle Scholar
  10. 10.
    Matsuoka K, Nakashima S (2001) Minimal distortion principle for blind source separation. In: International workshop on ICA and BSS, pp. 722–727Google Scholar
  11. 11.
    Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43:275–296CrossRefGoogle Scholar
  12. 12.
    Amari SI, Chen TP, Cichocki A (2000) Nonholonomic orthogonal learning algorithms for blind source separation, Neural computation 12(6). MIT Press Cambridge, MAGoogle Scholar
  13. 13.
    Kim L-H, Tashev I, Acero A (2010) Reverberated speech signal separation based on regularized subband feedforward ICA and instantaneous direction of arrival. In: IEEE ICASSP, pp 2678–2681Google Scholar
  14. 14.
    Raj B, Stern RM (2005) Missing-feature methods for robust automatic speech recognition. IEEE Signal Process Mag 22:101–116CrossRefGoogle Scholar
  15. 15.
    Kim M, Kim J-S, Park H-M (2011) Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique. In: Proceedings of SPIE 8058, 80580DGoogle Scholar
  16. 16.
    Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New JerseyGoogle Scholar
  17. 17.
    Price P, Fisher WM, Bernstein J, Pallet DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of IEEE ICASSP, pp 651–654Google Scholar
  18. 18.
    Young SJ, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book (for HTK version 3.4). University of Cambridge, CambridgeGoogle Scholar
  19. 19.
    Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Soram Jun
    • 1
  • Minook Kim
    • 1
  • Myungwoo Oh
    • 1
  • Hyung-Min Park
    • 1
  1. 1.Department of Electronic EngineeringSogang UniversityMapo-gu, SeoulRepublic of Korea

Personalised recommendations