Advertisement

Applied Intelligence

, Volume 37, Issue 3, pp 377–389 | Cite as

A novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models

  • Branislav PopovićEmail author
  • Marko Janev
  • Darko Pekar
  • Nikša Jakovljević
  • Milan Gnjatović
  • Milan Sečujski
  • Vlado Delić
Article

Abstract

The paper presents a novel split-and-merge algorithm for hierarchical clustering of Gaussian mixture models, which tends to improve on the local optimal solution determined by the initial constellation. It is initialized by local optimal parameters obtained by using a baseline approach similar to k-means, and it tends to approach more closely to the global optimum of the target clustering function, by iteratively splitting and merging the clusters of Gaussian components obtained as the output of the baseline algorithm. The algorithm is further improved by introducing model selection in order to obtain the best possible trade-off between recognition accuracy and computational load in a Gaussian selection task applied within an actual recognition system. The proposed method is tested both on artificial data and in the framework of Gaussian selection performed within a real continuous speech recognition system, and in both cases an improvement over the baseline method has been observed.

Keywords

Gaussian mixtures Split-and-merge operation Hierarchical clustering Continuous speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang J (2007) Discriminative Gaussian mixtures for interactive image segmentation. In: Proc ICASSP, Honolulu, HI, vol 1, pp I-601–I-604. doi: 10.1109/ICASSP.2007.365979 Google Scholar
  2. 2.
    Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. doi: 10.1109/5.18626 CrossRefGoogle Scholar
  3. 3.
    Reynolds DA, Rose RC (1995) Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83. doi: 10.1109/89.365379 CrossRefGoogle Scholar
  4. 4.
    Shin KS, Jeong Y-S, Jeong MK (2011) A two-leveled symbiotic evolutionary algorithm for clustering problems. Appl Intel (published online 08 July 2011), doi: 10.1007/s10489-011-0295-y
  5. 5.
    Bahrampour S, Moshiri B, Salahshoor K (2011) Weighted and constrained possibilistic C-means clustering for online fault detection and isolation. Appl Intell 35(2):269–284. doi: 10.1007/s10489-010-0219-2 CrossRefGoogle Scholar
  6. 6.
    Korkmaz EE (2010) Multi-objective genetic algorithms for grouping problems. Appl Intell 33(2):179–192. doi: 10.1007/s10489-008-0158-3 CrossRefGoogle Scholar
  7. 7.
    Goldberger J, Roweis S (2005) Hierarchical clustering of a mixture model. Adv Neural Inf Process Syst 17:505–512 Google Scholar
  8. 8.
    Bocchieri E (1993) Vector quantization for efficient computation of continuous density likelihoods. In: Proc ICASSP, Minneapolis, MN, vol 2, pp II-692–II-695. doi: 10.1109/ICASSP.1993.319405 Google Scholar
  9. 9.
    Knill KM, Gales MJF, Young SJ (1996) Use of Gaussian selection in large vocabulary continuous speech recognition using HMMs. In: Proc ICSLP, vol 1, pp 470–473. doi: 10.1109/ICSLP.1996.607156 Google Scholar
  10. 10.
    Simonin J, Delphin L, Damnati G (1998) Gaussian density tree structure in a multi-Gaussian HMM based speech recognition system. In: 5-th Int Conf Spok Lang Process, Sidney, Australia Google Scholar
  11. 11.
    Watanabe T, Shinoda K, Takagi K, Iso K-I (1995) High speed speech recognition using tree-structured probability density function. In: Proc ICASSP, vol 1, pp 556–559. doi: 10.1109/ICASSP.1995.479658 Google Scholar
  12. 12.
    Marko J, Pekar D, Jakovljevic N, Delic V (2010) Eigenvalues driven Gaussian selection in continuous speech recognition using HMM’s with full covariance matrices. Appl Intell 33(2):107–116. doi: 10.1007/s10489-008-0152-9 CrossRefGoogle Scholar
  13. 13.
    Shinoda K, Lee C-H (2001) A structural Bayes approach to speaker adaptation. IEEE Trans Speech Audio Process 9(3):276–287. doi: 10.1109/89.906001 CrossRefGoogle Scholar
  14. 14.
    Linde Y, Buzo A, Gray R (1980) An algorithm for vector quantizer design. IEEE Trans Commun 26(1):84–95. doi: 10.1109/TCOM.1980.1094577 CrossRefGoogle Scholar
  15. 15.
    McCrosky J (2008) A new measure for clustering model selection. Master thesis, University of Waterloo, Waterloo, Ontario, Canada Google Scholar
  16. 16.
    Axelrod S, Goel V, Gopinaht RA, Olsen PA, Visweswariah K (2005) Subspace constrained Gaussian mixture models for speech recognition. IEEE Trans Speech Audio Process 13(6):1144–1160. doi: 10.1109/TSA.2005.851965 CrossRefGoogle Scholar
  17. 17.
    Dharanipragada S, Visweswariah K (2006) Gaussian mixture models with covariances or precisions in shared multiple subspaces. IEEE Trans Audio Speech Lang Process 14(4):1255–1266. doi: 10.1109/TSA.2005.860835 CrossRefGoogle Scholar
  18. 18.
    Olsen PA, Gopinaht RA (2004) Modeling inverse covariance matrices by basis expansion. IEEE Trans Speech Audio Process 12(1):37–46. doi: 10.1109/TSA.2003.819943 CrossRefGoogle Scholar
  19. 19.
    Sun J, Kaban A (2008) A fast algorithm for robust mixtures in the presence of measurements errors. IEEE Trans Neural Netw 21(8):1206–1220. doi: 10.1109/TNN.2010.2048219 Google Scholar
  20. 20.
    Verbeek JJ, Nunnink JRJ, Vlassis N (2006) Accelerated EM-based clustering of large data sets. Data Min Knowl Disc 13:291–307. doi: 10.1007/s10618-005-0033-3 MathSciNetCrossRefGoogle Scholar
  21. 21.
    Moore AW (1999) A very fast EM-based mixture model clustering using multiresolution kd-trees. In: Adv Neural Inf Process Syst, vol 11. MIT Press, Cambridge, pp 543–549. ISBN: 0-262-11245-0 Google Scholar
  22. 22.
    Hershey JR, Olsen PA (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: Proc ICASSP, Honolulu, HI, vol 4, pp IV-317–IV-320. doi: 10.1109/ICASSP.2007.366913 Google Scholar
  23. 23.
    Zhang Z, Chen C, Sun J, Chan KL (2003) EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognit 36(9):1973–1983. doi: 10.1016/S0031-3203(03)00059-1 zbMATHCrossRefGoogle Scholar
  24. 24.
    Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) Split and merge EM algorithm for improving Gaussian mixture density estimates. J VLSI Signal Process Syst Signal Image Video Technol 26(1/2):133–140. doi: 10.1023/A:1008155703044 zbMATHCrossRefGoogle Scholar
  25. 25.
    Delic V (2007) A review of R&D of speech technologies in Serbian and their applications in western Balkan countries. Keynote lecture at 12th SPECOM (Speech and Computer), Moscow, Russia, pp 64–83 Google Scholar
  26. 26.
    Webb AR (1999) Statistical Pattern Recognition. Defence Evaluation and Research Agency, Arnold, UK Google Scholar
  27. 27.
    Kannan A, Ostendorf N, Rohlicek JR (1994) Maximum likelihood clustering of Gaussian mixtures for speech recognition. IEEE Trans Speech Audio Process 2(3):453–455. doi: 10.1109/89.294362 CrossRefGoogle Scholar
  28. 28.
    Young SJ, Odell JJ, Woodland PC (1994) Tree-based state tying for high accuracy state modeling. In: Proc ARPA Workshop Hum Lang Technol, pp 307–312. doi: 10.3115/1075812.1075885 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Branislav Popović
    • 1
    Email author
  • Marko Janev
    • 2
  • Darko Pekar
    • 3
  • Nikša Jakovljević
    • 1
  • Milan Gnjatović
    • 1
  • Milan Sečujski
    • 1
  • Vlado Delić
    • 1
  1. 1.Faculty of Technical SciencesUniversity of Novi SadNovi SadSerbia
  2. 2.Mathematical InstituteSerbian Academy of Sciences and ArtsBelgradeSerbia
  3. 3.Alfanum Speech TechnologiesNovi SadSerbia

Personalised recommendations