International Journal of Speech Technology

, Volume 22, Issue 4, pp 893–909 | Cite as

Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news

  • Dabbabi KarimEmail author
  • Hajji Salah
  • Cherif Adnen


In this paper, we address the problem of optimal non-hierarchical clustering in the speaker clustering phase for the speaker diarization task of news broadcasts. A new hybridization combining differential evolution (DE) algorithm and K-means algorithm is proposed and tested on TV news database (TVND). To optimize the classification of speakers, two criteria, namely trace within criterion (TRW) and variance ratio criterion (VRC), were used as clustering validity indices, correcting every possible grouping of speakers’ segments. Concerning the encoding of the classification of clusters to be optimized, it is performed by the cluster centers in DE algorithm. Therefore, a problem of rearrangement of centers in the populations can be generated, which cannot ensure an efficient search by applying evolutionary operators. For this purpose, an efficient heuristic was also proposed for this rearrangement. Non-hybrid DE variants were applied with and without the rearrangement of cluster centers, and compared with the corresponding hybrid K-means variants. The experimental results have showed the high-efficiency of hybrid K-means variants with the rearrangement of cluster centers compared with those without the rearrangement of cluster centers and non-hybrid DE variants. Also, the obtained results using hybrid and non-hybrid DE variants with the rearrangement of cluster centers were quite similar using both TWR and VRC criteria. Moreover, the best efficiency was acquired using hybrid DE variants thanks to these two criteria from which a value of 13.05% of DER has been reached by hybrid b6e6rl variant.


Hybrid and non-hybrid DE variants DE algorithm K-means algorithm VCR and TRW criteria The rearrangement of cluster centers Diarization error ratio (DER) 



  1. Ajmera, J., Wooters, C. (2003). A robust speaker clustering algorithm. In Proceedings on IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03) (pp. 411–416).Google Scholar
  2. Anguera, X., Wooters, C., Hernando, J. (2006). Purity algorithms for speaker diarization of meetings data. In Acoustics, speech and signal processing, 2006. ICASSP 2006 Proceedings, 2006 IEEE International Conference.Google Scholar
  3. Barras, C., Zhu, X., Meignier, S., & Gauvain, J. (2006). Multistage speaker diarization of broadcast news. IEEE Transactions on Audio, Speech, and Language Processing,14(5), 1505.CrossRefGoogle Scholar
  4. Bozonnet, S., Evans, N., Fredouille, C. (2010). The LIA-EURECOM RT’09 speaker diarization system: enhancements in speaker modelling and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).Google Scholar
  5. Bozonnet, S., Evans, N.W.D., and Fredouille, C. (2010). The lia-Eurecom RT’09 speaker diarization system: enhancements in speaker Gaussian and cluster purification. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference (pp. 4958–4961).Google Scholar
  6. Brest, J., Greiner, S., Boskovic, B., Mernik, M., & Zumer, V. (2006). Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems. IEEE Transactions on Evolutionary Computation,10, 646–657.CrossRefGoogle Scholar
  7. Brownlee, J. (2011). Clever algorithms nature-inspired programming recipes. Faculty of Information and Communication Technologies Swinburne University of Technology, Melbourne, Australia. First Edition. Lulu.Google Scholar
  8. Carlisle, A. and Doizier, G. (2001). An off-the-shelf PSO. In Proceedings on particle swarm optimization workshop. West Lafayette, School Eng. Technol, Purdue University.Google Scholar
  9. Chen, S., Gopalakrishnan, P. (1998). Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In Proceeding DARPA Broadcast News Transcription and Understanding Workshop (pp. 127–132).Google Scholar
  10. Das, S., Abraham, A., & Konar, A. (2008). Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems Man and Cybernetics Part A-Systems and Humans,28, 218–237.CrossRefGoogle Scholar
  11. Das, S., & Sil, S. (2010). Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm. Information Sciences,180, 1237–1256.MathSciNetCrossRefGoogle Scholar
  12. Dupuy, G., Meignier, S., Deléglise, P., Estève, Y. (2014). Recent improvements on ILP-based clustering for broadcast news speaker diarization. In The Speaker and Language Recognition Workshop. Joensuu, Finland.Google Scholar
  13. Dupuy, G., Rouvier, M., Meignier, S., and Esteve, Y. (2012). I-vectors and ILP clustering adapted to cross-show speaker diarization. In Proceedings of Interspeech. Portland, Oregon.Google Scholar
  14. Fu, W., Johnston, M., and Zhang, M. (2011). Hybrid particle swarm optimization algorithms based on differential evolution and local search. In Advances in artificial intelligence lecture notes in computer science (vol. 6464, pp. 313–322). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  15. Gaithersburg, M.D. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.Google Scholar
  16. Galibert, O. and Kahn, J. (2013). The first official REPERE evaluation. In Proceedings of Interspeech Satellite Workshop on Speech, Language and Audio in Multimedia (SLAM). Marseille, France.Google Scholar
  17. Galliano, S., Gravier, G., and Chaubard, L. (2009). The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association. Brighton, September 6–10, 2009.Google Scholar
  18. Gauvain, J., Lamel, L., Adda, G. (1998). Partitioning and transcription of broadcast news data. In Proceedings on 5th International Conference on Spoken Language Processing (ICSLP’98). Sydney, Australia, paper 0084.Google Scholar
  19. Gupta, V., Boulianne, G., Kenny, P., Ouellet, P. and Dumouchel, P. (2008). Speaker diarization of French broadcast new. Centre de recherché Informatique de Montréal (CRIM).Google Scholar
  20. Jeyakumar, G., & Shunmuga Velayutham, C. (2009). A comparative performance analysis of multiple trial vectors differential evolution and classical differential evolution variants. In: H. Sakai, M. K. Chakraborty, A. E. Hassanien, D. Ślęzak, & W. Zhu (Eds.), Rough sets, fuzzy sets, data mining and granular computing. RSFDGrC 2009. Lecture notes in computer science (vol 5908). Springer, Berlin.Google Scholar
  21. Krink, T., Paterlini, S., & Resti, A. (2007). Using differential evolution to improve the accuracy of bank rating systems. Computational Statistics & Data Analysis,52, 68–87.MathSciNetCrossRefGoogle Scholar
  22. Kuo, R.J., Suryani, E., Yasid, A. (2013). Automatic clustering combining differential evolution algorithm and K-means algorithm. In Proceedings of the Institute of Industrial Engineers Asian Conference (pp. 1207–1215).CrossRefGoogle Scholar
  23. Kwedlo, W. (2011). A clustering method combining differential evolution with the K-means algorithm. Pattern Recognition Letters,32, 1613–1621.CrossRefGoogle Scholar
  24. Meignier, S., Bonastre, J., Igounet, S. (2001). E-HMM approach for learning and adapting sound models for speaker indexing. In Proceedings on 2001: A speaker Odyssey—The speaker recognition workshop (Odyssey-2001) (pp. 175–180).Google Scholar
  25. Meignier, S., Moraru, D., Fredouille, C., Bonastre, J., & Besacier, L. (2006). Step-by-step and integrated approaches in broadcast news speaker diarization. Computer Speech & Language,20(2–3), 303–330.CrossRefGoogle Scholar
  26. Mirrezaie, S.M. and Ahadi, S.M. (2008). Speaker diarization in a multi-speaker environment using particle swarm optimization and mutual information. In 2008 IEEE International Conference on Multimedia and Expo ICME 2008 Proceedings.Google Scholar
  27. Moraru, D., Besacier, L., Castelli E. (2004). Using a priori information for speaker diarization. In Proceedings on the Speaker and Language Recognitionworkshop (pp. 355–362).Google Scholar
  28. NIST. (2004). Fall 2004 rich transcription (RT-04F) evaluation plan.Google Scholar
  29. Nwankwor, E., Nagar, A. K., & Reid, D. C. (2013). Hybrid differential evolution and particle swarm optimization for optimal well placement. Computational Geosciences,17(2), 249–268.CrossRefGoogle Scholar
  30. Pandit, P., & Rao, P. (2015). SpeakerDiarization of broadcast news audios. Bombay: Department of Electrical Engineering, Indian Institute of Technology Bombay.Google Scholar
  31. Paterlini, S., & Krink, T. (2004). Differential evolution and particle swarm optimisation in partitional clustering. Computational Statistics & Data Analysis,50, 1220–1247.MathSciNetCrossRefGoogle Scholar
  32. Reynolds, D., Dunn, R., McLaughlin, J. (2000). The Lincoln Speaker Recognition System: NIST Eval2000. In Proceedings on ICSLP’00 (vol. 2, pp. 470–473).Google Scholar
  33. Reynolds, D., & Torres-Carrasquillo, P. (2005). Approaches and applications of audio diarization. In Proceeding on International Conference Acoustic, Speech, Signal Process (pp. 953–956). Philadelphia.Google Scholar
  34. Robinson, J., & Rahmat-Samii, Y. (2004). Particle swarm optimization in electromagnetics. IEEE Transactions on Antennas and Propagation,52(2), 397–407.MathSciNetCrossRefGoogle Scholar
  35. Salcedo-Sanz, S., Gallardo-Antolín, A., Leiva-Murillo, J. M., & Bousoño-Calzón, C. (2006). Offline speaker segmentation using genetic algorithms and mutual information. IEEE Transactions on Evolutionary Computation,10(2), 1.CrossRefGoogle Scholar
  36. Siegler, M., Jain, U, Raj, B., Stern, R. (1997). Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings on DARPA Speechrecognition, Workshop (pp. 97–99). Chantilly.Google Scholar
  37. Sierra, L.-M., Cobos, C., & Corrales, J.-C. (2014). Continuous optimization based on a hybridization of differential evolution with K-means. Computer Science Journal. Scholar
  38. Storn, R., & Price, K. (1997). Differential evolution A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization,11, 341–359.MathSciNetCrossRefGoogle Scholar
  39. Tan, Y., Tan, G.-Z., & Deng, S.-G. (2013). Hybrid particle swarm optimization with differential evolution and chaotic local search to solve reliability-redundancy allocation problems. Journal of Central South University,20(6), 1572–1581.CrossRefGoogle Scholar
  40. Tranter, S.E., Yu, K., Reynolds, D.A., Evermann, G., Kim, D.Y., and Woodland, P.C. (2003) An investigation into the interactions between speaker diarization systems and automatic speech transcription. Eng. Dept., Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR-464.Google Scholar
  41. Tvrdik, J. (2007). Differential evolution with competitive setting of control parameters. Task Quarterly, 10(4), 1001–1011.Google Scholar
  42. Tvrdik, J. (2009). Self-adaptive variants of differential evolution with exponential crossover (pp. 151–168). Analele of West University Timisoara, Series Mathematics-Informatics. optimization.html.
  43. Tvrdik, J., & Krivy, I. (2005). Hybrid differential evolution algorithm for optimal clustering. Applied Soft Computing Journal,35, 502.CrossRefGoogle Scholar
  44. Tvrdik, J., Krivy, I. (2012). Differential evolution with competing strategies applied to partitional clustering. In Swarm and evolutionary computation, vol. 7269 of lecture notes in computer science (pp. 136–144).CrossRefGoogle Scholar
  45. Tzanetakis, G. (2004). Song-specific bootstrapping of singing voice structure. In IEEE conference: Multimedia and expo (vol. 3).Google Scholar
  46. Vazquez-Machado, C. and Colon-Hernandez, P., Torres-Carrasquillo, P.A. (2016). I-vector speaker and language recognition system on android. In High performance extreme computing conference (HPEC), IEEE.Google Scholar
  47. Wilcox, L., Chen, F., Kimber, D., Balasubramanian, V. (1994). Segmentation of speech using speaker identification. In Proceedings on IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’94) (vol. 1, pp. I/161–I/164).Google Scholar
  48. Yang, X., Liu, G. (2012). Self-adaptive clustering-based differential evolution with new composite trial vector generation strategies. In Proceedings of the 2nd International Congress on Computer Applications and Computational Sciences—Advances in Intelligent and Soft Computing (pp. 261–267). Berlin Heidelberg: Springer.CrossRefGoogle Scholar
  49. Zelenak, M., Schulz, H., & Hernando, J. (2012). Speaker diarization of broadcast news in albayzin 2010 evaluation campaign. EURASIP Journal on Audio, Speech, and Music Processing,2012(1), 1–9.CrossRefGoogle Scholar
  50. Zochova, P., V.Radova, V. (2005). Modified DISTBIC algorithm for speaker change detection. In INTERSPEECH Conference 2005—Eurospeech, 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, September 4–8.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Research Unite of Processing and Analysis of Electrical and Energetic Systems, Faculty of Sciences of TunisUniversity Tunis El-ManarTunisTunisia
  2. 2.National School of Engineers of TunisTunisTunisia

Personalised recommendations