Knowledge Discovery of Complex Data Using Gaussian Mixture Models

  • Linfei Zhou
  • Wei Ye
  • Claudia Plant
  • Christian BöhmEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10440)


With the explosive growth of data quantity and variety, the representation and analysis of complex data becomes a more and more challenging task in many modern applications. As a general class of probabilistic distribution functions, Gaussian Mixture Models have the ability to approximate arbitrary distributions in a concise way, making them very suitable for the representation of complex data. To facilitate efficient queries and following analysis, we generalize Euclidean distance to Gaussian Mixture Models and derive the closed-form expression called Infinite Euclidean Distance. Our metric enables efficient and accurate similarity calculations. For the analysis of complex data, we model two real-world data sets, NBA player statistic and the weather data of airports, into Gaussian Mixture Models, and we compare the performance of Infinite Euclidean Distance to previous similarity measures on both classification and clustering tasks. Experimental evaluations demonstrate the efficiency and effectiveness of Infinite Euclidean Distance and Gaussian Mixture Models on the analysis of complex data.


  1. 1.
    Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)CrossRefGoogle Scholar
  2. 2.
    Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digit. Signal Proc. 10(1–3), 19–41 (2000)CrossRefGoogle Scholar
  3. 3.
    KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for real-time tracking with shadow detection. In: Remagnino, P., Jones, G.A., Paragios, N., Regazzoni, C.S. (eds.) Video-Based Surveillance Systems, pp. 135–144. Springer, Boston (2002)CrossRefGoogle Scholar
  4. 4.
    Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. In: ICPR, pp. 28–31 (2004)Google Scholar
  5. 5.
    STATS description. Accessed 25 Feb 2017
  6. 6.
    Cheplygina, V., Tax, D.M.J., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)CrossRefGoogle Scholar
  7. 7.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)Google Scholar
  8. 8.
    Kriegel, H.-P., Pryakhin, A., Schubert, M.: An EM-approach for clustering multi-instance objects. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 139–148. Springer, Heidelberg (2006). doi: 10.1007/11731139_18 CrossRefGoogle Scholar
  9. 9.
    Wei, X., Wu, J., Zhou, Z.: Scalable multi-instance learning. In: ICDM, pp. 1037–1042 (2014)Google Scholar
  10. 10.
    Zhou, Z., Sun, Y., Li, Y.: Multi-instance learning by treating instances as non-I.I.D. samples. In: ICML, pp. 1249–1256 (2009)Google Scholar
  11. 11.
    Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)Google Scholar
  12. 12.
    Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM/SIGACT-SIAM SODA, pp. 311–321 (1993)Google Scholar
  13. 13.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)CrossRefzbMATHGoogle Scholar
  14. 14.
    Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Weidmann, N., Frank, E., Pfahringer, B.: A two-level learning method for generalized multi-instance problems. In: Lavrač, N., Gamberger, D., Blockeel, H., Todorovski, L. (eds.) ECML 2003. LNCS, vol. 2837, pp. 468–479. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-39857-8_42 CrossRefGoogle Scholar
  16. 16.
    Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 913–939 (2004)MathSciNetGoogle Scholar
  17. 17.
    Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)CrossRefGoogle Scholar
  18. 18.
    Wang, H., Yang, Q., Zha, H.: Adaptive p-posterior mixture-model kernels for multiple instance learning. In: ICML, pp. 1136–1143 (2008)Google Scholar
  19. 19.
    Vatsavai, R.R.: Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery. In: SIGKDD, pp. 1419–1426 (2013)Google Scholar
  20. 20.
    Sikka, K., Giri, R., Bartlett, M.S.: Joint clustering and classification for multiple instance learning. In: BMVC, p. 71.1–71.12 (2015)Google Scholar
  21. 21.
    Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, pp. 827–832. Springer, New York (2015)CrossRefGoogle Scholar
  22. 22.
    Kullback, S.: Information Theory and Statistics. Courier Dover Publications, Mineola (2012)zbMATHGoogle Scholar
  23. 23.
    Hershey, J.R., Olsen, P.A.: Approximating the kullback leibler divergence between gaussian mixture models. In: ICASSP, pp. 317–320 (2007)Google Scholar
  24. 24.
    Goldberger, J., Gordon, S., Greenspan, H.: An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures. In: ICCV, pp. 487–493 (2003)Google Scholar
  25. 25.
    Cui, S., Datcu, M.: Comparison of kullback-leibler divergence approximation methods between gaussian mixture models for satellite image retrieval. In: IGARSS, pp. 3719–3722 (2015)Google Scholar
  26. 26.
    Helén, M.L., Virtanen, T.: Query by example of audio signals using euclidean distance between gaussian mixture models. In: ICASSP, vol. 1, pp. 225–228 (2007)Google Scholar
  27. 27.
    Sfikas, G., Constantinopoulos, C., Likas, A., Galatsanos, N.P.: An analytic distance metric for gaussian mixture models with application in image retrieval. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 835–840. Springer, Heidelberg (2005). doi: 10.1007/11550907_132 Google Scholar
  28. 28.
    Jensen, J.H., Ellis, D.P.W., Christensen, M.G., Jensen, S.H.: Evaluation of distance measures between gaussian mixture models of MFCCs. In: ISMIR, pp. 107–108 (2007)Google Scholar
  29. 29.
    Beecks, C., Ivanescu, A.M., Kirchhoff, S., Seidl, T.: Modeling image similarity by gaussian mixture models and the signature quadratic form distance. In: ICCV, pp. 1754–1761 (2011)Google Scholar
  30. 30.
    Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB, pp. 922–933 (2005)Google Scholar
  31. 31.
    Rougui, J.E., Gelgon, M., Aboutajdine, D., Mouaddib, N., Rziza, M.: Organizing Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recogn. Lett. 28(11), 1314–1319 (2007)CrossRefGoogle Scholar
  32. 32.
    Böhm, C., Kunath, P., Pryakhin, A., Schubert, M.: Querying objects modeled by arbitrary probability distributions. In: Papadias, D., Zhang, D., Kollios, G. (eds.) SSTD 2007. LNCS, vol. 4605, pp. 294–311. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73540-3_17 CrossRefGoogle Scholar
  33. 33.
    Zhou, L., Wackersreuther, B., Fiedler, F., Plant, C., Böhm, C.: Gaussian component based index for GMMs. In: ICDM, pp. 1365–1370 (2016)Google Scholar
  34. 34.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: SIGKDD, pp. 226–231 (1996)Google Scholar
  35. 35.
    Peel, M.C., Finlayson, B.L., McMahon, T.A.: Updated world map of the köppen-geiger climate classification. Hydrol. Earth Syst. Sci. Discuss. 4(2), 439–473 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Linfei Zhou
    • 1
  • Wei Ye
    • 1
  • Claudia Plant
    • 2
  • Christian Böhm
    • 1
    Email author
  1. 1.Ludwig-Maximilians-Universität MünchenMunichGermany
  2. 2.University of ViennaViennaAustria

Personalised recommendations