On the Number of Modes of a Gaussian Mixture

  • Miguel Á. Carreira-Perpiñán
  • Christopher K. I. Williams
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2695)


We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matrix (or the same covariance matrix up to a scaling factor), then the number of modes cannot exceed the number of components. We demonstrate that the number of modes can exceed the number of components when the components are allowed to have arbitrary and different covariance matrices.

We will review related results from scale-space theory, statistics and machine learning, including a proof of the conjecture in 1D. We present a convergent, EM-like algorithm for mode finding and compare results of searching for all modes starting from the centers of the mixture components with a brute-force search. We also discuss applications to data reconstruction and clustering.


Convex Hull Covariance Matrice Gaussian Kernel Multivalued Mapping Convex Linear Combination 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Carreira-Perpiñán, M.Á., Williams, C.K.I.: On the number of modes of a Gaussian mixture. Technical Report EDI-INF-RR-0159, School of Informatics, University of Edinburgh, UK (2003). Available online at Scholar
  2. 2.
    Carreira-Perpiñán, M.Á.: Continuous Latent Variable Models for Dimensionality Reduction and Sequential Data Reconstruction. PhD thesis, Dept. of Computer Science, University of Sheffield, UK (2001)Google Scholar
  3. 3.
    Carreira-Perpiñán, M.Á.: Mode-finding for mixtures of Gaussian distributions. Technical Report CS-99-03, Dept. of Computer Science, University of Sheffield, UK (1999), revised August 4, 2000. Available online at Scholar
  4. 4.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14 (2002) 1771–1800zbMATHCrossRefGoogle Scholar
  5. 5.
    Lindeberg, T.: Scale-Space Theory in Computer Vision. Kluwer Academic Publishers Group, Dordrecht, The Netherlands (1994)Google Scholar
  6. 6.
    Koenderink, J.J.: The structure of images. Biol. Cybern. 50 (1984) 363–370zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Babaud, J., Witkin, A.P., Baudin, M., Duda, R.O.: Uniqueness of the Gaussian kernel for scale-space filtering. IEEE Trans. on Pattern Anal. and Machine Intel. 8 (1986) 26–33zbMATHGoogle Scholar
  8. 8.
    Yuille, A.L., Poggio, T.A.: Scaling theorems for zero crossings. IEEE Trans. on Pattern Anal. and Machine Intel. 8 (1986) 15–25zbMATHCrossRefGoogle Scholar
  9. 9.
    Roberts, S.J.: Parametric and non-parametric unsupervised cluster analysis. Pattern Recognition 30 (1997) 261–272CrossRefGoogle Scholar
  10. 10.
    Lifshitz, L.M., Pizer, S.M.: A multiresolution hierarchical approach to image segmentation based on intensity extrema. IEEE Trans. on Pattern Anal. and Machine Intel. 12 (1990) 529–540CrossRefGoogle Scholar
  11. 11.
    Kuijper, A., Florack, L.M.J.: The application of catastrophe theory to image analysis. Technical Report UU-CS-2001-23, Dept. of Computer Science, Utrecht University (2001). Available online at Scholar
  12. 12.
    Kuijper, A., Florack, L.M.J.: The relevance of non-generic events in scale space models. In Heyden, A., Sparr, G., Nielsen, M., Johansen, P., eds.: Proc. 7th European Conf. Computer Vision (ECCV’02), Copenhagen, Denmark (2002)Google Scholar
  13. 13.
    Damon, J.: Local Morse theory for solutions to the heat equation and Gaussian blurring. J. Diff. Equations 115 (1995) 368–401zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Silverman, B.W.: Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society, B 43 (1981) 97–99Google Scholar
  15. 15.
    Carreira-Perpiñán, M.Á.: Mode-finding for mixtures of Gaussian distributions. IEEE Trans. on Pattern Anal. and Machine Intel. 22 (2000) 1318–1323CrossRefGoogle Scholar
  16. 16.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B 39 (1977) 1–38zbMATHMathSciNetGoogle Scholar
  17. 17.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons (1997)Google Scholar
  18. 18.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26 (1984) 195–239zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Rose, K.: Deterministic annealing for clustering, compression, classification, regression, and related optimization problems. Proc. IEEE 86 (1998) 2210–2239CrossRefGoogle Scholar
  20. 20.
    Schölkopf, B., Mika, S., Burges, C.J.C., Knirsch, P., Müller, K.R., Rätsch, G., Smola, A.: Input space vs. feature space in kernel-based methods. IEEE Trans. Neural Networks 10 (1999) 1000–1017CrossRefGoogle Scholar
  21. 21.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Computation 3 (1991) 79–87CrossRefGoogle Scholar
  22. 22.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York, Oxford (1995)Google Scholar
  23. 23.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Number 26 in Monographs on Statistics and Applied Probability. Chapman & Hall, London, New York (1986)zbMATHGoogle Scholar
  24. 24.
    Carreira-Perpiñán, M.Á.: Reconstruction of sequential data with probabilistic models and continuity constraints. In Solla, S.A., Leen, T.K., Müller, K.R., eds.: Advances in Neural Information Processing Systems. Volume 12, MIT Press, Cambridge, MA (2000) 414–420Google Scholar
  25. 25.
    Bishop, C.M., Svensén, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Computation 10 (1998) 215–234CrossRefGoogle Scholar
  26. 26.
    Fukunaga, K., Hostetler, L.D.: The estimation of the gradient of a density function, with application in pattern recognition. IEEE Trans. Inf. Theory IT-21 (1975) 32–40CrossRefMathSciNetGoogle Scholar
  27. 27.
    Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. on Pattern Anal. and Machine Intel. 17 (1995) 790–799CrossRefGoogle Scholar
  28. 28.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Trans. on Pattern Anal. and Machine Intel. 24 (2002) 603–619CrossRefGoogle Scholar
  29. 29.
    Wong, Y.: Clustering data by melting. Neural Computation 5 (1993) 89–104CrossRefGoogle Scholar
  30. 30.
    Chakravarthy, S.V., Ghosh, J.: Scale-based clustering using the radial basis function network. IEEE Trans. Neural Networks 7 (1996) 1250–1261CrossRefGoogle Scholar
  31. 31.
    Leung, Y., Zhang, J.S., Xu, Z.B.: Clustering by scale-space filtering. IEEE Trans. on Pattern Anal. and Machine Intel. 22 (2000) 1396–1410CrossRefGoogle Scholar
  32. 32.
    Minnotte, M.C., Scott, D.W.: The mode tree: A tool for visualization of nonparametric density features. Journal of Computational and Graphical Statistics 2 (1993) 51–68CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Miguel Á. Carreira-Perpiñán
    • 1
  • Christopher K. I. Williams
    • 2
  1. 1.Dept. of Computer ScienceUniversity of TorontoToronto
  2. 2.School of InformaticsUniversity of EdinburghEdinburgh

Personalised recommendations