Skip to main content
Log in

Additive and multiplicative mixed normal distributions and finding cluster centers

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Mixed normal distributions are considered in additive and multiplicative forms. While the weighted arithmetic mean of the probability density functions typically demonstrates several peaks corresponding to the parent sub-distributions, their weighted geometric mean is always expressed in one unimodal multivariate normal distribution. Estimation of the cluster center parameters from such a synthesized distribution is considered. The problem is solved by a non-linear least squares optimization yielding the cluster centers and sizes. The relationship to factor analysis by unweighted least squares and generalized least squares is noted, and numerical results are discussed. The described approach uses only the sample variance–covariance matrix and not the observations, so it can be applied for difficult clustering tasks on huge data sets from data bases and for data mining problems such as finding the approximation for the cluster centers and sizes. The suggested techniques can enrich both theoretical consideration and practical applications for clustering problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Anderson TW (1959) Some scaling models and estimation procedures in the latent class model. In: Grenander U (ed) Probability and statistics. The Harald Cramer volume. Wiley, New York, pp 9–38

    Google Scholar 

  2. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

    Article  MathSciNet  MATH  Google Scholar 

  3. Barral J, Mandelbrot B (2009) Fractional multiplicative processes. Annales de l’Institut Henri Poincare Pobabilites et Statistiques 45:1116–1129

    Article  MathSciNet  MATH  Google Scholar 

  4. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  5. Bromiley PA (2003) Products and convolutions of Gaussian distributions, Tina Memo No. 2003-003. Internal report. Imaging Science and Biomedical Engineering Division, Medical School, University of Manchester. https://people.ok.ubc.ca/jbobowsk/phys327/Gaussian%20Convolution.pdf

  6. Brusco MJ, Steinley D (2007) A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika 73:125–144

    MathSciNet  Google Scholar 

  7. Byrne BM, Shavelson RJ, Muthen B (1989) Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychol Bull 105:456–466

    Article  Google Scholar 

  8. Carreira-Perpinan M (2000) Mode-finding for mixtures of Gaussian distributions, IEEE Trans Pattern Anal Mach Intel 22:1318–1323 (see animations at http://www.cs.toronto.edu/~miguel/research/GMmodes.html)

    Google Scholar 

  9. Carreira-Perpinan M, Williams C (2003) On the number of modes of a Gaussian mixture. Scale-space methods in computer vision. Lecture Notes in Comput. Sci., vol 2695. Springer, New York, pp 625–640

  10. Chen CH (ed) (2009) Handbook of pattern recognition and computer vision. World Scientific Publishing Co. Pte. Ltd., Singapore

    Google Scholar 

  11. Cohen AC, Burke PJ (1956) Compound normal distribution (advanced problems and solutions). Am Math Mon 63:129

    Article  MathSciNet  Google Scholar 

  12. Draief M, Massoulie L (2010) Epidemics and rumours in complex networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  13. Easley D, Kleinberg J (2010) Networks, crowds, and markets: reasoning about highly connected world. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  14. Fraley C (1996) Algorithms for model-based Gaussian hierarchical clustering, technical report no. 311. Dept. of Statistics, University of Washington, Seattle. http://www.cba.ua.edu/~mhardin/tr311.pdf

  15. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631

    Article  MathSciNet  MATH  Google Scholar 

  16. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(part II):179–188

    Google Scholar 

  17. Gage TB (2002) Modeling birthweight and gestational age distributions: additive vs. multiplicative processes. Am J Hum Biol 14:728–734

    Article  Google Scholar 

  18. Gower J, deRooij M (2003) A comparison of the multidimensional scaling of triadic and dyadic distances. J Classif 20:115–136

    Article  MathSciNet  MATH  Google Scholar 

  19. Graaff AJ, Engelbrecht AP (2011) Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0041-0

  20. Guo GD, Chen S, Chen LF (2011) Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0038-8

  21. Joreskog KG (1977) Factor analysis by least-squares and maximum-likelihood methods. In: Enslein K, Ralston A, Wilf HS (eds) Statistical methods for digital computers. Wiley, New York, pp 125–153

    Google Scholar 

  22. Joreskog KG (1967) Some contributions to maximum likelihood factor analysis. Psychometrika 32:443–482

    Article  MathSciNet  Google Scholar 

  23. Joreskog KG, Goldberger AS (1972) Factor analysis by generalized least squares. Psychometrika 37:243–259

    Article  MathSciNet  Google Scholar 

  24. Ladd JW (1966) Linear probability functions and discriminant functions. Econometrica 34:873–885

    Article  Google Scholar 

  25. Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method. Elsevier, New York

    MATH  Google Scholar 

  26. Liang JZ, Song W (2011) Clustering based on Steiner points. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0047-7

  27. Lindsay B (1995) Mixture models: theory, geometry and applications. IMS Monographs, Hayward

  28. Lipovetsky S, Senashenko V (1972) On the character of the interference of resonances with a continuum in the spectra of electrons ejected from atoms by protons. J Phys B 5:183–186

    Article  Google Scholar 

  29. Lipovetsky S, Senashenko V (1974) On the shape of resonance in the spectra of electrons ejected from helium atoms during their encounter with fast electrons and protons. J Phys B 7:693–703

    Article  Google Scholar 

  30. Lipovetsky S, Conklin M (2005) Regression by data segments via discriminant analysis. J Modern Appl Stat Methods 4:63–74

    Google Scholar 

  31. Lipovetsky S (2007) Equidistant regression modeling. Model Assist Stat Appl 2:71–80

    MathSciNet  MATH  Google Scholar 

  32. Lipovetsky S (2009) PCA and SVD with nonnegative loadings. Pattern Recognit 42:68–76

    Article  MATH  Google Scholar 

  33. Lipovetsky S (2009) Linear regression with special coefficient features attained via parameterization in exponential, logistic, and multinomial-logit forms. Math Comput Model 49:1427–1435

    Article  MathSciNet  MATH  Google Scholar 

  34. Lughofer E, Bouchot JL, Shaker A (2011) On-line elimination of local redundancies in evolving fuzzy systems. Evolv Syst 2:165–187

    Article  Google Scholar 

  35. MacLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

    Book  Google Scholar 

  36. Maxwell AE (1983) Factor analysis. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol 3. Wiley, New York, pp 2–8

  37. Olofsson P (2005) Probability, statistics, and stochastic processes. Wiley, Hoboken

    Book  MATH  Google Scholar 

  38. Ossiander M, Waymire EC (2000) Statistical estimation for multiplicative cascades. Ann Stat 28:1533–1560

    Article  MathSciNet  MATH  Google Scholar 

  39. Otter R (1948) The multiplicative process. Ann Math Stat 20:206–224

    Article  MathSciNet  Google Scholar 

  40. Ray S, Lindsay BG (2005) The topography of multivariate normal mixtures. Ann Stat 33:2042–2065

    Article  MathSciNet  MATH  Google Scholar 

  41. Redner S (1990) Random mulitplicative processes: an elementary tutorial. Am J Phys 58:267–273

    Article  Google Scholar 

  42. Richardson S, Green P (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc Ser B 59:731–792

    Article  MathSciNet  MATH  Google Scholar 

  43. Robert C, Mengersen K (1999) Reparametrization issues in mixture estimation and their bearings on the Gibbs sampler. Comput Stat Data Anal 29:325–343

    Article  MATH  Google Scholar 

  44. Roeder K (1994) A graphical technique for determining the number of components in a mixture of normal. J Am Stat Assoc 89:487–495

    Article  MathSciNet  MATH  Google Scholar 

  45. Roeder K, Wasserman L (1997) Practical Bayesian density estimation using mixtures of normal. J Am Stat Assoc 92:894–902

    Article  MathSciNet  MATH  Google Scholar 

  46. Rokach L (2009) Pattern classification using ensemble methods. World Scientific Publishing Co. Pte. Ltd., Singapore

    Google Scholar 

  47. Schilling MF, Watkins AE, Watkins W (2002) Is human height bimodal? Am Stat 56:223–229

    Article  MathSciNet  Google Scholar 

  48. S-PLUS’2000 (1999) MathSoft Inc., Seattle

  49. Sornette D (1998) Multiplicative processes and power laws. Phys Rev E 57:4811–4813

    Article  Google Scholar 

  50. Szekely GJ, Rizzo ML (2005) Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. J Classif 22:151–183

    Article  MathSciNet  Google Scholar 

  51. Tsai H, Tsay RS (2010) Constrained factor models. J Am Stat Assoc 105:1593–1605

    Article  MathSciNet  Google Scholar 

  52. Vagin V, Fomina M (2011) Problem of knowledge discovery in noisy databases. Int J Mach Learn Cyber 2:135–145

    Article  Google Scholar 

Download references

Acknowledgments

I thank three reviewers for their help improving the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stan Lipovetsky.

Appendix: Geometric mean of multinormal distributions

Appendix: Geometric mean of multinormal distributions

To find the explicit form for the geometric mean of sub-distributions, use (1) in (3) yielding:

$$ \begin{aligned} G(x) & = \prod\limits_{q = 1}^{K} {\left[ {\left( {(2\pi )^{n} |S_{q} |} \right)^{{ - \gamma_{q} /2}} \exp \left( { - \frac{1}{2}(x - m_{q} )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(x - m_{q} )} \right)} \right]} \\ & = \exp \left( { - \frac{1}{2}Q(x)} \right)\prod\limits_{q = 1}^{K} {\left( {(2\pi )^{n} |S_{q} |} \right)^{{ - \gamma_{q} /2}} } \\ \end{aligned} $$
(38)

where the total of the quadratic forms can be represented as follows:

$$ \begin{aligned} Q(x) & = \sum\limits_{q = 1}^{K} {(x - m_{q} )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(x - m_{q} )} \\ & = x^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)x - 2x^{\prime}\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} + } \sum\limits_{q = 1}^{K} {m_{q}^{\prime } (\gamma_{q} S_{q}^{ - 1} )m_{q} } \\ & = x^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)\left\{ {x - 2\left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\} + \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } \\ \end{aligned} $$
(39)

Completing the first term to the whole square we get:

$$ \begin{aligned} Q(x) & = \left\{ {x - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\}^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right) \\ & \quad \times \left\{ {x - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)} \right\} + \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } \\ & \quad - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right) \\ \end{aligned} $$
(40)

Let us denote the weighted mean of the inverted covariance matrices in (40) as:

$$ S_{tot}^{ - 1} \equiv \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } . $$
(41)

So the total covariance matrix is defined via the covariance matrices of sub-distributions:

$$ S_{tot} = \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} . $$
(42)

The weighted aggregate of the Fisher discriminator vectors in (40) (in another formulation, a combination of the coefficients of regressions of binary indices of belonging to each class by predictors) we denote as:

$$ F_{tot} \equiv \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } . $$
(43)

Then the result in (40) can be presented as

$$ Q(x) = \left( {x - x^{ * } } \right)^{\prime } S_{tot}^{ - 1} \left( {x - x^{ * } } \right) + C, $$
(44)

where the vector \( x^{ * } \) is defined as:

$$ x^{ * } = \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right), $$
(45)

and the constant C is explained below. The vector (45) can also be presented as:

$$ x^{ * } = S_{tot} F_{tot} = \sum\limits_{q = 1}^{K} {(\gamma_{q} S_{tot} S_{q}^{ - 1} )m_{q} } , $$
(46)

so it is a weighted vector of means, because the total matrix of weights in (46) equals the identity matrix:

$$ \sum\limits_{q = 1}^{K} {\gamma_{q} S_{tot} S_{q}^{ - 1} = } S_{tot} \sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} = } S_{tot} S_{tot}^{ - 1} = I. $$
(47)

When \( x = x^{ * } \), the quadratic form (44) equals its maximum value:

$$ \begin{aligned} C & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right)^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)^{ - 1} \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } } \right) \\ & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - F^{\prime}_{tot} S_{tot} F_{tot} = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} m_{q} } - F_{tot}^{\prime } x^{ * } \\ & = \sum\limits_{q = 1}^{K} {m_{q}^{\prime } \gamma_{q} S_{q}^{ - 1} (m_{q} } - x^{ * } ) = \sum\limits_{q = 1}^{K} {(m_{q} - x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} } - x^{ * } ) + A \\ \end{aligned} $$
(48)

where the additional constant A is actually zero, because

$$ \begin{aligned} A & = \sum\limits_{q = 1}^{K} {(x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} } - x^{ * } ) = (x^{ * } )^{\prime } \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} m_{q} } - \left( {\sum\limits_{q = 1}^{K} {\gamma_{q} S_{q}^{ - 1} } } \right)x^{ * } } \right) \\ & = (x^{ * } )^{\prime } \left( {F_{tot} - S_{tot}^{ - 1} S_{tot} F_{tot} } \right) = 0 \\ \end{aligned} $$
(49)

Thus, (44) is reduced explicitly to:

$$ Q(x) = (x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } ) + \sum\limits_{q = 1}^{K} {(m_{q} - x^{ * } )^{\prime } (\gamma_{q} S_{q}^{ - 1} )(m_{q} - x^{ * } )} . $$
(50)

Using (50) in (38) yields:

$$ \begin{aligned} G(x) & = \exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right) \\ & \quad \times \prod\limits_{q = 1}^{K} {\left[ {\left( {(2\pi )^{n} |S_{q} |} \right)^{ - 1/2} \exp \left( { - \frac{1}{2}(x^{ * } - m_{q} )^{\prime } (S_{q}^{ - 1} )(x^{ * } - m_{q} )} \right)} \right]}^{{\gamma_{q} }} \\ \end{aligned} $$
(51)

Then the expression (51) can be reduced to the following:

$$ \begin{aligned} G(x) & = \exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right)\prod\limits_{q = 1}^{K} {\left[ {f_{q} \left( {x^{ * } ,m_{q} ,S_{q} } \right)} \right]}^{{\gamma_{q} }} \\ & = G(x^{ * } )\exp \left( { - \frac{1}{2}(x - x^{ * } )^{\prime } S_{tot}^{ - 1} (x - x^{ * } )} \right) \\ \end{aligned} $$
(52)

which is the geometric mean (3) in the point \( x^{ * } \) (45), and the dependence on x is given in one exponent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lipovetsky, S. Additive and multiplicative mixed normal distributions and finding cluster centers. Int. J. Mach. Learn. & Cyber. 4, 1–11 (2013). https://doi.org/10.1007/s13042-012-0070-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-012-0070-3

Keywords

Navigation