Skip to main content

Hartigan’s Method for \(k\)-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval

  • Chapter
  • First Online:
Geometric Theory of Information

Part of the book series: Signals and Communication Technology ((SCT))


We describe a novel algorithm called \(k\)-Maximum Likelihood Estimator (\(k\)-MLE) for learning finite statistical mixtures of exponential families relying on Hartigan’s \(k\)-means swap clustering method. To illustrate this versatile Hartigan \(k\)-MLE technique, we consider the exponential family of Wishart distributions and show how to learn their mixtures. First, given a set of symmetric positive definite observation matrices, we provide an iterative algorithm to estimate the parameters of the underlying Wishart distribution which is guaranteed to converge to the MLE. Second, two initialization methods for \(k\)-MLE are proposed and compared. Finally, we propose to use the Cauchy-Schwartz statistical divergence as a dissimilarity measure between two Wishart mixture models and sketch a general methodology for building a motion retrieval system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

    Otherwise, convergence to a pointwise estimate of the parameters would be replaced by convergence in distribution of a Markov chain.

  2. 2.

    Product \(\hat{\theta }_n^{(t)}\hat{\theta }_S^{(t)}\) is constant through iterations.

  3. 3.

    For translation invariance, \(\mathbb {X}_{i}\) are column centered before.

  4. 4.

    Since \(|2S|=2^d|S|\), we have \(2^{\frac{nd}{2}} |S|^{\frac{n}{2}}\) that is equivalent to \(|2S|^{\frac{n}{2}}\).


  1. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley Series in Probability and Statistics. Wiley-Interscience, New York (2008)

    Google Scholar 

  2. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MATH  MathSciNet  Google Scholar 

  3. Nielsen, F.: \(k\)-MLE: a fast algorithm for learning statistical mixture models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 869–872 (2012). Long version as arXiv:1203.5181

  4. Jain, A.K.: Data clustering: 50 years beyond \(K\)-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    Article  Google Scholar 

  5. Wishart, J.: The generalised product moment distribution in samples from a Normal multivariate population. Biometrika 20(1/2), 32–52 (1928)

    Article  Google Scholar 

  6. Tsai, M.-T.: Maximum likelihood estimation of Wishart mean matrices under Lwner order restrictions. J. Multivar. Anal. 98(5), 932–944 (2007)

    Article  MATH  Google Scholar 

  7. Formont, P., Pascal, T., Vasile, G., Ovarlez, J.-P., Ferro-Famil, L.: Statistical classification for heterogeneous polarimetric SAR images. IEEE J. Sel. Top. Sign. Proces. 5(3), 567–576 (2011)

    Article  Google Scholar 

  8. Jian, B., Vemuri, B.: Multi-fiber reconstruction from diffusion MRI using mixture of wisharts and sparse deconvolution. In: Information Processing in Medical Imaging, pp. 384–395, Springer, Berlin (2007)

    Google Scholar 

  9. Cherian, A., Morellas, V., Papanikolopoulos, N., Bedros, S.: Dirichlet process mixture models on symmetric positive definite matrices for appearance clustering in video surveillance applications. In: Computer Vision and Pattern Recognition (CVPR), pp. 3417–3424 (2011)

    Google Scholar 

  10. Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards. Accessed Nov 2009

  11. Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1997)

    Google Scholar 

  12. Wainwright, M.J., Jordan, M.J.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    MATH  Google Scholar 

  13. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. (Methodological). 39 1–38 (1977)

    Google Scholar 

  14. Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 14(3), 315–332 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  15. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A \(k\)-means clustering algorithm. J. Roy. Stat. Soc. C (Applied Statistics). 28(1), 100–108 (1979)

    Google Scholar 

  16. Telgarsky, M., Vattani, A.: Hartigan’s method: \(k\)-means clustering without Voronoi. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 820–827 (2010)

    Google Scholar 

  17. Nielsen, F., Boissonnat, J.D., Nock, R.: On Bregman Voronoi diagrams. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 746–755 (2007)

    Google Scholar 

  18. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)

    Article  Google Scholar 

  19. Kulis, B., Jordan, M.I.: Revisiting \(k\)-means: new algorithms via Bayesian nonparametrics. In: International Conference on Machine Learning (ICML) (2012)

    Google Scholar 

  20. Ackermann, M.R.: Algorithms for the Bregman \(K\)-median problem. PhD thesis. Paderborn University (2009)

    Google Scholar 

  21. Arthur, D., Vassilvitskii, S.: \(k\)-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)

    Google Scholar 

  22. Ji, S., Krishnapuram, B., Carin, L.: Variational Bayes for continuous hidden Markov models and its application to active learning. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 522–532 (2006)

    Article  Google Scholar 

  23. Hidot, S., Saint-Jean, C.: An Expectation-Maximization algorithm for the Wishart mixture model: application to movement clustering. Pattern Recogn. Lett. 31(14), 2318–2324 (2010)

    Article  Google Scholar 

  24. Brent. R.P.: Algorithms for Minimization Without Derivatives. Courier Dover Publications, Mineola (1973)

    Google Scholar 

  25. Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P.: Local convergence analysis of a grouped variable version of coordinate descent. J. Optim. Theory Appl. 54(3), 471–477 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  26. Bogdan, K., Bogdan, M.: On existence of maximum likelihood estimators in exponential families. Statistics 34(2), 137–149 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  27. Ciuperca, G., Ridolfi, A., Idier, J.: Penalized maximum likelihood estimator for normal mixtures. Scand. J. Stat. 30(1), 45–59 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  28. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    Google Scholar 

  29. Nielsen, F.: Closed-form information-theoretic divergences for statistical mixtures. In: International Conference on Pattern Recognition (ICPR), pp. 1723–1726 (2012)

    Google Scholar 

  30. Haff, L.R., Kim, P.T., Koo, J.-Y., Richards, D.: Minimax estimation for mixtures of Wishart distributions. Ann. Stat. 39(6), 3417–3440 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  31. Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5, 819–844 (2004)

    MATH  MathSciNet  Google Scholar 

  32. Moreno, P.J., Ho, P., Vasconcelos, N.: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (2003)

    Google Scholar 

  33. Petersen, K.B., Pedersen, M.S.: The matrix cookbook. Accessed Nov 2012

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christophe Saint-Jean .

Editor information

Editors and Affiliations

Appendix A

Appendix A

This Appendix details some calculations for distributions \(\mathcal {W}_{d}\), \(\mathcal {W}_{d,\underline{n}}\), \(\mathcal {W}_{d,\underline{S}}\).

11.1.1 Wishart Distribution \(\mathcal {W}_{d}\)

$$\begin{aligned} \mathcal {W}_{d}(X;n,S)&= \frac{|X|^{\frac{n-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)\}}{2^{\frac{nd}{2}} |S|^{\frac{n}{2}} \varGamma _{d}(\frac{n}{2})}\\&= \exp \left\{ \frac{n-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)- \frac{nd}{2}\log (2) - \frac{n}{2} \log |S| - \log \varGamma _{d}\left( \frac{n}{2}\right) \right\} \end{aligned}$$

Letting \((\theta _{n},\theta _{S})=(\frac{n-d-1}{2},S^{-1}) \longleftrightarrow (n,S) = (2\theta _{n}+d+1,\theta _{S}^{-1})\)

$$\begin{aligned} \mathcal {W}_{d}(X;\theta _{n},\theta _{S}) =&\exp \left\{ \frac{2\theta _{n} + d + 1-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(\theta _{S} X)- \frac{(2\theta _{n} + d + 1)d}{2}\log (2) \right. \\&- \left. \frac{(2\theta _{n} + d + 1)}{2} \log |\theta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{2\theta _{n} + d + 1}{2}\right) \right\} \\ =&\exp \left\{ \theta _{n}\log |X|-\frac{1}{2} {\mathrm {tr}}(\theta _{S} X)- \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) \right. \\&\left. - \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \right\} \\ =&\exp \left\{ <\theta _{n},\log |X|>_{\mathbb {R}} + <\theta _{S},- \frac{1}{2}X>_{HS} - F(\varTheta )\right\} \\&\text{ with } F(\varTheta ) = \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) + \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \\ =&\exp \left\{ <\varTheta ,t(X)> - F(\varTheta ) + k(X)\right\} \\&\text{ with } t(X) =(\log |X|,- \frac{1}{2}X) \text{ and } k(X)=0 \end{aligned}$$
$$F(\varTheta ) = \left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) + \log \varGamma _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) $$
$$\begin{aligned} \frac{\partial F}{\partial \theta _{n}}(\theta _{n},\theta _{S}) = d\log (2) - \log |\theta _{S}|+ \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) \end{aligned}$$

where \(\varPsi _{d}\) is the multivariate Digamma function (or multivariate polygamma of order 0).

$$\begin{aligned} \frac{\partial F}{\partial \theta _{S}}(\theta _{n},\theta _{S}) = - \left( \theta _{n} + \frac{(d + 1)}{2}\right) \theta _{S}^{-1} \end{aligned}$$

Dissimilarity \(\varDelta (\theta ,\theta ')\) between natural parameters \(\theta =(\theta _{n},\theta _{S})\) and \(\theta '=(\theta '_{n},\theta '_{S})\) is

$$\begin{aligned} \varDelta (\theta ,\theta ') =&F(\theta +\theta ') - (F(\theta ) + F(\theta ')) = \left( \theta _{n} + \theta _{n}'+ \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}+\theta '_{S}|\right) \nonumber \\&-\left( \theta _{n} + \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta _{S}|\right) - \left( \theta _{n}'+ \frac{(d + 1)}{2}\right) \left( d\log (2) - \log |\theta '_{S}|\right) \nonumber \\&+ \log \varGamma _{d}\left( \theta _{n} + \theta '_{n} + \frac{(d + 1)}{2}\right) - \log \varGamma _{d}\left( \theta _{n}+ \frac{(d + 1)}{2}\right) - \log \varGamma _{d}\left( \theta '_{n} + \frac{(d + 1)}{2}\right) \nonumber \\ =&- \frac{(d + 1)}{2}d\log (2) + \left( \theta _{n}+ \frac{(d + 1)}{2}\right) \log |\theta _{S}| + \left( \theta _{n}'+ \frac{(d + 1)}{2}\right) \log |\theta _{S}'|\nonumber \\&- \left( \theta _{n} + \theta _{n}'+ \frac{(d + 1)}{2}\right) \log |\theta _{S}+\theta _{S}'| + \log \left( \frac{\varGamma _{d}\left( \theta _{n} + \theta '_{n} + \frac{(d + 1)}{2}\right) }{\varGamma _{d}\left( \theta _{n} + \frac{(d \,+\, 1)}{2}\right) \varGamma _{d}\left( \theta '_{n} + \frac{(d\, +\, 1)}{2}\right) }\right) \end{aligned}$$

Remark \(\varDelta (\theta ,\theta ) \ne 0\). Same quantity with source parameters \(\lambda =(n,S)\) and \(\lambda '=(n',S')\) is

$$\begin{aligned} \varDelta (\lambda ,\lambda ') =\,&- \frac{(d + 1)}{2}d\log (2) -\frac{n}{2} \log |S| - \frac{n'}{2} \log |S'| - \frac{n+n'-d-1}{2} \log |S^{-1}\nonumber \\&+ S'^{-1}| + \log \left( \frac{\varGamma _{d}\left( \frac{n+n'-d-1}{2}\right) }{\varGamma _{d}\left( \frac{n}{2}\right) \varGamma _{d}\left( \frac{n'}{2}\right) }\right) \end{aligned}$$

11.1.2 Distribution \(\mathcal {W}_{d,\underline{n}}\)

$$\begin{aligned} \mathcal {W}_{d}(X;\underline{n},S)&= \frac{|X|^{\frac{\underline{n}-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)\}}{2^{\frac{\underline{n}d}{2}} |S|^{\frac{\underline{n}}{2}} \varGamma _{d}(\frac{\underline{n}}{2})}\\&= \exp \left\{ \frac{\underline{n}-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(S^{-1} X)- \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |S| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \right\} \end{aligned}$$

Letting \(\theta _{S}=S^{-1}\),

$$\begin{aligned} \mathcal {W}_{d}(X;\underline{n},\theta _{S})&= \exp \left\{ -\frac{1}{2} {\mathrm {tr}}(\theta _{S} X) + \frac{\underline{n}-d-1}{2} \log |X| - \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |\theta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \right\} \\&= \exp \left\{ <\theta _{S},- \frac{1}{2}X>_{HS} + k(X) - F_{\underline{n}}(\theta _{S})\right\} \\&\quad \quad \text{ with } F_{\underline{n}}(\theta _{S}) = \frac{\underline{n}d}{2}\log (2) - \frac{\underline{n}}{2} \log |\theta _{S}| + \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&\quad \quad \text{ with } k_{\underline{n}}(X) = \frac{\underline{n}-d-1}{2}\log |X| \end{aligned}$$

Using the rule \(\frac{\partial log |X|}{\partial X} = ^{t}(X^{-1})\) [33] and the symmetry of \(\theta _{S}\), we get

$$\nabla _{\theta _{S}} F_{\underline{n}} (\theta _{S})= - \frac{\underline{n}}{2}\theta _{S}^{-1}$$

The correspondence between natural parameter \(\theta _{S}\) and expectation parameter \(\eta _{S}\) is

$$\eta _{S} = \nabla _{\theta _{S}} F_{\underline{n}} (\theta _{S}) = - \frac{\underline{n}}{2}\theta _{S}^{-1} \longleftrightarrow \theta _{S} = \nabla _{\eta _{S}} F_{\underline{n}}^{*}(\eta _{S}) = (\nabla _{\theta _{S}} F_{\underline{n}})^{-1}(\eta _{S}) = - \frac{\underline{n}}{2}\eta _{S}^{-1}$$

Finally, we obtain the MLE for \(\theta _{S}\) in this sub family:

$$\hat{\theta }_{S} = - \frac{\underline{n}}{2} \left( \frac{1}{N}\sum _{i=1}^{N} - \frac{1}{2}X_{i}\right) ^{-1} = \underline{n} N \left( \sum _{i=1}^{N} X_{i}\right) ^{-1}$$

Same formulation with source parameter \(S\):

$$\hat{S} = \hat{\theta }_{S}^{-1} = \left( \underline{n} N\left( \sum _{i=1}^{N} X_{i}\right) ^{-1}\right) ^{-1} = \frac{\sum _{i=1}^{N} X_{i}}{\underline{n}N}$$

Dual log-normalizer \(F_{\underline{n}}^{*}\) for \(\mathcal {W}_{d,\underline{n}}\) is

$$\begin{aligned} F_{\underline{n}}^{*} (\eta _{S})&= \langle (\nabla F_{\underline{n}})^{-1} (\eta _{S}), \eta _{S}\rangle - F_{\underline{n}}((\nabla F_{\underline{n}})^{-1} (\eta _{S}))\\&= \langle - \frac{\underline{n}}{2}\eta _{S}^{-1}, \eta _{S}\rangle - F_{\underline{n}}(- \frac{\underline{n}}{2}\eta _{S}^{-1})\\&= - \frac{\underline{n}}{2} {\mathrm {tr}}(\eta _{S}^{-1} \eta _{S}) - \frac{\underline{n}d}{2}\log (2) + \frac{\underline{n}}{2} \log \left[ (\frac{\underline{n}}{2})^{d} |-\eta _{S}^{-1} | \right] - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&= - \frac{\underline{n}d}{2} (1+ \log (2) - \log \underline{n} + \log 2) + \frac{\underline{n}}{2} \log |-\eta _{S}^{-1} | - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\&= \frac{\underline{n}d}{2} \log \left( \frac{\underline{n}}{4e}\right) + \frac{\underline{n}}{2} \log |-\eta _{S}^{-1}| - \log \varGamma _{d}\left( \frac{\underline{n}}{2}\right) \\[-28pt] \end{aligned}$$
$$\begin{aligned} {\mathrm {KL}}(\mathcal {W}_{d,\underline{n}}^{1} || \mathcal {W}_{d,\underline{n}}^{2})&= B_{F_{\underline{n}}}(\theta _{S_{2}}:\theta _{S_{1}})\\&= F_{\underline{n}}(\theta _{S_{2}}) - F_{\underline{n}}(\theta _{S_{1}}) - {<}\theta _{S_{2}}-\theta _{S_{1}}, \nabla _{\theta _{S}} F_{\underline{n}}(\theta _{S_{1}}){>}\\&= \frac{\underline{n}}{2} \left( \log |\theta _{S_{1}}| - \log |\theta _{S_{2}}| \right) + \frac{\underline{n}}{2} {\mathrm {tr}}((\theta _{S_{2}}-\theta _{S_{1}})\theta _{S_{1}}^{-1})\\&= \frac{\underline{n}}{2} \left( \log \frac{|\theta _{S_{1}}|}{|\theta _{S_{2}}|} + {\mathrm {tr}}(\theta _{S_{2}}\theta _{S_{1}}^{-1}) - d\right) \\&= \frac{\underline{n}}{2} \left( - \log \frac{|\theta _{S_{2}}|}{|\theta _{S_{1}}|} + {\mathrm {tr}}(\theta _{S_{2}}\theta _{S_{1}}^{-1}) - d\right) \end{aligned}$$

also with source parameter

$${\mathrm {KL}}(\mathcal {W}_{d,\underline{n}}^{1} || \mathcal {W}_{d,\underline{n}}^{2}) = \frac{\underline{n}}{2} \left( - \log \frac{|S_{1}|}{|S_{2}|} + {\mathrm {tr}}(S_{2}^{-1}S_{1}) - d\right) $$

Let’s remark that KL divergence depends now on \(\underline{n}\).

$$\begin{aligned} B_{F_{\underline{n}}^*} (\eta _{S_{1}} : \eta _{S_{2}})&= {F_{\underline{n}}^*}(\eta _{S_{1}}) - {F_{\underline{n}}^*}(\eta _{S_{2}}) - {<}\eta _{S_{1}} - \eta _{S_{2}}, \nabla F_{\underline{n}}^{*}(\eta _{S_{2}}){>}_{HS}\\&= \frac{\underline{n}}{2} \left( \log |-\eta _{S_{1}}^{-1}| - \log |-\eta _{S_{2}}^{-1}|\right) - {<}\eta _{S_{1}} - \eta _{S_{2}}, -\frac{\underline{n}}{2} \eta _{S_{2}}^{-1} {>}_{HS}\\&= \frac{\underline{n}}{2} \left( \log \frac{|-\eta _{S_{1}}^{-1}|}{|-\eta _{S_{2}}^{-1}|} + {\mathrm {tr}}(\eta _{S_{1}}\eta _{S_{2}}^{-1}) - d\right) \end{aligned}$$

11.1.3 Distribution \(\mathcal {W}_{d,\underline{S}}\)

For fixed \(\underline{S}\), the p.d.f of \(\mathcal {W}_{d,\underline{S}}\) can be rewrittenFootnote 4 as

$$\begin{aligned} \mathcal {W}_{d}(X;n,\underline{S})&= \frac{|X|^{\frac{n-d-1}{2}}\exp \{-\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X)\}}{|2\underline{S}|^{\frac{n}{2}} \varGamma _{d}(\frac{n}{2})}\\&= \exp \left\{ \frac{n-d-1}{2} \log |X|-\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X)- \frac{n}{2}\log |2\underline{S}| - \log \varGamma _{d}\left( \frac{n}{2}\right) \right\} \end{aligned}$$

Letting \(\theta _{n}=\frac{n-d-1}{2}\) (\(n=2\theta _{n}+d+1\))

$$\begin{aligned} \mathcal {W}_{d}(X;\theta _{n},\underline{S})&= \exp \left\{ \theta _{n} \log |X| -\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X) - \left( \theta _{n} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \theta _{n} + \frac{d+1}{2}\right) \right\} \\&= \exp \left\{ <\theta _{n},\log |X|> + k_{\underline{S}}(X) - F_{\underline{S}}(\theta _{n})\right\} \\&\quad \quad \text{ with } F_{\underline{S}}(\theta _{n}) = \left( \theta _{n} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| + \log \varGamma _{d}\left( \theta _{n} + \frac{d+1}{2}\right) \\&\quad \quad \text{ with } k_{\underline{S}}(X) = -\frac{1}{2} {\mathrm {tr}}(\underline{S}^{-1} X) \end{aligned}$$

The correspondence between natural parameter \(\theta _{n}\) and expectation parameter \(\eta _{n}\) is

$$\eta _{n} = \nabla _{\theta _{n}} F_{\underline{S}} (\theta _{n}) = \log \left| 2\underline{S}\right| + \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) $$
$$\begin{aligned} \Leftrightarrow&\qquad \qquad \varPsi _{d}\left( \theta _{n} + \frac{(d + 1)}{2}\right) = \eta _{n} - \log \left| 2\underline{S}\right| \\ \Leftrightarrow&\qquad \qquad \theta _{n} + \frac{(d + 1)}{2} = \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \\ \Leftrightarrow&\theta _{n} = \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} = (\nabla F_{\underline{S}})^{-1} (\eta _{n}) = \nabla F_{\underline{S}}^* (\eta _{n})\\ \end{aligned}$$

Finally, we obtain the MLE for \(\theta _{n}\) in this sub family:

$$\hat{\theta }_{n} = \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}$$

Same formulation with source parameter \(n\):

$$\begin{aligned} \frac{\hat{n} - d -1}{2}&= \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}\\ \hat{n}&= 2 \varPsi _{d}^{-1}\left( \left[ \frac{1}{N} \sum _{i=1}^{N} \log \left| X\right| \right] - \log \left| 2\underline{S}\right| \right) \end{aligned}$$

Dual log-normalizer \(F_{\underline{S}}^{*}\) for \(\mathcal {W}_{d,\underline{S}}\) is

$$\begin{aligned} F_{\underline{S}}^{*} (\eta _{n}) = \,&\langle (\nabla F_{\underline{S}})^{-1} (\eta _{n}), \eta _{n}\rangle - F_{\underline{S}}((\nabla F_{\underline{S}})^{-1} (\eta _{n}))\\ = \,&\langle \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}, \eta _{n}\rangle \\&-\varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \right) \\ =\,&\varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} \eta _{n} - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n} - \log \left| 2\underline{S}\right| \right) \right) \\[-38pt] \end{aligned}$$
$$\begin{aligned} {\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1} || \mathcal {W}_{d,{\underline{S}}}^{2}) =\,&B_{F_{\underline{S}}} (\theta _{n_{2}} : \theta _{n_{1}}) = F_{\underline{S}}(\theta _{n_{2}}) - F_{\underline{S}}(\theta _{n_{1}}) - \langle \theta _{n_{2}} - \theta _{n_{1}} , \nabla F_{\underline{S}} (\theta _{n_{1}})\rangle \\ = \,&\left( \theta _{n_{2}} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| + \log \varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) \\&-\left( \theta _{n_{1}} + \frac{d+1}{2}\right) \log \left| 2\underline{S}\right| - \log \varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) \\&-\langle \theta _{n_{2}} - \theta _{n_{1}}, \log \left| 2\underline{S}\right| + \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \rangle \\ {\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1}|| \mathcal {W}_{d,{\underline{S}}}^{2}) =\,&\log \frac{\varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) }{\varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) } - (\theta _{n_{2}} - \theta _{n_{1}}) \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \\ =\,&- \log \frac{\varGamma _{d}\left( \theta _{n_{1}} + \frac{d+1}{2}\right) }{\varGamma _{d}\left( \theta _{n_{2}} + \frac{d+1}{2}\right) } + (\theta _{n_{1}} - \theta _{n_{2}}) \varPsi _{d}\left( \theta _{n_{1}} + \frac{(d + 1)}{2}\right) \end{aligned}$$

also with source parameter

$${\mathrm {KL}}(\mathcal {W}_{d,{\underline{S}}}^{1}|| \mathcal {W}_{d,{\underline{S}}}^{2}) = - \log \left( \frac{\varGamma _{d}\left( \frac{n_{1}}{2}\right) }{\varGamma _{d}\left( \frac{n_{2}}{2}\right) }\right) + \left( \frac{n_{1} - n_{2}}{2}\right) \varPsi _{d}\left( \frac{n_{1}}{2}\right) $$

Let us remark that this quantity does not depend on \(\underline{S}\).

$$\begin{aligned} B_{F_{\underline{S}}^*} (\eta _{n_{1}} : \eta _{n_{2}}) =\,&{F_{\underline{S}}^*}(\eta _{n_{1}}) - {F_{\underline{S}}^*}(\eta _{n_{2}}) - {<}\eta _{n_{1}} - \eta _{n_{2}}, \nabla F_{\underline{S}}^{*}(\eta _{n_{2}}){>}_{HS}\\ =\,&\varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2} \eta _{n_{1}} \\&\quad - \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right) \\&- \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) + \frac{(d + 1)}{2} \eta _{n_{2}} \\&\quad + \log \varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \right) \\&- \langle \eta _{n_{1}} - \eta _{n_{2}}, \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) - \frac{(d + 1)}{2}\rangle _{HS}\\ B_{F_{\underline{S}}^*} (\eta _{n_{1}} : \eta _{n_{2}}) = \,&\log \frac{\varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) \right) }{\varGamma _{d}\left( \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right) }\\&-\left[ \varPsi _{d}^{-1}\left( \eta _{n_{2}} - \log \left| 2\underline{S}\right| \right) - \varPsi _{d}^{-1}\left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \right] \left( \eta _{n_{1}} - \log \left| 2\underline{S}\right| \right) \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Saint-Jean, C., Nielsen, F. (2014). Hartigan’s Method for \(k\)-MLE: Mixture Modeling with Wishart Distributions and Its Application to Motion Retrieval. In: Nielsen, F. (eds) Geometric Theory of Information. Signals and Communication Technology. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05316-5

  • Online ISBN: 978-3-319-05317-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics