Toward an Efficient Computation of Log-Likelihood Functions in Statistical Inference: Overdispersed Count Data Clustering

  • Masoud DaghyaniEmail author
  • Nuha Zamzami
  • Nizar Bouguila
Part of the Unsupervised and Semi-Supervised Learning book series (UNSESUL)


This work presents an unsupervised learning algorithm, using the mesh method for computing the log-likelihood function. The multinomial Dirichlet distribution (MDD) is one of the widely used methods of modeling multicategorical count data with overdispersion. Recently, it has been shown that traditional numerical computation of the MDD log-likelihood function either results in instability or leads to long run times that make its use infeasible in case of large datasets. Thus, we propose to use the mesh algorithm that involves approximating the MDD log-likelihood function based on Bernoulli polynomials. Moreover, we extend the mesh algorithm approach for computing the log-likelihood function of a more flexible distribution, namely the multinomial generalized Dirichlet (MGD). We demonstrate the efficiency of this method in statistical inference, i.e., maximum likelihood estimation, for fitting finite mixture models based on MDD and MGD as efficient distributions for count data. Through a set of experiments, the proposed approach shows its merits in two real-world clustering problems, namely natural scenes categorization and facial expression recognition.


Unsupervised learning Mixture models Maximum likelihood Count data Image clustering 


  1. 1.
    Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, New York (2011)zbMATHGoogle Scholar
  2. 2.
    Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11(10), R106 (2010)CrossRefGoogle Scholar
  3. 3.
    Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bouguila, N.: Clustering of count data using generalized Dirichlet multinomial distributions. IEEE Trans. Knowl. Data Eng. 20(4), 462–474 (2008)CrossRefGoogle Scholar
  5. 5.
    Bouguila, N., Ziou, D., Vaillancourt, J.: Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. Image Process. 13(11), 1533–1543 (2004)CrossRefGoogle Scholar
  6. 6.
    Busam, R., Freitag, E.: Complex Analysis. Springer, London (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Cadez, I.V., Smyth, P., McLachlan, G.J., McLaren, C.E.: Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach. Learn. 47(1), 7–34 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data, vol. 53. Cambridge University Press, Cambridge (2013)zbMATHCrossRefGoogle Scholar
  9. 9.
    Casella, G., Berger, R.: Duxbury advanced series in statistics and decision sciences. Statistical Inference (2002)Google Scholar
  10. 10.
    Church, K.W., Gale, W.A.: Poisson mixtures. Nat. Lang. Eng. 1(2), 163–190 (1995)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague vol. 1, pp. 1–2 (2004)Google Scholar
  13. 13.
    De Dinechin, F., Lauter, C.Q.: Optimizing polynomials for floating-point implementation (2008). Preprint. arXiv:0803.0439Google Scholar
  14. 14.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2, pp. 524–531. IEEE, New York (2005)Google Scholar
  16. 16.
    Griffiths, D.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637–648 (1973)CrossRefGoogle Scholar
  17. 17.
    Haseman, J., Kupper, L.: Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35(1), 281–293 (1979)CrossRefGoogle Scholar
  18. 18.
    Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2011)zbMATHCrossRefGoogle Scholar
  19. 19.
    Katz, S.M.: Distribution of content words and phrases in text and language modelling. Nat. Lang. Eng. 2(1), 15–59 (1996)CrossRefGoogle Scholar
  20. 20.
    Leckenby, J.D., Kishi, S.: The Dirichlet multinomial distribution as a magazine exposure model. J. Market. Res. 21(1), 100–106 (1984)CrossRefGoogle Scholar
  21. 21.
    Lewy, P.: A generalized Dirichlet distribution accounting for singularities of the variables. Biometrics 52(4), 1394–1409 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Lochner, R.H.: A generalized Dirichlet distribution in Bayesian life testing. J. R. Stat. Soc. Ser. B (Methodol.) 37(1), 103–113 (1975)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Loh, W.Y.: Symmetric multivariate and related distributions. Technometrics 34(2), 235–236 (1992)CrossRefGoogle Scholar
  24. 24.
    Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of DARPA Broadcast News Workshop, pp. 127–131 (1999)Google Scholar
  25. 25.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  26. 26.
    Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pp. 94–101. IEEE, New York (2010)Google Scholar
  27. 27.
    MacKay, D.J., Peto, L.C.B.: A hierarchical Dirichlet language model. Nat. Lang. Eng. 1(3), 289–308 (1995)CrossRefGoogle Scholar
  28. 28.
    Madsen, R.E., Kauchak, D., Elkan, C.: Modeling word burstiness using the Dirichlet distribution. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 545–552. ACM, New York (2005)Google Scholar
  29. 29.
    McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions, vol. 382. Wiley, Hoboken (2007)zbMATHGoogle Scholar
  30. 30.
    McLachlan, G., Peel., D.: Finite Mixture Models. Wiley, Hoboken (2000)zbMATHCrossRefGoogle Scholar
  31. 31.
    McLachlan, G.J., Lee, S.X., Rathnayake, S.I.: Finite mixture models. Annu. Rev. Stat. Appl. 6, 355–378 (2000)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Mimno, D., McCallum, A.: Topic models conditioned on arbitrary features with Dirichlet-multinomial regression (2012). Preprint. arXiv:1206.3278Google Scholar
  33. 33.
    Minka, T.: Estimating a Dirichlet distribution (2000).
  34. 34.
    Mosimann, J.E.: On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49(1/2), 65–82 (1962)MathSciNetzbMATHCrossRefGoogle Scholar
  35. 35.
    Neerchal, N.K., Morel, J.G.: An improved method for the computation of maximum likelihood estimates for multinomial overdispersion models. Comput. Stat. Data Anal. 49(1), 33–43 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)zbMATHCrossRefGoogle Scholar
  37. 37.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)zbMATHCrossRefGoogle Scholar
  38. 38.
    Poortema, K.: On modelling overdispersion of counts. Stat. Neerl. 53(1), 5–20 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Puig, P., Valero, J.: Count data distributions: some characterizations with applications. J. Am. Stat. Assoc. 101(473), 332–340 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  40. 40.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)MathSciNetzbMATHCrossRefGoogle Scholar
  41. 41.
    Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 616–623 (2003)Google Scholar
  42. 42.
    Rowe, C.H.: A proof of the asymptotic series for log γ (z) and log γ (z+ a). Ann. Math. 32(1), 10–16 (1931)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Rust, R.T., Leone, R.P.: The mixed-media Dirichlet multinomial distribution: a model for evaluating television-magazine advertising schedules. J. Mark. Res. 21(1), 89–99 (1984)CrossRefGoogle Scholar
  44. 44.
    Teevan, J., Karger, D.R.: Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM, New York (2003)Google Scholar
  45. 45.
    Tirri, H., Kontkanen, P., Myllym Aki, P.: Probabilistic instance-based learning. In: Machine Learning: Proceedings of the Thirteenth International Conference, pp. 507–515 (1996)Google Scholar
  46. 46.
    Ueda, N., Saito, K.: Parametric mixture models for multi-labeled text. In: Advances in Neural Information Processing Systems, pp. 737–744 (2003)Google Scholar
  47. 47.
    Valstar, M., Pantic, M.: Induced disgust, happiness and surprise: an addition to the MMI facial expression database. In: Proc. 3rd Intern. Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Paris, p. 65 (2010)Google Scholar
  48. 48.
    Whittaker, E., Watson, G.: A Course of Modern Analysis. Cambridge University Press, Cambridge (1990)zbMATHGoogle Scholar
  49. 49.
    Wong, T.T.: Generalized Dirichlet distribution in Bayesian analysis. Appl. Math. Comput. 97(2–3), 165–181 (1998)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Wong, T.T.: Alternative prior assumptions for improving the performance of naïve Bayesian classifiers. Data Min. Knowl. Disc. 18(2), 183–213 (2009)CrossRefGoogle Scholar
  51. 51.
    Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492. IEEE, New York (2010)Google Scholar
  52. 52.
    Yu, P., Shaw, C.A.: An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function. Bioinformatics 30(11), 1547–1554 (2014)CrossRefGoogle Scholar
  53. 53.
    Zamzami, N., Bouguila, N.: Consumption behavior prediction using hierarchical Bayesian frameworks. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 31–34. IEEE, New York (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Masoud Daghyani
    • 1
    Email author
  • Nuha Zamzami
    • 2
    • 3
  • Nizar Bouguila
    • 2
  1. 1.Department of Electrical and Computer Engineering (ECE)Concordia UniversityMontrealCanada
  2. 2.Concordia Institute for Information Systems EngineeringConcordia UniversityMontrealCanada
  3. 3.Faculty of Computing and Information TechnologyKing Abdulaziz UniversityJeddahSaudi Arabia

Personalised recommendations