Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Finite mixtures of unimodal beta and gamma densities and the \(k\)-bumps algorithm


This paper addresses the problem of estimating a density, with either a compact support or a support bounded at only one end, exploiting a general and natural form of a finite mixture of distributions. Due to the importance of the concept of multimodality in the mixture framework, unimodal beta and gamma densities are used as mixture components, leading to a flexible modeling approach. Accordingly, a mode-based parameterization of the components is provided. A partitional clustering method, named \(k\)-bumps, is also proposed; it is used as an ad hoc initialization strategy in the EM algorithm to obtain the maximum likelihood estimation of the mixture parameters. The performance of the \(k\)-bumps algorithm as an initialization tool, in comparison to other common initialization strategies, is evaluated through some simulation experiments. Finally, two real applications are presented.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Downloadable from http://www.humanfertility.org/cgi-bin/main.php.


  1. Altman E, Resti A, Sironi A (2005) Loss given default: a review of the literature. In: Altman E, Resti A, Sironi A (eds) The next challenge in credit risk management. Riskbooks, London

  2. Banca d’Italia (2001) Principali Risultati della Rilevazione sull’Attività di Recupero dei Crediti. Bollettino di Vigilanza 12

  3. Basel Committee on Banking Supervision (2004) International capital measurement and capital standards: a revised framework. Bank for International Settlements, Basel

  4. Behboodian J (1970) On the modes of a mixture of two normal distributions. Technometrics 12(1):131–139

  5. Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3):561–575

  6. Brazier S, Sparks RSJ, Carey SN, Sigurdsson H, Westgate JA (1983) Bimodal grain size distribution and secondary thickening in air-fall ash layers. Nature 301:115–119

  7. Bruche M, González-Aguado C (2010) Recovery rates, default probabilities, and the credit cycle. J Banking Financ 34(4):713–723

  8. Calabrese R, Zenga M (2008) Measuring loan recovery rate: methodology and empirical evidence. Stat Appl VI(2):193–214

  9. Calabrese R, Zenga M (2010) Bank loan recovery rates: measuring and nonparametric density estimation. J Banking Financ 34(5):903–911

  10. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332

  11. Chen S (1999) Beta kernel estimators for density functions. Comput Stat Data Anal 31(2):131–145

  12. Chen S (2000) Probability density function estimation using gamma kernels. Ann Inst Stat Math 52(3):471–480

  13. Coale A (1971) Age patterns of marriage. Pop Stud 25(2):193–214

  14. Congdon P (1993) Statistical graduation in local demographic analysis and projection. J R Stat Soc Ser A Stat Soc 156(2):237–270

  15. Cox D (1966) Notes on the analysis of mixed frequency distributions. Br J Math Stat Psychol 19(1):39–47

  16. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B Methodol 39(1):1–38

  17. Diebolt J, Ip E (1996) Stochastic EM: method and application. In: Gilks W, Richardson S, Spiegelhalter D (eds) Markov chain Monte Carlo in practice, chap 15. Chapman and Hall, London, pp 259–273

  18. Dye JL, (2008) Fertility of American women, 2006. Current Population Reports, US Census Bureau 20(558)

  19. Eisenberger I (1964) Genesis of bimodal distributions. Technometrics 6(4):357–363

  20. Elderton WP, Johnson NL (1969) Systems of frequency curves. Cambridge University Press, Cambridge

  21. Everitt B, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London

  22. Ghosal S (2001) Convergence rates for density estimation with Bernstein polynomials. Ann Stat 29(5):1264–1280

  23. Gupton G, Stein R (2002) LossCalc: Moody’s model for predicting loss given default (LGD). Moody’s Investors Service, New York

  24. Gupton G, Finger C, Bhatia M (1997) CreditMetrics—technical document. J. P. Morgan and Co, New York

  25. Huang Z (1998) Extensions to the \(k\)-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

  26. Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold Learning. Springer, New York

  27. Ji Y, Wu C, Liu P, Wang J, Coombes K (2005) Applications of beta-mixture models in bioinformatics. Bioinformatics 21(9):2118–2122

  28. Johnson NL, Kotz S (1970a) Continuous univariate distributions, vol 1. Wiley, New York

  29. Johnson NL, Kotz S (1970b) Continuous univariate distributions, vol 2. Wiley, New York

  30. Jordan MI, Xu L (1995) Convergence results for the EM approach to mixtures of experts architectures. Neural Netw 8(9):1409–1431

  31. Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis, vol 39. Wiley, New York

  32. Kendall MG, Stuart A (1958) The advanced theory of statistics, vol 1. Charles Griffin and Company Limited, London

  33. Lee S, Sheldon Lin X (2010) Modeling and evaluating insurance losses via mixtures of Erlang distributions. N Am Actuar J 14(1):107–130

  34. Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11(8):1–18

  35. Lindsay B (1995) Mixture models: theory, geometry and applications. In: NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward

  36. Martin JA, Hamilton BE, Sutton PD, Ventura SJ, Menacker F, Kirmeyer S, Mathews T (2009) Births: final data for 2006. Natl Vital Stat Rep 57(7):1–104

  37. Maulik U, Bandyopadhyay S, Mukhopadhyay A (2011) Multiobjective genetic algorithm-based fuzzy clustering: applications in data mining and bioinformatics. Springer, Berlin

  38. Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21(2):151–158

  39. Mazza A, Punzo A (2011) Discrete beta kernel graduation of age-specific demographic indicators. In: Ingrassia S, Rocci R, Vichi M (eds) New perspectives in statistical modeling and data analysis (Studies in classification, data analysis and knowledge organization), vol 42. Springer, Berlin, pp 127–134

  40. Mazza A, Punzo A (2013a) Graduation by adaptive discrete beta kernels. In: Giusti A, Ritter G, Vichi M (eds) Classification and data mining (Studies in classification, data analysis and knowledge organization), vol 44. Springer, Berlin, pp 77–84

  41. Mazza A, Punzo A (2013b) Using the variation coefficient for adaptive discrete beta kernel graduation. In: Giudici P, Ingrassia S, Vichi M (eds) Studies in classification, data analysis and knowledge organization. Springer, Berlin (in press)

  42. McLachlan G, Krishnan T (2007) The EM algorithm and extensions. Wiley, New York

  43. McLachlan GJ, Basford KE (1988) Mixture models—inference and applications to clustering. Marcel Dekker, New York

  44. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

  45. Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42(1):9–29

  46. Murphy EA (1964) One cause? Many causes? the argument from the bimodal distribution. J Chronic Dis 17(4):301–324

  47. Pearson K (1902a) On the systematic fitting of curves to observations and measurements. Biometrika 1(3):265–303

  48. Pearson K (1902b) On the systematic fitting of curves to observations and measurements: part II. Biometrika 2(1):1–23

  49. Petrone S (1999a) Bayesian density estimation using Bernstein polynomials. Can J Stat 27(1):105–126

  50. Petrone S (1999b) Random Bernstein polynomials. Scand J Stat 26(3):373–393

  51. Punzo A (2010) Discrete beta-type models. In: Locarek-Junge H, Weihs C (eds) Classification as a tool for research (Studies in classification, data analysis and knowledge organization), vol 40. Springer, Berlin, pp 253–261

  52. Punzo A, Zini A (2012) Discrete approximations of continuous and mixed measures on a compact interval. Stat Pap 53(3):563–575

  53. Ray S, Lindsay B (2005) The topography of multivariate normal mixtures. Ann Stat 33(5):2042–2065

  54. R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0

  55. Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239

  56. Robertson C, Fryer J (1969) Some descriptive properties of normal mixtures. Skand Aktuarietidskr 52: 137–146

  57. Rogers A (1986) Parameterized multistate population dynamics and projections. J Am Stat Assoc 81(393):48–61

  58. Scharl T, Grün B, Leisch F (2010) Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects. Bioinformatics 26(3):370–377

  59. Schilling M, Watkins A, Watkins W (2002) Is human height bimodal? Am Stat 56(3):223–229

  60. Silverman B (1981) Using kernel density estimates to investigate multimodality. J R Stat Soc Ser B Methodol 43:97–99

  61. Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York

  62. Wessels J (1964) Multimodality in a family of probability densities, with application to a linear mixture of two normal densities. Statistica Neerlandica 18(3):267–282

  63. Wiper M, Insua DR, Ruggeri F (2001) Mixtures of gamma distributions with applications. J Comput Graph Stat 10(3):440–454

Download references

Author information

Correspondence to Antonio Punzo.

Electronic supplementary material


Parameterization genesis

If a density function \(f\) is chosen to belong to the Pearson system, then it is the solution of the differential equation

$$\begin{aligned} \frac{df\left(x\right)}{dx}=-\frac{\left(x-m\right)f\left(x\right)}{c_0+c_1x+c_2x^2}. \end{aligned}$$

It is clear that some of the solutions to (18) have a single mode (\(df/dx=0\) at \(x=m\)) and smooth contact with the horizontal axis (\(df/dx=0\) when \(f\left(x\right)=0\)).

The shape of \(f\) depends on the set of parameters \(\left(m,c_0,c_1,c_2\right)\), and the form of the solution of (18) evidently depends on the nature of the roots of the equation

$$\begin{aligned} c_0+c_1x+c_2x^2=0. \end{aligned}$$

The classes of unimodal gamma and beta densities, illustrated in Sect. 2, arise from a convenient choice of these roots.

Unimodal gamma densities

Consider \(c_2=0\) and \(c_1>0\). Thus, Eq. (18) becomes

$$\begin{aligned} \frac{df\left(x\right)}{dx}=-\frac{\left(x-m\right)f\left(x\right)}{c_0+c_1x}=\left(-\frac{1}{c_1}+\frac{m+c_0/c_1}{c_0+c_1x}\right)f\left(x\right), \end{aligned}$$

in which

$$\begin{aligned} f\left(x\right)=C\left(x+\frac{c_0}{c_1}\right)^{\frac{1}{c_1}\left(m+\frac{c_0}{c_1}\right)}e^{-\frac{x}{c_1}},\qquad -\frac{c_0}{c_1}\le x<\infty . \end{aligned}$$

In order to make (21) a density function,

$$\begin{aligned} C=\left\{ c_1^{\frac{1}{c_1}\left(m+\frac{c_0}{c_1}\right)+1}e^{\frac{c_0}{c_1^2}}\mathrm \Gamma \left[\frac{1}{c_1}\left(m+\frac{c_0}{c_1}\right)+1\right]\right\} ^{-1}, \end{aligned}$$

so that the result is a gamma distribution (see Johnson and Kotz 1970a, Chapter 17). Equation (2) is obtained by setting \(-c_0/c_1=a\) and \(c_1=v\).

Unimodal beta densities

Suppose that both the roots of (19) are real. Denoting these roots as \(a\) and \(b\), with \(a<b\), it follows that

$$\begin{aligned} c_0+c_1x+c_2x^2=-c_2\left(x-a\right)\left(b-x\right); \end{aligned}$$

consequently, Eq. (18) becomes

$$\begin{aligned} \frac{df\left(x\right)}{dx}=\frac{\left(x-m\right)f\left(x\right)}{c_2\left(x-a\right)\left(b-x\right)}=\frac{1}{c_2\left(b-a\right)}\left(\frac{a-m}{x-a}+\frac{b-m}{b-x}\right)f\left(x\right), \end{aligned}$$

in which

$$\begin{aligned} f\left(x\right)=C\left(x-a\right)^{\frac{a-m}{c_2\left(b-a\right)}}\left(b-x\right)^{\frac{m-b}{c_2\left(b-a\right)}},\qquad a\le x\le b. \end{aligned}$$

In order to make (23) a density function,

$$\begin{aligned} C=\left\{ \left(b-a\right)^{\frac{c_2-1}{c_2}}\mathrm B \left[\frac{a-m}{c_2\left(b-a\right)}+1,\frac{m-b}{c_2\left(b-a\right)}+1\right]\right\} ^{-1}, \end{aligned}$$

so that the result is a beta distribution (see Johnson and Kotz 1970b, Chapter 24). If a unimodal beta density must be considered, it is necessary that \(c_2\le 0\). Equation (6) is obtained by setting \(-c_2=v\) in (23).

Details on the EM algorithm

Here we attempt to make explicit the derivatives in (14) for both gamma and beta densities parameterized according to (2) and (6), respectively. We recall that the resulting ML-estimates do not have a closed-form expression and can only be computed numerically, with the aid of an iterative algorithm; such numerical methods are available in most computer software, such as Mathematica and R.

In detail, for the gamma density in (2) we have

$$\begin{aligned} \displaystyle \frac{\partial \ln f \left(x_i;m_j,v_j\right)}{\partial m_j} = \displaystyle \frac{1}{v_j}\left[\ln \left(x_i-a\right)-\ln v_j-\psi \left(\frac{m_j-a}{v_j}+1\right)\right] \end{aligned}$$


$$\begin{aligned} \frac{\partial \ln f \left(x_i;m_j,v_j\right)}{\partial v_j}&= \frac{1}{v_j^2}\left\{ \left(m_j-a\right)\left[\ln v_j+\psi \left(\displaystyle \frac{m_j-a}{v_j}+1\right)-\ln \left(x_i-a\right)\right]+\right.\\&-\left(m_j+v_j\right)+x_i\biggr \}, \end{aligned}$$

where \(\psi \left(\cdot \right)\) is the digamma function. In the same way, for the beta density in (6) we have

$$\begin{aligned} \frac{\partial \ln f \left(x_i;m_j,v_j\right)}{\partial m_j}&= \displaystyle \frac{1}{v_j\left(b-a\right)}\left\{ \left[\psi \left(\frac{b-m_j}{v_j\left(b-a\right)}+1\right)-\psi \left(\frac{m_j-a}{v_j\left(b-a\right)}+1\right)\right]+\right.\\&\quad +\ln \left(x_i-a\right)-\ln \left(b-x_i\right)\biggr \}, \end{aligned}$$


$$\begin{aligned} \frac{\partial \ln f\left(x_i;m_j,v_j\right)}{\partial v_j}&= \frac{1}{v_j^2\left(b-a\right)}\left\{ \left(b-a\right)\left[\ln \left(b-a\right)-\psi \left(\displaystyle \frac{2v_j+1}{v_j}\right)\right]+\right.\\&\quad +\left[\left(m_j-a\right)\psi \quad \left(\frac{m_j-a}{v_j\left(b-a\right)}+1\right)+\left(b-m_j\right)\psi \left(\displaystyle \frac{b-m_j}{v_j\left(b-a\right)}+1\right)\right]+\\&\quad -\left(m_j-a\right)\ln \left(x_i-a\right)-\left(b-m_j\right)\ln \left(b-x_i\right)\biggr \}. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bagnato, L., Punzo, A. Finite mixtures of unimodal beta and gamma densities and the \(k\)-bumps algorithm. Comput Stat 28, 1571–1597 (2013). https://doi.org/10.1007/s00180-012-0367-4

Download citation


  • Finite mixtures of densities
  • Pearson system
  • EM algorithm
  • Bump hunting
  • Partitional clustering methods