Statistics and Computing

, Volume 21, Issue 4, pp 601–612 | Cite as

Generalized EM estimation for semi-parametric mixture distributions with discretized non-parametric component

  • Jun MaEmail author
  • Sigurbjorg Gudlaugsdottir
  • Graham Wood


We consider independent sampling from a two-component mixture distribution, where one component (called the parametric component) is from a known distributional family and the other component (called the non-parametric component) is unknown. This is a semi-parametric mixture distribution. We discretize the non-parametric component and estimate the parameters of this mixture model, namely the mixing proportion, the unknown parameters of the parametric component and the discretized non-parametric component. We define the maximum penalized likelihood (MPL) estimates of the mixture model parameters and then develop a generalized EM (GEM) iterative scheme to compute the MPL estimates. A simulation study and an example from biology are presented.


Semi-parametric mixture model Maximum penalized likelihood Roughness penalty Generalized EM 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bordes, L., Delmas, C., Vandekerkhove, P.: Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Stat. 33, 733–752 (2006) zbMATHCrossRefMathSciNetGoogle Scholar
  2. Bordes, L., Chauveau, D., Vandekerkhove, P.: A stochastic EM algorithm for a semiparametric mixture model. Comput. Stat. Data Anal. 51, 5429–5443 (2007) zbMATHCrossRefMathSciNetGoogle Scholar
  3. Cavalier-Smith, T.: Selfish DNA and the origin of introns. Nature 315, 283–284 (1985) CrossRefGoogle Scholar
  4. Cho, G., Doolittle, R.F.: Intron distribution in ancient paralogs supports random insertion and not random loss. J. Mol. Evol. 44, 573–584 (1997) CrossRefGoogle Scholar
  5. Cruz-Medina, I.R., Hettmansperger, T.P.: Nonparametric estimation in semi-parametric univariate mixture models. J. Stat. Comput. Simul. 74, 513–524 (2004) zbMATHCrossRefMathSciNetGoogle Scholar
  6. De Souza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., Gilbert, W.: Toward a resolution of the introns early/late debate: Only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95, 5094–5099 (1998) CrossRefGoogle Scholar
  7. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977) zbMATHMathSciNetGoogle Scholar
  8. Gudlaugsdottir, S., Boswell, D.R., Wood, G.R., Ma, J.: Exon size distribution and the origin of introns. Genetica 131, 299–306 (2007) CrossRefGoogle Scholar
  9. Hall, P., Zhou, X.H.: Nonparametric estimation of component distributions in a multivariate mixture. Ann. Stat. 31, 201–224 (2003) zbMATHCrossRefMathSciNetGoogle Scholar
  10. Lindsay, B.G., Lesperance, M.L.: A review of semiparametric mixture models. J. Stat. Plan. Inference 47, 29–39 (1995) zbMATHCrossRefMathSciNetGoogle Scholar
  11. Logsdon, J.M., Palmer, J.D.: Origin of introns—early or late? Nature 369, 526 (1994) CrossRefGoogle Scholar
  12. Long, M., Rosenberg, C., Gilbert, W.: Intron phase correlations and the evolution of the intron/exon structure of genes. Proc. Natl. Acad. Sci. USA 92, 12495–12499 (1995) CrossRefGoogle Scholar
  13. Luenberger, D.: Linear and Nonlinear Programming, 2nd edn. Wiley, New York (1984) zbMATHGoogle Scholar
  14. Ma, J.: Multiplicative algorithms for maximum penalized likelihood inversion with nonnegative constraints and generalized error distributions. Commun. Stat., Theory Methods 35, 831–848 (2006) zbMATHCrossRefGoogle Scholar
  15. McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, New York (2000) zbMATHCrossRefGoogle Scholar
  16. Olkin, I., Spiegelman, C.H.: A semiparametric approach to density estimation. J. Am. Stat. Assoc. 82, 858–865 (1987) zbMATHCrossRefMathSciNetGoogle Scholar
  17. Ortega, J.M., Rheinboldt, W.C.: Iterative Solutions of Nonlinear Equations in Several Variables. Academic Press, New York (1970) Google Scholar
  18. Roy, S.W., Nosaka, M., de Souza, S.J., Gilbert, W.: Centripetal modules and ancient introns. Gene 238, 85–91 (1999) CrossRefGoogle Scholar
  19. Tikhonov, T., Arsenin, V.: Solutions of Ill-Posed Problems. Wiley, New York (1977) zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Jun Ma
    • 1
    Email author
  • Sigurbjorg Gudlaugsdottir
    • 1
  • Graham Wood
    • 1
  1. 1.Department of StatisticsMacquarie UniversitySydneyAustralia

Personalised recommendations