A Hybrid Parameter Estimation Algorithm for Beta Mixtures and Applications to Methylation State Classification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9838)


Mixtures of beta distributions have previously been shown to be a flexible tool for modeling data with values on the unit interval, such as methylation levels. However, maximum likelihood parameter estimation with beta distributions suffers from problems because of singularities in the log-likelihood function if some observations take the values 0 or 1. While ad-hoc corrections have been proposed to mitigate this problem, we propose a different approach to parameter estimation for beta mixtures where such problems do not arise in the first place. Our algorithm has computational advantages over the maximum-likelihood-based EM algorithm. As an application, we demonstrate that methylation state classification is more accurate when using adaptive thresholds from beta mixtures than non-adaptive thresholds on observed methylation levels.


Mixture model Beta distribution Maximum likelihood Method of moments EM algorithm Differential methylation Classification 



C.S. acknowledges funding from the Federal Ministry of Education and Research (BMBF) under the Project Number 01KU1216 (Deutsches Epigenom Programm, DEEP). S.R. acknowledges funding from the Mercator Research Center Ruhr (MERCUR), project Pe-2013-0012 (UA Ruhr professorship) and from the German Research Foundation (DFG), Collaborative Research Center SFB 876, project C1.


  1. 1.
    Adusumalli, S., Mohd Omar, M.F., Soong, R., Benoukraf, T.: Methodological aspects of whole-genome bisulfite sequencing analysis. Brief. Bioinform. 16(3), 369–379 (2015)CrossRefGoogle Scholar
  2. 2.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B 39(1), 1–38 (1977)MathSciNetMATHGoogle Scholar
  3. 3.
    Grün, B., Kosmidis, I., Zeileis, A.: Extended beta regression in R: Shaken, stirred, mixed, and partitioned. J. Stat. Softw. 48(11), 1–25 (2012)CrossRefGoogle Scholar
  4. 4.
    Ji, Y., Wu, C., Liu, P., Wang, J., Coombes, K.R.: Applications of beta-mixture models in bioinformatics. Bioinformatics 21(9), 2118–2122 (2005)CrossRefGoogle Scholar
  5. 5.
    Pounds, S., Morris, S.W.: Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19(10), 1236–1242 (2003)CrossRefGoogle Scholar
  6. 6.
    Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Zeschnigk, M., et al.: Massive parallel bisulfite sequencing of CG-rich DNA fragments reveals that methylation of many X-chromosomal CpG islands in female blood DNA is incomplete. Hum. Mol. Genet. 18(8), 1439–1448 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Genome Informatics, Institute of Human Genetics, University Hospital EssenUniversity of Duisburg-EssenEssenGermany

Personalised recommendations