Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data

  • John WiedenhoeftEmail author
  • Alexander Schliep
Part of the Methods in Molecular Biology book series (MIMB, volume 1833)


CNV detection requires a high-quality segmentation of genomic data. In many WGS experiments, sample and control are sequenced together in a multiplexed fashion using DNA barcoding for economic reasons. Using the differential read depth of these two conditions cancels out systematic additive errors. Due to this detrending, the resulting data is appropriate for inference using a hidden Markov model (HMM), arguably one of the principal models for labeled segmentation. However, while the usual frequentist approaches such as Baum-Welch are problematic for several reasons, they are often preferred to Bayesian HMM inference, which normally requires prohibitively long running times and exceeds a typical user’s computational resources on a genome scale data. HaMMLET solves this problem using a dynamic wavelet compression scheme, which makes Bayesian segmentation of WGS data feasible on standard consumer hardware.

Key words

HaMMLET Hidden Markov Model Bayesian inference CNV Whole genome sequencing Segmentation 


  1. 1.
    Wiedenhoeft J, Brugel E, Schliep A (2016) Fast Bayesian inference of copy number variants using hidden Markov models with wavelet compression. PLoS Comput Biol 12(5):e1004871. CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int Comput Sci Inst 4(510):126 Retrieved from Google Scholar
  3. 3.
    Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16. CrossRefGoogle Scholar
  4. 4.
    Rabiner LRLR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. CrossRefGoogle Scholar
  5. 5.
    Chib S (1996) Calculating posterior distributions and modal estimates in Markov mixture models. J Econ 75(1):79–97 Retrieved from CrossRefGoogle Scholar
  6. 6.
    Scott SL (2002) Bayesian methods for hidden Markov models: recursive computing in the 21st century. J Am Stat Assoc 97(457):337–351. CrossRefGoogle Scholar
  7. 7.
    Shah SP, Xuan X, DeLeeuw RJ et al (2006) Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics 22(14):e431–e439. CrossRefPubMedGoogle Scholar
  8. 8.
    Shah SP, Lam WL, Ng RT et al (2007) Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23(13):i450–i458. CrossRefPubMedGoogle Scholar
  9. 9.
    Guha S, Li Y, Neuberg D (2006) Bayesian hidden Markov modeling of Array CGH data, Harvard University Biostatistics Working Paper Series. Retrieved from
  10. 10.
    Olshen AB, Venkatraman ES, Lucito R et al (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics (Oxford England) 5(4):557–572. CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Chalmers University of TechnologyGothenburgSweden
  2. 2.Rutgers UniversityNew BrunswickUSA

Personalised recommendations