Advertisement

Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data

  • John WiedenhoeftEmail author
  • Alexander Schliep
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1833)

Abstract

CNV detection requires a high-quality segmentation of genomic data. In many WGS experiments, sample and control are sequenced together in a multiplexed fashion using DNA barcoding for economic reasons. Using the differential read depth of these two conditions cancels out systematic additive errors. Due to this detrending, the resulting data is appropriate for inference using a hidden Markov model (HMM), arguably one of the principal models for labeled segmentation. However, while the usual frequentist approaches such as Baum-Welch are problematic for several reasons, they are often preferred to Bayesian HMM inference, which normally requires prohibitively long running times and exceeds a typical user’s computational resources on a genome scale data. HaMMLET solves this problem using a dynamic wavelet compression scheme, which makes Bayesian segmentation of WGS data feasible on standard consumer hardware.

Key words

HaMMLET Hidden Markov Model Bayesian inference CNV Whole genome sequencing Segmentation 

References

  1. 1.
    Wiedenhoeft J, Brugel E, Schliep A (2016) Fast Bayesian inference of copy number variants using hidden Markov models with wavelet compression. PLoS Comput Biol 12(5):e1004871. https://doi.org/10.1371/journal.pcbi.1004871 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int Comput Sci Inst 4(510):126 Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.613 Google Scholar
  3. 3.
    Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16. https://doi.org/10.1109/MASSP.1986.1165342 CrossRefGoogle Scholar
  4. 4.
    Rabiner LRLR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286. https://doi.org/10.1109/5.18626 CrossRefGoogle Scholar
  5. 5.
    Chib S (1996) Calculating posterior distributions and modal estimates in Markov mixture models. J Econ 75(1):79–97 Retrieved from http://www.sciencedirect.com/science/article/pii/0304407695017704 CrossRefGoogle Scholar
  6. 6.
    Scott SL (2002) Bayesian methods for hidden Markov models: recursive computing in the 21st century. J Am Stat Assoc 97(457):337–351. https://doi.org/10.1198/016214502753479464 CrossRefGoogle Scholar
  7. 7.
    Shah SP, Xuan X, DeLeeuw RJ et al (2006) Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics 22(14):e431–e439. https://doi.org/10.1093/bioinformatics/btl238 CrossRefPubMedGoogle Scholar
  8. 8.
    Shah SP, Lam WL, Ng RT et al (2007) Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics 23(13):i450–i458. https://doi.org/10.1093/bioinformatics/btm221 CrossRefPubMedGoogle Scholar
  9. 9.
    Guha S, Li Y, Neuberg D (2006) Bayesian hidden Markov modeling of Array CGH data, Harvard University Biostatistics Working Paper Series. Retrieved from http://biostats.bepress.com/harvardbiostat/paper24
  10. 10.
    Olshen AB, Venkatraman ES, Lucito R et al (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics (Oxford England) 5(4):557–572. https://doi.org/10.1093/biostatistics/kxh008 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Chalmers University of TechnologyGothenburgSweden
  2. 2.Rutgers UniversityNew BrunswickUSA

Personalised recommendations