Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data
CNV detection requires a high-quality segmentation of genomic data. In many WGS experiments, sample and control are sequenced together in a multiplexed fashion using DNA barcoding for economic reasons. Using the differential read depth of these two conditions cancels out systematic additive errors. Due to this detrending, the resulting data is appropriate for inference using a hidden Markov model (HMM), arguably one of the principal models for labeled segmentation. However, while the usual frequentist approaches such as Baum-Welch are problematic for several reasons, they are often preferred to Bayesian HMM inference, which normally requires prohibitively long running times and exceeds a typical user’s computational resources on a genome scale data. HaMMLET solves this problem using a dynamic wavelet compression scheme, which makes Bayesian segmentation of WGS data feasible on standard consumer hardware.
Key wordsHaMMLET Hidden Markov Model Bayesian inference CNV Whole genome sequencing Segmentation
- 2.Bilmes JA (1998) A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int Comput Sci Inst 4(510):126 Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.613 Google Scholar
- 5.Chib S (1996) Calculating posterior distributions and modal estimates in Markov mixture models. J Econ 75(1):79–97 Retrieved from http://www.sciencedirect.com/science/article/pii/0304407695017704 CrossRefGoogle Scholar
- 9.Guha S, Li Y, Neuberg D (2006) Bayesian hidden Markov modeling of Array CGH data, Harvard University Biostatistics Working Paper Series. Retrieved from http://biostats.bepress.com/harvardbiostat/paper24