CNVeM: Copy Number Variation Detection Using Uncertainty of Read Mapping

Wang, Zhanyong; Hormozdiari, Farhad; Yang, Wen-Yun; Halperin, Eran; Eskin, Eleazar

doi:10.1007/978-3-642-29627-7_34

Zhanyong Wang²⁰,
Farhad Hormozdiari²⁰,
Wen-Yun Yang²⁰,
Eran Halperin^21,22 &
…
Eleazar Eskin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1384 Accesses

Abstract

Copy number variations (CNVs) are widely known to be an important mediator for diseases and traits. The development of high-throughput sequencing (HTS) technologies has provided great opportunities to identify CNV regions in mammalian genomes. In a typical experiment, millions of short reads obtained from a genome of interest are mapped to a reference genome. The mapping information can be used to identify CNV regions. One important challenge in analyzing the mapping information is the large fraction of reads that can be mapped to multiple positions. Most existing methods either only consider reads that can be uniquely mapped to the reference genome, or randomly place a read to one of its mapping positions. Therefore, these methods have low power to detect CNVs located within repeated sequences. In this study, we propose a probabilistic model, CNVeM, that utilizes the inherent uncertainty of read mapping. We use maximum likelihood to estimate locations and copy numbers of copied regions, and implement an expectation-maximization (EM) algorithm. One important contribution of our model is that we can distinguish between regions in the reference genome that differ from each other by as little as 0.1%. As our model aims to predict the copy number of each nucleotide, we can predict the CNV boundaries with high resolution. We apply our method to simulated datasets and achieve higher accuracy compared to CNVnator. Moreover, we apply our method to real data from which we detected known CNVs. To our knowledge, this is the first attempt to predict CNVs at nucleotide resolution, and to utilize uncertainty of read mapping.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abyzov, A., Urban, A.E., Snyder, M., Gerstein, M.: CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research 21(6), 974–984 (2011)
Article Google Scholar
Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O., Cenk Sahinalp, S., Gibbs, R.A., Eichler, E.E.: Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics 41(10), 1061–1067 (2009)
Article Google Scholar
Cappuzzo, F., Hirsch, F.R., Rossi, E., Bartolini, S., Ceresoli, G.L., Bemis, L., Haney, J., Witta, S., Danenberg, K., Domenichini, I., Ludovini, V., Magrini, E., Gregorc, V., Doglioni, C., Sidoni, A., Tonato, M., Franklin, W.A., Crino, L., Bunn Jr., P.A., Varella-Garcia, M.: Epidermal growth factor receptor gene and protein and gefitinib sensitivity in non-small-cell lung cancer. Journal of National Cancer Institute 97(9), 643–655 (2005)
Article Google Scholar
Carter, N.P.: Methods and strategies for analyzing copy number variation using dna microarrays. Nature Genetics 39(suppl. 7), 16–21 (2007)
Article Google Scholar
Chen, P.-A., Liu, H.-F., Chao, K.-M.: CNVDetector: locating copy number variations using array CGH data. Bioinformatics 24(23), 2773–2775 (2008)
Article Google Scholar
Chiang, D.Y., Getz, G., Jaffe, D.B., O’Kelly, M.J.T., Zhao, X., Carter, S.L., Russ, C., Nusbaum, C., Meyerson, M., Lander, E.S.: High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6(1), 99–103 (2009)
Article Google Scholar
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619 (2002)
Article Google Scholar
1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)
Article Google Scholar
Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E., Cenk Sahinalp, S.: mrsfast: a cache-oblivious algorithm for short-read mapping. Nature Methods 7(8), 576–577 (2010)
Article Google Scholar
Halperin, E., Hazan, E.: HAPLOFREQ-estimating haplotype frequencies efficiently. Journal of Computational Biology 13(2), 481–500 (2006)
Article MathSciNet Google Scholar
He, D., Furlotte, N., Eskin, E.: Detection and reconstruction of tandemly organized de novo copy number variations. BMC Bioinformatics 11(suppl. 11), S12 (2010)
Article Google Scholar
He, D., Hormozdiari, F., Furlotte, N., Eskin, E.: Efficient algorithms for tandem copy number variation reconstruction in repeat-rich regions. Bioinformatics 27(11), 1513–1520 (2011)
Article Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E.E., Cenk Sahinalp, S.: Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Research 19(7), 1270–1278 (2009)
Article Google Scholar
John Iafrate, A., Feuk, L., Rivera, M.N., Listewnik, M.L., Donahoe, P.K., Qi, Y., Scherer, S.W., Lee, C.: Detection of large-scale variation in the human genome. Nature Genetics 36(9), 949–951 (2004)
Article Google Scholar
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology 10(3), R25 (2009)
Article Google Scholar
Medvedev, P., Fiume, M., Dzamba, M., Smith, T., Brudno, M.: Detecting copy number variation with mated short reads. Genome Research 20(11), 1613–1622 (2010)
Article Google Scholar
Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., Yamrom, B., Yoon, S., Krasnitz, A., Kendall, J., Leotta, A., Pai, D., Zhang, R., Lee, Y.-H., Hicks, J., Spence, S.J., Lee, A.T., Puura, K., Lehtimki, T., Ledbetter, D., Gregersen, P.K., Bregman, J., Sutcliffe, J.S., Jobanputra, V., Chung, W., Warburton, D., King, M.-C., Skuse, D., Geschwind, D.H., Conrad Gilliam, T., Ye, K., Wigler, M.: Strong association of de novo copy number mutations with autism. Science 316(5823), 445–449 (2007)
Article Google Scholar
Simpson, J.T., McIntyre, R.E., Adams, D.J., Durbin, R.: Copy number variant detection in inbred strains from short read sequence data. Bioinformatics 26(4), 565–567 (2010)
Article Google Scholar
Sudbery, I., Stalker, J., Simpson, J.T., Keane, T., Rust, A.G., Hurles, M.E., Walter, K., Lynch, D., Teboul, L., Brown, S.D., Li, H., Ning, Z., Nadeau, J.H., Croniger, C.M., Durbin, R., Adams, D.J.: Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels. Genome Biology (October 2009)
Google Scholar
Sudmant, P.H., Kitzman, J.O., Antonacci, F., Alkan, C., Malig, M., Tsalenko, A., Sampas, N., Bruhn, L., Shendure, J., 1000 Genomes Project, Eichler, E.E.: Diversity of human copy number variation and multicopy genes. Science 330(6004), 641–646 (2010)
Article Google Scholar
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Anne Morrison, V., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., Olson, M.V., Eichler, E.E.: Fine-scale structural variation of the human genome. Nature Genetics 37(7), 727–732 (2005)
Article Google Scholar
Yoon, S., Xuan, Z., Makarov, V., Ye, K., Sebat, J.: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Research 19(9), 1586–1592 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of California, Los Angeles, USA
Zhanyong Wang, Farhad Hormozdiari, Wen-Yun Yang & Eleazar Eskin
Blavatnik School of Computer Science, The Department of Molecular Microbiology and Biotechnology, Tel-Aviv University, Israel
Eran Halperin
International Computer Science Institute, Berkeley, USA
Eran Halperin

Authors

Zhanyong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Hormozdiari
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Eran Halperin
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, Tel-Aviv University, 69978, Tel-Aviv, Israel
Benny Chor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Hormozdiari, F., Yang, WY., Halperin, E., Eskin, E. (2012). CNVeM: Copy Number Variation Detection Using Uncertainty of Read Mapping. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-29627-7_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29626-0
Online ISBN: 978-3-642-29627-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics