Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models

Wang, Wenyi; Carvalho, Benilton; Miller, Nate; Pevsner, Jonathan; Chakravarti, Aravinda; Irizarry, Rafael A.

doi:10.1007/978-3-540-71681-5_10

Wenyi Wang¹,
Benilton Carvalho¹,
Nate Miller²,
Jonathan Pevsner²,
Aravinda Chakravarti³ &
…
Rafael A. Irizarry¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Included in the following conference series:

Annual International Conference on Research in Computational Molecular Biology

1545 Accesses
1 Citations

Abstract

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer.

More than one decade ago comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high- throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates.

Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (http://www.bioconductor.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bignell, G.R., Huang, J., Greshock, J., Watt, S., Butler, A., West, S., Grigorova, M., Jones, K.W., Wei, W., Stratton, M.R., Futreal, P.A., Weber, B., Shapero, M.H., Wooster, R.: High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 14(2), 287–295 (2004)
Article Google Scholar
Carvalho, B., Speed, T.P., Irizarry, R.A.: Exploration, normalization, and genotype calls of high density oligonucleotide snp array data. Johns Hopkins University, Dept. of Biostatistics Working Papers (111) (2006)
Google Scholar
Collins, F.S., Brooks, L.D., Chakravarti, A.: A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8(12), 1229–1231 (1998)
Google Scholar
Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., Pritchard, J.K.: A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38(1), 75–81 (2006)
Article Google Scholar
Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)
Article Google Scholar
Gribble, S.M., Kalaitzopoulos, D., Burford, D.C., Prigmore, E., Selzer, R.R., Ng, B.L., Matthews, N.S.W., Porter, K.M., Curley, R., Lindasy, S.J., Baptista, J., Richmond, T.A., Carter, N.P.: Ultra-high resolution array painting facilitates breakpoint sequencing. J. Med. Genet. (Sept. 2006)
Google Scholar
Hinds, D.A., Kloek, A.P., Jen, M., Chen, X., Frazer, K.A.: Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38(1), 82–85 (2006)
Article Google Scholar
Huang, J., Wei, W., Chen, J., Zhang, J., Liu, G., Di, X., Mei, R., Ishikawa, S., Aburatani, H., Jones, K.W., Shapero, M.H.: CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics 7, 83 (2006)
Article Google Scholar
Huang, J., Wei, W., Zhang, J., Liu, G., Bignell, G.R., Stratton, M.R., Futreal, P.A., Wooster, R., Jones, K.W., Shapero, M.H.: Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum. Genomics. 1(4), 287–299 (2004)
Google Scholar
Huber, W., von Heydebreck, A., Sueltmann, H., Poutska, A., Vingron, M.: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 1 (2002)
Google Scholar
Iafrate, A., Feuk, L., Rivera, M., Listewnik, M., Donahoe, P., Qi, Y., Scherer, S., Lee, C.: Detection of large-scale variation in the human genome. Nature Genetics 36(9), 949–951 (2004)
Article Google Scholar
Irizarry, R., Hobbs, F.C.B., Beaxer-Barclay, Y., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003)
Article MATH Google Scholar
Ishikawa, S., Komura, D., Tsuji, S., Nishimura, K., Yamamoto, S., Panda, B., Huang, J., Fukayama, M., Jones, K.W., Aburatani, H.: Allelic dosage analysis with genotyping microarrays. Biochem. Biophys. Res. Commun. 333(4), 1309–1314 (2005)
Article Google Scholar
Kennedy, G.C., Matsuzaki, H., Dong, S., Min Liu, W., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., Liu, W., Yang, G., Di, X., Ryder, T., He, Z., Surti, U., Phillips, M.S., Boyce-Jacino, M.T., Fodor, S.P., Jones, K.W.: Large-scale genotyping of complex DNA. Nature Biotechnology 21, 1233–1237 (2003)
Article Google Scholar
Komura, D., Nishimura, K., Ishikawa, S., Panda, B., Huang, J., Nakamura, H., Ihara, S., Hirose, M., Jones, K.W., Aburatani, H.: Noise Reduction from genotyping microarrays using probe level information. Silico. Biol. 6(1-2):0009 (Feb. 2006)
Google Scholar
Laframboise, T., Harrington, D., Weir, B.A.: PLASQ: A Generalized Linear Model-Based Procedure to Determine Allelic Dosage in Cancer Cells from SNP Array Data. Biostatistics (June 2006)
Google Scholar
Li, C., Wong, W.: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science U S A 98, 31–36 (2001)
Article MATH Google Scholar
McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J., Altshuler, D.M., Consortium, I.H.: Common deletion polymorphisms in the human genome. Nat. Genet. 38(1), 86–92 (2006)
Article Google Scholar
Nannya, Y., Sanada, M., Nakazaki, K., Hosoya, N., Wang, L., Hangaishi, A., Kurokawa, M., Chiba, S., Bailey, D.K., Kennedy, G.C., Ogawa, S.: A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 65(14), 6071–6079 (2005)
Article Google Scholar
Peiffer, D.A., Le, J.M., Steemers, F.J., Chang, W., Jenniges, T., Garcia, F., Haden, K., Li, J., Shaw, C.A., Belmont, J., Cheung, S.W., Shen, R.M., Barker, D.L., Gunderson, K.L.: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16(9), 1136–1148 (2006)
Article Google Scholar
Rabbee, N., Speed, T.P.: A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22(1), 7–12 (2006)
Article Google Scholar
Rocke, D.M., Durbin, B.: A model for measurement error for gene expression arrays. J. Comput. Biol. 8(6), 557–569 (2001)
Article Google Scholar
Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., Navin, N., Lucito, R., Healy, J., Hicks, J., Ye, K., Reiner, A., Gilliam, T., Trask, B., Patterson, N., Zetterberg, A., Wigler, M.: Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004)
Article Google Scholar
Sharp, A.J., Hansen, S., Selzer, R.R., Cheng, Z., Regan, R., Hurst, J.A., Stewart, H., Price, S.M., Blair, E., Hennekam, R.C., Fitzpatrick, C.A., Segraves, R., Richmond, T.A., Guiver, C., Albertson, D.G., Pinkel, D., Eis, P.S., Schwartz, S., Knight, S.J.L., Eichler, E.E.: Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38(9), 1038–1042 (2006)
Article Google Scholar
Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R., Oseroff, V.V., Albertson, D.G., Pinkel, D., Eichler, E.E.: Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77(1), 78–88 (2005)
Article Google Scholar
Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., Olson, M.V., Eichler, E.E.: Fine-scale structural variation of the human genome. Nat. Genet. 37(7), 727–732 (2005)
Article Google Scholar
Wu, Z., Irizarry, R., Gentleman, R., Martinez-Murillo, F., Spencer, F.: A model based background adjustement for oligonucleotide expression arrays. Journal of the America Statistical Association (2004)
Google Scholar
Zhao, X., Li, C., Paez, J.G., Chin, K., Jeanne, P.A., Chen, T.-H., Girard, L., Minna, J., Christiani, D., Leo, C., Gray, J.W., Sellers, W.R., Meyerson, M.: An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 64(9), 3060–3071 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
Wenyi Wang, Benilton Carvalho & Rafael A. Irizarry
Department of Neurology, Kennedy Krieger Institute, 707 North Broadway, Baltimore, MD 21205, USA
Nate Miller & Jonathan Pevsner
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine Broadway Research Building, Suite 579, 733 N. Broadway, Baltimore, MD 21205, USA
Aravinda Chakravarti

Authors

Wenyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Benilton Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Nate Miller
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Pevsner
View author publications
You can also search for this author in PubMed Google Scholar
Aravinda Chakravarti
View author publications
You can also search for this author in PubMed Google Scholar
Rafael A. Irizarry
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Carvalho, B., Miller, N., Pevsner, J., Chakravarti, A., Irizarry, R.A. (2007). Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-71681-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71680-8
Online ISBN: 978-3-540-71681-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics