Skip to main content

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to be an underlying cause of cancer.

More than one decade ago comparative genomic hybridization (CGH) technology was developed to detect copy number changes in a high- throughput fashion. However, this technology only provides a 10 MB resolution which limits the ability to detect copy number alterations spanning small regions. It is widely believed that a copy number alteration as small as one base can have significant downstream effects, thus microarray manufacturers have developed technologies that provide much higher resolution. Unfortunately, strong probe effects and variation introduced by sample preparation procedures have made single-point copy number estimates too imprecise to be useful. CGH arrays use a two-color hybridization, usually comparing a sample of interest to a reference sample, which to some degree removes the probe effect. However, the resolution is not nearly high enough to provide single-point copy number estimates.

Various groups have proposed statistical procedures that pool data from neighboring locations to successfully improve precision. However, these procedure need to average across relatively large regions to work effectively thus greatly reducing the resolution. Recently, regression-type models that account for probe-effect have been proposed and appear to improve accuracy as well as precision. In this paper, we propose a mixture model solution specifically designed for single-point estimation, that provides various advantages over the existing methodology. We use a 314 sample database, constructed with public datasets, to motivate and fit models for the conditional distribution of the observed intensities given allele specific copy numbers. With the estimated models in place we can compute posterior probabilities that provide a useful prediction rule as well as a confidence measure for each call. Software to implement this procedure will be available in the Bioconductor oligo package (http://www.bioconductor.org).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bignell, G.R., Huang, J., Greshock, J., Watt, S., Butler, A., West, S., Grigorova, M., Jones, K.W., Wei, W., Stratton, M.R., Futreal, P.A., Weber, B., Shapero, M.H., Wooster, R.: High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 14(2), 287–295 (2004)

    Article  Google Scholar 

  2. Carvalho, B., Speed, T.P., Irizarry, R.A.: Exploration, normalization, and genotype calls of high density oligonucleotide snp array data. Johns Hopkins University, Dept. of Biostatistics Working Papers (111) (2006)

    Google Scholar 

  3. Collins, F.S., Brooks, L.D., Chakravarti, A.: A DNA polymorphism discovery resource for research on human genetic variation. Genome Res. 8(12), 1229–1231 (1998)

    Google Scholar 

  4. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., Pritchard, J.K.: A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38(1), 75–81 (2006)

    Article  Google Scholar 

  5. Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)

    Article  Google Scholar 

  6. Gribble, S.M., Kalaitzopoulos, D., Burford, D.C., Prigmore, E., Selzer, R.R., Ng, B.L., Matthews, N.S.W., Porter, K.M., Curley, R., Lindasy, S.J., Baptista, J., Richmond, T.A., Carter, N.P.: Ultra-high resolution array painting facilitates breakpoint sequencing. J. Med. Genet. (Sept. 2006)

    Google Scholar 

  7. Hinds, D.A., Kloek, A.P., Jen, M., Chen, X., Frazer, K.A.: Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38(1), 82–85 (2006)

    Article  Google Scholar 

  8. Huang, J., Wei, W., Chen, J., Zhang, J., Liu, G., Di, X., Mei, R., Ishikawa, S., Aburatani, H., Jones, K.W., Shapero, M.H.: CARAT: a novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays. BMC Bioinformatics 7, 83 (2006)

    Article  Google Scholar 

  9. Huang, J., Wei, W., Zhang, J., Liu, G., Bignell, G.R., Stratton, M.R., Futreal, P.A., Wooster, R., Jones, K.W., Shapero, M.H.: Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum. Genomics. 1(4), 287–299 (2004)

    Google Scholar 

  10. Huber, W., von Heydebreck, A., Sueltmann, H., Poutska, A., Vingron, M.: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 1 (2002)

    Google Scholar 

  11. Iafrate, A., Feuk, L., Rivera, M., Listewnik, M., Donahoe, P., Qi, Y., Scherer, S., Lee, C.: Detection of large-scale variation in the human genome. Nature Genetics 36(9), 949–951 (2004)

    Article  Google Scholar 

  12. Irizarry, R., Hobbs, F.C.B., Beaxer-Barclay, Y., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264 (2003)

    Article  MATH  Google Scholar 

  13. Ishikawa, S., Komura, D., Tsuji, S., Nishimura, K., Yamamoto, S., Panda, B., Huang, J., Fukayama, M., Jones, K.W., Aburatani, H.: Allelic dosage analysis with genotyping microarrays. Biochem. Biophys. Res. Commun. 333(4), 1309–1314 (2005)

    Article  Google Scholar 

  14. Kennedy, G.C., Matsuzaki, H., Dong, S., Min Liu, W., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., Liu, W., Yang, G., Di, X., Ryder, T., He, Z., Surti, U., Phillips, M.S., Boyce-Jacino, M.T., Fodor, S.P., Jones, K.W.: Large-scale genotyping of complex DNA. Nature Biotechnology 21, 1233–1237 (2003)

    Article  Google Scholar 

  15. Komura, D., Nishimura, K., Ishikawa, S., Panda, B., Huang, J., Nakamura, H., Ihara, S., Hirose, M., Jones, K.W., Aburatani, H.: Noise Reduction from genotyping microarrays using probe level information. Silico. Biol. 6(1-2):0009 (Feb. 2006)

    Google Scholar 

  16. Laframboise, T., Harrington, D., Weir, B.A.: PLASQ: A Generalized Linear Model-Based Procedure to Determine Allelic Dosage in Cancer Cells from SNP Array Data. Biostatistics (June 2006)

    Google Scholar 

  17. Li, C., Wong, W.: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Science U S A 98, 31–36 (2001)

    Article  MATH  Google Scholar 

  18. McCarroll, S.A., Hadnott, T.N., Perry, G.H., Sabeti, P.C., Zody, M.C., Barrett, J.C., Dallaire, S., Gabriel, S.B., Lee, C., Daly, M.J., Altshuler, D.M., Consortium, I.H.: Common deletion polymorphisms in the human genome. Nat. Genet. 38(1), 86–92 (2006)

    Article  Google Scholar 

  19. Nannya, Y., Sanada, M., Nakazaki, K., Hosoya, N., Wang, L., Hangaishi, A., Kurokawa, M., Chiba, S., Bailey, D.K., Kennedy, G.C., Ogawa, S.: A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 65(14), 6071–6079 (2005)

    Article  Google Scholar 

  20. Peiffer, D.A., Le, J.M., Steemers, F.J., Chang, W., Jenniges, T., Garcia, F., Haden, K., Li, J., Shaw, C.A., Belmont, J., Cheung, S.W., Shen, R.M., Barker, D.L., Gunderson, K.L.: High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16(9), 1136–1148 (2006)

    Article  Google Scholar 

  21. Rabbee, N., Speed, T.P.: A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22(1), 7–12 (2006)

    Article  Google Scholar 

  22. Rocke, D.M., Durbin, B.: A model for measurement error for gene expression arrays. J. Comput. Biol. 8(6), 557–569 (2001)

    Article  Google Scholar 

  23. Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Maner, S., Massa, H., Walker, M., Chi, M., Navin, N., Lucito, R., Healy, J., Hicks, J., Ye, K., Reiner, A., Gilliam, T., Trask, B., Patterson, N., Zetterberg, A., Wigler, M.: Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004)

    Article  Google Scholar 

  24. Sharp, A.J., Hansen, S., Selzer, R.R., Cheng, Z., Regan, R., Hurst, J.A., Stewart, H., Price, S.M., Blair, E., Hennekam, R.C., Fitzpatrick, C.A., Segraves, R., Richmond, T.A., Guiver, C., Albertson, D.G., Pinkel, D., Eis, P.S., Schwartz, S., Knight, S.J.L., Eichler, E.E.: Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38(9), 1038–1042 (2006)

    Article  Google Scholar 

  25. Sharp, A.J., Locke, D.P., McGrath, S.D., Cheng, Z., Bailey, J.A., Vallente, R.U., Pertz, L.M., Clark, R.A., Schwartz, S., Segraves, R., Oseroff, V.V., Albertson, D.G., Pinkel, D., Eichler, E.E.: Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77(1), 78–88 (2005)

    Article  Google Scholar 

  26. Tuzun, E., Sharp, A.J., Bailey, J.A., Kaul, R., Morrison, V.A., Pertz, L.M., Haugen, E., Hayden, H., Albertson, D., Pinkel, D., Olson, M.V., Eichler, E.E.: Fine-scale structural variation of the human genome. Nat. Genet. 37(7), 727–732 (2005)

    Article  Google Scholar 

  27. Wu, Z., Irizarry, R., Gentleman, R., Martinez-Murillo, F., Spencer, F.: A model based background adjustement for oligonucleotide expression arrays. Journal of the America Statistical Association (2004)

    Google Scholar 

  28. Zhao, X., Li, C., Paez, J.G., Chin, K., Jeanne, P.A., Chen, T.-H., Girard, L., Minna, J., Christiani, D., Leo, C., Gray, J.W., Sellers, W.R., Meyerson, M.: An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res. 64(9), 3060–3071 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Wang, W., Carvalho, B., Miller, N., Pevsner, J., Chakravarti, A., Irizarry, R.A. (2007). Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics