Skip to main content

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Abstract

In this chapter, we focus on statistical questions raised by the identification of copy number alterations in tumor samples using genotyping microarrays, also known as Single Nucleotide Polymorphism (SNP) arrays. We define the copy number states formally, and show how they are assessed by SNP arrays. We identify and discuss general and cancer-specific challenges for SNP array data preprocessing, and how they are addressed by existing methods. We review existing statistical methods for the detection of copy number changes along the genome. We describe the influence of two biological parameters – the proportion of normal cells in the sample, and the ploidy of the tumor – on observed data. Finally, we discuss existing approaches for the detection and calling of copy number aberrations in the particular context of cancer studies, and identify statistical challenges that remain to be addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanahan, D., & Weinberg, R. A. (2000, January). The hallmarks of cancer. Cell, 100(1), 57–70.

    Google Scholar 

  2. Chin, L., & Gray, J. W. (2008, April). Translating insights from the cancer genome into clinical practice. Nature, 452(7187), 553–563.

    Google Scholar 

  3. Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W. L., Chen, C., Zhai, Y., Dairkee, S. H., Ljung, B. M., Gray, J. W., & Albertson, D. G. (1998). High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics, 20, 207–211.

    Article  Google Scholar 

  4. Albertson, D. G., & Pinkel, D. (2003, October). Genomic microarrays in human genetic disease and cancer. Human Molecular Genetics, 12(Spec. No. 2), R145–R152.

    Google Scholar 

  5. Tuna, M., Knuutila, S., & Mills, G. B. (2009, March). Uniparental disomy in cancer. Trends in Molecular Medicine, 15(3), 120–128. PMID: 19246245.

    Google Scholar 

  6. Staaf, J., Lindgren, D., Vallon-Christersson, J., Isaksson, A., Goransson, H., Juliusson, G., Rosenquist, R., Hoglund, M., Borg, A., & Ringner, M. (2008). Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biology, 9(9), R136.

    Article  Google Scholar 

  7. Assié, G., LaFramboise, T., Platzer, P., Bertherat, J., Stratakis, C. A., & Eng, C. (2008). SNP arrays in heterogeneous tissue: Highly accurate collection of both germline and somatic genetic information from unpaired single tumor samples. American Journal of Human Genetics, 82, 903–915.

    Article  Google Scholar 

  8. Li, C., Beroukhim, R., Weir, B. A., Winckler, W., Garraway, L. A., Sellers, W. R., & Meyerson, M. (2008). Major copy proportion analysis of tumor samples using SNP arrays. BMC Bioinformatics, 9, 204.

    Article  Google Scholar 

  9. LaFramboise, T. (2009, July). Single nucleotide polymorphism arrays: A decade of biological, computational and technological advances. Nucleic Acids Research, 37(13), 4181–4193. PMID: 19570852.

    Google Scholar 

  10. Peiffer, D. A., Le, J. M., Steemers, F. J., Chang, W., Jenniges, T., Garcia, F., Haden, K., Li, J., Shaw, C. A., Belmont, J., Cheung, S. W., Shen, R. M., Barker, D. L., & Gunderson, K. L. (2006, September). High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Research, 16(9), 1136–1148.

    Google Scholar 

  11. Gardina, P. J., Lo, K. C., Lee, W., Cowell, J. K., & Turpaz, Y. (2008). Ploidy status and copy number aberrations in primary glioblastomas defined by integrated analysis of allelic ratios, signal ratios and loss of heterozygosity using 500 K SNP Mapping Arrays. BMC Genomics, 9(1), 489.

    Article  Google Scholar 

  12. Collins, F. S., & Barker, A. D. (2007, March). Mapping the cancer genome. Scientific American, 296(3), 50–57.

    Google Scholar 

  13. The Cancer Genome Atlas (TGCA) research Network. (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068.

    Article  Google Scholar 

  14. Bengtsson, H., Wirapati, P., & Speed, T. P. (2009). A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix arrays including GenomeWideSNP 5 & 6. Bioinformatics, 27(17), 2149–2156.

    Article  Google Scholar 

  15. Bengtsson, H., Neuvial, P., & Speed, T. P. (2010). TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays. BMC Bioinformatics, 11(1), 245.

    Article  Google Scholar 

  16. Affymetrix Inc. (2007). Affymetrix Genome-Wide Human SNP Array 6.0. Data sheet.

    Google Scholar 

  17. Affymetrix Inc. (2009). Affymetrix cytogenetics research solution. Data sheet.

    Google Scholar 

  18. Gunderson, K. L., Steemers, F. J., Lee, G., Mendoza, L. G., & Chee, M. S. (2005, May). A genome-wide scalable SNP genotyping assay using microarray technology. Nature Genetics, 37(5), 549–554.

    Google Scholar 

  19. Steemers, F. J., & Gunderson, K. L. (2007). Whole genome genotyping technologies on the BeadArray platform. Biotechnology Journal, 2(1), 41–49.

    Article  Google Scholar 

  20. Illumina, Inc. (2009). SNP genotyping and copy number analysis. Illumina Product Guide.

    Google Scholar 

  21. Nannya, Y., Sanada, M., Nakazaki, K., Hosoya, N., Wang, L., Hangaishi, A., Kurokawa, M., Chiba, S., Bailey, D. K., Kennedy, G. C., & Ogawa, S. (2005, July 15). A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Research, 65(14), 6071–6079.

    Google Scholar 

  22. Ishikawa, S., Komura, D., Tsuji, S., Nishimura, K., Yamamoto, S., Panda, B., Huang, J., Fukayama, M., Jones, K. W., & Aburatani, H. (2005, August 12). Allelic dosage analysis with genotyping microarrays. Biochemical and Biophysical Research Communications, 333(4), 1309–1314.

    Google Scholar 

  23. Carvalho, B., Bengtsson, H., Speed, T. P., & Irizarry, R. A. (2007, April). Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics, 8(2), 485–499.

    Google Scholar 

  24. Bengtsson, H., Irizarry, R., Carvalho, B., & Speed, T. P. (2008, March 15). Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics, 24(6), 759–767.

    Google Scholar 

  25. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., & Speed, T. P. (2002). Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), e15.

    Article  Google Scholar 

  26. Li, C., & Wong, W. H. (2001, January 2). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America, 98(1), 31–36.

    Google Scholar 

  27. Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003, January). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185–193.

    Google Scholar 

  28. Ortiz-Estevez, M., Bengtsson, H., & Rubio, A. (2010, June). ACNE: A summarization method to estimate allele-specific copy numbers for Affymetrix SNP arrays. Bioinformatics, 26(15), 1827–1833.

    Google Scholar 

  29. Rabbee, N., & Speed, T. P. (2006, January). A genotype calling algorithm for Affymetrix SNP arrays. Bioinformatics, 22(1), 7–12.

    Google Scholar 

  30. Affymetrix Inc. (2006, April). BRLMM: An improved genotype calling method for the GeneChip Human Mapping 500 K Array Set.

    Google Scholar 

  31. LaFramboise, T., Harrington, D., & Weir, B. A. (2007, April). PLASQ: A generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics, 8(2), 323–336.

    Google Scholar 

  32. Illumina, Inc. (2006). Illumina’s genotyping data normalization methods. White paper.

    Google Scholar 

  33. Staaf, J., Vallon-Christersson, J., Lindgren, D., Juliusson, G., Rosenquist, R., Hoglund, M., Borg, A., & Ringner, M. (2008). Normalization of illumina infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios. BMC Bioinformatics, 9(1), 409.

    Article  Google Scholar 

  34. Steemers, F. J., Chang, W., Lee, G., Barker, D. L., Shen, R., & Gunderson, K. L. (2006). Whole-genome genotyping with the single-base extension assay. Nature Methods, 3(1), 31–33. PMID: 16369550.

    Article  Google Scholar 

  35. Bengtsson, H. (2004, October). Low-level analysis of microarray data. PhD thesis, Centre for Mathematical Sciences, Division of Mathematical Statistics, Lund University. http://www.lunduniversity.lu.se/o.o.i.s?id=24732&postid=467374

  36. Yamamoto, G., Nannya, Y., Kato, M., Sanada, M., Levine, R. L., Kawamata, N., Hangaishi, A., Kurokawa, M., Chiba, S., Gilliland, D. G., Koeffler, H. P., & Ogawa, S. (2007, July). Highly sensitive method for genomewide detection of allelic composition in nonpaired, primary tumor specimens by use of affymetrix single-nucleotide-polymorphism genotyping microarrays. American Journal of Human Genetics, 81(1), 114–126.

    Google Scholar 

  37. Pounds, S., Cheng, C., Mullighan, C., Raimondi, S. C., Shurtleff, S., & Downing, J. R. (2009). Reference alignment of SNP microarray signals for copy number analysis of tumors. Bioinformatics, 25(3), 315.

    Article  Google Scholar 

  38. Olshen, A. B., Venkatraman, E. S., Lucito, R., & Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5(4), 557–572.

    Article  MATH  Google Scholar 

  39. Bengtsson, H., Simpson, K., Bullard, J., & Hansen, K. (2008). aroma. affymetrix: A generic framework in R for analyzing small to very large Affymetrix data sets in bounded memory Technical Report 745. Berkeley: Department of Statistics, University of California.

    Google Scholar 

  40. von Neumann, J., Kent, R. H., Bellinson, H. R., & Hart, B. I. (1941). The mean square successive difference. The Annals of Mathematical Statistics, 12(2), 153–162.

    Article  Google Scholar 

  41. Bengtsson, H., Ray, A., Spellman, P. T., & Speed, T. P. (2009). A single-sample method for normalizing and combining full-resolution copy numbers from multiple sources and technologies. Bioinformatics, 25(7), 861—867.

    Article  Google Scholar 

  42. Venkatraman, E. S., & Olshen, A. B. (2007, March). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23(6), 657–663.

    Google Scholar 

  43. Greenman, C. D., Bignell, G., Butler, A., Edkins, S., Hinton, J., Beare, D., Swamy, S., Santarius, T., Chen, L., Widaa, S., Futreal, P. A., & Stratton, M. R. (2010). PICNIC: An algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics, 11(1), 164–175.

    Article  Google Scholar 

  44. Lai, W. R., Johnson, M. D., Kucherlapati, R., & Park, P. J. (2005, October 1). Comparative analysis of algorithms for identifying amplifications and deletions in array-CGH data. Bioinformatics, 21(19), 3763–3770.

    Google Scholar 

  45. Willenbrock, H., & Fridlyand, J. (2005, November 15). A comparison study: Applying segmentation to array-CGH data for downstream analyses. Bioinformatics, 21(22), 4084–4091.

    Google Scholar 

  46. Jong, K., Marchiori, E., van der Vaart, A., Ylstra, B., Weiss, M., & Meijer, G. (2003, April 14–16). Chromosomal breakpoint detection in human cancer. In G. R. Raidl, S. Cagnoni, J. J. R. Cardalda, D. W. Corne, J. Gottlieb, A. Guillot, E. Hart, C. G. Johnson, E. Marchiori, J.-A. Meyer, & M. Middendorf (Eds.), Applications of evolutionary computing, EvoWorkshops2003: EvoBIO, EvoCOP, EvoIASP, EvoMUSART, EvoROB, EvoSTIM, Vol. 2611 of LNCS (pp. 54–65). England, UK: University of Essex, Springer-Verlag.

    Google Scholar 

  47. Zhang, N. R., & Siegmund, D. O. (2007). A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics, 63(1), 22–32.

    Article  MathSciNet  MATH  Google Scholar 

  48. Lavielle, M. (2005). Using penalized contrasts for the change-point problem. Signal Processing, 85(8), 1501–1510.

    Article  MATH  Google Scholar 

  49. Picard, F., Robin, S., Lavielle, M., Vaisse, C., & Daudin, J. J. (2005). A statistical approach for array CGH data analysis. BMC Bioinformatics, 6(1), 27–27.

    Article  Google Scholar 

  50. Rigaill, G. (2010, April). Pruned dynamic programming for optimal multiple change-point detection. Arxiv preprint arXiv:1004.0887.

    Google Scholar 

  51. Tibshirani, R., & Wang, P. (2008, Jan). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, 9(1), 18–29.

    Google Scholar 

  52. Harchaoui, Z., & Lévy-Leduc, C. (2008). Catching change-points with lasso. Advances in Neural Information Processing Systems, 20, 161–168.

    Google Scholar 

  53. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society. Series B, Statistical Methodology, 67(1), 91–108.

    Google Scholar 

  54. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of statistics, 32(2), 407–451.

    Article  MathSciNet  MATH  Google Scholar 

  55. Fridlyand, J., Snijders, A., Pinkel, D., Albertson, D. G., & Jain, A. N. (2004). Application of hidden markov models to the analysis of the array CGH data. Journal of Multivariate Analysis, 90, 132–153. Special Issue on Multivariate Methods in Genomic Data Analysis.

    Google Scholar 

  56. Guha, S., Li, Y., & Neuberg, D. (2008). Bayesian hidden Markov modeling of array CGH data. Journal of the American Statistical Association, 103(482), 485–497.

    Article  MathSciNet  MATH  Google Scholar 

  57. Lai, T. L., Xing, H., & Zhang, N. (2008, April). Stochastic segmentation models for array-based comparative genomic hybridization data analysis. Biostatistics, 9(2), 290–307.

    Google Scholar 

  58. Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Manér, S., Massa, H., Walker, M., Chi, M., Navin, N., Lucito, R., Healy, J., Hicks, J., Ye, K., Reiner, A., Gilliam, T. C., Trask, B., Patterson, N., Zetterberg, A., & Wigler, M. (2004, July). Large-scale copy number polymorphism in the human genome. Science, 305(5683), 525–528.

    Google Scholar 

  59. Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L., Donahoe, P. K., Qi, Y., Scherer, S. W., & Lee, C. (2004, September). Detection of large-scale variation in the human genome. Nature Genetics, 36(9), 949–951.

    Google Scholar 

  60. Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., et al. (2006). Global variation in copy number in the human genome. Nature, 444, 444–454.

    Article  Google Scholar 

  61. Scharpf, R. B., Parmigiani, G., Pevsner, J., & Ruczinski, I. (2008). Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays. Annals of Applied Statistics, 2(2), 687–713.

    Article  MathSciNet  MATH  Google Scholar 

  62. Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S. F. A., Hakonarson, H., & Bucan, M. (2007, November). PennCNV: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research, 17(11), 1665.

    Google Scholar 

  63. Colella, S., Yau, C., Taylor, J. M., Mirza, G., Butler, H., Clouston, P., Bassett, A. S., Seller, A., Holmes, C. C., & Ragoussis, J. (2007, March). QuantiSNP: An objective bayes Hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Research, 35(6), 2013–2025.

    Google Scholar 

  64. Korn, J. M., Kuruvilla, F. G., McCarroll, S. A., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P. J., Darvishi, K., Lee, C., Nizzari, M. M., Gabriel, S. B., Purcell, S., Daly, M. J., & Altshuler, D. (2008, October). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics, 40(10), 1253–1260.

    Google Scholar 

  65. Attiyeh, E. F., Diskin, S. J., Attiyeh, M. A., Mossé, Y. P., Hou, C., Jackson, E. M., Kim, C., Glessner, J., Hakonarson, H., Biegel, J. A., & Maris, J. M. (2009, February). Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy. Genome Research, 19(2), 276–283.

    Google Scholar 

  66. Olshen, A. B., Olshen, R. A., Bengtsson, H., Neuvial, P., Spellman, P. T., & Seshan, V. E. (2010, May). Parent-specific copy number in paired tumor-normal studies using circular binary segmentation. Submitted, December 2010.

    Google Scholar 

  67. Popova, T., Manié, É., Stoppa-Lyonnet, D., Rigaill, G., Barillot, E., & Stern, M.-H. (2009). Genome alteration print (GAP): A tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome Biology, 10(11), R128.

    Article  Google Scholar 

  68. Lamy, P., Andersen, C. L., Dyrskjot, L., Torring, N., & Wiuf, C. (2007). A Hidden Markov Model to estimate population mixture and allelic copy-numbers in cancers using Affymetrix SNP arrays. BMC Bioinformatics, 8(1), 434.

    Article  Google Scholar 

  69. Chen, H., Xing, H., & Zhang, N. R. (2011 Jan). Estimation of parent specific DNA copy number in tumors using high-density genotyping arrays. PLoS Comput Biol., 7(1): e1001060.

    Google Scholar 

  70. Sun, W., Wright, F. A., Tang, Z., Nordgard, S. H., Van Loo, P., Yu, T., Kristensen, V. N., & Perou, C. M. (2009, September). Integrated study of copy number states and genotype calls using high-density SNP arrays. Nucleic Acids Research, 37(16), 5365–5377.

    Google Scholar 

  71. Beroukhim, R., Lin, M., Park, Y., Hao, K., Zhao, X., Garraway, L. A., Fox, E. A., Hochberg, E. P., Mellinghoff, I. K., Hofer, M. D., Descazeaud, A., Rubin, M. A., Meyerson, M., Wong, W. H., Sellers, W. R., & Li, C. (2006, May). Inferring loss-of-heterozygosity from unpaired tumors using high-density oligonucleotide SNP arrays. PLoS Computational Biology, 2(5), e41.

    Google Scholar 

  72. Zhang, N. R., Siegmund, D. O., Ji, H., & Li, J. Z. (2010). Detecting simultaneous change-points in multiple sequences. Biometrika, 97(3), 631–645.

    Article  MathSciNet  MATH  Google Scholar 

  73. Vert J.-P. & Bleakley K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, (eds.) Advances in Neural Information Processing Systems 23 (NIPS), 2343–2351.

    Google Scholar 

  74. Picard, F., Lebarbier, É., Budinaská, E., & Robin, S. (2011). Joint segmentation of multivariate Gaussian Processes using mixed linear models. Computational Statistics and Data Analysis, 55, 1160–1170.

    Article  MathSciNet  Google Scholar 

  75. Shah, S. P., Lam, W. L., Ng, R. T., & Murphy, K. P. (2007, July). Modeling recurrent DNA copy number alterations in array-CGH data. Bioinformatics, 23(13), i450–i458.

    Google Scholar 

  76. Zhang, N. R., Senbabaoglu, Y., & Li, J. Z. (2009, November). Joint estimation of DNA copy number from multiple platforms. Bioinformatics, 26(2), 153–160.

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the Lawrence Berkeley National Laboratory (LBNL) and The Cancer Genome Atlas (TCGA) for making data and results available. This work was supported by NCI grant U24 CA126551.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Neuvial, P., Bengtsson, H., Speed, T.P. (2011). Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_11

Download citation

Publish with us

Policies and ethics