Statistics in Biosciences

, Volume 2, Issue 2, pp 120–136 | Cite as

Assessing Population Level Genetic Instability via Moving Average

  • Samuel McDaniel
  • Jessica Minnier
  • Rebecca A. Betensky
  • Gayatry Mohapatra
  • Yiping Shen
  • James F. Gusella
  • David N. Louis
  • Tianxi Cai


Tumoral tissues tend to generally exhibit aberrations in DNA copy number that are associated with the development and progression of cancer. Genotyping methods such as array-based comparative genomic hybridization (aCGH) provide means to identify copy number variation across the entire genome. To address some of the shortfalls of existing methods of DNA copy number data analysis, including strong model assumptions, lack of accounting for sampling variability of estimators, and the assumption that clones are independent, we propose a simple graphical approach to assess population-level genetic alterations over the entire genome based on moving average. Furthermore, existing methods primarily focus on segmentation and do not examine the association of covariates with genetic instability. In our methods, covariates are incorporated through a possibly mis-specified working model and sampling variabilities of estimators are approximated using a resampling method that is based on perturbing observed processes. Our proposal, which is applicable to partial, entire or multiple chromosomes, is illustrated through application to aCGH studies of two brain tumor types, meningioma and glioma.


aCGH data Moving average Perturbation method Gaussian process Genomic data 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aguirre A, Brennan C, Bailey G, Sinha R, Feng B, Leo C, Zhang Y, Zhang J, Gans J, Bardeesy N, Cauwels C, Cordon-Cardo C, Redston M, Depinho R, Chin L (2004) High-resolution characterization of the pancreatic adenocarcinoma genome. PNAS 24:9067–9072 CrossRefGoogle Scholar
  2. 2.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc, Ser B 57:289–300 zbMATHMathSciNetGoogle Scholar
  3. 3.
    Cai T, Wei LJ, Wilcox M (2000) Semi-parametric regression analysis for clustered failure time data. Biometrika 87:867–878 zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Cai T, Zheng Y (2007) Model checking for ROC regression analysis. Biometrics 63:152–63 zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Diskin S, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris J, Wang K (2008) Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucl Acids Res 36(19):e126 CrossRefGoogle Scholar
  6. 6.
    Engler DA, Mohaptra G, Louis DN, Betensky R (2006) A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations. Biostatistics 7(3):399–421 zbMATHCrossRefGoogle Scholar
  7. 7.
    Feuk L, Carson AR, Scherer SW (2006) Structural variation in the human genome. Nat Rev Genet 7:85–97 CrossRefGoogle Scholar
  8. 8.
    Fridlyand J, Snijders A, Pinkell D, Albertson D, Jain A (2004) Hidden Markov models approach to the analysis of array CGH data. J Multivar Anal 90:132–153 zbMATHCrossRefGoogle Scholar
  9. 9.
    Guha S, Li Y, Neuberg D (2006) Bayesian hidden Markov modeling of array CGH data. Harvard University Biostatistics Working Paper Series, vol 24 Google Scholar
  10. 10.
    Heiskanen MA, Bittner ML, Chen Y, Khan J, Adler KE, Trent JM, Meltzer PS (2000) Detection of gene amplification by genomic hybridization to cDNA microarrays. Cancer Res 60(4):799–802 Google Scholar
  11. 11.
    Hodgson G, Hager JH, Volik S, Hariono S, Wernick M, Moore D, Albertson DG, Pinkel D, Collins C, Hanahan D, Gray JW (2001) Genome scanning with array CGH deliniates regional alternatives in mouse islet carcinomas. Nat Genet 29:459–464 CrossRefGoogle Scholar
  12. 12.
    Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 20:3413–3422 CrossRefGoogle Scholar
  13. 13.
    Lai TL, Xing H, Zhang N (2008) Stochastic segmentation models for array-based comparative genomic hybridization data analysis. Biostatistics 9:290–307 zbMATHCrossRefGoogle Scholar
  14. 14.
    Louis DN, Ohgaki H, Wiestler OD, Cavenee WK (eds) (2007) World health organization histological classification of tumours of the central nervous system. International Agency for Research on Cancer, Lyon Google Scholar
  15. 15.
    Marioni JC, Thorne NP, Valsesia A, Fitzgerald T, Redon R, Fiegler H, Andrews TD, Stranger BE, Lynch AG, Dermitzakis ET et al. (2007) Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol 8:R228 CrossRefGoogle Scholar
  16. 16.
    Mohapatra G, Betensky RA, Miller ER, Carey B, Gaumont LD, Engler DA, Louis DN (2006) Glioma test array for use with formalin-fixed, paraffin-embedded tissue: array comparative genomic hybridization correlates with loss of heterozygosity and fluorescence in situ hybridization. J Mol Diagnostics 8(2):268–76 CrossRefGoogle Scholar
  17. 17.
    Mutter GL, Baak JP, Fitzgerald JT, Gray R, Neuberg D, Kust GA, Gentleman R, Gullans S, Wei LJ, Wilcox M (2001) Global expression changes of constitutive and hormonally regulated genes during endometrial neoplastic transformation. Gynecol Oncol 83:177–185 CrossRefGoogle Scholar
  18. 18.
    Okada Y, Hurwitz EE, Esposito JM, Brower MA, Nutt CL, Louis DN (2003) Selection pressures of TP53 mutation and microenvironmental location influence epidermal growth factor receptor gene amplication in human glioblastomas. Cancer Res 63:413–416 Google Scholar
  19. 19.
    Olshen AB, Venkatraman ES, Lucito R, Wigler M (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 4:557–572 CrossRefGoogle Scholar
  20. 20.
    Park Y, Wei LJ (2003) Estimating subject-specific survival functions under the accelerated failure time model. Biometrika 90:717–723 CrossRefMathSciNetGoogle Scholar
  21. 21.
    Picard F, Robin S, Lavielle M, Vaisse C, Daudin JJ (2005) A statistical approach for array CGH data analysis. BMC Bioinformatics 6:27 CrossRefGoogle Scholar
  22. 22.
    Pinkel D, Albertson DG (2005) Array comparative genomic hybridization and its applications in cancer. Nat Genet 37(Suppl):S11–S17 CrossRefGoogle Scholar
  23. 23.
    Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet 23:41–46 CrossRefGoogle Scholar
  24. 24.
    Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale A, Brown PO (2002) Microarray analysis reveals a major direct role of DNA copy number alternation in the transcriptional program of human breast tumors. PNAS 99:12963–12968 CrossRefGoogle Scholar
  25. 25.
    Pollard D (1990) Empirical processes: theory and applications. Institute of Mathematical Statistics, Hayward zbMATHGoogle Scholar
  26. 26.
    Purdom E, Holmes SP (2005) Error distribution for gene expression data. Stat Appl Genet Mol Biol 4(1):1070 MathSciNetGoogle Scholar
  27. 27.
    Rueda OM, Diaz-Uriarte R (2006) A flexible statistical method for detecting genomic copy-number changes using Hidden Markov Models with reversible jump MCMC. COBRA preprint series Google Scholar
  28. 28.
    Scherer SW, Lee C, Birney E, Altshuler D, Eichler EE, Carter N, Hurles M, Feuk L (2007) Challenges and standards in integrating surveys of structural variation. Nat Genet 39:S7–S15 CrossRefGoogle Scholar
  29. 29.
    Shah SP, Xuan X, Deleeuw RJ, Khojasteh M, Lam WL, Ng R, Murphy KP (2006) Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics 22(14):e431–e439 CrossRefGoogle Scholar
  30. 30.
    Tian L, Cai T, Goetghebeur E, Wei LJ (2007) Model evaluation based on the distribution of estimated absolute prediction error. Biometrika 94:297–311 zbMATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Veltman JA, Fridlyand J, Pejavar S, Olshen AB, JKorkola JE, DeVries S, Carroll P, Kuo WL, Pinkel D, Albertson D, Cordon-Cardo C, Jain AN, Waldman FM (2003) Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. Cancer Res 63(11):2872–2880 Google Scholar
  32. 32.
    Wang P, Young K, Pollack J, Narasimham B, Tibshirani R (2005) A method for calling gains and losses in array CGH data. Biostatistics 6:45–58 zbMATHCrossRefGoogle Scholar
  33. 33.
    Weiss MM, Snijders AM, Kuipers EJ, Ylstra B, Pinkel D, Meuwissen SGM, Van Diest PJ, Albertson DG, Meijer GA (2003) Determination of amplicon boundaries at 20q13.2 in tissue samples of human gastric adenocarcinomas by high-resolution microarray comparative genomic hybridization. J Pathol 200:320–326 CrossRefGoogle Scholar
  34. 34.
    Willenbrock H, Fridlyand J (2005) A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics 21:4084–4091 CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2010

Authors and Affiliations

  • Samuel McDaniel
    • 1
  • Jessica Minnier
    • 2
  • Rebecca A. Betensky
    • 2
  • Gayatry Mohapatra
    • 3
  • Yiping Shen
    • 4
  • James F. Gusella
    • 4
  • David N. Louis
    • 3
  • Tianxi Cai
    • 2
  1. 1.Department of MathematicsThe University of the West IndiesMonaJamaica
  2. 2.Department of BiostatisticsHarvard UniversityBostonUSA
  3. 3.Molecular Neuro-Oncology and Pathology LaboratoriesMassachusetts General HospitalCharlestownUSA
  4. 4.Center for Human Genetic ResearchMassachusetts General HospitalBostonUSA

Personalised recommendations