Analysis of Array Data and Clinical Validation of Array-Based Assays

  • Benjamin Haibe-Kains
  • John Quackenbush


High-throughput array-based assays have been widely used for diagnostics and biomarker discovery. However, the development of these assays requires analysis of high-dimensional genomic data and clinical validation of the resulting models, which remain challenging. In this chapter, we describe all the steps of array-based data analysis from data quality control and normalization to higher-level analyses such as clustering, dimensionality reduction, and predictive modeling, with special emphasis on the pitfalls and dangers of such analyses. We then tackle the problems related to clinical validation of array-based biomarkers and predictive assays, which include reproducibility and portability of the initial discovery and its translation into clinics. As array-based data, and those generated by the next-generation sequencing technologies, become less expensive to produce and more widely available, the growing number of patients for whom we have genomic data will open new opportunities for development of robust and reliable genomic biomarkers—provided we apply the lessons we have learned from this last decade of array-based studies.


Root Mean Square Error Feature Selection Receive Operating Characteristic Curve Performance Criterion Independent Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Affymetrix (2004) GeneChip expression analysis: data analysis fundamentals, vol 2447, pp 1–42. doi: 10.1002/jnr.10268
  2. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511PubMedCrossRefGoogle Scholar
  3. Allison PD, Inc. SI (eds) (1995) Survival analysis using SAS: a practical guide. SAS Institute Inc., Cary, NCGoogle Scholar
  4. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750PubMedCrossRefGoogle Scholar
  5. Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97(18):10101–10106. doi: 97/18/10101 [pii]PubMedCrossRefGoogle Scholar
  6. Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99(10):6562–6566. doi: 10.1073/pnas.102102699 PubMedCrossRefGoogle Scholar
  7. Bach FR, Jordan MI (2003) Kernel independent component analysis. J Mach Learn Res 3:1–48Google Scholar
  8. Bair E, Tibshirani R (2004) Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2(4):511–522CrossRefGoogle Scholar
  9. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R (2005) NCBI GEO: mining millions of expression profiles—database and tool. Nucleic Acids Res 33:D562PubMedCrossRefGoogle Scholar
  10. Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni MD, Orringer MB, Hanash S (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–824PubMedGoogle Scholar
  11. Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. Proc Pac Symp Biocomput 7:6–17Google Scholar
  12. Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS (2004) Adjustment of systematic microarray data biases. Bioinformatics 20(1):105–114PubMedCrossRefGoogle Scholar
  13. Berrer DP, Dubitzky W, Granzow M (2002) A practical approach to microarray data analysis, 1st edn. Springer, New YorkGoogle Scholar
  14. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Golub TR, Sugarbaker DJ, Meyerson M (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98(24):13790–13795PubMedCrossRefGoogle Scholar
  15. Bishop CM, Jordan M, Kleinberg J, Scholkopf B (eds) (2006) Pattern recognition and machine learning information science and statistics. Springer, New YorkGoogle Scholar
  16. Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S, Quackenbush J, Yeatman TJ (2004) Multi-platform, multi-site, microarray-based human tumor classification. Am J Pathol 164(1):9–16PubMedCrossRefGoogle Scholar
  17. Bolstad BM (2004) Low-level analysis of high-density oligonucleotide array data: background normalization and summarization. University of California, BerkeleyGoogle Scholar
  18. Bolstad BM, Irizarry RA, Astrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193PubMedCrossRefGoogle Scholar
  19. Boulesteix AL, Porzelius C, Daumer M (2008) Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24(15):1698–1706. doi:btn262 [pii] 10.1093/bioinformatics/btn262 PubMedCrossRefGoogle Scholar
  20. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New YorkGoogle Scholar
  21. Bylesjo M, Eriksson D, Sjodin A, Jansson S, Moritz T, Trygg J (2007) Orthogonal projections to latent structures as a strategy for microarray data normalization. BMC Bioinformatics 8:207. doi: 1471-2105-8-207 [pii]10.1186/1471-2105-8-207PubMedCrossRefGoogle Scholar
  22. Cardoso F, Piccart-Gebhart M, Van’t Veer L, Rutgers E (2007) The MINDACT trial: the first prospective clinical validation of a genomic tool. Mol Oncol 1(3):246–251. doi:S1574-7891(07)00077-4 [pii] 10.1016/j.molonc.2007.10.004 PubMedCrossRefGoogle Scholar
  23. Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. Paper presented at the ACM SIGKDD international conference on Knowledge discovery and data mining, New YorkGoogle Scholar
  24. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C (2011) Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6(2):e17238. doi: 10.1371/journal.pone.0017238 PubMedCrossRefGoogle Scholar
  25. Cheng Y, Church GM (2000) Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 8:93–103PubMedGoogle Scholar
  26. Cobleigh MA, Tabesh B, Bitterman P, Baker J, Cronin M, Liu ML, Borchik R, Mosquera JM, Walker MG, Shak S (2005) Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes. Clin Cancer Res 11(24 Pt 1):8623–8631. doi:11/24/8623 [pii] 10.1158/1078-0432.CCR-05-0735 PubMedCrossRefGoogle Scholar
  27. Collobert R, Bengio S (2001) SVMTorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160Google Scholar
  28. Contopoulos-Ioannidis DG, Alexiou GA, Gouvias TC, Ioannidis JP (2008) Medicine. Life cycle of translational research for medical interventions. Science 321(5894):1298–1299. doi:321/5894/1298 [pii] 10.1126/science.1160622 PubMedCrossRefGoogle Scholar
  29. Cox DR (1972) Regression models and life tables. J R Stat Soc Ser B 34:187–220Google Scholar
  30. Cristianini N, Press CCU, Shawe-Taylor J (eds) (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  31. Dasarathy BV (ed) (1990) Nearest neighbor: pattern classification techniques. IEEE Computer Society Press, New YorkGoogle Scholar
  32. Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Kuffner R, Zimmer R (2006) Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19):2356–2363. doi: 10.1093/bioinformatics/btl400 PubMedCrossRefGoogle Scholar
  33. De Smet F, Mathys J, Marchal K, Thijs G, De Moor B, Moreau Y (2002) Adaptive quality-based clustering of gene expression profiles. Bioinformatics 18(5):735–746PubMedCrossRefGoogle Scholar
  34. de Souto M, Costa I, de Araujo D, Ludermir T, Schliep A (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinformatics 9(1):497. doi: 10.1186/1471-2105-9-497 PubMedCrossRefGoogle Scholar
  35. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14(4):457–460PubMedCrossRefGoogle Scholar
  36. Desmedt C, Piette F, Loi SM, Wang Y, Lallemand F, Haibe-Kains B, Viale G, Delorenzi M, Zhang Y, d’Assignies MS, Bergh J, Lidereau R, Ellis P, Harris AL, Klijn JGM, Foekens JA, Cardoso F, Piccart MJ, Buyse M, Sotiriou C, Consortium T (2007) Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res 13(11):3207–3214PubMedCrossRefGoogle Scholar
  37. Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, Delorenzi M, Piccart M, Sotiriou C (2008) Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res 14(16):5158–5165. doi: 10.1158/1078-0432.CCR-07-4756 PubMedCrossRefGoogle Scholar
  38. Duda RO, Hart PR, Stork DG (2001) Pattern classification. Wiley, New YorkGoogle Scholar
  39. Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87CrossRefGoogle Scholar
  40. Dupuy A, Simon RM (2007) Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst 99(2):147–157. doi: 10.1093/jnci/djk018 PubMedCrossRefGoogle Scholar
  41. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499CrossRefGoogle Scholar
  42. Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. PNAS 95:14863–14868PubMedCrossRefGoogle Scholar
  43. Eng-Wong J, Zujewski JA (2008) Current NCI-sponsored cooperative group trials of endocrine therapies in breast cancer. Cancer 112(3 Suppl):723–729. doi: 10.1002/cncr.23188 PubMedCrossRefGoogle Scholar
  44. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, Chen H, Omeroglu G, Meterissian S, Omeroglu A, Hallett M, Park M (2008) Stromal gene expression predicts clinical outcome in breast cancer. Nat Med 14(5):518–527. doi: 10.1038/nm1764 PubMedCrossRefGoogle Scholar
  45. Fisher RA (2011) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188CrossRefGoogle Scholar
  46. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631. doi: 10.1198/016214502760047131 CrossRefGoogle Scholar
  47. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914PubMedCrossRefGoogle Scholar
  48. Gamberger D, Lavrac N (2004) Avoiding data overfitting in scientific discovery: experiments in functional genomics. Paper presented at the ECAI, 22–27 Aug 2004, Valencia, SpainGoogle Scholar
  49. Gentleman R (2005) Reproducible research: a bioinformatics case study. Stat Appl Genet Mol Biol 4(1)Google Scholar
  50. Gentleman R, Huber W, Carey VJ, Irizarry RA, Dudoit S (2005) Bioinformatics and computational biology solutions using R and bioconductor. Springer, New YorkCrossRefGoogle Scholar
  51. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537PubMedCrossRefGoogle Scholar
  52. Gui J, Li H (2005) Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics 21(13):3001–3008. doi: 10.1093/bioinformatics/bti422 PubMedCrossRefGoogle Scholar
  53. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182Google Scholar
  54. Habel LA, Shak S, Jacobs MK, Capra A, Alexander C, Pho M, Baker J, Walker M, Watson D, Hackett J, Blick NT, Greenberg D, Fehrenbacher L, Langholz B, Quesenberry CP (2006) A population-based study of tumor gene expression and risk of breast cancer death among lymph node-negative patients. Breast Cancer Res 8(3):R25. doi:bcr1412 [pii] 10.1186/bcr1412 PubMedCrossRefGoogle Scholar
  55. Haibe-Kains B, Desmedt C, Sotiriou C, Bontempi G (2008) A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all? Bioinformatics 24(19):2200–2208. doi: 10.1093/bioinformatics/btn374 PubMedCrossRefGoogle Scholar
  56. Haibe-Kains B, Desmedt C, Loi SM, Culhane AC, Bontempi G, Quackenbush J, Sotiriou C (2012) A three-gene model to robustly identify breast cancer molecular subtypes. J Natl Cancer Inst 104(4):311–325. doi: 10.1093/jnci/djr545 PubMedCrossRefGoogle Scholar
  57. Harr B, Schlotterer C (2006) Comparison of algorithms for the analysis of Affymetrix microarray data as evaluated by co-expression of genes in known operons. Nucleic Acids Res 34(2):8CrossRefGoogle Scholar
  58. Harrell FJ, Lee K, Mark D (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15(4):361–387. doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4PubMedCrossRefGoogle Scholar
  59. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, LondonGoogle Scholar
  60. Hastie T, Bickel P, Tibshirani R, Diggle P, Friedman J, Fienberg S, Gather U, Otkin I, Zeger S (eds) (2001) The elements of statistical learning statistics. Springer, New YorkGoogle Scholar
  61. Heagerty PJ, Lumley T, Pepe MS (2000) Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56:337–344PubMedCrossRefGoogle Scholar
  62. Hu H, Li J-Y, Wang H, Daggard G, Wang L-Z (2008) Robustness analysis of diversified ensemble decision tree algorithms for Microarray data classification. Paper presented at the 2008 International Conference on Machine Learning and Cybernetics (ICMLC), Kunming, 12–15 Jul 2008Google Scholar
  63. Huber W, von Heydebreck A, Sultman H, Poustka A, Vingron M (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18(1):S96–S104PubMedCrossRefGoogle Scholar
  64. Irizarry RA, Boldstad BM, Collin F, Cope LM, Hobbs B, Speed TR (2003a) Summaries of affymetrix GeneChip probe level data. Nucleic Acids Res 31(4)Google Scholar
  65. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP (2003b) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics (Oxford, England) 4(2):249–264. doi: 10.1093/biostatistics/4.2.249 Google Scholar
  66. Jin R, Si L, Chan C (2008) A Bayesian framework for knowledge driven regression model in micro-array data analysis. Int J Data Min Bioinform 2(3):250–267PubMedCrossRefGoogle Scholar
  67. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. doi: 10.1093/biostatistics/kxj037 PubMedCrossRefGoogle Scholar
  68. Jolliffe IT, Jolliffe IT (eds) (2002) Principal component analysis. Springer series in statistics. Springer, New YorkGoogle Scholar
  69. Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:451–457Google Scholar
  70. Kelemen A, Zhou H, Lawhead P, Liang Y (2003) Naive Bayesian classifier for microarray data. In: 2003 International joint conference on neural networks, vol 3, pp 1769–1773. Paper presented at the 2003 international joint conference on neural networks, IEEE. doi: 10.1109/IJCNN.2003.1223675
  71. Kelley RK, Wang G, Venook AP (2011) Biomarker use in colorectal cancer therapy. J Natl Compr Canc Netw 9(11):1293–1302. doi: 9/11/1293 [pii]PubMedGoogle Scholar
  72. Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, Smith PD, Jiang Y, Gooden GC, Trent JM, Meltzer PS (1998) Gene expression profiling of alveolar rhabdomyosarcoma with cDNA microarrays. Cancer Res 58(22):5009–5013PubMedGoogle Scholar
  73. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRefGoogle Scholar
  74. Kohli-Laven N, Bourret P, Keating P, Cambrosio A (2011) Cancer clinical trials in the era of genomic signatures: biomedical innovation, clinical utility, and regulatory-scientific hybrids. Soc Stud Sci 41(4):487–513PubMedCrossRefGoogle Scholar
  75. Lee JW, Lee JB, Park M, Song SH (2005) An extensive comparison of recent classification tools applied to microarray data. Computational Statistics & Data Analysis 48(4):869–885. doi:  10.1016/j.csda.2004.03.017 Google Scholar
  76. Lehmann EL, Caselia G (1998) Theory of point estimation, 2nd edn. Springer, New YorkGoogle Scholar
  77. Leisch F (2002) Sweave. Dynamic generation of statistical reports using literate data analysis. In: Computational statistics, vol 69, pp 575–580. Presented at the computational statistics, SFB adaptive information systems and modelling in economics and management science, WU Vienna University of Economics and Business.
  78. Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2(8):1–11Google Scholar
  79. Lipshutz RJ, Morris D, Chee M, Hubbell E, Kozal MJ, Shah N, Shen N, Yang R, Fodor SP (1995) Using oligonucleotide probe arrays to access genetic diversity. Biotechniques 19(3):442–447PubMedGoogle Scholar
  80. Loi SM, Haibe-Kains B, Desmedt C, Wirapati P, Lallemand F, Tutt AM, Gillet C et al (2008) Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9:239. doi: 10.1186/1471-2164-9-239 PubMedCrossRefGoogle Scholar
  81. Loi SM, Haibe-Kains B, Desmedt C, Wirapati P, Lallemand F, Tutt AM, Gillet C, Ellis P, Ryder K, Reid JF, Daidone MG, Pierotti MA, Berns EM, Jansen MP, Foekens JA, Delorenzi M, Bontempi G, Piccart MJ, Sotiriou C (2008) Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics 9:239. doi: 10.1186/1471-2164-9-239 PubMedCrossRefGoogle Scholar
  82. Loi SM, Haibe-Kains B, Majjaj S, Lallemand F, Durbecq V, Larsimont D, Gonzalez-Angulo AM, Pusztai L, Symmans WF, Bardelli A, Ellis P, Tutt ANJ, Gillett CE, Hennessy BT, Mills GB, Phillips WA, Piccart MJ, Speed TP, McArthur GA, Sotiriou C (2010) PIK3CA mutations associated with gene signature of low mTORC1 signaling and better outcomes in estrogen receptor-positive breast cancer. Proc Natl Acad Sci USA 107(22):10208–10213. doi: 10.1073/pnas.0907011107 PubMedCrossRefGoogle Scholar
  83. Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J (2010) A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J 10(4):278–291. doi:tpj201057 [pii] 10.1038/tpj.2010.57 PubMedCrossRefGoogle Scholar
  84. Mamounas E, Budd GT, Miller K (2008) Incorporating the oncotype DX breast cancer assay into community practice: an expert Q and A and case study sampling. Clin Adv Hematol Oncol 6(2):s1–s8PubMedGoogle Scholar
  85. Manilich EA, Ozsoyoglu ZM, Trubachev V, Radivoyevitch T (2011) Classification of large microarray datasets using fast random forest construction. J Bioinform Comput Biol 9(2):251–267. doi: [pii] S021972001100546X PubMedCrossRefGoogle Scholar
  86. Marchionni L, Wilson RF, Marinopoulos SS, Wolff AC, Parmigiani G, Bass EB, Goodman SN (2007) Impact of gene expression profiling tests on breast cancer outcomes. Evid Rep Technol Assess (Full Rep) 160:1–105Google Scholar
  87. Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation. Q J R Meteorol Soc 128(584):2145–2166. doi: 10.1256/003590002320603584 CrossRefGoogle Scholar
  88. Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405(2):442–451PubMedGoogle Scholar
  89. McCall MN, Bolstad BM, Irizarry RA (2010) Frozen robust multiarray analysis (fRMA). Biostatistics (Oxford, England) 11(2):242–253. doi: 10.1093/biostatistics/kxp059 Google Scholar
  90. McCall MN, Murakami PN, Lukk M, Huber W, Irizarry RA (2011a) Assessing affymetrix GeneChip microarray quality. BMC Bioinformatics 12:137. doi:1471-2105-12-137 [pii] 10.1186/1471-2105-12-137 PubMedCrossRefGoogle Scholar
  91. McCall MN, Uppal K, Jaffee HA, Zilliox MJ, Irizarry RA (2011b) The gene expression barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes. Nucleic Acids Res 39(Database issue):D1011–D1015. doi: gkq1259 [pii]PubMedCrossRefGoogle Scholar
  92. Mesirov JP (2010) Computer science accessible reproducible research. Science 327(5964):415–416. doi:327/5964/415 [pii] 10.1126/science.1179653 PubMedCrossRefGoogle Scholar
  93. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492PubMedCrossRefGoogle Scholar
  94. Moch H, Schraml P, Bubendorf L, Mirlacher M, Kononen J, Gasser T, Mihatsch MJ, Kallioniemi OP, Sauter G (1999) Identification of prognostic parameters for renal cell carcinoma by cDNA arrays and cell chips. Verh Dtsch Ges Pathol 83:225–232PubMedGoogle Scholar
  95. Mook S, van’t Veer LJ, Rutgers EJ, Piccart-Gebhart MJ, Cardoso F (2007) Individualization of therapy using mammaprint: from development to the MINDACT Trial. Cancer Genomics Proteomics 4(3):147–155PubMedGoogle Scholar
  96. Natsoulis G, El Ghaoui L, Lanckriet GRG, Tolley AM, Leroy F, Dunlea S, Eynon BP, Pearson CI, Tugendreich S, Jarnagin K (2005) Classification of a large microarray data set: algorithm comparison and analysis of drug signatures. Genome Res 15(5):724–736. doi: 10.1101/gr.2807605 PubMedCrossRefGoogle Scholar
  97. Nepomuceno-Chamorro I, Azuaje F, Devaux Y, Nazarov PV, Muller A, Aguilar-Ruiz JS, Wagner DR (2011) Prognostic transcriptional association networks: a new supervised approach based on regression trees. Bioinformatics 27(2):252–258. doi:btq645 [pii] 10.1093/bioinformatics/btq645 PubMedCrossRefGoogle Scholar
  98. Onitilo AA, Engel JM, Greenlee RT, Mukesh BN (2009) Breast cancer subtypes based on ER/PR and Her2 expression: comparison of clinicopathologic features and survival. Clin Med Res 7(1–2):4–13. doi: 10.3121/cmr.2009.825 PubMedCrossRefGoogle Scholar
  99. Osorio YFJ, Prina E, Lang T, Milon G, Davory C, Coppee JY, Regnault B (2008) AffyGCQC: a web-based interface to detect outlying genechips with extreme studentized deviate tests. J Bioinform Comput Biol 6(2):317–334. doi: S0219720008003400 [pii]CrossRefGoogle Scholar
  100. Paik S, Shak S, Tang G, Kim C, Bakker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, Hiller W, Fisher ER, Wickerham DL, Bryant J, Wolmark N (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351(27):2817–2826PubMedCrossRefGoogle Scholar
  101. Pang S, Havukkala I, Hu Y, Kasabov N (2007) Classification consistency analysis for bootstrapping gene selection. Neural Comput Appl 18(6):527–539Google Scholar
  102. Park MY, Hastie T (2007) L1 regularization path algorithm for generalized linear models. J R Stat Soc 69:659–677CrossRefGoogle Scholar
  103. Parkinson H, Sarkans U, Shojatalab M, Contrino S, Coulson R, Farne A, Lara GG, Holloway E, Kapushesky M, Lilja P, Mukherjee G, Oezcimen A, Rayner T, Rocca-Serra P, Sharma A, Sansone S, Brazma A (2005) ArrayExpress: a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33:D553–D555PubMedCrossRefGoogle Scholar
  104. Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L, Oberthuer A, Fischer M, Tong W, Wang MD (2010) k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction. Pharmacogenomics J 10(4):292–309. doi: 10.1038/tpj.2010.56 PubMedCrossRefGoogle Scholar
  105. Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC, Lashkari D, Shalon D, Brown PO, Botstein D (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA 96(16):9212–9217PubMedCrossRefGoogle Scholar
  106. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale A-L, Brown PO, Botstein D (2000) Molecular portraits of human breast tumours. Nature 406(6797):747–752. doi: 10.1038/35021093 PubMedCrossRefGoogle Scholar
  107. Phuong TM, Lee D, Lee KH (2004) Regression trees for regulatory element identification. Bioinformatics 20(5):750–757. doi: 10.1093/bioinformatics/btg480 btg480 [pii]PubMedCrossRefGoogle Scholar
  108. Ploner A, Miller LD, Hall P, Bergh J, Pawitan Y (2005) Correlation test to assess low-level processing of high-density oligonucletide microarray data. BMC Bioinformatics 6(80):1–20Google Scholar
  109. Pomeroy SL, Tamayo P, Gaasenbeek M, Sturla LM, Angelo M, McLaughlin ME, Kim JY, Goumnerova LC, Black PM, Lau C, Allen JC, Zagzag D, Olson JM, Curran T, Wetmore C, Biegel JA, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis DN, Mesirov JP, Lander ES, Golub TR (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442PubMedCrossRefGoogle Scholar
  110. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T, Gerald W, Loda M, Lander ES, Golub TR (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci USA 98(26):15149–15154PubMedCrossRefGoogle Scholar
  111. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38(5):500–501. doi: 10.1038/ng0506-500 PubMedCrossRefGoogle Scholar
  112. Rifkin R, Klautau A (2004) In defense of One-Vs-All classification. J Mach Learn Res 5(1):101–141Google Scholar
  113. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24(3):227–235PubMedCrossRefGoogle Scholar
  114. Ross JS, Hatzis C, Symmans WF, Pusztai L, Hortobagyi GN (2008) Commercialized multigene predictors of clinical outcome for breast cancer. Oncologist 13(5):477–493. doi:13/5/477 [pii] 10.1634/theoncologist.2007-0248 PubMedCrossRefGoogle Scholar
  115. Royston P, Sauerbrei W (2004) A new measure of prognostic separation in survival data. Stat Med 23(5):723–748. doi: 10.1002/sim.1621 PubMedCrossRefGoogle Scholar
  116. Sarder P, Schierding W, Cobb JP, Nehorai A (2010) Estimating sparse gene regulatory networks using a bayesian linear regression. IEEE Trans Nanobioscience 9(2):121–131. doi: 10.1109/TNB.2010.2043444 PubMedCrossRefGoogle Scholar
  117. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235):467–470PubMedCrossRefGoogle Scholar
  118. Schumacher M, Binder H, Gerds TA (2007) Assessment of survival prediction models based on microarray data. Bioinformatics 23(14):1768–1774PubMedCrossRefGoogle Scholar
  119. Sheng Q, Moreau Y, De Moor B (2003) Biclustering microarray data by Gibbs sampling. Bioinformatics 19(Suppl 2):ii196–ii205. doi: 10.1093/bioinformatics/btg1078 PubMedCrossRefGoogle Scholar
  120. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu TM, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan XH, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li QZ, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W Jr (2006) The MicroArray quality control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161PubMedCrossRefGoogle Scholar
  121. Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L, Shaughnessy JD Jr, Oberthuer A, Thomas RS, Paules RS, Fielden M, Barlogie B, Chen W, Du P, Fischer M, Furlanello C, Gallas BD, Ge X, Megherbi DB, Symmans WF, Wang MD, Zhang J, Bitter H, Brors B, Bushel PR, Bylesjo M, Chen M, Cheng J, Chou J, Davison TS, Delorenzi M, Deng Y, Devanarayan V, Dix DJ, Dopazo J, Dorff KC, Chou J, Davison TS, Delorenzi M, Deng Y, Devanarayan V, Dix DJ, Dopazo J, Dorff KC, Elloumi F, Fan J, Fan S, Fan X, Fang H, Gonzaludo N, Hess KR, Hong H, Huan J, Irizarry RA, Judson R, Juraeva D, Lababidi S, Lambert CG, Li L, Li Y, Li Z, Lin SM, Liu G, Lobenhofer EK, Luo J, Luo W, McCall MN, Nikolsky Y, Pennello GA, Perkins RG, Philip R, Popovici V, Price ND, Qian F, Scherer A, Shi T, Shi W, Sung J, Thierry-Mieg D, Thierry-Mieg J, Thodima V, Trygg J, Vishnuvajjala L, Wang SJ, Wu J, Wu Y, Xie Q, Yousef WA, Zhang L, Zhang X, Zhong S, Zhou Y, Zhu S, Arasappan D, Bao W, Lucas AB, Berthold F, Brennan RJ, Buness A, Catalano JG, Chang C, Chen R, Cheng Y, Cui J, Czika W, Demichelis F, Deng X, Dosymbekov D, Eils R, Feng Y, Fostel J, Fulmer-Smentek S, Fuscoe JC, Gatto L, Ge W, Goldstein DR, Guo L, Halbert DN, Han J, Harris SC, Hatzis C, Herman D, Huang J, Jensen RV, Jiang R, Johnson CD, Jurman G, Kahlert Y, Khuder SA, Kohl M, Li J, Li M, Li QZ, Li S, Liu J, Liu Y, Liu Z, Meng L, Madera M, Martinez-Murillo F, Medina I, Meehan J, Miclaus K, Moffitt RA, Montaner D, Mukherjee P, Mulligan GJ, Neville P, Nikolskaya T, Ning B, Page GP, Parker J, Parry RM, Peng X, Peterson RL, Phan JH, Quanz B, Ren Y, Riccadonna S, Roter AH, Samuelson FW, Schumacher MM, Shambaugh JD, Shi Q, Shippy R, Si S, Smalter A, Sotiriou C, Soukup M, Staedtler F, Steiner G, Stokes TH, Sun Q, Tan PY, Tang R, Tezak Z, Thorn B, Tsyganova M, Turpaz Y, Vega SC, Visintainer R, von Frese J, Wang C, Wang E, Wang J, Wang W, Westermann F, Willey JC, Woods M, Wu S, Xiao N, Xu J, Xu L, Yang L, Zeng X, Zhang M, Zhao C, Puri RK, Scherf U, Tong W, Wolfinger RD, Consortium M (2010) The MicroArray quality control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28(8):827–838. doi: nbt.1665 [pii]PubMedCrossRefGoogle Scholar
  122. Simon R (2003) Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data. Br J Cancer 89:1599–1604PubMedCrossRefGoogle Scholar
  123. Slodkowska EA, Ross JS (2009) MammaPrint 70-gene signature: another milestone in personalized medical care for breast cancer patients. Expert Rev Mol Diagn 9(5):417–422. doi: 10.1586/erm.09.32 PubMedCrossRefGoogle Scholar
  124. Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Eystein Lonning P, Borresen-Dale AL (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98(19):10869–10874PubMedCrossRefGoogle Scholar
  125. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geister S, Demeter J, Perou C, Lonning PE, Brown PO, Borresen-Dale A-L, Botstein D (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA 1(14):8418–8423CrossRefGoogle Scholar
  126. Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Buyse M, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M (2006) Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98(4):262–272PubMedCrossRefGoogle Scholar
  127. Steel RGD, Torrie JH (1980) Principles and procedures of statistics. McGraw Hill, New YorkGoogle Scholar
  128. Straver ME, Glas AM, Hannemann J, Wesseling J, van de Vijver MJ, Rutgers EJ, Vrancken Peeters MJ, van Tinteren H, van’t Veer LJ, Rodenhuis S (2010) The 70-gene signature as a response predictor for neoadjuvant chemotherapy in breast cancer. Breast Cancer Res Treat 119(3):551–558. doi: 10.1007/s10549-009-0333-1 PubMedCrossRefGoogle Scholar
  129. Sugar C (1998) Techniques for clustering and classification with applications to medical problems. Doctoral Thesis, Stanford UniversityGoogle Scholar
  130. Suzuki K (ed) (2011) Artificial neural networks—methodological advances and biomedical applications. Artifical Neural Network Intech, CroatiaGoogle Scholar
  131. Sweets JA (1988) Measuring the accuracy of diagnostic systems. Science 240(4857):1285–1293CrossRefGoogle Scholar
  132. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR (1999) Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6):2907–2912PubMedCrossRefGoogle Scholar
  133. Taylor JS, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New YorkCrossRefGoogle Scholar
  134. The Cancer Letter (2011) Duke accepts potti resignation; retraction process initiated with nature medicine.
  135. Therneau TM, Gail M, Grambsch PM, Krickeberg K, Samet JM, Tsiatis A, Wong W (eds) (2000) Modeling survival data: extending the Cox model. Statistics for biology and health. Springer, New York. doi: 10.1002/sim.956 Google Scholar
  136. Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395. doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3 [pii]PubMedCrossRefGoogle Scholar
  137. Tibshirani R (2001) Regression shrinkage and selection via the lasso. J Royal Statist Soc B 58(1):1267–1288Google Scholar
  138. Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14(3):511–528CrossRefGoogle Scholar
  139. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423. doi: 10.1111/1467-9868.00293 CrossRefGoogle Scholar
  140. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA 99(10):6567–6572. doi: 10.1073/pnas.082099299 99/10/6567 [pii]PubMedCrossRefGoogle Scholar
  141. Tseng GC, Wong WH (2005) Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 61(1):10–16. doi: 10.1111/j.0006-341X.2005.031032.x PubMedCrossRefGoogle Scholar
  142. UIshwaran H, Kogalur U, Blackstone E, Lauer M (2008) Random survival forests. Ann Appl Stat 2(3):841–860CrossRefGoogle Scholar
  143. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30(10):1105–1117. doi: 10.1002/sim.4154 PubMedCrossRefGoogle Scholar
  144. Van Belle V, Pelckmans K, Van Huffel S, Suykens JA (2011a) Improved performance on high-dimensional survival data by application of survival-SVM. Bioinformatics 27(1):87–94. doi:btq617 [pii] 10.1093/bioinformatics/btq617 PubMedCrossRefGoogle Scholar
  145. Van Belle V, Pelckmans K, Van Huffel S, Suykens JA (2011b) Support vector methods for survival analysis: a comparison between ranking and regression approaches. Artif Intell Med 53(2):107–118. doi: 10.1016/j.artmed.2011.06.006S0933-3657(11)00076-5 [pii]PubMedCrossRefGoogle Scholar
  146. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347(25):1999–2009PubMedCrossRefGoogle Scholar
  147. van der Laan MJ, Pollard KS, Bryan J (2003) A new partitioning around medoids algorithm. J Stat Comput Simulat 73(8):575–584CrossRefGoogle Scholar
  148. van Houwelingen H, Bruinsma T, Hart AA, van’t Veer LJ, Wessels LFA (2006) Cross-validated Cox regression on microarray gene expression data. Stat Med 25:3201–3216PubMedCrossRefGoogle Scholar
  149. van Rijsbergen C (1979) Information retrieval, 2nd edn. Butterworths, LondonGoogle Scholar
  150. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536CrossRefGoogle Scholar
  151. Verweij PJM, van Houwelingen JC (1993) Cross-validation in survival analysis. Stat Med 12:2305–2314PubMedCrossRefGoogle Scholar
  152. Wang Y, Klijn JGM, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, Jatkoe T, Berns EMJJ, Atkins D, Foekens JA (2005) Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365(9460):671–679. doi: 10.1016/S0140-6736(05)17947-1 PubMedGoogle Scholar
  153. Webb A (2003) Statistical pattern recognition, 2nd edn. Wiley, New YorkGoogle Scholar
  154. Wei JS, Greer BT, Westermann F, Steinberg SM, Son CG, Chen QR, Whiteford CC, Bilke S, Krasnoselsky AL, Cenacchi N, Catchpoole D, Berthold F, Schwab M, Khan J (2004) Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma. Cancer Res 64(19):6883–6891. doi:64/19/6883 [pii] 10.1158/0008-5472.CAN-04-0695 PubMedCrossRefGoogle Scholar
  155. Weiss SM, Kulikowski CA (1991) Computer systems that learn. Morgan Kaufmann, San MateoGoogle Scholar
  156. Welford SM, Gregg J, Chen E, Garrison D, Sorensen PH, Denny CT, Nelson SF (1998) Detection of differentially expressed genes in primary tumor tissues using representational differences analysis coupled to microarray hybridization. Nucleic Acids Res 26(12):3059–3065PubMedCrossRefGoogle Scholar
  157. Wilson CL, Miller CJ (2005) Simpleaffy: a BioConductor package for affymetrix quality control and data analysis. Bioinformatics 21(18):3683–3685PubMedCrossRefGoogle Scholar
  158. Wu Z, Irizarry RA (2004) Preprocessing of oligonucleotide array data. Nat Biotechnol 22:656–658PubMedCrossRefGoogle Scholar
  159. Yeung KY, Bumgarner RE (2003) Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 4(12):R83. doi: 10.1186/gb-2003-4-12-r83 PubMedCrossRefGoogle Scholar
  160. Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35PubMedCrossRefGoogle Scholar
  161. Zhu J, Hastie T (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics 5(3):427–443. doi: 5/3/427 [pii] 10.1093/biostatistics/5.3.427 PubMedCrossRefGoogle Scholar
  162. Zilliox MJ, Irizarry RA (2007) A gene expression bar code for microarray data. Nat Methods 4(11):911–913. doi:nmeth1102 [pii] 10.1038/nmeth1102 PubMedCrossRefGoogle Scholar
  163. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc SerB Stat Methodol 67(2):301–320. doi: 10.1111/j.1467-9868.2005.00503.x CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Biostatistics and Computational BiologyDana-Farber Cancer InstituteBostonUSA
  2. 2.Biostatistics, Harvard School of Public HealthBostonUSA

Personalised recommendations