Expression and Genetic Variation Databases for Cancer Research

  • Johan Rung
  • Alvis Brazma


The amount of data generated in cancer research is growing rapidly. High-density array-based technologies, such as genome-wide single nucleotide polymorphism (SNP) genotyping and gene expression microarrays, are producing data that is not only larger in size, but also in complexity, with regard to study design and associated meta-data. This chapter discusses how the flood of genomic and transcriptomic data is managed in databases, often by large collaborative consortia developing new approaches in informatics to maximize the availability and utility of data. Genetic variation databases are most often designed for a particular layer of detail, such as single disease-causing variants associated with specific phenotypes; databases for genome-wide variation, both for SNPs and structural variants; and large repositories for complete genome-wide association studies. Gene-expression microarray data is stored in large repositories, and new services have been developed that take advantage of the increasing number and diversity of stored experiments. By associating data with biological information and integrative analysis, it can be transformed from high dimensionality to a summary level that is directly usable by bench biologists.


Data Data Wellcome Trust Case Control Consortium Mendelian Disease Research Research Single Nucleotide Polymorphism Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503–511PubMedCrossRefGoogle Scholar
  2. Ball CA et al (2002) Standards for microarray data. Science 298:539PubMedCrossRefGoogle Scholar
  3. Barrett T et al (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37(Database issue):D885–D890PubMedCrossRefGoogle Scholar
  4. Brazma A et al (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 29:365–371PubMedCrossRefGoogle Scholar
  5. Easton DF et al (2007) Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447:1087–1093PubMedCrossRefGoogle Scholar
  6. Firth HV et al (2009) DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet 84:524–533PubMedCrossRefGoogle Scholar
  7. Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537PubMedCrossRefGoogle Scholar
  8. Homer N et al (2008) Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet 4:e1000167CrossRefGoogle Scholar
  9. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529PubMedCrossRefGoogle Scholar
  10. Hubbard TJP et al (2009) Ensembl 2009. Nucleic Acids Res 37(Database issue):D690–D697CrossRefGoogle Scholar
  11. Iafrate AJ et al (2004) Detection of large-scale variation in the human genome. Nat Genet 36:949–951PubMedCrossRefGoogle Scholar
  12. International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437:1299–1320CrossRefGoogle Scholar
  13. International HapMap Consortium et al (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861CrossRefGoogle Scholar
  14. Kapushesky M et al (2010) Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res 38(Database issue):D690–D698PubMedCrossRefGoogle Scholar
  15. Kauffmann A et al (2009) Importing arrayexpress datasets into R/Bioconductor. Bioinformatics 25:2092–2094PubMedCrossRefGoogle Scholar
  16. Mills RE et al (2006) An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res 16:1182–1190PubMedCrossRefGoogle Scholar
  17. Parkinson H et al (2009) ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 37(Database issue):D868–D872PubMedCrossRefGoogle Scholar
  18. Pleasance ED et al (2010a) A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463:184–190CrossRefGoogle Scholar
  19. Pleasance ED et al (2010b) A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463:191–196CrossRefGoogle Scholar
  20. Rayner TF et al (2006) A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics 7:489PubMedCrossRefGoogle Scholar
  21. Rhodes DR et al (2004) ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia 6:1–6PubMedGoogle Scholar
  22. Sherry ST et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311PubMedCrossRefGoogle Scholar
  23. Thomas G et al (2008) Multiple loci identified in a genome-wide association study of prostate cancer. Nat Genet 40:310–305PubMedCrossRefGoogle Scholar
  24. Thorisson GA, Muilu J, Brookes AJ (2009) Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 10:9–18PubMedCrossRefGoogle Scholar
  25. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–78CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.EMBL—European Bioinformatics InstituteWellcome Trust Genome Campus, HinxtonCambridgeUK

Personalised recommendations