Handling and Interpreting Gene Groups

  • Nils Blüthgen
  • Szymon M. Kielbasa
  • Dieter Beule


Systems biologists often have to deal with large gene groups obtained from high-throughput experiments, genome-wide predictions, and literature searches. Handling and functional interpretation of these gene groups is rather challenging. Problems arise from redundancies in databases, where a gene is given several names or identifiers, and from falsely assigned genes in the list. Moreover, genes in gene groups obtained by different methods are often represented by different types of identifiers, or are even genes from other model organisms. Thus, research in systems biology requires software tools that help to handle and interpret gene groups.

This chapter will review tools to store and compare gene groups represented by various identifiers. We introduce software that uses Gene Ontology (GO) annotations to infer biological processes associated with the gene groups. Additionally, we review approaches to further analyze gene groups regarding their transcriptional regulation by retrieving and analyzing their putative promoter regions.

Key Words

Gene groups homology promoter analysis GO redundancy functional interpretation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Benson D, Karsch-Mizrachi I, Lipman D, et al. GenBank. Nucleic Acids Res 2005;33:D34–38.PubMedCrossRefGoogle Scholar
  2. 2.
    Wheeler D, Barrett T, Benson D, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2005;33:D39–45.PubMedCrossRefGoogle Scholar
  3. 3.
    Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledge base and its supplement TrEMBL in 2003. Nucleic Acids Res 2003;31:365–370.PubMedCrossRefGoogle Scholar
  4. 4.
    Maglott D, Ostell J, Pruitt K, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 2005;33:D54–58.PubMedCrossRefGoogle Scholar
  5. 5.
    Blüthgen N, Kielbasa SM, Cajavec B, Herzel H. HOMGL-comparing genelists across species and with different accession numbers. Bioinformatics 2004;20:125–126.PubMedCrossRefGoogle Scholar
  6. 6.
    Tullai JW, Schaffer ME, Mullenbrock S, et al. Identification of transcription factor binding sites upstream of human genes regulated by the phosphatidylinositol 3-kinase and MEK/ERK signaling pathways. J Biol Chem 2004;279:20167–20177.PubMedCrossRefGoogle Scholar
  7. 7.
    Cheung K, Hager J, Pan D, et al. KARMA: a web server application for comparing and annotating heterogeneous microarray platforms. Nucleic Acids Res 2004;32:W441–444.PubMedCrossRefGoogle Scholar
  8. 8.
    Veldhoven A, de Lange D, Smid M, et al. Storing, linking, and mining microarray databases using SRS. BMC Bioinformatics 2005;6:192.PubMedCrossRefGoogle Scholar
  9. 9.
    Tsai J, Sultana R, Lee Y, et al. RESOURCERER: a database for annotating and linking microarray resources within and across species. Genome Biol 2001;2:SOFTWARE0002.Google Scholar
  10. 10.
    Wang P, Ding F, Chiang H, et al. ProbeMatchDB—a web database for finding equivalent probes across microarray platforms and species. Bioinformatics 2002;18:488–489.PubMedCrossRefGoogle Scholar
  11. 11.
    Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25:25–29.PubMedCrossRefGoogle Scholar
  12. 12.
    Bard JB, Rhee SY. Ontologies in biology: design, applications and future challenges. Nat Rev Genet 2004;5:213–222.PubMedCrossRefGoogle Scholar
  13. 13.
    Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci 2003;18:71–103.CrossRefGoogle Scholar
  14. 14.
    Blüthgen N, Brand K, Cajavec B, et al. Biological profiling of gene groups utilizing gene ontology. Genome Inform 2005;16:106–115.PubMedGoogle Scholar
  15. 15.
    Draghici S, Khatri P, Martins RP, et al. Global functional profiling of gene expression. Genomics 2003;81:98–104.PubMedCrossRefGoogle Scholar
  16. 16.
    Hosack DA, Dennis G Jr, Sherman BT, et al. Identifying biological themes within lists of genes with EASE. Genome Biol 2003;4:R70.PubMedCrossRefGoogle Scholar
  17. 17.
    Dennis G, Sherman BT, Hosack DA, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003;4:P3.PubMedCrossRefGoogle Scholar
  18. 18.
    Zhong S, Li C, Wong WH. ChipInfo: Software for extracting gene annotation and gene ontology information for microarray analysis. Nucleic Acids Res 2003;31:3483–3486.PubMedCrossRefGoogle Scholar
  19. 19.
    Feng W, Wang G, Zeeberg B, et al. Development of gene ontology tool for biological interpretation of genomic and proteomic data. AMIA Annu Symp Proc 2003;839.Google Scholar
  20. 20.
    Castillo-Davis CI, Hartl DL. GeneMerge-post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 2003;19:891–892.PubMedCrossRefGoogle Scholar
  21. 21.
    Al-Shahrour F, Diaz-Uriarte R, Dopazo J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 2004;20:578–580.PubMedCrossRefGoogle Scholar
  22. 22.
    Beissbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004;20:1464–1465.PubMedCrossRefGoogle Scholar
  23. 23.
    Maere S, Heymans K, Kuiper M. BiNGO: a Cytoscape plug-in to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 2005;21:3448–3449.PubMedCrossRefGoogle Scholar
  24. 24.
    Conesa A, Gotz S, Garcia-Gomez J, et al. Blast2go: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 2005;21:3674–3676.PubMedCrossRefGoogle Scholar
  25. 25.
    Kielbasa S, Blüthgen N, Herzel H. Genome-wide analysis of functions regulated by sets of transcription factors. Proceedings of the German Conference on Bioinformatics. 2004;105–113.Google Scholar
  26. 26.
    Blüthgen N, Kielbasa S, Herzel H. Inferring combinatorial regulation of transcription in silico. Nucleic Acids Res 2005;33:272–279.PubMedCrossRefGoogle Scholar
  27. 27.
    Wasserman W, Fickett J. Identification of regulatory regions which confer muscle-specific gene expression. J Mol Biol 1998;278:167–181.PubMedCrossRefGoogle Scholar
  28. 28.
    Schmid C, Praz V, Delorenzi M, et al. The Eukaryotic Promoter Database EPD: the impact of in silico primer extension. Nucleic Acids Res 2004;32:D82–85.PubMedCrossRefGoogle Scholar
  29. 29.
    Carninci P, Kasukawa T, Katayama S, et al. The transcriptional landscape of the mammalian genome. Science 2005;309:1559–1563.PubMedCrossRefGoogle Scholar
  30. 30.
    Suzuki Y, Yamashita R, Sugano S, Nakai K. DBTSS, DataBase of Transcriptional Start Sites: progress report 2004. Nucleic Acids Res 2004;32:D78–81.PubMedCrossRefGoogle Scholar
  31. 31.
    Birney E, Andrews D, Bevan P, et al. Ensembl 2004. Nucleic Acids Res 2004;32 Database issue:D468–D470.PubMedCrossRefGoogle Scholar
  32. 32.
    Stormo G. DNA binding sites: representation and discovery. Bioinformatics 2000;16:16–23.PubMedCrossRefGoogle Scholar
  33. 33.
    Lawrence CE, Altschul SF, Boguski MS, et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993;262:208–214.PubMedCrossRefGoogle Scholar
  34. 34.
    Roth FR, Hughes JD, Estep PE, Church GM. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnol 1998;16:939–945.CrossRefGoogle Scholar
  35. 35.
    Frith M, Hansen U, Spouge J, Weng Z. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res 2004;32:189–200.PubMedCrossRefGoogle Scholar
  36. 36.
    Bailey TL, Elkan C. Fitting a mixture model by expectation maximisation to discover motifs in biopolymers. In: Proceedings of the International Conference on Intelligence Systems for Molecular Biology. AAAI Press; 1994:28–36.Google Scholar
  37. 37.
    van Helden J, André B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 1998;281:827–842.PubMedCrossRefGoogle Scholar
  38. 38.
    Kielbasa S, Korbel J, Beule D, et al. Combining frequency and positional information to predict transcription factor binding sites. Bioinformatics 2001;17:1019–1026.PubMedCrossRefGoogle Scholar
  39. 39.
    Sandelin A, Alkema W, Engstrom P, et al. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 2004;32 Database issue:D91–D94.PubMedCrossRefGoogle Scholar
  40. 40.
    Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 1996;24:238–241.PubMedCrossRefGoogle Scholar
  41. 41.
    Wingender E, Chen X, Hehl R, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000;28:316–319.PubMedCrossRefGoogle Scholar
  42. 42.
    Matys V, Fricke E, Geffers R, et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003;31:374–378.PubMedCrossRefGoogle Scholar
  43. 43.
    Quandt K, Frech K, Karas H, et al. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res 1995;23:4878–4884.PubMedCrossRefGoogle Scholar
  44. 44.
    Kel A, Gossling E, Reuter I, et al. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003;31:3576–3579.PubMedCrossRefGoogle Scholar
  45. 45.
    Frith M, Fu Y, Yu L, et al. Detection of functional DNA motifs via statistical over-representation. Nucleic Acids Res 2004;32:1372–1381.PubMedCrossRefGoogle Scholar
  46. 46.
    Rahmann S, Müller T, Vingron M. On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2003;2:7.Google Scholar
  47. 47.
    Wasserman W, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004;5:276–287.PubMedCrossRefGoogle Scholar
  48. 48.
    Bussemaker H, Li H, Siggia E. Regulatory element detection using correlation with expression. Nat Genet 2001;27:167–171.PubMedCrossRefGoogle Scholar
  49. 49.
    Caselle M, Di Cunto F, Provero P. Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes. BMC Bioinformatics 2002;3:7.PubMedCrossRefGoogle Scholar
  50. 50.
    Wagner A. A computational genomics approach to the identification of gene networks. Nucleic Acids Res 1997;25:3594–3604.PubMedCrossRefGoogle Scholar
  51. 51.
    Pilpel Y, Sudarsanam P, Church G. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001;29:153–159.PubMedCrossRefGoogle Scholar
  52. 52.
    Frith M, Spouge J, Hansen U, Weng Z. Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res 2002;30:3214–3224.PubMedCrossRefGoogle Scholar
  53. 53.
    Frith M, Li M, Weng Z. Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 2003;31:3666–3668.PubMedCrossRefGoogle Scholar
  54. 54.
    Murakami K, Kojima T, Sakaki Y. Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression. BMC Genomics 2004;5:16.PubMedCrossRefGoogle Scholar
  55. 55.
    Kel-Margoulis O, Romashchenko A, Kolchanov N, et al. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res 2000;28:311–315.PubMedCrossRefGoogle Scholar
  56. 56.
    Dieterich C, Cusack B, Wang H, et al. Annotating regulatory DNA based on man-mouse genomic comparison. Bioinformatics 2002;18Suppl 2 S84–S90.PubMedGoogle Scholar
  57. 57.
    Wasserman W, Palumbo M, Thompson W, et al. Human-mouse genome comparisons to locate regulatory sites. Nat Genet 2000;26:225–228.PubMedCrossRefGoogle Scholar
  58. 58.
    Wang T, Stormo G. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2003;19:2369–2380.PubMedCrossRefGoogle Scholar
  59. 59.
    Lenhard B, Sandelin A, Mendoza L, et al. Identification of conserved regulatory elements by comparative genome analysis. J Biol 2003;2:13.PubMedCrossRefGoogle Scholar
  60. 60.
    Roepcke S, Grossmann S, Rahmann S, Vingron M. T-Reg Comparator: an analysis tool for the comparison of position weight matrices. Nucleic Acids Res 2005;33:W438–441.PubMedCrossRefGoogle Scholar
  61. 61.
    Kielbasa S, Gonze D, Herzel H. Measuring similarities between transcription factor binding sites. BMC Bioinformatics 2005;6:237.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Nils Blüthgen
    • 1
  • Szymon M. Kielbasa
    • 2
  • Dieter Beule
    • 3
  1. 1.Institute of Theoretical BiologyHumboldt UniversityBerlinGermany
  2. 2.Max Planck Institute for Molecular GeneticsComputational Molecular BiologyBerlinGermany
  3. 3.MicroDiscovery GmbHBerlinGermany

Personalised recommendations