Abstract
High-throughput experiments in biology often produce sets of genes of potential interests. Some of those gene sets might be of considerable size. Therefore, computer-assisted analysis is necessary for the biological interpretation of the gene sets, and for creating working hypotheses, which can be tested experimentally. One obvious way to analyze gene set data is to associate the genes with a particular biological feature, for example, a given pathway. Statistical analysis could be used to evaluate if a gene set is truly associated with a feature. Over the past few years many tools that perform such analysis have been created. In this chapter, using WebGestalt as an example, it will be explained in detail how to associate gene sets with functional annotations, pathways, publication records, and protein domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235), 467–470.
Stoughton, R. B. (2005) Applications of DNA microarrays in biology. Annu. Rev. Biochem. 74, 53–82.
Ashburner, M., Ball, C. A., Blake, J. A., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29.
Bono, H., Nikaido, I., Kasukawa, T., Hayashizaki, Y., and Okazaki, Y. (2003) Comprehensive analysis of the mouse metabolome based on the transcriptome. Genome Res. 13(6B), 1345–1349.
Kanehisa, M., Goto, S., Kawashima, S., Okunu, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res. 32(Database issue), D277–D280.
Lin, B., White, J. T., Lu, W., et al. (2005) Evidence for the presence of disease-perturbed networks in prostate cancer cells by genomic and proteomic analyses: a systems approach to disease. Cancer Res. 65(8), 3081–3091.
Kluger, Y., Tuck, D. P., Chang, J. T., et al. (2004) Lineage specificity of gene expression patterns. Proc. Natl. Acad. Sci. USA 101(17), 6508–6513.
Mi, H., Lazareva-Ulitsky, B., Loo, R., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33(Database issue), D284–D288.
OBO_Team, Open Biomedical Ontologies Foundry. (http://obofoundry.org/).
Ren, B., Robert, F., Wyrick, J. J., et al. (2000) Genome-wide location and function of DNA binding proteins. Science 290(5500), 2306–2309.
Iyer, V. R., Horak, C. E., Scafe, C. S., Bostein, D., Synder, M., and Brown, P. O. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409(6819), 533–538.
Dorschner, M. O., Hawrylycz, M., Humbert, R., et al. (2004) High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods. 1(3), 219–225.
Crawford, G. E., Holt, I. E., Whittle, J., et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16(1), 123–131.
Zhang, B., Kirov, S., and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33, W741–W748.
Zhang, B., Kirov, S., and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33(Web Server issue), W741–W748.
Zapala, M. A., Hovatta, I., Ellison, J. A., et al. (2005) Adult mouse brain gene expression patterns bear an embryologic imprint. Proc. Natl. Acad. Sci. USA 102(29), 10,357–10,362.
Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2005) Entrez Gene: genecentered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58.
Kasprzyk, A., Keefe, D., Smedley, D., et al. (2004) EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 14(1), 160–169.
Kirov, S. A., Peng, X., Baker, E., Schmoyer, D., Zhang, B., and Snoddy, J. (2005) GeneKeyDB: a lightweight, gene-centric, relational database to support data mining environments. BMC Bioinformatics 6(1), 72.
Yusuf, D., Lim, J. S., and Wasserman, W. W. (2005) The Gene Set Builder: collation, curation, and distribution of sets of genes. BMC Bioinformatics 6, 305.
Kirov, S. A., Peng, X., Baker, E., Schmoyer, D., Zhang, B., and Snoddy, J. (2005) GeneKeyDB: A lightweight, gene-centric, relational database to support data mining environments. BMC Bioinformatics 6.
Beissbarth, T. and Speed, T. P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20(9), 1464–1465.
Shah, N. H. and Fedoroff, N. V. (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics 20(7), 1196–1197.
Martin, D., Brun, C., Remy, E., Mouren, O., Thiaffry, D., and Jacq, B. (2004) GOToolBox: functional analysis of gene data sets based on Gene Ontology. Genome Biol. 5(12), R101.
Dennis, G., Jr., Sherman, B. T., Hosack, D. A., et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4(5), P3.
Zeeberg, B. R., Feng, W., Wang, G., et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 4(4), R28.
Al-Shahrour, F., Diaz-Uriarte, R., and Dopazo, J. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20(4), 578–580.
Zhang, B., Schmoyer, D., Kirov, S., and Snoddy, J. (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5, 16.
Draghici, S., Kulaeva, O., Hoff, B., Petrov, A., Shams, S., and Tainsky, M. A. (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31(13), 3775-3781.
Castillo-Davis, C. I. and Hartl, D. L. (2003) GeneMerge–post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19(7), 891–892.
Berriz, G. F., King, O. D., Bryant, B., Sander, C., and Roth, F. P. (2003) Characterizing gene sets with FuncAssociate. Bioinformatics 19(18), 2502–2504.
Zhong, S., Storch, K. F., Lipan, O., et al. (2004) GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl. Bioinformatics 3(4), 261–264.
EGOn Beisvag, V., et al. (2006) Gene Tools—application for functional annotation and statistical hypothesis testing. BMC Bioinformatics 7, p. 470.
Young, A., Whitehouse, N., Cho, J., and Shaw, C. (2005) OntologyTraverser: an R package for GO analysis. Bioinformatics 21(2), 275–276.
Robinson, M. D., Grigull, J., Mohammad, N., and Hughes, T. R. (2002) FunSpec: a web-based cluster interpreter for yeast. BMC Bioinformatics 3, 35.
Boyle, E. I., Weng, S., Gollub, J., et al. (2004) GO:TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20(18), 3710–3715.
Doniger, S. W., Salomonis, N., Dahlguist, K. D., Vranizan, K., Lawlor, S. C., and Conklin, B. R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 4(1), R7.
Masseroli, M., Martucci, D., and Pinciroli, F. (2004) GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res. 32(Web Server issue), W293–W300.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this protocol
Cite this protocol
Kirov, S.A., Zhang, B., Snoddy, J.R. (2007). Association Analysis for Large-Scale Gene Set Data. In: Ochs, M.F. (eds) Gene Function Analysis. Methods in Molecular Biology™, vol 408. Humana Press. https://doi.org/10.1007/978-1-59745-547-3_2
Download citation
DOI: https://doi.org/10.1007/978-1-59745-547-3_2
Publisher Name: Humana Press
Print ISBN: 978-1-58829-734-1
Online ISBN: 978-1-59745-547-3
eBook Packages: Springer Protocols