Skip to main content

Association Analysis for Large-Scale Gene Set Data

  • Protocol
Gene Function Analysis

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 408))

Abstract

High-throughput experiments in biology often produce sets of genes of potential interests. Some of those gene sets might be of considerable size. Therefore, computer-assisted analysis is necessary for the biological interpretation of the gene sets, and for creating working hypotheses, which can be tested experimentally. One obvious way to analyze gene set data is to associate the genes with a particular biological feature, for example, a given pathway. Statistical analysis could be used to evaluate if a gene set is truly associated with a feature. Over the past few years many tools that perform such analysis have been created. In this chapter, using WebGestalt as an example, it will be explained in detail how to associate gene sets with functional annotations, pathways, publication records, and protein domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 270(5235), 467–470.

    Article  CAS  PubMed  Google Scholar 

  2. Stoughton, R. B. (2005) Applications of DNA microarrays in biology. Annu. Rev. Biochem. 74, 53–82.

    Article  CAS  PubMed  Google Scholar 

  3. Ashburner, M., Ball, C. A., Blake, J. A., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25(1), 25–29.

    Article  CAS  PubMed  Google Scholar 

  4. Bono, H., Nikaido, I., Kasukawa, T., Hayashizaki, Y., and Okazaki, Y. (2003) Comprehensive analysis of the mouse metabolome based on the transcriptome. Genome Res. 13(6B), 1345–1349.

    Article  CAS  PubMed  Google Scholar 

  5. Kanehisa, M., Goto, S., Kawashima, S., Okunu, Y., and Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res. 32(Database issue), D277–D280.

    Article  CAS  PubMed  Google Scholar 

  6. Lin, B., White, J. T., Lu, W., et al. (2005) Evidence for the presence of disease-perturbed networks in prostate cancer cells by genomic and proteomic analyses: a systems approach to disease. Cancer Res. 65(8), 3081–3091.

    CAS  PubMed  Google Scholar 

  7. Kluger, Y., Tuck, D. P., Chang, J. T., et al. (2004) Lineage specificity of gene expression patterns. Proc. Natl. Acad. Sci. USA 101(17), 6508–6513.

    Article  CAS  PubMed  Google Scholar 

  8. Mi, H., Lazareva-Ulitsky, B., Loo, R., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33(Database issue), D284–D288.

    Article  CAS  PubMed  Google Scholar 

  9. OBO_Team, Open Biomedical Ontologies Foundry. (http://obofoundry.org/).

    Google Scholar 

  10. Ren, B., Robert, F., Wyrick, J. J., et al. (2000) Genome-wide location and function of DNA binding proteins. Science 290(5500), 2306–2309.

    Article  CAS  PubMed  Google Scholar 

  11. Iyer, V. R., Horak, C. E., Scafe, C. S., Bostein, D., Synder, M., and Brown, P. O. (2001) Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409(6819), 533–538.

    Article  CAS  PubMed  Google Scholar 

  12. Dorschner, M. O., Hawrylycz, M., Humbert, R., et al. (2004) High-throughput localization of functional elements by quantitative chromatin profiling. Nat. Methods. 1(3), 219–225.

    Article  CAS  PubMed  Google Scholar 

  13. Crawford, G. E., Holt, I. E., Whittle, J., et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16(1), 123–131.

    Article  CAS  PubMed  Google Scholar 

  14. Zhang, B., Kirov, S., and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33, W741–W748.

    Article  CAS  PubMed  Google Scholar 

  15. Zhang, B., Kirov, S., and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 33(Web Server issue), W741–W748.

    Article  CAS  PubMed  Google Scholar 

  16. Zapala, M. A., Hovatta, I., Ellison, J. A., et al. (2005) Adult mouse brain gene expression patterns bear an embryologic imprint. Proc. Natl. Acad. Sci. USA 102(29), 10,357–10,362.

    Article  CAS  PubMed  Google Scholar 

  17. Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2005) Entrez Gene: genecentered information at NCBI. Nucleic Acids Res. 33(Database issue), D54–D58.

    Article  CAS  PubMed  Google Scholar 

  18. Kasprzyk, A., Keefe, D., Smedley, D., et al. (2004) EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 14(1), 160–169.

    Article  CAS  PubMed  Google Scholar 

  19. Kirov, S. A., Peng, X., Baker, E., Schmoyer, D., Zhang, B., and Snoddy, J. (2005) GeneKeyDB: a lightweight, gene-centric, relational database to support data mining environments. BMC Bioinformatics 6(1), 72.

    Article  CAS  PubMed  Google Scholar 

  20. Yusuf, D., Lim, J. S., and Wasserman, W. W. (2005) The Gene Set Builder: collation, curation, and distribution of sets of genes. BMC Bioinformatics 6, 305.

    Article  PubMed  Google Scholar 

  21. Kirov, S. A., Peng, X., Baker, E., Schmoyer, D., Zhang, B., and Snoddy, J. (2005) GeneKeyDB: A lightweight, gene-centric, relational database to support data mining environments. BMC Bioinformatics 6.

    Google Scholar 

  22. Beissbarth, T. and Speed, T. P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 20(9), 1464–1465.

    Article  CAS  PubMed  Google Scholar 

  23. Shah, N. H. and Fedoroff, N. V. (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics 20(7), 1196–1197.

    Article  CAS  PubMed  Google Scholar 

  24. Martin, D., Brun, C., Remy, E., Mouren, O., Thiaffry, D., and Jacq, B. (2004) GOToolBox: functional analysis of gene data sets based on Gene Ontology. Genome Biol. 5(12), R101.

    Article  PubMed  Google Scholar 

  25. Dennis, G., Jr., Sherman, B. T., Hosack, D. A., et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4(5), P3.

    Article  PubMed  Google Scholar 

  26. Zeeberg, B. R., Feng, W., Wang, G., et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 4(4), R28.

    Article  PubMed  Google Scholar 

  27. Al-Shahrour, F., Diaz-Uriarte, R., and Dopazo, J. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20(4), 578–580.

    Article  CAS  PubMed  Google Scholar 

  28. Zhang, B., Schmoyer, D., Kirov, S., and Snoddy, J. (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5, 16.

    Article  PubMed  Google Scholar 

  29. Draghici, S., Kulaeva, O., Hoff, B., Petrov, A., Shams, S., and Tainsky, M. A. (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 31(13), 3775-3781.

    Article  CAS  PubMed  Google Scholar 

  30. Castillo-Davis, C. I. and Hartl, D. L. (2003) GeneMerge–post-genomic analysis, data mining, and hypothesis testing. Bioinformatics 19(7), 891–892.

    Article  CAS  PubMed  Google Scholar 

  31. Berriz, G. F., King, O. D., Bryant, B., Sander, C., and Roth, F. P. (2003) Characterizing gene sets with FuncAssociate. Bioinformatics 19(18), 2502–2504.

    Article  CAS  PubMed  Google Scholar 

  32. Zhong, S., Storch, K. F., Lipan, O., et al. (2004) GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl. Bioinformatics 3(4), 261–264.

    Article  CAS  PubMed  Google Scholar 

  33. EGOn Beisvag, V., et al. (2006) Gene Tools—application for functional annotation and statistical hypothesis testing. BMC Bioinformatics 7, p. 470.

    Article  PubMed  Google Scholar 

  34. Young, A., Whitehouse, N., Cho, J., and Shaw, C. (2005) OntologyTraverser: an R package for GO analysis. Bioinformatics 21(2), 275–276.

    Article  CAS  PubMed  Google Scholar 

  35. Robinson, M. D., Grigull, J., Mohammad, N., and Hughes, T. R. (2002) FunSpec: a web-based cluster interpreter for yeast. BMC Bioinformatics 3, 35.

    Article  PubMed  Google Scholar 

  36. Boyle, E. I., Weng, S., Gollub, J., et al. (2004) GO:TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20(18), 3710–3715.

    Article  CAS  PubMed  Google Scholar 

  37. Doniger, S. W., Salomonis, N., Dahlguist, K. D., Vranizan, K., Lawlor, S. C., and Conklin, B. R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 4(1), R7.

    Article  PubMed  Google Scholar 

  38. Masseroli, M., Martucci, D., and Pinciroli, F. (2004) GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res. 32(Web Server issue), W293–W300.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc.

About this protocol

Cite this protocol

Kirov, S.A., Zhang, B., Snoddy, J.R. (2007). Association Analysis for Large-Scale Gene Set Data. In: Ochs, M.F. (eds) Gene Function Analysis. Methods in Molecular Biology™, vol 408. Humana Press. https://doi.org/10.1007/978-1-59745-547-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-547-3_2

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-734-1

  • Online ISBN: 978-1-59745-547-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics