Manipulating Large-Scale Arabidopsis Microarray Expression Data: Identifying Dominant Expression Patterns and Biological Process Enrichment

  • David A. Orlando
  • Siobhan M. Brady
  • Jeremy D. Koch
  • José R. Dinneny
  • Philip N. Benfey
Part of the Methods in Molecular Biology™ book series (MIMB, volume 553)


A series of large-scale Arabidopsis thaliana microarray expression experiments profiling genome-wide expression across different developmental stages, cell types, and environmental conditions have resulted in tremendous amounts of gene expression data. This gene expression is the output of complex transcriptional regulatory networks and provides a starting point for identifying the dominant transcriptional regulatory modules acting within the plant. Highly co-expressed groups of genes are likely to be regulated by similar transcription factors. Therefore, finding these co-expressed groups can reduce the dimensionality of complex expression data into a set of dominant transcriptional regulatory modules. Determining the biological significance of these patterns is an informatics challenge and has required the development of new methods. Using these new methods we can begin to understand the biological information contained within large-scale expression data sets.

Key words

Clustering microarray gene expression enrichment gene ontology 


  1. 1.
    Busch, W. and Lohmann, J.U. (2007) Profiling a plant: expression analysis in Arabidopsis. Current Opinion in Plant Biology 10(2), 136--141.PubMedCrossRefGoogle Scholar
  2. 2.
    Schmid, M., Davison, T.S., Henz, S.R., et al. (2005) A gene expression map of Arabidopsis thaliana development. Nature Genetics 37(5), 501--506.PubMedCrossRefGoogle Scholar
  3. 3.
    Nemhauser, J.L., Hong, F., and Chory, J. (2006) Different plant hormones regulate similar processes through largely nonoverlapping transcriptional responses. Cell 126(3), 467--475.PubMedCrossRefGoogle Scholar
  4. 4.
    Kilian, J., Whitehead, D., Horak, J., et al. (2007) The AtGenExpress global stress expression data set: protocols, evaluation and model data analysis of UV-B light, drought and cold stress responses. Plant Journal 50(2), 347--363.PubMedCrossRefGoogle Scholar
  5. 5.
    Birnbaum, K., Jung, J.W., Wang, J.Y., et al. (2005) Cell type-specific expression profiling in plants via cell sorting of protoplasts from fluorescent reporter lines. Nature Methods 2(8), 615--619.PubMedCrossRefGoogle Scholar
  6. 6.
    Birnbaum, K., Shasha, D.E., Wang, J.Y., et al. (2003) A gene expression map of the Arabidopsis root. Science 302(5652), 1956--1960.PubMedCrossRefGoogle Scholar
  7. 7.
    Brady, S.M., Orlando, D.A., Lee , J.-Y., et al. (2007) A high-resolution root spatiotemporal map reveals dominant expression patterns. Science 318(5851), 801--806.PubMedCrossRefGoogle Scholar
  8. 8.
    Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley.Google Scholar
  9. 9.
    Ashburner, M., Ball, C.A., Blake, J.A., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25, 25--29.PubMedCrossRefGoogle Scholar
  10. 10.
    Swarbreck, D., Wilks, C., Lamesch, P., et al. (2007) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Research, gkm965.Google Scholar
  11. 11.
    Guo, A., He, K., Liu, D., et al. (2005) DATF: a database of Arabidopsis transcription factors. Bioinformatics 21(10), 2568--2569.PubMedCrossRefGoogle Scholar
  12. 12.
    Higo, K., Ugawa, Y., Iwamoto, M., and Korenaga, T. (1999) Plant cis-acting regulatory DNA elements (PLACE) database. Nucleic Acids Research 27(1), 297--300.PubMedCrossRefGoogle Scholar
  13. 13.
    Palaniswamy, S.K., James, S., Sun, H., Lamb, R.S., Davuluri, R.V., and Grotewold, E. (2006) AGRIS and AtRegNet. A platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiology 140(3), 818--829.PubMedCrossRefGoogle Scholar
  14. 14.
    Brown, D.M., Zeef , L.A.H., Ellis, J., Goodacre, R., Turner, S.R. (2005) Identification of novel genes in Arabidopsis involved in secondary cell wall formation using expression profiling and reverse genetics. Plant Cell 17(8), 2281--2295.PubMedCrossRefGoogle Scholar
  15. 15.
    Jones, M.A., Raymond, M.J., and Smirnoff, N. (2006) Analysis of the root-hair morphogenesis transcriptome reveals the molecular identity of six genes with roles in root-hair development in Arabidopsis. Plant Journal 45(1), 83--100.PubMedCrossRefGoogle Scholar
  16. 16.
    Menges, M., de Jager, S.M., Gruissem, W., Murray, J.A.H. (2005) Global analysis of the core cell cycle regulators of Arabidopsis identifies novel genes, reveals multiple and highly specific profiles of expression and provides a coherent model for plant cell cycle control. Plant Journal 41(4), 546--566.PubMedCrossRefGoogle Scholar
  17. 17.
    Persson, S., Wei, H., Milne, J., Page, G.P., and Somerville, C.R. (2005) Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proceedings of the National Academy of Sciences of the United States of America 102(24), 8633--8638.PubMedCrossRefGoogle Scholar
  18. 18.
    Gadbury, G.L., Garrett, K.A., and Allison, D.B. Challenges and approaches to statistical design and inference in high dimensional investigations. In this volume.Google Scholar
  19. 19.
    Boyle, E.I., Weng, S., Gollub, J., et al. (2004) GO::TermFinder -- open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 18, 3710--3715.CrossRefGoogle Scholar
  20. 20.
    O'Connor, T.R., Dyreson, C., and Wyrick, J.J. (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 24, 4411--4413.CrossRefGoogle Scholar
  21. 21.
    Team RDC. (2006) R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Google Scholar
  22. 22.
    Iida, K., Seki, M., Sakurai, T., et al. (2005) RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Research 12, 247--256.PubMedCrossRefGoogle Scholar
  23. 23.
    Storey, J.D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B 64, 479--498.CrossRefGoogle Scholar
  24. 24.
    Maechler, M., Rousseeuw, P.J., Hubert, M., and Hornik, K. (2007) Cluster: Cluster Analysis Basics and Extensions. In R package version 1.11. 9 ed.Google Scholar
  25. 25.
    Gasch, A. and Eisen, M. (2002) Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11): research0059.1--research 22.Google Scholar
  26. 26.
    Tibshirani, R., Walther, G., and Hastie, T. (2000) Estimating the number of clusters in a dataset via the gap statistic. Technical Report 208. Department of Statistics, Stanford University.Google Scholar
  27. 27.
    Levine, D.M., Haynor, D.R., Castle, J.C., et al. (2006) Pathway and gene-set activation measurement from mRNA expression data: the tissue distribution of human pathways. Genome Biology 7(10), R93.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • David A. Orlando
    • 1
    • 2
  • Siobhan M. Brady
    • 1
    • 2
  • Jeremy D. Koch
    • 3
  • José R. Dinneny
    • 1
    • 2
  • Philip N. Benfey
    • 1
    • 2
  1. 1.Department of BiologyDuke UniversityDurhamUSA
  2. 2.IGSP Center for Systems BiologyDuke UniversityDurhamUSA
  3. 3.DavisUSA

Personalised recommendations