Molecular Genetics and Genomics

, Volume 276, Issue 6, pp 517–531

Global comparative transcriptome analysis identifies gene network regulating secondary xylem development in Arabidopsis thaliana

Authors

  • Jae-Heung Ko
    • Department of ForestryMichigan State University
  • Eric P. Beers
    • Department of HorticultureVirginia Polytechnic Institute and State University
    • Department of ForestryMichigan State University
Original Paper

DOI: 10.1007/s00438-006-0157-1

Cite this article as:
Ko, J., Beers, E.P. & Han, K. Mol Genet Genomics (2006) 276: 517. doi:10.1007/s00438-006-0157-1

Abstract

Our knowledge of the genetic control of wood formation (i.e., secondary growth) is limited. Here, we present a novel approach to unraveling the gene network regulating secondary xylem development in Arabidopsis, which incorporates complementary platforms of comparative-transcriptome analyses such as “digital northern” and “digital in situ” analysis. This approach effectively eliminated any genes that are expressed in either non-stem tissues/organs (“digital northern”) or phloem and non-vascular regions (“digital in situ”), thereby identifying 52 genes that are upregulated only in the xylem cells of secondary growth tissues as “core xylem gene set”. The proteins encoded by this gene set participate in signal transduction, transcriptional regulation, cell wall metabolism, and unknown functions. Five of the seven signal transduction-related genes represented in the core xylem gene set encode the essential components of ROP (Rho-related GTPase from plants) signaling cascade. Furthermore, the analysis of promoter sequences of the core xylem gene set identified a novel cis-regulatory element, ACAAAGAA. The functional significances of this gene set were verified by several independent experimental and bioinformatics methods.

Keywords

ArabidopsisComparative transcriptome analysisGenechipSecondary cell wallSecondary xylemWood formation

Introduction

Secondary growth is a highly ordered developmental process, which involves patterned division of vascular cambium cells and a subsequent regulated differentiation of cambial derivatives into secondary xylem and phloem tissues. This patterned growth requires various molecular signals that are differentially transduced by cell-to-cell contacts, relative to cell positions, and turn on and off in response to both external and internal stimuli. The resulting secondary tissues provide necessary mechanical support and a conduit for the long-distance transport of water and nutrients, allowing trees to grow tall and eventually out-compete other herbaceous vegetation for light and nutrient uptake. Currently our understanding of the shoot apical meristem (SAM), which generates all aboveground primary organs including stems, leaves, and flowers, far surpasses that of cambium initiation and maintenance.

Using an experimental system that produced synchronized (i.e., same-age) inflorescence stems having different degrees of secondary growth, we previously demonstrated that the weight carried by the stem is a primary signal for the induction of cambium differentiation and that the plant hormone, auxin, is a downstream carrier of the signal for this process (Ko et al. 2004). In addition, Arabidopsis whole-transcriptome (ATH1 GeneChip, Affymetrix) analysis provided an unprecedented view of the flux that occurs in the Arabidopsis transcriptome during secondary growth (Ko et al. 2004; Ko and Han 2004). The analysis identified 1,433 genes that are threefold or higher upregulated in the wood forming stems when compared to immature stems (with no secondary growth) and 10-day-old seedlings (with no inflorescence stem). Although the profiling revealed transcription phenotypes that are characteristic of different stem developmental stages and confirmed various existing insights regarding the genetic regulation of secondary growth, the number of differentially expressed genes is too large to be informative in revealing the networks of genes underlying wood formation.

A large number of Arabidopsis GeneChip array data are publicly available through various searchable websites such as Nottingham Arabidopsis Stock Centre Transcriptomics Service (NASCArrays; Craigon et al. 2004), ArrayExpress at the European Bioinformatics Institute (EBI; Brazma et al. 2003), and Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI; Edgar et al. 2002). In addition, Zimmermann et al. (2004) introduced an Arabidopsis microarray database and analysis toolbox (called GENEVESTIGATOR) for high-throughput gene expression analysis. Considering these transcriptome data were obtained from identical platform (i.e., Affymetrix GeneChip arrays), carefully designed comparative analyses of such data may provide meaningful information on the molecular mechanisms underlying many biological processes.

Here, we describe a comparative transcriptome analysis approach that exploits a large number of publicly available Affymetrix GeneChip datasets to discover the gene network regulating wood formation. This approach identified 52 genes as potential regulators of secondary xylem development. The validity of this approach was supported by several independent analyses. The candidate genes discovered in this study will be a focal point of further efforts to unravel the genetic regulations of secondary growth in plants.

Materials and methods

Plant materials

Arabidopsis thaliana (ecotype Columbia) was grown in a growth chamber at 23°C under 16 h light/8 h dark photoperiod. The poplar trees (Populus tremula × P. alba) for the cDNA–AFLP analysis were grown under 16 h light/8 h dark photoperiod at 25°C. When the trees reached one month of age, the following samples were harvested from the stem: a 1 cm segment directly below the shoot apical meristem (PG, primary growth), a segment between LPI (Leaf Plastochron Index) 4 and 6 (TS, primary-to-secondary growth transition stage), and a segment between LPI 8 and 11 (SG, secondary growth). Bark samples were prepared by peeling off from the stem section 10 cm above the ground level. Leaves in the region of LPI 3 to 8 were used as the leaf sample after Class I and II veins were removed. Five trees were used in this analysis.

Computational data analysis

Data sets of AtGenExpress (AFGN, http://www.web.uni-frankfurt.de/fb15/botanik/mcb/AFGN/atgenex.htm) were downloaded from TAIR (ftp://www.ftp.arabidopsis.org/home/tair/Microarrays/Datasets/AtGenExpress). Among them, 45 different organs/developmental stages of Arabidopsis transcriptome were selected and analyzed with Excel macro (Microsoft) to obtain mean values and statistical analysis. Hierarchical clustering analysis was performed using GeneSpring (version 6.1, Silicon Genetics) program. Gene tree (Y-axis) was generated to classify the genes based on their expression patterns. Similarity was measured by Pearson’s correlation. Conditional tree (X-axis) was used to identify the similarities among the entire transcriptome of each organs/developmental stages. Similarity was measured by standard correlation. For comparative analysis of the final gene set (core xylem gene set) with fern (Ceratopteris richardii) and moss (Physcomitrella patens), 5,133 ESTs of C. richardii and 14,703 ESTs of P. patens deposited in NCBI (http://www.ncbi.nlm.nih.gov/entrez/) were downloaded and subjected to clustering analysis using stackPACK (v2.2, Electric Genetics) to identify unigenes. 9,113 ESTs of Zinnia elegans were downloaded from PRIDE website (http://www.mrg.psc.riken.go.jp/PRIDE/expression.html) and converted to FASTA format by using custom Perl script. These gene sets were used to find putative Arabidopsis orthologs by using Blastall program against Arabidopsis protein database (http://www.gaea.bch.msu.edu/) with a score greater than 80 and an E value less than 1.0 × 10−5 as cutoff. The “Find Potential Regulatory Sequences” module of GeneSpring was used to find cis-regulatory elements of the final gene set. Search was performed from 10 to 500 bases upstream of each gene with 0.06-probability cutoff. The two control sequences used in the statistic analysis are the 10–500 bases upstream sequences of all annotated genes and the whole genome sequences.

Poplar cDNA–AFLP analysis

Total RNA was extracted from poplar tissues according to the method described by Wang et al. (2000). cDNA–AFLP analysis was performed by the method described previously (Ko et al. 2003; Prassinos et al. 2005). In order to select the restriction enzyme pair on which the analysis would be based, we screened 57 complete coding sequences of aspen that are available in the GenBank database. Eighty-six percent of the sequences had target sites for the restriction enzyme pair ApoI/MseI, which can produce restriction fragment sizes (ranging from 50 to 1,000 bp) suitable for separation within a single run of polyacrylamide gel electrophoresis. Based on this estimation, we assumed that TDFs (transcript-derived fragments) generated by using the ApoI/MseI primer pair could cover at least 80% of the transcriptome.

Promoter::GUS analysis of ANAC012 and ANAC073 gene

To study the expressions of ANAC012 and ANAC073 gene, promoter::GUS transgenic Arabidopsis plants were generated. 1.8 and 1.5 kb XbaI/BamHI genomic fragments flanking the 5′ end of ANAC012 and ANAC073 coding sequences were subcloned into pCB308 (Xiang et al. 1999), respectively. Primers used for promoter amplifications are the following: ANAC012 (forward, 5′- GGGTCTAGAACGAGACATAGCCCCCAAAA -3′ and reverse, 5′- TTTGGATCCTGCTCCAGCAAGATCATCAA -3′), and ANAC073 (forward, 5′- GGGTCTAGAGGTTTCGATCCACCAATGAA -3′ and reverse, 5′- TTTGGATCCTCAGTGCTCACAACTTCATCAGA -3′). Plasmids were mobilized into Agrobacterium tumefaciens strain C58 and introduced into Arabidopsis (ecotype Columbia) by the floral dip method (Clough and Bent 1998). Histochemical assays for GUS activity in transgenic Arabidopsis plants were performed as described previously (Jefferson et al. 1987).

Results

Identification of a core xylem gene set

Recent work from several investigators has produced profiles of Arabidopsis genes associated with specific aspects of xylem metabolism (e.g., cellulose synthesis, Persson et al. 2005; Brown et al. 2005) and with xylem from different organs (Ko et al. 2004; Zhao et al. 2005). We sought to identify a set of core genes that are involved in secondary xylem development. We first identified 1,433 genes that were upregulated (threefold or higher; gene list in Supplemental Table S1) in Arabidopsis stems undergoing secondary growth (“mature” stem), relative to only two other transcriptomes from “immature” and “intermediate” stems as described in Ko et al. (2004) and Ko and Han (2004). Then, we reduced the number of candidate genes by identifying stem specific or preferential expressed genes using hierarchical clustering analyses of the transcriptome profiles derived from 45 different organs/developmental stages (Supplemental Fig. S1).

When we first attempted to identify gene clusters unique to the stem tissue by hierarchical clustering analyses, we were not able to find a “stem tissue-specific” cluster (Supplemental Fig. S1). Therefore, “digital northern analysis” was performed using the 45 different organs/developmental stages transcriptomes of Arabidopsis as templates and six genes (AtCesA04, AtCesA07, AtCesA08, COBL4, ANAC012, and ANAC073) as probes. The probe genes were selected based on their secondary cell wall/wood forming tissue-preferential expression patterns in the previous studies (Turner and Somerville 1997; Taylor et al. 2003; Ko et al. 2004; Ko and Han 2004; Persson et al. 2005; Brown et al. 2005; Andersson-Gunneras et al. 2006). AtCesA04, AtCesA07 and AtCesA08 are genes encoding cellulose synthases, which were known as key enzymes in the biosynthesis of secondary cell wall (Turner and Somerville 1997; Taylor et al. 2003; Persson et al. 2005). COBL4 encodes a member of the COBRA family (Schindelman et al. 2001; Roudier et al. 2002) and was suggested to be involved in the cellulose deposition in the secondary cell wall (Brown et al. 2005). ANAC012 and ANAC073 are NAC family transcription factors which were highly expressed in the secondary xylem tissue from our preliminary study (Ko et al. 2006, in preparation). The “Find Similar Genes” function of the GeneSpring program identified a total of 283 genes whose expression patterns were similar to those of probe genes with a correlation of at least 0.6 (Fig. 1; gene list in Supplemental Table S1). A third analysis, the “digital in situ” analysis was performed to select genes with xylem-preferential-expression at tissue-type resolution by comparing the transcription profiles of secondary xylem, phloem–cambium, non-vascular (cork cambium and cork) tissues (Zhao et al. 2005), and epidermis cells (Suh et al. 2005). This step identified 247 genes, which are up regulated (threefold or higher) both in the xylem tissue compared to non-vascular tissue and stem tissue compared to epidermal tissue (Fig. 1; gene lists in Supplemental Table S1). By including both of the stem and root-hypocotyl secondary tissues, this comparative analysis has eliminated any xylem genes that may be peculiar to stem xylem or root-hypocotyl xylem, thereby permitting the assembly of a 52-member core xylem-specific set (Fig. 1; Table 1). The resulting expression profiles of the 52-member xylem gene set are depicted in Fig. 2. All of the 52 genes were preferentially expressed in the stems (Fig. 2a), xylem tissues (Fig. 2b) and wood-forming stems (Fig. 2c).
https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig1_HTML.gif
Fig. 1

Venn diagram showing the concept and design of global-comparative-transcriptome analysis. Secondary growth: 1,433 genes upregulated in the wood-forming stems; digital northern: 283 genes identified from the transcriptome of 45 different organ/tissues (AtGenExpress) by using six genes that had wood forming tissue-preferential expression patterns as digital probes; digital in situ: 247 genes preferentially expressed in xylem identified by comparing (xylem/phloem/non-vascular cells) and (stem/epidermis of stem) transcriptome. Threefold upregulations were used as a threshold

Table 1

Core xylem-specific gene set identified from current study and comparative analysis with publicly available data

Affy I.D.

AGIa

Classb

GOc

Gene description

Poplard Cam_EST

Poplare Other_EST

Cotton ESTf

Cross-Refg

Cross-Refh

Cross-Refi

248928_at

At5g45970

ST

Pm

AtROP7/ARAC2, Rho GTPase

2

 

15

  

Set 8

261809_at

At1g08340

ST

Pm

RopGAP5, putative rac GTPase activating protein

3

 

2

Yes

Yes

Set 3

261399_at

At1g79620

ST

Pm

Leucine-rich repeat receptor protein kinase (RLK)

1

   

Yes

Set 11

264549_at

At1g09440

ST

Om

Putative receptor-like ser/thr protein kinase

     

Set 3

263028_at

At1g24030

ST

M

Putative protein kinase

1

V

   

Set 3

257233_at

At3g15050

ST

C

Calmodulin-binding family protein

2

G

  

Yes

 

264495_at

At1g27380

ST

Pm

RIC2, ROP-interactive CRIB motif-containing protein2

   

Yes

Yes

Set 3

260326_at

At1g63910

RT

N

AtMYB103, myb family transcription factor

 

X

3

  

Set 8

259822_at

At1g66230

RT

N

AtMYB20, myb family transcription factor

 

P

3

   

255903_at

At1g17950

RT

N

AtMYB52, myb family transcription factor

1

 

1

  

Set 3

253327_at

At4g33450

RT

N

AtMYB69, myb family transcription factor

 

X

    

254277_at

At4g22680

RT

N

AtMYB85, myb family transcription factor

1

G

1

  

Set 3

261703_at

At1g32770

RT

N

ANAC012, NAC family transcription factor

5

G, F

    

253798_at

At4g28500

RT

N

ANAC073, NAC family transcription factor

5

G, R

  

Yes

Set 22

251303_at

At3g61910

RT

N

ANAC066, NAC family transcription factor

 

N

    

266714_at

At2g46770

RT

N

ANAC043, NAC family transcription factor

      

249070_at

At5g44030

CW

Pm

AtCesA04, cellulose synthase catalytic subunit (IRX5)

2

G

8

Yes

Yes

Set 3

246425_at

At5g17420

CW

Pm

AtCesA07, cellulose synthase catalytic subunit (IRX3)

2

G

9

Yes

Yes

Set 8

254618_at

At4g18780

CW

Pm

AtCesA08, cellulose synthase catalytic subunit (IRX1)

3

G

12

Yes

Yes

Set 3

250933_at

At5g03170

CW

Om

FLA11, fasciclin-like arabinogalactan-protein

4

G

 

Yes

Yes

Set 3

253619_at

At4g30460

CW

Om

Glycine-rich cell wall structural protein

1

UB, P, T

    

248121_at

At5g54690

CW

Om

GT family 8

2

G, X

 

Yes

Yes

Set 3

260666_at

At1g19300

CW

Om

GT family 8, PARVUS

5

UB, G, I

8

  

Set 3

257757_at

At3g18660

CW

C

GT family 8, glycogenin glucosyltransferase-related

3

G

 

Yes

Yes

Set 3

253380_at

At4g33330

CW

Om

GT family 8, glycogenin glucosyltransferase-related

3

G

 

Yes

  

265463_at

At2g37090

CW

Om

GT family43 protein, beta-glucuronyltransferase

2

M

 

Yes

Yes

Set 3

262060_at

At1g80170

CW

Om

Similar to polygalacturonase

      

264493_at

At1g27440

CW

Om

Exostosin family protein

4

G

7

Yes

Yes

Set 3

246512_at

At5g15630

CW

Om

COBL4, COBRA-like cell expansion protein

4

G, X

1

Yes

Yes

Set 3

266783_at

At2g29130

CW

Om

Putative laccase (diphenol oxidase)

4

  

Yes

Yes

 

250770_at

At5g05390

CW

Om

Putative laccase (diphenol oxidase)

1

C

    

259606_at

At1g27920

CC

Un

AtMAP65-8, microtubule associated protein family

2

G

   

Set 3

251297_at

At3g62020

CO

Om

Germin-like protein (GLP10)

2

G, X

  

Yes

Set 3

249439_at

At5g40020

DF

Om

Thaumatin-like protein

1

F

   

Set 3

266708_at

At2g03200

PF

Om

Aspartyl protease family protein

2

    

Set 8

266488_at

At2g47670

PF

Om

invertase/pectin methylesterase inhibitor family protein

     

Set 21

261224_at

At1g20160

PF

Om

Subtilase family protein, subtilisin-like serine protease

1

UA, X

    

245410_at

At4g17220

UN

Un

Unknown protein

 

X

   

Set 3

252550_at

At3g45870

UN

Om

Integral membrane family protein

 

N

4

   

256054_at

At1g07120

UN

Un

Unknown protein, Leucine zipper, homeobox-associated

 

X

   

Set 22

253821_at

At4g28380

UN

Om

Leucine-rich repeat family protein

      

247030_at

At5g67210

UN

Om

DUF579-1, unknown protein

2

    

Set 3

252211_at

At3g50220

UN

Om

DUF579-2, unknown protein

 

K, V

  

Yes

Set 3

261999_at

At1g33800

UN

Om

DUF579-3, Unknown protein

4

UB, G

 

Yes

 

Set 8

264559_at

At1g09610

UN

Om

DUF579-4, unknown protein

5

G

 

Yes

Yes

Set 3

245105_at

At2g41610

UN

Om

Hypothetical protein

1

G

  

Yes

Set 3

261928_at

At1g22480

UN

Om

Similar to uclacyanin II

8

     

246635_at

At1g31720

UN

Om

Unknown protein

     

Set 3

263468_at

At2g31930

UN

Un

Unknown protein

 

X

    

253877_at

At4g27435

UN

Om

Unknown protein

  

3

Yes

Yes

Set 3

251093_at

At5g01360

UN

Om

Unknown protein

2

G

  

Yes

Set 3

247590_at

At5g60720

UN

C

Unknown protein

2

  

Yes

Yes

Set 3

247522_at

At5g61340

UN

Om

Unknown protein

     

Set 3

aAGI Arabidopsis gene index

bFunctional classification (Ko and Han 2004), ST signal transduction, RT regulation of transcription, CW cell wall metabolism, CC cell cycle, DF defense, PF protein fate, SM secondary metabolism, UN unknown function

cGO (http://www.arabidopsis.org/tools/bulk/go/index.jsp) cellular component (Un unknown, N nucleus, Om other membrane, Pm plasma membrane, C chloroplast, M mitochondria)

dFound in cambium EST library of poplar, number of ESTs (Schrader et al. 2004)

eFound in other EST library (Sterky et al. 2004), C young leaves, F floral buds, G tension wood, I senescing leaves, K apical shoot, M female catkins, N bark, P petioles, R roots, T shoot meristem, UA dormant cambium, UB active cambium, V mature catkins, X wood cell-death

fFound in EST libraries of cotton (Gossypium arboreum and G. hirsutum) fiber development (PlantGDB; http://www.plantgdb.org/)

gFound in the gene list of secondary cell wall formation (Brown et al. 2005)

hFound in the gene list of cellulose synthesis (Persson et al. 2005)

iGene sets identified from the xylem vessel differentiation (Kubo et al. 2005)

https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig2_HTML.gif
Fig. 2

Gene expression profiles of the 52-member core xylem gene set identified in this study. a Wood forming tissue-preferential expression in different tissues and developmental stages (1–7 weeks) (“digital northern”). Each dot indicates the signal intensity of each gene. Six probe genes used in this analysis were indicated. Y-axis indicates the gene expression level by using ‘Signal Intensity’ obtained from ATH1 genechip analysis. b Preferential expression in xylem cells (“digital in situ”). c Up regulation during the secondary growth

Thirty-four (>65%) of the 52 genes are predicted to be localized to membranes (Table 1), implying that they may be involved in signal transduction and/or cell wall metabolism. Interestingly, most of the essential components in ROP signaling cascade were discovered, including RopGAP5 (Rop-specific GTPase activating proteins), a RIC2 (ROP-interactive CRIB motif-containing protein) and a receptor-like kinase (RLK, a CLAVATA1 homolog). However, none of the 11 ROP GTPases in Arabidopsis genome (Vernoud et al. 2003) was represented in the gene list (Table 1). Further examination discovered that a ROP GTPase (AtROP7/AtRAC2; At5g45970) met the criteria of two platforms (digital northern and digital in situ) but was upregulated only 2.2-fold in the wood forming stems (i.e., failing the threefold threshold).

The core xylem set has many cell wall metabolism genes, including three cellulose synthases (AtCesA04, AtCesA07, and AtCesA08) known as key enzymes in the secondary cell wall biosynthesis (Turner and Somerville 1997; Taylor et al. 2003), four glycosyltransferase family eight genes (At5g54690/irx8, At1g19300, At3g18660, At4g33330) responsible for pectin synthesis, and one hemicellulose synthesis gene (At1g27440/irx10), three genes (At5g03170, At4g30460, At4g28380) encoding cell wall structural protein, and one cell wall reassembly gene (At1g80170). The final gene set includes only two transcription factor families (4 NACs and 5 MYBs), which are regarded as plant-specific regulators (Martin and Paz-Ares 1997; Stracke et al. 2001; Olsen et al. 2005). NAC family transcription factors are involved in maintaining organ or tissue boundaries regulating the transition from growth by cell division to growth by cell expansion (Souer et al. 1996; Sablowski and Meyerowitz 1998). While several MYB family genes have been implicated in the regulation of lignification and flavonoid biosynthesis (Tamagnone et al. 1998; Patzlaff et al. 2003a), it has been suggested that xylem-abundant MYB proteins might be involved in the transcriptional regulation of secondary xylem formation (Oh et al. 2003; Ko and Han 2004; Newman et al. 2004). The final gene set also includes 15 genes of unknown function, four of which have highly conserved, uncharacterized plant-specific domain DUF579 (Domain of Unknown Function 579, InterPro; http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF04669).

These DUF579 genes (named DUF579-1 through DUF579-4) showed very similar expression patterns regarding secondary xylem development, implying that they may have a novel and conserved functional role (Supplemental Fig. S2). In addition to the four genes, there are six additional genes having the DUF579 domain in the Arabidopsis genome. The expression patterns of the five DUF579 genes (one of them is not represented in the GeneChip) are different from those of the four genes we described here. This suggests that the domain itself may not have any functional role in secondary xylem development.

Since the identified genes are highly co-regulated spatially and developmentally with regard to secondary xylem differentiation, we attempted to identify any cis-regulatory element(s) common in the promoter regions of the genes by using the function of “Find Potential Regulatory Sequences” in GeneSpring software. A novel motif, ACAAAGAA, was found in 13 out of 52 genes (Table 2). This frequency (0.269) is significantly higher compared to that of the sequence in the upstream of the entire genes in Arabidopsis genome (0.044). The functional role of this element in the regulation of gene expression regarding xylem development remains to be investigated. Nine of the 13 genes having the novel cis-motif had a unique DNA motif (ACC[A/T]A[A/C]C), called “AC element” (Table 2). The “AC elements” have been known to drive xylem gene expression (Hatton et al. 1995; Patzlaff et al. 2003b). However, their functional role in secondary xylem formation is not known.
Table 2

Putative novel cis-element motif identified from the promoter regions of the final gene set

Cis-elementa

Frequency

Validationb

Genes

AC elementsc

Location

AGI

Putative functions

ACAAAGAA

14/52 (26.9%)

Relative to upstream of other genes

−430

At1g24030

Putative protein kinase

4

 

P value, 0.064

−292

At1g27920

Microtubule associated protein

1

 

Random rate, 2.23%

−437

At1g33800

DUF579-3, unknown protein

No

 

Observed rate, 4.44%

−430, −167

At1g63910

AtMYB103, myb transcription factor

No

 

Single P, 2.52E−07

−134

At3g15050

Putative calmodulin-binding protein

No

  

−331

At3g18660

GT family 8, putative glycogenin glucosyltransferase

2

 

Relative to whole genome

−31

At4g17220

Unknown protein

2

 

P value, 0.0148

−287

At4g18780

AtCesA08, cellulose synthase subunit (IRX1)

1

 

Random rate, 2.23%

−391

At4g28380

Leucine-rich repeat family protein

1

 

Observed rate, 3.90%

−132

At4g33330

GT family 8, putative glycogenin glucosyltransferase

No

 

Single P, 5.69E−08

−302

At4g33450

AtMYB69, myb transcription factor

3

  

−27

At5g54690

GT family 8 protein, unknown protein

3

  

−289

At5g61340

Unknown protein

2

Promoter region (upstream) means from 10 to 500 bases upstream of the gene

a,bMotif search and validation was done using GeneSpring software. The frequency (26.9%) was compared to the frequency (4.44%) of that sequence upstream of other ORFs in the genome Arabidopsis thaliana. If the distribution of bases were random you would expect to see that sequence upstream of 0.0223 of the genes. The probability that this particular sequence is that common due to chance is 2.52e-07 (Single P). However since 262,144 tests were done, the false positive probability is really 0.064

cAC elements (ACC[A/T]A[A/C]C) were found by using “Pattern Matching” (http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl) in the promoter region of the genes (upstream 1,000 bases)

Promoter testing and functional/expressional analysis

NAC proteins are known as one of the largest families of plant-specific transcription factors (for recent reviews, Olsen et al. 2005) and their biological functions in xylem cell differentiation have been suggested (Kubo et al. 2005; Zhao et al. 2005; Mitsuda et al. 2005). We used ANAC012 and ANAC073 promoter: GUS analyses to experimentally test the spatial and temporal expression pattern predicted to be common to the 52 core xylem genes. ANAC012 and ANAC073 genes are members of NAC transcription factor family. Promoter regions of ANAC012 and ANAC073 genes were transcriptionally fused with the GUS gene and introduced into Arabidopsis plants. GUS activity driven by ANAC073 promoter was observed throughout the vascular system in young seedlings (Fig. 3d). Within the vascular tissue in adult plants, GUS staining was localized to the xylem cells of the stem and root (Fig. 3a–c). GUS staining resulting from ANAC012 promoter activity was found strongly in the vascular bundles and the apical meristem of young seedling (Fig. 3h). However, in adult plants, GUS expression was detected specifically in the (pro) cambium and xylem parenchyma cells of stem and root (Fig. 3e–g). The GUS activities directed by the both of ANAC012 and ANAC073 promoters were consistent with results from “digital northern” (e.g., stem preferential expression) and “digital in situ” analysis (e.g., xylem tissue preferential expression).
https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig3_HTML.jpg
Fig. 3

Expressions of ANAC012 and ANAC073 genes in planta. Histochemical localizations of GUS expression in the transgenic Arabidopsis plants carrying the ANAC012 and ANAC073 promoter: GUS fusion gene. ad ANAC073 promoter-GUS expression; a Rosette-level stem cross-section. b Stem cross-section at 10 cm above rosette-level. c Cross-section of root. d Eight-day-old seedling. eg ANAC012 promoter-GUS expression, e Rosette-level stem cross-section. f Stem cross-section below rosette-level. g Cross-section of root. h Eight-day-old seedling. Scale bars represent 0.5 mm

To establish the functional roles, we produced several transgenic Arabidopsis plants over-expressing candidate genes (i.e., uncharacterized genes from the core xylem gene set). Among them, our preliminary observation indicates that the over-expression of ANAC012 induces dramatic changes of xylem tissue composition compared to wild-type plants, which is consistent to its expression pattern (Ko et al. 2006, in preparation). Recently, another pair of NAC transcription factor genes in our list (Table 1), ANAC043 and ANAC066, were suggested as transcriptional regulators for cell wall thickening (NST1 and NST2; Mitsuda et al. 2005).

We carried out a series of cDNA–AFLP (Amplified Fragment Length Polymorphism)-based transcriptome analyses on the vertical stem segments of a hybrid aspen (Populus tremula L. × P. alba L.), which represent different developmental stages with regard to secondary growth: exclusively primary growth at the top, primary-to-secondary growth transition stage, and secondary growth at the bottom. Additional transcription profiles were obtained from bark, xylem, and mature leaf without midvein. Profiles from leaves without midveins were used as indicators for the genes that are not associated with secondary growth. Of the more than 76,800 transcript-derived fragments (TDFs) analyzed, 271 TDFs were selected and sequenced based on their differential expression patterns correlated with secondary growth (Prassinos et al. 2005). Populus homologs (E value less than 1.0 × 10-10 in Blast search) to 16 (>30%) of the 52 Arabidopsis xylem genes were noted among these poplar TDFs. All of the 16 genes were upregulated in the wood forming stems and specifically accumulated in the xylem cells (Fig. 4).
https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig4_HTML.gif
Fig. 4

cDNA–AFLP expression patterns of Populus homologs to selected gene set. PG primary growth, TS transition stage, SG secondary growth. Arabidopsis genes homologous to Populus were indicated on right side. At4g18780, AtCesA08, cellulose synthase catalytic subunit (IRX1); At5g03170, FLA11, fasciclin-like arabinogalactan-protein; At5g17420, AtCesA07, cellulose synthase catalytic subunit (IRX3); At5g44030, AtCesA04, cellulose synthase catalytic subunit (IRX5); At5g40020, Thaumatin-like protein; At1g20160, subtilisin-like serine protease; At1g19300, glycosyltransferase family 8, PARVUS; At3g18660, glycosyltransferase family 8, glycogenin glucosyltransferase-related; At5g54690, glycosyltransferase family 8 protein; At1g17950, AtMYB52, myb family transcription factor; At2g46770, ANAC043, NAC family transcription factor; At5g05390, Putative laccase (diphenol oxidase); At1g79620, Leucine-rich repeat receptor protein kinase; At2g41610, Hypothetical protein; At5g01360, unknown protein; At5g60720, unknown protein. ACP acyl-carrier protein (GenBank accession No. Y10994) as an internal control of ubiquitous expression in all tissues

Comparative transcriptomics with publicly available data

Various bioinformatics approaches were used to test the specificity of the gene discovery strategy described here. We examined the cDNA microarray data that reported different developmental stages of xylogenesis of 2,995 poplar genes (Hertzberg et al. 2001) and transdifferentiation of mesophyll cells into xylem cells of 9,113 Zinnia genes (Demura et al. 2002). Twenty and nineteen of the 52 Arabidopsis genes have homologous sequences on poplar and Zinnia arrays, respectively. Their expression was dramatically increased in the secondary wall-forming xylem tissues and tapered down in the tissues undergoing late xylem maturation and programmed cell death (Fig. 5). Eight out of 16 genes identified from cDNA–AFLP analysis (Fig. 4) were overlapped with this analysis (Fig. 5a). In addition, we analyzed a total of 102,019 poplar ESTs from 19 different tissues (Schrader et al. 2004; Sterky et al. 2004) through PopulusDB (http://www.populus.db.umu.se/). Populus homologs to 40 of the 52 candidate genes were specifically expressed in the cambium and wood forming tissues (Table 1). Also, we searched EST database from cotton fibers, which produce almost pure cellulosic cell walls, by using PlantGDB (Plant Genome Database, http://www.plantgdb.org/). Fourteen genes have their matching ESTs including four MYB transcription factors and cellulose synthases (Table 1), implying their roles in secondary cell wall biosynthesis. As a “negative filtering” regarding secondary xylem formation, we compared our final gene set with the 5,133 fern ESTs and 14,703 moss ESTs obtained from NCBI database. Since fern (Ceratopteris richardii) does not undergo secondary growth and moss (Physcomitrella patens) has non-vascularized stem, any xylem-specific genes are expected to be underrepresented or absent in these plants. We could not find any homologs of the 52 Arabidopsis genes except cellulose synthases and protein kinases, which are highly conserved in the plant kingdom (data not shown).
https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig5_HTML.gif
Fig. 5

Coordinated transcriptional regulation of the core xylem gene set in the differentiation of secondary wall-forming xylem. a Wood forming tissues of poplar. Differential expressions of 20 genes from the final gene set, which have their matching genes (homologs) in the array data of (Hertzberg et al. 2001), were depicted in expression ratios (log2 scale over control sample). Phl phloem, A cambial zone including initials and dividing phloem and xylem mother cells, B early expanding xylem, C late expanding xylem, beginning of secondary wall formation in vessels, D secondary wall forming xylem, beginning lignification, E late xylem maturation, contains lignifying cells and cells undergoing programmed cell death. The control sample was an equal mixture of samples AE (Hertzberg et al. 2001). Error bar means standard deviation. b Transdifferentiation of mesophyll cells into xylem cells of Zinnia elegans. Differential expressions of 19 of the final gene set, homologs to genes in the Zinnia array data (Demura et al. 2002), were depicted. Stage 1, corresponding to the functional dedifferentiation process during which mesophyll cells lose their photosynthetic capacity and acquire a new multidifferentiation potency; stage 2, corresponding to the process of differentiation from procambial initials into the precursors of tracheary element (TE); and stage 3, corresponding to the process of morphogenesis that characterizes TE formation and includes secondary-wall formation and programmed cell death (PCD) (Demura et al. 2002). Asterisks indicate the genes overlapped between a and b

Xylem vessel differentiation-associated expression of the candidate genes in Arabidopsis was tested using the published data of Kubo et al. (2005). They established an in vitro xylem vessel element inducible system from Arabidopsis suspension cells and performed microarray analysis with the Arabidopsis full-genome GeneChip array ATH1 (Affymetrix) over the time course. We found 35 genes were overlapped with their gene set (set 3, 8, and 22; Kubo et al. 2005), which showed up-regulated expression when the xylem vessel elements were actively forming (6 days after induction) (Fig. 6).
https://static-content.springer.com/image/art%3A10.1007%2Fs00438-006-0157-1/MediaObjects/438_2006_157_Fig6_HTML.gif
Fig. 6

Coordinated transcriptional regulation of the core xylem gene set in in vitro xylem vessel element inducible system of Arabidopsis suspension cells. Gene expression profiles were downloaded from NASCArrays’ experiment (http://www.affymetrix.arabidopsis.info/narrays/experimentpage.pl?experimentid = 361) and analyzed. In this system, ~50% of subcultured cells of Arabidopsis (ecotype Col-0) differentiate into xylem vessel elements within 7 days in the presence of 1 μM brassinolide and 10 mM boric acid (Kubo et al. 2005). The expression patterns of 35 candidate genes were shown

We also used the “Meta-Analyzer” function of GENEVESTIGATOR to demonstrate that the 52 candidate genes are strongly co-regulated developmentally (“Gene Chronologer”), spatially (“Gene Atlas”), and environmentally (“Response Viewer”) (Supplemental Fig. S3). The result of this analysis also indicates that the expression of our core xylem gene set is not affected by various single environmental signals or exogenous growth regulator treatments, suggesting that they may be tightly regulated by developmental cues.

Discussion

It may be possible to extract novel, meaningful biological information from large-scale comparative analyses of expression data from an organism using identical platform such as Affymetrix GeneChip arrays (Zimmermann et al. 2004). Here, we have successfully used thousands of publicly available Affymetrix GeneChip array data to carry out a global comparative transcriptome analysis for the discovery of the gene network regulating secondary xylem development.

The data analysis has been performed by filtering a large number of genes that are upregulated in the stem undergoing secondary growth through two partially overlapping gene sets. The first filter effectively collected genes having secondary cell wall/wood forming tissue-preferential expression patterns (“digital northern” analysis). The second filter provided a means to identify xylem-specific genes at tissue-type resolution (“digital in situ” analysis). This second filter eliminated secondary phloem genes, epidermis and other nonvascular genes, and any xylem genes that were peculiar to stem or root-hypocotyl xylem (i.e., not core xylem genes). The end result is a short list of genes that are upregulated in secondary xylem development.

Validity of this approach to gene discovery was supported by various independent experimental and bioinformatics methods. Promoter activities of selected genes (ANAC012 and ANAC073) from the final gene networks were examined in transgenic plants to experimentally demonstrate the predicted expression pattern (Fig. 3). The phenotypic consequences of transgenic plants of candidate genes were relevant to xylem development and secondary cell wall biosynthesis (data not shown). Populus homologs of 16 out of the 52 genes were abundantly expressed in the xylem cells of the stems undergoing secondary growth (Fig. 4). Furthermore, comparative analysis of poplar ESTs derived from different tissues (Schrader et al. 2004; Sterky et al. 2004) showed that Populus homologs of the candidate genes were mainly represented in the cDNA libraries derived from cambium and other vascular tissues (Table 1). Transcriptional co-regulation of the candidate genes in different developmental stages, organs, and environmental stresses were clearly demonstrated by the GENEVESTIGATOR analysis (Supplemental Fig. S3). Examination of publicly available experimental data revealed that the expression of most of the candidate genes strongly coincided with the differentiation of secondary wall-forming xylem in poplar, Z. elegans and Arabidopsis (Table 1; Figs. 5, 6). The validity of the approach described here was further supported by recent reports by Persson et al. (2005) and Brown et al. (2005). Using a similar comparative analysis of publicly available Arabidopsis GeneChip data, they identified secondary cell wall biosynthesis-associated genes, most of which were also represented in our candidate gene list (Table 1).

A consideration of functions predicted for the wood formation genes revealed a surprisingly unique gene network associated with secondary xylem development. The candidate gene list consists of 7 genes involved in signal transduction, 9 in transcriptional regulation, and 15 in downstream effectors for cell wall metabolism, and 16 in unknown function. It is very interesting that most of the signal transduction-associated genes encode components of small GTPase ROP signaling pathway. ROPs are molecular switches involved in multiple developmental processes such as gene expression, H2O2 production, endocytosis, exocytosis, cytokinesis, cell cycle progression, cell differentiation, and cell wall synthesis in various eukaryotic organisms (Settleman 2001; Yang 2002; Vernoud et al. 2003). The common mechanism of action for these ROP GTPases is believed to be in controlling cell growth and morphology by regulating the dynamic assembly of cytoskeleton. Signal transduction from the cell surface receptors occurs through the interaction of ROP GTPase with its regulatory proteins and downstream effector proteins (Hall 1998; Fu et al. 2002; Chen et al. 2003). A plant ROP GTPase was co-immunoprecipitated with the plant RLK, CLAVATA1 (Trotochaud et al. 1999), suggesting that ROP GTPases may be activated through direct association with plant RLKs (Vernoud et al. 2003). Arabidopsis has six RopGAPs, all of which have an N-terminal Cdc42/Rac-interactive binding (CRIB) motif that is critical for the Rop-specific regulation of GAP activity (Wu et al. 2000). As for downstream effector proteins, Arabidopsis has a class of novel plant proteins, known as ROP-interacting CRIB-motif-containing (RIC) proteins (Wu et al. 2001). The final gene list contained all of the essential components in ROP signaling, such as AtROP7/AtRAC2 (At5g45970), RopGAP5 (At1g08340), RIC2 (At1g27380), and a RLK (At1g79620) that belongs to the same family of CLAVATA1 (PlantsP database, http://www.plantsp.sdsc.edu/). This suggests that the ROP GTPase-mediated signaling may act as a molecular switch for the initiation of secondary growth in Arabidopsis. Previously, two cotton Rop GTPases (GhRac9 and GhRac13) were reported to be highly expressed just before the secondary cell wall synthesis (Delmer et al. 1995). Phylogenetic analysis revealed that AtROP7, GhRac9, and GhRac13 were located in the same clade (Nakanomyo et al. 2002). Potikha et al. (1999) reported that an increase in H2O2 production accompanies the transition from primary to secondary wall formation, and GhRac13 is a regulator of H2O2 production, which in turn stimulate secondary cell wall formation and/or cellulose synthesis. In addition, ZeRAC2, a ROP GTPase of Zinnia elegans, accumulated preferentially in xylem cells and transiently at the time when visible tracheary elements appear (Nakanomyo et al. 2002). We also note the presence of the components of ROP GTPase signaling in a recent report about the genes expressed preferentially in xylem during the wood formation of Eucalyptus (Paux et al. 2004). Very recently, Brembu et al. (2005) demonstrated that the AtROP7/AtRAC2 is specifically expressed during late stages of xylem differentiation in Arabidopsis. However, the functional significance of AtROP7 is still elusive. The more sophisticated and comprehensive characterizations of AtROP7, together with RopGAP5, RIC2, and a CLAVATA1-like RLK, may lead to novel findings regarding the regulation of xylem development.

Thirteen of the 52 candidate genes were “downstream effectors for cell wall metabolism” such as cellulose synthases for secondary cell wall biosynthesis, glycosyltransferases for pectin biosynthesis, and cell wall structural proteins. It also includes three putative glycosylphosphatidylinositol (GPI)-anchored proteins, FLA11 (fasciclin-like arabinogalactan protein 11), uclacyanin II-like protein and COBL4 (COBRA-like protein 4). GPI-anchored proteins are thought to be targeted to the plant cell surface and involved in extracellular matrix-remodeling, cell adhesion and signaling (Elkins et al. 1990; Sherrier et al. 1999; Borner et al. 2002; Kim et al. 2002). For example, the plasma membrane-associated COBRA is thought to play a regulatory role in cell wall maintenance and/or biosynthesis probably by influencing either the crystallization or deposition of cellulose microfibrils in the cell wall of expanding root cells (Schindelman et al. 2001; Roudier et al. 2002, 2005). Recently, Li et al. (2003) reported that BC1 (brittle culm1), which has the highest similarity to COBL4, plays an (important role in the biosynthesis of the cell walls of mechanical tissues. Also, the secondary cell wall-specific expression of AtFLA11 was confirmed both at the transcripts and protein level (Ito et al. 2005).

NAC proteins are known as one of the largest families of plant-specific transcription factors (for recent reviews, Olsen et al. 2005). Their biological functions regarding xylem cell differentiation are newly emerging. For examples, VASCULAR-RELATED NAC-DOMAIN6 (VND6) and VND7 could induce transdifferentiation of various cells into metaxylem- and protoxylem-like vessel elements, respectively, in Arabidopsis and poplar (Kubo et al. 2005). We have identified four NAC genes (Table 1). Vascular tissue-specific expression of two genes (ANAC012 and ANAC073) was confirmed by promoter-GUS analysis (Fig. 3). Our phenotypic observations of 35S::ANAC012 transgenic plants suggest the functional role of ANAC012 in the xylem fiber development (Ko and Han, unpublished data). Furthermore, ANAC043 and ANAC066 were described as regulators for secondary wall thickenings (Mitsuda et al. 2005).

The complexity of multicellular organism requires proper context-dependent expression of genes, which is achieved by highly interconnected transcriptional networks (Zimmermann et al. 2004). Careful documentation of the flux in transcriptome at various tissue/cell types may reveal significant patterns of gene expression that can lead to the discovery of a gene network regulating certain growth and developmental processes (Segal et al. 2003). In this report, we explored thousands of publicly available Arabidopsis GeneChip array data sets to identify candidate genes involved in the gene network underlying secondary growth, and more specifically xylem differentiation. The expression pattern of the resulting candidate genes was confirmed by various independent methods. While understanding the molecular mechanisms of secondary growth at the systems level may require the exploitation of different types of data such as gene expression, protein abundance, protein interaction, and metabolite abundance (Troyanskaya et al. 2003), the results described in this report provided a significant novel insight that can lead to many testable hypotheses for unraveling this uniquely understudied aspect of plant development, the regulation of xylem differentiation and function.

Acknowledgments

We thank John Ohrlogge at Michigan State University for the stem epidermis GeneChip data, and NASCArrays for various Affymetrix GeneChip data. This work was supported by the USDA CSREES (grant no. 01-34158-11222 and 2002-34158-11914 to K.H.H), the National Science Foundation (grant no. IBN-0131386 to E.P.B) and the Department of Energy (grant no. DE-FG02-04ER15627 to E.P.B).

Supplementary material

Copyright information

© Springer-Verlag 2006