Background

Eucalyptus tree species are an extremely important source of hardwood for forest industries worldwide. It is the most widely planted hardwood species in the temperate, sub-tropical and tropical zones. In South Africa, about 1.26 million ha are Eucalyptus plantations which accounts for 37.7% of total forest plantations [1]. Eucalyptus grandis are the most commonly used trees in forest plantations. The most serious problem affecting wood quality and product yield of South African Eucalyptus trees is the high level of growth stress that develops as the trees grow, manifesting itself in severe splitting when the trees are felled and cut into logs [2]. Molecular markers linked to wood splitting in E. grandis were developed by Barros et al. [3] and were successfully used in the selection of non-splitting clones as part of a marker-assisted breeding programme. However, no genes linked to the differential response of E. grandis trees to wood end splitting and to growth stress were identified. Growth stress arises from the deposition of lignin within the secondary walls during maturation of fibrous cells, which includes the biosynthesis of polysaccharides and cell wall proteins [4, 5]. The genes and genetic mechanisms that underlie growth stress are of particular interest in E. grandis due to the potential of identifying trees with desirable wood properties.

A variety of molecular techniques are available to identify differentially expressed genes. cDNA-AFLP has been successfully used in the identification of a wide range of candidate genes including genes with possible roles in plant defense response [6], in fruit ripening processes [7] and in cell wall biosynthesis in Eucalyptus [8]. Recently, expressed sequence tag (EST) sequencing proved to be an efficient approach to identify gene types and novel genes during wood formation. Using this technique Allona et al. [9] found a significant representation of cellulose, lignin and other cell-wall biosynthesis genes and a comparable percentage of ESTs. Pavy et al. [10] identified a total of 260 differentially expressed gene sequences and the gene encoding the Smad4 interacting factor by statistical analysis of ESTs belonging to the TIGR Pinus Gene Index. In a similar experiment, Ramussen-Poblete et al. [11] described transcription factor families such as the AUX/IAA (auxin/indole-3-acetic acid) family, MYB 9 and HD containing domains (zinc finger proteins and homeodomain-leucine zipper) that regulate genes participating in xylem development and secondary cell wall formation (lignin and cellulose biosynthesis) [12, 13].

The accumulation of large EST libraries have allowed the construction of high-throughput cDNA microarray chips [5], which could be used to study gene expression in tree species such as Eucalyptus [1416], pine [17, 18], poplar [19] and aspen [20]. Demura et al. [21] used cDNA microarrays to identify clusters of Zinnia elegans genes that punctuate the major morphological and biochemical events of the transdifferentiation of tracheary elements [8]. Further evidence of the identity of major genes involved in wood formation has been gained in a separate study by Foucart et al. [22] who established a portfolio of Eucalyptus xylem genes. This technique can also be used to assay DNA sequence variation in different phenotypes reducing the genotyping effort as well as producing quantitative raw data that can then be converted into discrete genotypes and has been used in several studies [2326].

A strategy that combines high-throughput microarray expression profiling with genotyping offers the opportunity to explore gene expression of a tree that has not been completely characterized at the molecular level. The aim of the study was to develop a prototype microarray chip from differentiating xylem tissue of E. grandis trees differing in their splitting characteristics and in their lignin contents. cDNA-AFLP and cDNA microarray analysis was used to identify individual E. grandis trees exhibiting preferred wood qualities and to identify differential expressed genes underlying different aspects of wood development that could help elucidate wood splitting. This study also evaluates the potential of combining expression analysis with fingerprinting analysis for the early detection of E. grandis trees that are prone to severe splitting. Trees identified as being prone to splitting could be excluded from breeding populations and add value to plantation forestry.

Results

RNA and cDNA quality

The RNA extracted from wood-forming tissue of the seven E. grandis trees was found to be of high quality and the absence of contaminating genomic DNA was confirmed for all cDNA samples (results not shown). The amplification of a region of the CAD2 genes from cDNA yielded the expected 410 bp mRNA-derived amplicon, which was clearly distinguishable from the 700 bp genomic DNA-derived, intron-containing fragments (results not shown).

Assembly of clones from cDNA library and cDNA-AFLP

cDNA libraries were constructed from RNA extracted from the seven E. grandis trees and a total of 810 cDNA clones were arrayed onto a microarray slide to be used for transcript profiling of the trees. The cDNA-AFLP clones were also generated from the seven trees and the selective amplification using the single +3 Mse/+2Pst primer combination showed high variable expression levels among the trees. Amplified fragments ranged in sizes from 100 bp to over 700 bp when visualized on polyacrylamide gels. These fragments were cloned and spotted onto the same microarray slide. In total 768 cDNA-AFLP clones were spotted to be used for the identification, fingerprinting and expression profiling of E. grandis trees.

Analysis of the combined array

The combined microarray containing a total of 1578 clones was hybridized with cDNA from the seven E. grandis trees. Hybridization profiles showed that 193 (12.6%) cDNA-AFLP clones and 206 (13.4%) cDNA library clones were differentially expressed. This revealed that both approaches generated a similar amount of differentially expressed clones suggesting that both techniques are equally useful for expression profiling.

General expression patterns of the combined array

A 1578-probe prototype cDNA microarray was constructed by arraying selective amplifications (Mse3/Pst4) of 768 cDNA-AFLP fragments and 810 cDNA library clones from seven individual Eucalyptus trees onto silanized glass slides. The cDNA profiles were clustered according to their expression patterns using Pearson's correlation in the Cluster program of Eisen et al [27] and are shown in Fig. 1A. Based on the clustering, ten different groups of co-expressed genes could be annotated (Fig. 1B). Clusters 3 and 4 contain genes that were up-regulated in the high lignin and the two high splitting trees. Most transcripts represented in these clusters are involved in cell wall biogenesis and include genes such as glucuronic acid decarboxylase 3 (UXS3), xyloglucan endotransglycosylate (XET) and caffeoyl-CoA 3-O-methyltransferase (CCoAOMT). Cluster 10 represents genes that were up-regulated in the high lignin and in the two high splitting trees. Most transcripts belonging to this cluster are associated with stress/defense. One transcript belonging to cluster 2, the putative zinc finger protein, was up-regulated in one low lignin and in the two low splitting trees. This gene is also part of the stress group. In general it was observed that stress-related genes were mostly up-regulated in the high lignin and high splitting trees. A similar pattern was also observed for the transcripts responsible for the higher lignin content which were up-regulated in both the high splitting trees and high lignin trees.

Figure 1
figure 1

Hierarchical clustering of expression patterns. A. Hierarchical clustering of 1578 differentially expressed tissue profiles from the combined cDNA array. Relative expression levels (median centered and standardized values) are represented by a continuum with green signifying relative low expression of the ESTs, black indicating moderate expression (relative up-regulation) in the respective tissues, and red indicating high expression. The rows correspond to the quantified ESTs and columns to the respective E. grandis trees. LS: low splitting, HS: high splitting, H: high lignin, and L: low lignin, all collected from the lower-half of the stem. The transcripts have been divided into 10 expression pattern clusters as indicated by the numbers 1–10. B. Graphs of expression pattern clusters. Vertical axes in B represent standard deviation from the median expression level of each gene. The location of each cluster is indicated in A.

The combined microarray was further analyzed by determining the statistical significance of changes in transcript abundance using the method described by Wolfinger et al. [28]. Eighty clones were found to be distinctly differentially expressed with a fold change of at least 1.4 or -1.4 and a p value of 0.01. These clones representing different gene clusters were isolated and functionally classified based on the MIPS standard [29]. This revealed that 10% of the expressed sequence tags (ESTs) were involved in cell wall biogenesis, 4% in cell growth, 4% in protein metabolism, 2% in transcription, 1% in energy, 5% in metabolism, 4% in signal transduction and 2% in stress (Additional file 1). A high proportion of the ESTs (67%) were classified as either having unknown function (21%) or as not producing any hits (46%). At least ten cDNA clones of each distinct group were sequenced to get precise information on their potential functions. Fragments sequenced fitted into the broad classification of similarity-inferred EST identities based on BLASTX results. The putative functions of the fragments sequenced are listed in Table 1.

Table 1 Differentially expressed transcripts of seven E. grandis trees and their putative functions

The majority of ESTs were up-regulated in the low splitting trees and seem to be involved in metabolism and transcription. A D-isomer specific 2-hydroxyacid dehydrogenase, which has oxidoreductase activity, a 26S ribosomal RNA, a putative glycyl tRNA synthetase and a splicing factor Prp8 were identified in these groups. Most of the transcripts belong to the functional category cell wall biogenesis and include transcripts such as glucuronic acid decarboxylase 3 (UXS3), xyloglucan endotransglycosylate (XET) and caffeoyl-CoA 3-O-methyltransferase (CCoAOMT). All three transcripts are known to play a fundamental role in regulating cell wall architecture and mechanical strength [30]. A second group of candidate transcripts that were up-regulated in the high lignin and high splitting trees were associated with stress/defense-related functions. This group contains transcripts such as a leucine-rich repeat, a lipase family protein and an antigen. Only a putative zinc finger protein in this group was up-regulated in one low lignin and in the two low splitting trees. All the transcripts associated with stress/defense are known to be strongly expressed in response to stress during secondary cell wall synthesis [31].

Quantitative expression analysis of cDNA-AFLP clones

Quantitative expression data for cDNA-AFLP clones was obtained by assaying the presence or absence of microarray markers using the hybridization patterns of the 768 E. grandis clones among trees. Generally, fragments on the array ranged in size from 100 bp to over 700 bp. Of 768 clones spotted onto the slide 133 (17.3%) were found to be polymorphic among the trees. The analysis was limited to only those spots for which clear threshold values (difference of 0.5 in relative intensity between two intensity classes) could be assigned. Spots with clear threshold values could be easily converted to binary scores (see Additional file 2). A unique microarray pattern was obtained for each Eucalyptus tree. Hybridisation profiles resulting from individual trees were 96% identical to those obtained in replicate, by reverse labelling reactions.

qRT-PCR verification

The combined cDNA array analysis was representative of differentially expressed transcripts from seven E. grandis trees. Real-time PCR was performed on five ESTs representing different expression clusters to verify the accuracy of cDNA microarray quantification. The ESTs chosen included UDP-glucuronic acid decarboxylase 3 (UXS3), a histidine-containing phosphotransfer protein 2 (hpt2), D-isomer specific 2-hydroxyacid dehydrogenase, caffeoyl-CoA 3-O-methyltransferase (CCoAOMT) and protein-L-isoaspartate O-methyltransferase. Microarray analysis suggested that ribosomal RNA was expressed constitutively and was, therefore, used as a control for normalization of the real-time PCR data. Analysis of results from both microarray data and qRT-PCR showed that the trends and patterns are consistent between the two different methods (Table 2). The higher fold values of transcripts detected by qRT-PCR were expected.

Table 2 Verification of array results

Discussion

Identification of superior E. grandis trees not prone to growth stress is essential for maximising the effectiveness of plantations adding value to the forestry industry. The genes and genetic mechanisms that underlie growth stress are of particular interest in E. grandis, due to the potential of selecting trees prone to severe splitting that could be excluded from breeding populations. In this study, a 1578-probe cDNA microarray was developed for both genotyping and expression profiling using seven different E. grandis trees. The combined microarray offers an opportunity to discriminate between individual trees as well as analyze transcript abundance, variability and the usefulness of the chip for fingerprinting.

For the transcript profiling and genotyping, a 1578-probe prototype cDNA microarray was constructed by arraying 768 cDNA-AFLP fragments and 810 cDNA library clones from seven Eucalyptus trees onto silanized glass slides. This provided an overview of transcript abundance, variability and the usefulness of the chip for fingerprinting transcripts. Analysis of the cDNA clones suggested that a significant proportion of genes expressed in the wood forming tissues of Eucalyptus are strongly up- or down-regulated. The high variability in gene expression patterns demonstrates that the sampling strategy used was successful in separating differentiating xylem tissue from the seven E. grandis trees and shows the extent to which the tissues of different tree phenotypes differ in function, biochemistry and morphology.

Clustering of expression profiles allowed the identification of distinct groups of co-expressed genes. These distinct groups may contain genes that are involved in the main metabolic or developmental processes occurring during tissue differentiation. A total of 80 differentially expressed transcripts representing different gene clusters were isolated and characterized. Ten percent of differentially abundant transcripts were identified as having roles in cell wall biogenesis. Two of them, glucuronic acid decarboxylase 3 (UXS3) and xyloglucan endotransglycosylate (XET), were found to be up-regulated in the high lignin tree and in the two high splitting trees. The increased expression of the UXS3 gene in the high lignin tree was expected as this gene was shown to be a precursor of xylan production in Arabidopsis [32, 33]. Xylan is a component of hemicellulose and an increase in xylan will result in increased lignin. The up-regulation of the UXS3 gene in the two high splitting trees could be the result of the involvement of this gene in cell wall biosynthesis. This gene is responsible for the organization of cellulose, hemicellulose and lignin in cell walls, and therefore determining the mechanical strength of the cell wall. The XET gene which is similar to the Arabidopsis XET protein XTH9 [34] and to poplar XET gene (PttXET16A) [35] was found to play a fundamental role in the construction and modification of cell wall architecture. Nishikubo et al. [36] observed that the XET gene is involved in the repair of xyloglucan cross-linkages, creating and reinforcing the connections between the primary cell wall and the secondary cell wall layers. Since the XET gene was found to be up-regulated in the two high-splitting trees its role in wood splitting and in growth stress could be speculated. Growth stress originates in the cambial region of the stem during the maturation of the cells where the contraction of the cellulose molecule during lignin deposition is a contributing factor to the stress [37]. High splitting trees are thought to have elevated levels of growth stress and thus the higher expression of the XET gene. This could confirm the greater activity of this gene in expanding the cell wall during secondary cell wall thickening. The growth stress in the trees is in equilibrium but as soon as it is cut, and this state of balance is modified, log deformations and splits occur. Equally interesting is the transcript profiling pattern of genes encoding important enzymes in lignin biosynthesis, such as caffeoyl-CoA 3-O-methyltransferase (CCoAOMT). This gene is a key transcript directly associated with lignin biosynthesis [38] and has been characterized in tobacco [39] and poplar [40]. Paux et al. [41] reported that the CCoAOMT gene was up-regulated in Eucalyptus gunni xylem and this gene was shown to be involved in cell wall formation. The up-regulation of the CCoAOMT gene in the high lignin tree was expected as this gene is involved in lignin biosynthesis. The up-regulation of CCoAOMT in the high splitting trees suggests that this gene responds to signaling mechanisms and triggers a stress-related compensatory deposition of lignin.

The second largest group of candidate transcripts identified was associated with stress/defence-related functions. Most the transcripts in this group were up-regulated in the high lignin and in the two high splitting trees suggesting that the cells in the xylem layer could play a role in protecting the cambium under stress conditions. Only one transcript associated with stress, a putative zinc finger protein, was up-regulated in the two low splitting trees and in the low lignin tree. Zinc finger proteins have been speculated to interact with cellulose [42] and to be strongly expressed in response to gravitational stress during secondary cell wall synthesis [15, 16, 31]. The up-regulation of the gene coding for this protein in the low lignin and low splitting trees could not be explained at this stage.

Several studies in forest trees have reported high proportions of sequences lacking similarity to any known proteins [9, 4346]. In this study, similar results were obtained. Many transcripts showed no significant homologies to publicly available sequences. A high proportion of cDNA clones (67%) were classified as transcripts lacking similarity to any known sequences (21%) or as transcripts not producing any hits (46%). The genes of unknown function are most probably transcripts that are highly and specifically expressed in wood-forming tissues. These differentially expressed genes are a source of novel genes whose function should be characterized in future studies to determine their role in secondary xylem formation and, represent an important source of candidate genes to improve the quality of wood in E. grandis.

Another important aim of this study was the development of a combined microarray for the characterization and genotyping of E. grandis trees for future breeding programmes. The observed high variability in gene expression patterns among the seven individual trees representing the four phenotypes provided a starting point for the clustering of the 768 cDNA-AFLP clones. In this context, direct comparison of signal intensity profiles suggest that the cDNA chip developed will allow the genome-wide fingerprinting of the seven E. grandis genomes since a unique microarray pattern was obtained for each individual tree. Some of the genes preferentially and/or specifically expressed in Eucalyptus cambium were shown to exhibit a distinctive expression pattern, which could be related to the bimodal distribution of the expression patterns.

Conclusion

A new microarray prototype was constructed that combined expression profiling and genotyping of E. grandis trees. This provides a tool for the identification and characterization of trees with superior qualities in breeding programmes. Furthermore expression level analysis gave a perspective of the types of genes active in wood-forming tissues while genotyping allowed the identification of individual trees. The genetic markers identified in this study in the form of genes that are either up- or down-regulated in the four different phenotypes could be used to develop gene-specific markers. The long-term objective of this study is to use the combined microarray for the identification of individual trees prone to splitting and for the identification of novel genes targeted to specific pathways. Novel genes for which no function has yet been assigned may hold the key towards a better understanding of the developmental processes and biochemical pathways that underlie wood formation and could be the source of candidate genes to improve the quality of wood in E. grandis.

Methods

Plant materials and tissue harvesting

Differentiating xylem tissue samples were collected from each of seven 4-year old coppice re-growth E. grandis trees that belonged to two unrelated, open pollinated trials, called the 'Florida' and the 'Frankfort' trials. The 'Florida' trial was established from seed imported from Florida, USA and the 'Frankfort' trial was established from South African plantation trees. The E. grandis trees were originally planted in 1979 and felled in 1999. All trees were characterized for their splitting qualities and lignin content as described by Turner [47]. Seven trees that best corresponded to the two selected traits were used in this study and are shown in Additional file 3 along with the trait for which they were selected. Two low lignin and one high lignin tree were selected from the 'Florida' trial and two high splitting and two low splitting trees were selected from the 'Frankfort' trial. For total RNA extraction a section of the stem of the coppice was progressively debarked and the exposed xylogenic tissue was scrapped, immediately frozen and stored at -80°C.

Total RNA extraction, quality control and cDNA synthesis

Total RNA was isolated from xylem tissue of seven E. grandis trees (741-H (high lignin), 108-L (low lignin), 243-L (low lignin), 1/23/4-HS (high splitting), 1/71/6-HS (high splitting), 1/91/7-LS (low splitting) and 1/92/7-LS (low splitting)) as described by Chang et al. [48]. The total RNA was DNAse (Roche Diagnostics GmbH) treated and using an Oligotex® mRNA Mini Kit (QIAGEN, Valencia, CA). RNA concentration was estimated using a ND-1000 Spectrophotometer (NanoDrop USA, Wilmington, DE) and integrity was evaluated on an agarose gel stained with ethidium bromide. Double-stranded cDNA was synthesized from purified RNA using the cDNA Synthesis System (Roche Diagnostics, Mannheim, Germany) according to manufacturer's protocol. cDNA was subsequently column-purified using the QIAquick PCR Purification Kit (QIAGEN). The purified cDNA were assayed for genomic DNA contamination by PCR using four separate intron-extron boundary spanning primer pairs: CCR.34-F1 (ACGTTGTGGTGGACGAGTC) and CCR.34-R1 (ACGTATGCCTGGACCGAGT) specific for the E. globulus cinnamoyl CoA reductase (CCR) gene; CCR1.23-F1 (CTTGTTGGAGCGACCTCGAA) and CCR1.23-R1 (ACGTACGCCTGGACCGAGTT) specific for the E. gunnii CCR1 gene; CAD.34-F1 (CTTGCAATTCGGACCAGGA) and CAD.34-R1 (GCTCCAATGCCTCCGTTCT) specific for E. saligna cinnamyl alcohol dehydrogenase gene; CAD.45-F1 (TCGCGATGCTTACCTAGTGAG) and CAD.45-R1 (CACGACGAACCTGTACCTGAC) specific for the E. gunnii cinnamyl alcohol dehydrogenase gene (CAD2) gene; these genes are known to be expressed in wood-forming tissues (Kirst et al. 2001). PCR amplification was performed using Taq DNA polymerase (Roche Diagnostics) at 55°C. Aliquots (5 μl) were removed after 20, 25, and 35 PCR cycles and assayed by agarose gel electrophoresis. cDNA synthesized was then used for cDNA-AFLP analysis and cDNA library construction.

cDNA-AFLP analysis and library construction

cDNA-AFLP analysis was performed on the seven individual trees as described by Vos et al. with minor modifications [49]. One hundred nanogram of double-stranded cDNA was used as initial template for restriction digestion with PstI and MseI (KeyGene). For pre-amplification an MseI primer and a PstI primer without a selective nucleotide were combined. The amplification mixtures obtained were diluted 20-fold and 5 μl were used for the selective amplifications. Twelve MseI primers with two or three selective nucleotides at the 3' end were combined with six PstI primers with two or three selective nucleotides at the 3' end were used for the cDNA-AFLP analysis. One primer combination (Mse3 and Pst4) was selected for further studies as it had the highest polymorphisms and large numbers of scorable bands. The adaptors and primers used for cDNA-AFLP analysis can be viewed in additional file 4. The cDNA-AFLP fragments obtained by selective amplification were inserted into a pGEM T-easy vector system II cloning kit (Promega, Madison Visconsin) following the manufacturer's instructions. Cloned cDNA-AFLP fragments were then amplified with primers T7 and SP6 (Promega, Madison Visconsin) for arraying onto the microarray slide.

cDNA library construction

The seven individual E. grandis trees were used to construct the cDNA library. The cDNA library was prepared by using the pGEM T-easy vector system II following the manufacture's instructions (Promega, Madison Visconsin). cDNA fragments were prepared by restriction-enzyme digestion of cDNA followed by ligation and transformation into Escherichia coli DH10α host cells. Individual colonies were plated on a grid followed by vector specific PCR using T7 and SP6 primers to verify that only single fragments were ligated. The cDNA library was stored at -80°C in 96 well microtiter plates in 75 μl of Luria Broth and 75 μl of a 50% glycerol solution. Before arraying, the individual clones were amplified using primers T7 and SP6 (Promega, Madison Visconsin) following the manufacturer's instructions (Promega, Madison Visconsin).

Construction of the combined cDNA array

The 1578 cDNA clones used for microarray construction were a combination of two separate libraries, namely 810 cDNA library clones and 768 cDNA-AFLP cloned fragments. Amplified cDNA and cDNA-AFLP clones were purified using Multiscreen® PCR Purification Plates (Millipore, Molsheim, France) and visualized on a 1% agarose Electro-Fast® Stretch gel (ABgene, Epsom, UK). Purified clones were robotically printed onto silanised glass slides (Amersham Biosciences, Little Chalfont, UK) using an Array Spotter Generation III (Molecular Dynamics, Sunnyvale, CA, USA). The GUS and bar genes and a fungal rDNA internal transcribed spacer (ITS) fragment were also printed to serve as controls for global normalization. Fragment were arrayed in duplicate on each slide at 250-μM. A fungal rDNA internal transcribed spacer (ITS) fragment, water and a bar gene at concentrations of 50 ng/μ, 100 ng/μl, 150 ng/μl and 200 ng/μl were also printed to serve as controls.

Hybridization of array slides

Seven E. grandis trees were used for microarray hybridizations. Probe cDNA from individual trees was prepared by restriction-enzyme digestion of cDNA (200 ng per tree) followed by ligation of restriction fragments to adapters and subsequent amplification following the protocol described above. Amplification products were column-purified using the QIAGEN PCR Purification Kit (QIAGEN, Valencia, CA) according to manufacturer's instructions. Probe cDNA labelling and hybridization were carried out following the procedure as described by Lezar et al. [50]. Reactions were spiked with cyanin-labeled controls for GUS, ITS and bar genes. Slides were scanned with a Genepix™ 4000B scanner (Axon Instruments, Foster City, CA, USA). The mean pixel intensity of each array that resulted from the individual hybridizations was quantified with the Array Vision 6.0 software (Imaging Research Inc., Molecular Dynamics, USA). For each hybridization experiment, one technical replicate (using independent labelling reactions) was performed, each replication consisting of a reverse labelling experiment. In addition, the whole experiment was repeated with one biological replicate labeled with Cy5 dye (i.e. three microarray slides were used in total for each sample).

Image acquisition, data processing and statistical analysis

For each spot on the array, local background signal intensities were subtracted and signal intensities of duplicate spots on glass slides were averaged. A clone was considered to have hybridized to the array, if its fluorescence was more than two standard deviations above local background. Abnormal spots (e.g. high background, dust, irregularities) were manually flagged for removal. Anomalous spots detected through manual inspection were removed, if the signal intensity of an array feature varied more than 10% from the duplicate spot. Signal intensities of duplicate spots were then averaged and spots with a signal-to-noise ratio of less than two were rejected. Intensity values were normalized across slides by global regression on the spot intensity data for tree 1/23/4-HS, which was used as a reference for normalization of all spot intensity data (reference design). The control genes GUS, ITS (200 ng/μl, 100 ng/μl and 50 ng/μl) and bar genes printed in duplicate on the array served as a separate control to confirm that data across slides was normalized correctly. The statistical significance of changes in transcript abundance was estimated using the methods described by Wolfinger et al. [51]. Only genes with an average fold-change of 1.4 for biological replicates and a p value of 0.01 were considered to be differentially expressed.

For the cDNA-AFLP fragments, normalized signal intensity values were used to identify polymorphic fragments based on their bimodal distribution of their intensity values across slides as described by Lezar et al. [50]. Polymorphic markers identified were then scored for the absence (0) or presence (1) of the fragment in each of the respective E. grandis trees. The absence and presence of polymorphic spots were used for cluster analysis of the pairwise genetic distances between the hybridization profiles of individual E. grandis trees, using Spearman correlation and hierarchical clustering (CLUSTER, available at http://rana.lbl.gov). The clustering results were visualized with TreeView [27]. Gene expression patterns were identified by converting normalized data into log2 intensity values. Cluster analysis was performed on the normalized and mean-centered signal intensities using Pearson's correlation in the Cluster program and visualized in TreeView [27] in order to identify groups with similar expression patterns across the different E. grandis trees.

The data discussed above has been deposited at NCBI Gene Expression Omnibus (GEO) [52] and is accessible through GEO series accession number GSE14707.

Data quality

To assess the reproducibility of the methods used in this study, biological and technical replicates from pools of xylem RNA samples were hybridized onto the slides each carrying the clones in duplicate. Approximately 16 of the 1578 background-corrected spots, representing about 1.0% of the cDNA present on the glass slide had signal intensities that varied more than 10% of the mean of the two replicates and were manually removed from subsequent data analysis. Spots excluded from analysis showed inaccuracies in signal intensities. This can be ascribed to variability in the experimental process introducing inaccuracies in labelling, array hybridisation, signal detection and quantification. This approach allowed us to obtain correct and repeatable scores, reducing the occurrence of spots that varied sufficiently to be erroneously classified.

Sequencing and sequence analysis

Following microarray analysis, fragments of interest were re-amplified from the libraries using SP6 and T7 primers. Amplification products were column-purified using the QIAGEN PCR Purification Kit (QIAGEN, CA) according to manufacturer's instructions. Sequencing reactions were carried out at Inqaba (South Africa). Single-pass partial sequences were obtained with universal T7 primer. After manual removal of ambiguous sequences, sequences were assigned putative identities by translating BLAST (BLASTX) [53], against the non-redundant protein database of the National Centre for Biotechnological Information database [54]. E-values were considered significant if they were below 10-3.

Confirmation of expression profiles by qRT-PCR

A subset of five genes was used to verify the microarray results. The five fragments that were chosen represent varying expression profiles across the E. grandis trees. Primer pairs were designed to UDP-glucuronic acid decarboxylase 3, hpt2 gene, D-isomer specific 2-hydroxyacid dehydrogenase, partial cDNA sequence of caffeoyl-CoA 3-O-methyltransferase (CCoAOMT) and protein-L-isoaspartate O-methyltransferase. Microarray analysis suggested that ribosomal RNA was expressed constitutively and was, therefore, used as a reference. The relative transcript abundance was detected by a Light Cycler (Roche Diagnostics, Basel, Switzerland) and Light Cycler FastStart DNA MasterPLUS SYBR Green I kit (Roche Diagnostics). PCR reactions were performed in a total volume of 20 μl containing 5 ng of single-stranded cDNA, 1 × Light Cycler FastStart DNA MasterPLUS SYBR Green I Master Mix and 1 μM of each primer. A negative control was run without cDNA template with every assay to assess the overall amplification specificity. Relative quantification was performed using the LightCycler software version 3.5.3 (Roche).