Generation of a large-scale genomic resource for functional and comparative genomics in Liriodendron tulipifera L.
- First Online:
- Cite this article as:
- Liang, H., Ayyampalayam, S., Wickett, N. et al. Tree Genetics & Genomes (2011) 7: 941. doi:10.1007/s11295-011-0386-2
- 274 Views
Liriodendron tulipifera L., a member of Magnoliaceae in the order Magnoliales, has been used extensively as a reference species in studies on plant evolution. However, genomic resources for this tree species are limited. We constructed cDNA libraries from ten different types of tissues: premeiotic flower buds, postmeiotic flower buds, open flowers, developing fruit, terminal buds, leaves, cambium, xylem, roots, and seedlings. EST sequences were generated either by 454 GS FLX or Sanger methods. Assembly of almost 2.4 million sequencing reads from all libraries resulted in 137,923 unigenes (132,905 contigs and 4,599 singletons). About 50% of the unigenes had significant matches to publically available plant protein sequences, representing a wide variety of putative functions. Approximately 30,000 simple sequence repeats were identified. More than 97% of the cell wall formation genes in the Cell Wall Navigator and the MAIZEWALL databases are represented. The cinnamyl alcohol dehydrogenase (CAD) homologs identified in the L. tulipifera EST dataset showed different expression levels in the ten tissue types included in this study. In particular, the LtuCAD1 was found to partially recover the stiffness of the floral stems in the Arabidopsis thaliana CAD4 and CAD5 double mutant plants, of the LtuCAD1 in lignin biosynthesis. L. tulipifera genes have greater sequence similarity to homologs from other woody angiosperm species than to non-woody model plants. This large-scale genomic resour"HistryDatesce will be instrumental for gene discovery, cDNA microarray production, and marker-assisted breeding in L. tulipifera, and strengthen this species' role in comparative studies.
Liriodendron tulipifera L., commonly known as yellow-poplar, tulip tree, or tulip-poplar, is one of only two arborescent species in the genus Liriodendron. Yellow-poplar gained its name due to the uncanny similarity of its wood structure and density to true poplars (Populus species). However, these two species are from distinct evolutionary lineages: yellow-poplar is a member of Magnoliaceae in the order Magnoliales, whereas Populus species are in the core eudicot order Malpighiales. Magnoliaceae flowers usually possess stamens and pistils in a spiral pattern, which is distinct from most other angiosperm species with whorled floral organs and thought to be an ancestral trait for flowering plants (Soltis et al. 2004). Magnoliales and three other orders (Laurales, Piperales, and Canellales) comprise the magnoliids, which, along with Amborellales, Nymphaeles, and Illiciales, form a grade of “basal angiosperm” lineages that contain a wide diversity of floral and growth forms (Qiu et al. 2005; Soltis et al. 2005; Jansen et al. 2007). Among basal angiosperms, Magnoliales are the immediate sister to the species-rich clade including monocots and eudicots with ca. 97% of all angiosperm species (Qiu et al. 2005; Soltis et al. 2005; Jansen et al. 2007; Moore et al. 2007). Its special position in the plant phylogeny and “primitive” floral structure make Liriodendron, along with representatives of other basal angiosperm lineages, an ideal candidate for comparative studies of the evolution of form and process throughout flowering plant history (Wei and Wu 1993; Hunt 1998; Ronse de Craene et al. 2003; Zahn et al. 2005).
In addition to its important phylogenetic position, L. tulipifera has great economic and ecological values. This species is cultivated in many temperate parts of the world for wood production (Hunt 1998) and is one of the recommended species for waste landfill remediation (Kim and Lee 2005). As one of the largest and ornamentally coveted trees in North America, L. tulipifera can attain a height of 61 m with a trunk diameter of up to 152 cm. On good sites (site index = 23 m) in the southern Appalachian mountains, L. tulipifera will grow faster than any associated species (Beck 1990). Compared with other commercially important species, L. tulipifera is remarkably free from damage by insects and diseases, does not require intensive stand management to grow well in dense stands, and is resistant to the damaging effect of metals (such as aluminum) (Klugh and Cumming 2003). The wood of L. tulipifera is commercially valuable and is a raw material source for lumber, furniture, musical instruments, wooden wares, pulp, and many other industries (Moody et al. 1993; Hernandez et al. 1997; Williams and Feist 2004). L. tulipifera is also valued as a nectar source for honey production, as a source of wildlife food (mast), and as a large shade tree in urban settings. In addition, chemical extracts from L. tulipifera wood or leaves have proven useful for a variety of purposes, including anti-tumor effects and antifeeding activity for herbivores (sesquiterpenes) (Moon et al. 2007) and antimicrobial alkaloids (Bae and Byun 1987). Recently, there has been increased interest in conversion of biomass from L. tulipifera to biofuels, as evidenced by studies on ethanol production from this species (Xiang et al. 2004; Berlin et al. 2005; Çelen et al. 2008; Hwang et al. 2008; Koo et al. 2008, 2009).
Little genomic research has been conducted on this species, despite the use of L. tulipifera as a reference species in studies on plant evolution and its significant economic and ecological value. To date, only one L. tulipifera gene, encoding a laccase, has been functionally characterized (LaFayette et al. 1999). Laccases (EC 188.8.131.52) are copper-containing glycoproteins. Several studies have suggested the involvement of laccases in lignin biosynthesis (Ranocha et al. 2002 and references therein). The organization of two L. tulipifera chromosome regions (harboring a GIGANTEA and a LEAFY floral gene, respectively) was recently revealed (Liang et al. 2010, 2011). At present, there exists only one EST database (6,520 unigenes) developed from floral tissues by capillary sequencing (Albert et al. 2005; Liang et al. 2008) and one ca. 5X BAC library with 73,728 large-insert clones (Liang et al. 2007) available for L. tulipifera. This lack of genomic research has hindered the efforts to identify genes involved in traits of economic and ecological importance and limited Liriodendron's role in comparative genomic studies. L. tulipifera is one of the species in the Magnoliaceae family with the lowest chromosome number (2n = 2x = 38). However, with a haploid genome size of 1,802 Mbp (Liang et al. 2007), sequencing and assembly of the L. tulipifera genome would be expensive, given currently available sequencing technologies. Thus, as with most forest tree species, large-scale sequencing and analysis of L. tulipifera ESTs remain a fundamental part of genomics research to enable gene discovery and functional investigations.
Here we report the generation and analysis of a deep transcriptome sequence resource for L. tulipifera. To maximize our ability to identify genes expressed in different tissues, extensive ESTs from ten different tissue types (premeiotic flower buds, postmeiotic flower buds, open flowers, developing fruit, terminal buds, leaves, cambium, xylem, roots, and seedlings) were isolated and sequenced. The unigenes from the newly built database were compared to publically available plant protein sequence databases, and Gene Ontology (GO) terms were determined. Genes involved in wood formation were identified based on similarity to genes in available sequence databases. In particular, a Liriodendron cinnamyl alcohol dehydrogenase homolog (LtuCAD1) was characterized by overexpression in an Arabidopsis CAD4/CAD5 double mutant. This dataset has also been mined for simple sequence repeats (SSRs) and microRNAs (miRNAs). The unigenes generated in this study will facilitate gene discovery and functional studies, support development of cDNA microarrays and assembly of short-read sequences, and thus allow expression profiling experiments to be integrated into investigations of xylem differentiation, reproductive development, insect and disease resistance, etc. in Liriodendron. The 29,289 gene-based SSRs identified in the unigene assemblies will enable marker-assisted breeding in the genus Liriodendron. The availability of this deep genomic resource will also strengthen the utility of Liriodendron in comparative studies of angiosperm evolution. Lastly, it is noteworthy that genomic resources are very limited for other species in the Magnoliaceae family, with a range of only two sequences in genus Dugandiodendron and 1,767 sequences in Magnolia deposited in GenBank (as February of 2011). Moreover, the majority of these publicly available sequences are from plastid genomes. Thus, the information developed in this study for L. tulipifera can serve as a reference in the Magnoliaceae family.
Materials and methods
Postmeiotic flower buds, open flowers, developing fruits, terminal buds, leaves, and cambium and xylem tissues were collected from mature ramets of clone 108 in the University of Tennessee's Tree Improvement Program L. tulipifera breeding orchard in Knoxville, TN and quick frozen with liquid nitrogen in the field. Clone 108 was selected from a pure L. tulipifera stand in eastern Tennessee in 1965. The ortet was 32 years of age with a height of 94 ft and a diameter (at 4.5 ft height) of 11.1 in. The bole (trunk) straightness of the ortet was rated as excellent and the pruning ability was good. Xylem and cambium tissues were obtained by removing a section of the bark at the height of 1.4 m from actively growing clone 108 ramets in April–June and scraping both exposed surfaces with RNA-free scalpels (Rnase-Zap, Ambion, Austin, TX). Open-pollinated L. tulipifera seeds (from ramets of clones 108, 7A, and 84A in the same orchard) were stratified by storing 4 months in the dark at 4°C, mixed with peat moss in 1 gal plastic bags. The seeds were then germinated by scattering them on top of Miracle-Gro® Potting Mix in covered flats (25 × 52 cm flats, approximately 400 seeds/flat) with a thin layer of potting mix sprinkled on top. Flats were kept under benches for shade at 25°C and ambient seasonal lighting (May and June) and watered as needed. Plastic coverers were used to keep the seeds moist. Young seedlings (emerging from seed coats) through late stage seedlings (with first true leaves emerging) were harvested (entire seedling) by removing seed coat (if needed), quickly rinsing in ddH20, blotting dry on toweling, and quick freezing in liquid nitrogen. For roots, young plants were grown in 6 cm square pots in the Penn State University Buckhout greenhouse (ambient light, 25°C) in Sun Gro Metro-Mix® 360 Growing Media or grown in the same growing media in mesh-bottom pots over a water reservoir for soil-free root collection. Fine, hairy roots and root tips were harvested and frozen as above.
RNA isolation, cDNA synthesis, and sequencing
Total RNA was extracted from younger tissues (seedlings, terminal buds, and postmeiotic flower buds) using the RNAqueous®-Midi kit (Ambion, catalog #1911) according to the manufacturer's protocol (http://www.ambion.com/techlib/prot/fm_1911.pdf) with modifications as described in Carlson et al. (2006). Total RNA was extracted from woody or mature tissues (cambium, xylem, roots, open flower, and fruit) using a modified version of the cetyl trimethyl ammonium bromide (CTAB) protocol developed by Chang et al. (1993), except that 2 to 3 g of frozen tissue was ground in a RNase-free, chilled mortar and pestle under liquid nitrogen and suspended in warm (65°C) CTAB buffer (made fresh same day using RNase-free stock solutions). Total RNA samples were DNase treated with amplification grade DNase I (Invitrogen, catalog #18068-015) and recombinant ribonuclease inhibitor, RNase Out (Invitrogen, catalog #10777-019), according to the manufacturer's recommendations. Purified RNA was recovered using the RNeasy Plant Mini kit (Qiagen, catalog #74104) RNA Cleanup protocol (sample concentrations adjusted to <100 μg in 100 μl RNase-free water) and checked on an Agilent 2100 Bioanalyzer (Agilent Technologies). Message RNA was then extracted from total RNA using the Poly(A) Purist™ mRNA Purification Kit (Ambion, catalog #1916) according to the manufacturer's protocol (http://www.ambion.com/techlib/prot/fm_1916.pdf), as described in Liang et al. (2008). mRNA from premeiotic flower buds was from a previous preparation for the floral cDNA library (Ltu01) (Liang et al. 2008) with an additional DNase treatment. The quality of the mRNA was determined using an Agilent 2100 Bioanalyzer (Agilent Technologies) using the RNA 6000 nano chip and the mRNA Plant assay to ensure that the mRNA samples had no detectable DNA contamination and had less than 15% tRNA contamination.
cDNA was generated from mRNA samples by following the Joint Genome Institute (JGI) cDNA library creation protocol (version 1.0) (http://my.jgi.doe.gov/general/index.html) with modifications. An additional chloroform cleanup step was added after the phenol/chloroform/isoamyl alcohol purification and the protocol stopped after the precipitation step where multiple samples were combined to increase yield. cDNA was resuspended in DNA-RNase free water and quality control was performed on the Agilent 2100 Bioanalyzer (Agilent Technologies) using the DNA 7500 chip. cDNA samples were then taken through the Roche GS FLX Shotgun DNA Library Preparation procedure (Dec 2007 manual, catalog #04852265001). Libraries (454) were constructed and pyrosequenced as described previously (Poinar et al. 2006) at Penn State University. All 454 libraries sent for sequencing had mean fragment sizes between 300 and 800 bp and >10 ng of product. Additional sequencing was performed at Washington University in St. Louis, Missouri, for the premeiotic flower bud sample using the Sanger method.
Data processing, assembly, and annotation
Sequences from individual 454 libraries were extracted from SFF files and renamed to reflect the source material. The names of Sanger sequences also indicated the source library. After renaming, all sequences were combined into a single FASTA file. All sequences in the combined FASTA file were screened for contaminants and trimmed using SeqClean (http://compbio.dfci.harvard.edu/tgi/software/) with the Roche library adaptors, and the Piper cenocladum (C. DC.) chloroplast genome (NCBI accession NC_008326), mitochondrial gene sequences from magnoliids Calycanthus floridus (L.), L. tulipifera, Laurus nobilis (L.), Piper betle (L.), and Asarum spp. Qiu 96018, and the Univec database (http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html). After screening and trimming, the 454 and Sanger sequences were assembled using MIRA version 3.0.5 (http://sourceforge.net/apps/mediawiki/mira-assembler, Chevreux et al. 2004) with default settings for EST sequences.
The resulting unpadded consensus sequences (i.e., unigenes) were assigned putative gene annotations from the PlantTribes 2.0 scaffold (Wall et al. 2008; http://fgp.huckpsu.edu/tribe.php). The PlantTribes 2.0 scaffold uses tribeMCL and orthoMCL to objectively classify the coding sequences of ten sequenced plant genomes (A. thaliana V7.0, Chlamydomonas reinhardtii V3.0, Physcomitrelle patens V1.0, Selaginella moellendorffii V1.0, Oryza sativa V5.0, Sorghum bicolor V1.0, Vitis vinifera V1.0, Populus trichocarpa V1.0, Medicago truncatula V1.0, Carica papaya V1.0) into Tribes and ortho groups (Orthos). Using custom perl scripts to parse the results of a BLASTx search (Altschul et al. 1990) against the inferred protein sequences of these ten genomes, unigenes were sorted into Tribes, which approximate gene families, and Orthos, which approximate putative orthologous gene sets. Each Tribe and Ortho in the PlantTribes database is annotated with a gene ontology (GO slim) term (Ashburner et al. 2000), conserved domain information (Marchler-Bauer et al. 2002), information from manually curated gene families, and common descriptive terms from the member sequences (Wall et al. 2008); accordingly, unigenes sorted into Tribes and Orthos are assigned the respective annotation. Unigenes with no significant (E value > 1e-5) hit to any of the ten sequenced genomes were searched against the GenBank non-redundant protein database.
GO enrichment analysis of the unigenes expressed in wood formation tissues was conducted using the DAVID Bioinformatics Resources 2008 with a False Discovery Rate (FDR) cutoff of 0.01 (Dennis et al. 2003; Huang et al. 2009). Simple sequence repeats (SSRs) were mined by using the scripts developed in-house in Clemson University Genomics Institute (CUGI). The minimum number of repeats was five for di-nucleotide repeats, four for tri-nucleotide repeats, three for tetra- and penta-nucleotide repeats, and two for hexa-nucleotide repeats. Primer3 was used to select candidate primers (Rozen and Skaletsky 2000). The single-copy gene coverage was calculated as the percent coverage of a V. vinifera reference gene (since Vitis represents the highest proportion of best hits from the annotation) by using the longest unigene in each tribe and ortho. The relative expression level is calculated as the percentage of the reads from each library in the overall reads from all libraries that were subjected to 454 sequencing. For comparative purposes (i.e., determining the most highly expressed unigenes), the expression level of each unigene was determined using the sum of the lengths of all reads assembled into the unigene over the length of that unigene.
Identification of conserved miRNA and prediction of their targets
Known miRNAs from the miRBase (release 14) were used to screen the L. tulipifera cDNA contig sequences using the program Patscan (Dsouza et al. 1997) with default parameters and two mismatches. Sequences with candidate miRNAs were first blasted against the Arabidopsis proteome; and sequences with hits to protein-encoding genes were removed. Filtered sequences were then checked for miRNA features using MIRcheck (Jones-Rhoades and Bartel 2004). The targets of the identified miRNAs were searched in the Liriodendron cDNA dataset by using the approach previously described (Allen et al. 2005).
Expression of LtuCAD1 in the Arabidopsis CAD4/5 double mutant
The LtuCAD1 gene was first cloned with BamHI between the 35S promoter of the cauliflower mosaic virus (CaMV) and the nopaline synthase (NOS) gene terminator in a pBIN102-based binary vector. The LtuCAD1 gene along with the 35S promoter and the NOS terminator were then cloned into the pCAMBIA1301 vector using the Gateway Cloning System (Carlsbad, California, US). The Agrobacterium tumefaciens strain GV31001 carrying the CAMBIA/LtuCAD1 was used to transform Arabidopsis CAD-C/D double mutants (obtained from Dr. Armand Séguin in Canadian Forest Service, Canada) (Sibout et al. 2005) by the floral-dip method (Desfeux et al. 2000). Arabidopsis seeds transformed with the LtuCAD1 were selected in Peter's plant food medium containing 25 μg/mL hygromycin.
Results and discussion
Sequencing of Liriodendron cDNA libraries from ten different tissue types and assembly
Statistics for each L. tulipifera cDNA library
Number of reads
Number of bases (MB)
Average length (bp)
Unigenes in each library
Library specific unigenes
Assembly for 12 libraries
Premeiotic flower bud
Premeiotic flower bud
Premeiotic flower bud
454 FLX GS
Postmeiotic flower buds
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
454 FLX GS
The average GC content for the 137,923 unigenes is 43.2% with a standard deviation of 6.5%, indicating that L. tulipifera genes tend to be slightly more AT-rich than annotated genes in currently sequenced genomes. The percentage GC composition in the L. tulipifera transcriptome is more similar to A. thaliana (42.7%) than to O. sativa (51.1%) (Kuhl et al. 2004). The codon usage in the translated sequences, generated by General Codon Usage Analysis (http://bioinf.may.ie/gcua/index.html; McInerney 1998), is represented in Online Resource 1. The pattern of codon preferences observed in the combined assembly was similar to A. thaliana (the Codon Use Database at http://www.kazusa.or.jp/codon/, GenBank Release 160.0, June 15, 2007), with only four different preferred codons. Only one amino acid (Leu) exhibits G or C at the degenerate third base of its preferred codon. This is consistent with the fact that dicots do not favor G and C in that position (Murray et al. 1989). Dinucleotides CG and TA are under-represented, which mirrors that of the L. tulipifera BAC and shotgun end sequence dataset (Liang et al. 2008) (Online Resource 2), as is common in eukaryotic sequences (Karlin et al. 1998).
Functional annotation and classification of the Liriodendron transcriptome
A BLASTX search of the 137,923 unigenes from the combined assembly, against ten sequenced plant genomes, revealed 68,464 matches (49.6% of the unigenes) with BLASTX (E value ≤ 10−5). Furthermore, a BLASTX search against the GenBank non-redundant protein database generated an additional 1,152 hits. Of all matches, 66.4% are either unknown, unnamed, hypothetical, or predicted proteins. The majority of the unigenes without similarity (76.0%) are less than 400 bp in length. When compared to model species with sequenced genomes, the L. tulipifera unigene set was most similar to P. trichocarpa (Torr. & Gray), with 46.9% of the unigenes having significant homology with Populus genes (BLASTX, E value ≤ 10−5). In contrast, only 43.0% and 42.3% of the L. tulipifera unigenes showed similarity to Arabidopsis and Oryza genes. Among the best BLASTX matches, woody angiosperm species have more hits (V. vinifera L. 39.8%, P. trichocarpa 20.9%, and C. papaya L. 14.0%) than the non-woody species (M. truncatula L. 9.2%, A. thaliana 5.8%, O. sativa 4.8%, and S. bicolor L. (Moench) 4.5%). BLASTX results with the Arabidopsis proteome can also be viewed through http://ancangio.uga.edu/ng-genediscovery/liriodendron.jnlp and the assembly can be searched using the Ancestral Angiosperm Genome Project blast interface at http://ancangio.uga.edu/blast/blast.html.
Comparative genomics presents opportunities to study the dynamics of molecular evolutionary processes. However, the phylogenetic distribution of currently available genomic resources is not balanced, and this imbalance is even more acute in some clades, such as magnoliids (Jackson et al. 2006). This can lead to biasing evolutionary comparisons. Since the first L. tulipifera EST dataset became available, Liriodendron has been used a comparator to better understand the evolution of the origin and evolution of the flower (Zahn et al. 2005, 2006; Soltis et al. 2007; Chanderbali et al. 2010), as well as ancestral polyploidy in seed plants and angiosperms (CW dePamphilis, personal communication). Built from ten different tissue types, the new EST dataset is by far the most comprehensive genomic resource for Liriodendron. This resource will strengthen Liriodendron's role in comparative studies of angiosperm evolution and facilitate molecular genetic and genomic investigations in Liriodendron and other species in the Magnoliaceae family.
In silico mining of simple sequence repeat markers
Simple sequence repeat (SSR) mining generated 29,289 repeats (dimers to pentamers), with 686 unique motifs. A total of 22,417 unigenes (16.3%) contain at least one SSR, with 53.1% of them having more than one SSR present. The number of SSRs identified in a unigene ranges from 1 to 15. This is consistent with the frequency of SSR-containing ESTs found in eudicotyledonous species, which ranges from 2.7% to 16.8% (Kumpatla and Mukhopadhyay 2005). Dimer repeats were the most commonly observed and constitute 41% of all the SSRs detected. The most common dimer, trimer, tetramer, and pentamer repeats are “ct,” “aag,” “tttc,” and “aaaag,” respectively. The SSR locations, forward and reverse primer sequences and their melting temperature (Tm) values, and expected amplicon sizes are listed in the Online Resources 3, 4, 5, and 6. After being validated, these SSRs can be applied in molecular breeding and investigations of candidate genes for traits of economic and ecological importance. These molecular markers may also be used to generate genetic maps for trait/gene association and refinement of candidate gene identification.
The genus Liriodendron contains only one other species, Liriodendron chinense (Hems1.) Sarg., which is native to China and Vietnam. This species is now considered an endangered species due to its limited seed production and small isolated populations (Xu et al. 2006). L. tulipifera and L. chinense are quite similar morphologically, except that the latter is smaller in stature. These two species are thought to have separated 10–16 million years ago (Parks and Wendel 1990), but hybridize readily (cf. Merkle et al. 1993). Preliminary data from Xu et al. (2006) indicated that 12 out of 15 single-locus SSR markers from the floral EST dataset of L. tulipifera (Albert et al. 2005; Liang et al. 2008) were found to be codominant and polymorphic in L. chinense, suggesting a high level of cross-species transferability. Thus, the SSRs developed from L. tulipifera can be applied in conservation of L. chinense. In a recent study (Xu et al. 2010) using 132 SSR markers of the same source, 47.7% of the markers could be amplified in Michelia maudiae Dunn, 37.9% in Manglietia maguanica Chang et B.L. Chen, and 33.3% in Magnolia amoena Cheng. Michelia, Manglietia, and Magnolia are in the same Magnoliaceae family with Liriodendron. This suggests that the L. tulipifera SSRs can also be useful in related species of the same family, for which genomic resources are not available or very limited.
Conserved microRNA identification
MicroRNA (miRNAs) play an important role in plant development since they negatively control gene expression by cleaving or inhibiting the translation of mRNA of target genes. Analysis of the Liriodendron transcript unigenes resulted in identification of 22 miRNA families from 53 unique miRNA precursor sequences (Online Resource 7). The number of sequence variants in each family varies between 1 and 9 bp. The number of miRNA families identified represents half of the number of conserved miRNA identified in plants. In a miRNA microarray study by Axtella and Bartel (2005), 13 out of the 23 families of Arabidopsis were found to be expressed in L. tulipifera leaves. We identified 8 of these 13 families of Arabidopsis miRNA in the L. tulipifera EST dataset.
The putative targets of these miRNAs are listed in the Online Resource 8. The miRNA target unigenes are involved in various molecular functions, cellular components, and biological processes. Molecular functions include DNA, RNA, nucleotide, or protein binding, hydrolase activity, kinase activity, structural molecule activity, transcription factor activity, transferase activity, and transporter activity. Endoplasmic reticulum (ER), Golgi apparatus, nucleus, ribosome, plastid, mitochondria, and chloroplast are among the cellular components. The biological processes include cell organization and biogenesis, developmental processes, response to abiotic or biotic stimulus, signal transduction, transcription, and transport. Among the 260 miRNA target unigenes being identified, 10% have hits in the Cell Wall Navigator database (Girke et al. 2004) and/or the MAIZEWALL dataset (Guillaumie et al. 2007), including one cellulose synthase gene and three monolignol biosynthesis-HCT (hydroxycinnamoyl CoA:shikimate/quinate hydroxycinnamoyltransferase). This resource provides an opportunity for functional and evolutionary studies of miRNAs in basal angiosperms.
Unigenes expressed in xylem and cambium tissues
The most enriched GO terms in the L. tulipifera xylem and cambium libraries
GO:0009628∼response to abiotic stimulus
GO:0043232∼intracellular non-membrane-bounded organelle
Comparison of L. tulipifera transcriptome with publicly available xylogenesis and cell wall formation EST datasets
No. of total unigenes in reference dataset
No. of hit unigenes in reference dataset
No. of hit unigenes in Liriodendron dataset
EUCAWOOD DB (Eucalyptus) (Rengel et al. 2009)
Radiata pine Xylem DB (Li et al. 2009)
Loblolly pine Xylem DBa
White Spruce Xylem DBa
Cell Wall Navigator DB (Girke et al. 2004)
Maize Wall DB (Guillaumie et al. 2007)
We report the sequencing, assembly, and annotation of 137,923 unigenes (132,905 contigs and 4,599 singletons, size ranging from 40 to 5,807 bp) derived from non-normalized cDNA libraries, which represented ten L. tulipifera tissue types: premeiotic flower buds, postmeiotic flower buds, open flowers, developing fruit, terminal buds, leaves, cambium, xylem, roots, and seedlings. About 50% of the unigenes were significantly similar to publicly available plant protein sequences, representing a wide variety of putative functions. Putative BLAST-based homologs of most of the genes involved in cell wall construction are represented, including seven full-length cinnamyl alcohol dehydrogenase-encoding genes (LutCAD1 to LtuCAD7). Approximately 50% of the unigenes did not match any sequence in the public databases, including the complete genomes of Arabidopsis, Oryza, and Populus. Some of these novel genes might be unique in basal angiosperm species and may be informative for understanding the origins of diverged gene families when characterized. In addition, about 30,000 simple sequence repeats (SSRs) have been identified. This new Liriodendron dataset currently provides the most comprehensive list of unigenes for any Magnoliaceae species. This large-scale genomic resource will facilitate gene discovery and cDNA microarray production in L. tulipifera and related species. The unigene sequences will become valuable in comparative and functional genomics of genes involved in the development of flowers, fruits, roots, buds, and wood formation, as well as in unraveling the molecular regulation of these important developmental stages in Liriodendron. This deep EST dataset will also further strengthen L. tulipifera's role in comparative study as a basal angiosperm species.
Sanger sequences generated by this report are accessible in NCBI dbEST (http://www.ncbi.nlm.nih.gov/dbEST/) and 454 sequences are available in the NCBI Sequence Read Database (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?). Assemblies and BLASTX results against the Arabidopsis proteome can be viewed through http://ancangio.uga.edu/ng-genediscovery/liriodendron.jnlp, and the assembly can be searched using the Ancestral Angiosperm Genome Project blast interface at http://jlmwiki.plantbio.uga.edu/blast/blast.html.
We thank Stephan Schuster and Lynn Tomsho for their assistance in 454 sequencing, Yi Hu for RNA isolations, Denis S. Diloreto for seedlings, Stephen Ficklin for the mining of SSRs, and Xinguo Li for providing the pure xylem unigenes for Populus, loblolly pine, and white spruce. This study was mainly supported by the National Science Foundation grant, Ancestral Angiosperm Genome project (Award # DBI-0638595, PI: dePamphilis). A National Institute of Food and Agriculture, USDA grant to HL (project number SC-1700324, technical contribution No. 5832 of the Clemson University Experiment Station) contributed the sequencing of a one half 454 plate.