Skip to main content
Log in

Conifer DBMagic: a database housing multiple de novo transcriptome assemblies for 12 diverse conifer species

  • Short Communication
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript

Abstract

Conifers comprise an ancient and widespread plant lineage of enormous commercial and ecological value. However, compared to model woody angiosperms, such as Populus and Eucalyptus, our understanding of conifers remains quite limited at a genomic level. Large genome sizes (10,000–40,000 Mbp) and large amounts of repetitive DNA have limited efforts to produce a conifer reference genome, and genomic resource development has focused primarily on characterization of expressed sequences. Here, we report the completion of a conifer transcriptome sequencing project undertaken in collaboration with the U.S. DOE Joint Genome Institute that resulted in production of almost 12 million sequence reads. Five loblolly pine (Pinus taeda) cDNA libraries representing multiple tissues, treatments, and genotypes produced over four million sequence reads that, along with available Sanger expressed sequence tags, were used to create contig assemblies using three different assembly algorithms: Newbler, MiraEST, and NGen. In addition, libraries from 11 other conifer species, as well as one member of the Gnetales (Gnetum gnemon), produced 0.4 to 1.2 million sequence reads each. Among the selected conifer species were representatives of each of the seven phylogenetic families in the Coniferales: Araucariaceae, Cephalotaxaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae. Transcriptome builds for each species were generated using each of the three assemblers. All contigs for every species generated using each assembler can be obtained from Conifer DBMagic, a public database for searching, viewing, and downloading contig sequences, the associated sequence reads, and their annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

References

  • Alagna F, D'Agostino N, Torchia L, Servili M, Rao R, Pietrella M, Giuliano G, Chiusano ML, Baldoni L, Perrotta G (2009) Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development. BMC Genomics 10:399

    Article  PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    PubMed  CAS  Google Scholar 

  • Bagal UR, Leebens-Mack JH, Lorenz WW, Dean JFD (2012) The phenylalanine ammonia lyase (PAL) gene family shows a gymnosperm-specific lineage. BMC Genomics 13 (Suppl. 3):S1. doi:10.1186/1471-2164-13-S3-S1

  • Bajgain P, Richardson BA, Price JC, Cronn RC, Udall JA (2011) Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata). BMC Genomics 12:370

    Article  PubMed  CAS  Google Scholar 

  • Barakat A, DiLoreto DS, Zhang Y, Smith C, Baier K, Powell WA, Wheeler N, Sederoff R, Carlson JE (2009) Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection. BMC Plant Biol 9:51

    Article  PubMed  Google Scholar 

  • Bordeaux JM (2008) Characterization of growth conditions for production of a laccase-like phenoloxidase by Amylostereum areolatum, a fungal pathogen of pines and other conifers. M.S. Thesis, University of Georgia, Athens, GA, 110 pg.

  • Bowe LM, Coat G, dePamphilis CW (2000) Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers. Proc Natl Acad Sci USA 97:4092–4097

    Article  PubMed  CAS  Google Scholar 

  • Bowyer JL, Shmulsky R, Haygreen JG (2007) Forest products and wood science: an introduction, 5th edn. Blackwell, Ames, p 576

    Google Scholar 

  • Burleigh JG, Mathews S (2004) Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am J Bot 91:1599–1613

    Article  PubMed  CAS  Google Scholar 

  • Chaw SM, Parkinson CL, Cheng Y, Vincent TM, Palmer JD (2000) Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc Natl Acad Sci USA 97:4086–4091

    Article  PubMed  CAS  Google Scholar 

  • Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI (2007) An optimized procedure greatly improves EST vector contamination removal. BMC Genomics 8:416

    Article  PubMed  CAS  Google Scholar 

  • Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WE, Wetter T, Suhai S (2004) Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 14:1147–1159

    Article  PubMed  CAS  Google Scholar 

  • Delano-Frier JP, Aviles-Arnaut H, Casarrubias-Castillo K, Casique-Arroyo G, Castrillón-Arbeláez PA, Herrera-Estrella L, Massange-Sánchez J, Martínez-Gallardo NA, Parra-Cota FI, Vargas-Ortiz E, Estrada-Hernández MG (2011) Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress. BMC Genomics 12:363

    Article  PubMed  CAS  Google Scholar 

  • Eberhardt TL, Bernards MA, He L, Davin LB, Wooten JB, Lewis NG (1993) Lignification in cell-suspension cultures of Pinus taeda. In situ characterization of a gymnosperm lignin. J Biol Chem 268:21088–21096

    PubMed  CAS  Google Scholar 

  • Eckert AJ, Pande B, Ersoz ES, Wright MH, Rashbrook VK, Nicolet CM, Neale DB (2009) High-throughput genotyping and mapping of single nucleotide polymorphisms in loblolly pine (Pinus taeda L.). Tree Genet Genom 5:225–234

    Article  Google Scholar 

  • Farjon A (2008) A natural history of conifers. Timber Press, Portland, p 304

    Google Scholar 

  • Fernandez-Pozo N, Canales J, Guerrero-Fernández D, Villalobos DP, Díaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MÁ, Perdiguero P, Collada C, Cervera MT, Soto A, Ordás R, Cantón FR, Avila C, Cánovas FM, Claros MG (2011) EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics 12:366

    Article  PubMed  Google Scholar 

  • Geraldes A, Pang J, Thiessen N, Cezard T, Moore R, Zhao Y, Tam A, Wang S, Friedmann M, Birol I, Jones SJ, Cronk QC, Douglas CJ (2011) SNP discovery in black cottonwood (Populus trichocarpa) by population transcriptome resequencing. Mol Ecol Resour 11(Suppl 1):81–92

    Article  PubMed  CAS  Google Scholar 

  • Hilton J, Bateman RM (2006) Pteridosperms are the backbone of seed-plant phylogeny. J Torrey Bot Soc 133:119–168

    Article  Google Scholar 

  • Hsiao YY, Chen YW, Huang SC, Pan ZJ, Fu CH, Chen WH, Tsai WC, Chen HH (2011) Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids. BMC Genomics 12:360

    Article  PubMed  CAS  Google Scholar 

  • Jermstad KD, Eckert AJ, Wegrzyn JL, Delfino-Mix A, Davis DA, Burton DC, Neale DB (2011) Comparative mapping in Pinus: sugar pine (Pinus lambertiana Dougl.) and loblolly pine (Pinus taeda L.). Tree Genet Genom 7:457–468

    Article  Google Scholar 

  • Keeling CI, Madilao LL, Zerbe P, Dullat HK, Bohlmann J (2011) The primary diterpene synthase products of Picea abies levopimaradiene/abietadiene synthase (PaLAS) are epimers of a thermally unstable diterpenol. J Biol Chem 286:21145–21153

    Article  PubMed  CAS  Google Scholar 

  • Kumar S, Blaxter ML (2010) Comparing de novo assemblers for 454 transcriptome data. BMC Genomics 11:571

    Article  PubMed  Google Scholar 

  • Liang C, Sun F, Wang H, Qu J, Freeman RM Jr, Pratt LH, Cordonnier-Pratt MM (2006) MAGIC-SPP: a database-driven DNA sequence processing package with associated management tools. BMC Bioinforma 7:115

    Article  Google Scholar 

  • Logacheva MD, Kasianov AS, Vinogradov DV, Samigullin TH, Gelfand MS, Makeev VJ, Penin AA (2011) De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum). BMC Genomics 12:30

    Article  PubMed  CAS  Google Scholar 

  • Lorenz WW, Dean JFD (2002) SAGE profiling and demonstration of differential gene expression along the axial developmental gradient of lignifying xylem in loblolly pine (Pinus taeda). Tree Physiol 22:301–310

    Article  PubMed  CAS  Google Scholar 

  • Lorenz WW, Sun F, Liang C, Kolychev D, Wang H, Zhao X, Cordonnier-Pratt MM, Pratt LH, Dean JFD (2006) Water stress-responsive genes in loblolly pine (Pinus taeda) roots identified by analyses of expressed sequence tag libraries. Tree Physiol 26:1–16

    Article  PubMed  Google Scholar 

  • Lorenz WW, Yu YS, Dean JFD (2010) An improved method of RNA isolation from loblolly pine (P. taeda L.) and other conifer species. J Vis Exp (36).pii:1751

  • Lorenz WW, Alba R, Yu YS, Bordeaux JM, Simões M, Dean JFD (2011) Microarray analysis and scale-free gene networks identify candidate regulators in drought-stressed roots of loblolly pine (P. taeda L.). BMC Genomics 12:264

    Article  PubMed  CAS  Google Scholar 

  • McKain MR, Wickett N, Zhang Y, Ayyampalayam S, McCombie WR, Chase MW, Pires JC, Depamphilis CW, Leebens-Mack J (2012) Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin of bimodal karyotypes in Agavoideae (Asparagaceae). Am J Bot 99:397–406

    Article  PubMed  Google Scholar 

  • Milne I, Bayer M, Cardle L, Shaw P, Stephen G, Wright F, Marshall D (2010) Tablet-next generation sequence assembly visualization. Bioinformatics 26:401–402

    Article  PubMed  CAS  Google Scholar 

  • Novaes E, Drost DR, Farmerie WG, Pappas GJ Jr, Grattapaglia D, Sederoff RR, Kirst M (2008) High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics 9:312

    Article  PubMed  Google Scholar 

  • Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG (2009) Next generation transcriptomes for next generation genomes using est2assembly. BMC Bioinforma 10:447

    Article  Google Scholar 

  • Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA (2010) Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. BMC Genomics 11:180

    Article  PubMed  Google Scholar 

  • Plomion C, Bouquet J, Kole C (2012) Genetics, genomics and breeding of conifers. CRC Press, Boca Raton, p 456

    Google Scholar 

  • Rigault P, Boyle B, Lepage P, Cooke JE, Bousquet J, MacKay JJ (2011) A white spruce gene catalog for conifer genome analyses. Plant Physiol 157:14–28

    Article  PubMed  CAS  Google Scholar 

  • Schultz RP (1999) Loblolly: the pine for the twenty-first century. New Forests 17:71–88

    Article  Google Scholar 

  • Sun H, Paulin L, Alatalo E, Asiegbu FO (2011) Response of living tissues of Pinus sylvestris to the saprotrophic biocontrol fungus Phlebiopsis gigantea. Tree Physiol 31:438–451

    Article  PubMed  Google Scholar 

  • Surget-Groba Y, Montoya-Burgos JI (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 20:1432–1440

    Article  PubMed  CAS  Google Scholar 

  • Varshney RK, Hiremath PJ, Lekha P, Kashiwagi J, Balaji J, Deokar AA, Vadez V, Xiao Y, Srinivasan R, Gaur PM, Siddique KH, Town CD, Hoisington DA (2009) A comprehensive resource of drought and salinity-responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.). BMC Genomics 10:523

    Article  PubMed  Google Scholar 

  • Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MT, Azam S, Fan G, Whaley AM, Farmer AD, Sheridan J, Iwata A, Tuteja R, Penmetsa RV, Wu W, Upadhyaya HD, Yang SP, Shah T, Saxena KB, Michael T, McCombie WR, Yang B, Zhang G, Yang H, Wang J, Spillane C, Cook DR, May GD, Xu X, Jackson SA (2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30:83–89

    Article  CAS  Google Scholar 

  • Wall PK, Leebens-Mack JH, Chanderbali A, Barakat A, Wolcott E, Liang H, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW (2009) Comparison of next generation sequencing technologies for transcriptome characterization. BMC Genomics 10:347

    Article  PubMed  Google Scholar 

  • Weber APM, Weber KL, Carr K, Wilkerson C, Ohlrogge JB (2007) Sampling the Arabidopsis transcriptome with massively parallel pyrosequencing. Plant Physiol 144:32–42

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors wish to acknowledge Yuan-Sheng Yu and Ujwal Bagal for their technical assistance. We also wish to thank Dr. Trevor Fenning for supplying the P. abies samples and Dr. Richard Cronn for early access to P. taeda chloroplast genome sequence information. These sequence data were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ in collaboration with the user community. Particular thanks go to Christa Pennacchio, Kerrie Barry, Erika Lindquist, and their associates at JGI for directing cDNA library construction, 454 pyrosequencing, and some of the data filtering reported for this project. Partial funding for this work was provided by USDA/NRI CSREES CAP Award #2007-55300-18603.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey F. D. Dean.

Additional information

Communicated by R. Sederoff

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary File 1

Library and sequence metrics. This Excel file contains detailed information on the species and tissues used for each sequenced library, including tissue type, age of individual, library ID, cDNA priming, sequencing platform, number of plates sequenced, total reads, mean read length, SRA accession number, and genotype. Details of sample treatment and genotypes are provided on separate worksheets for the P. taeda, P. abies, P. lambertiana, and P. menziesii samples. (XLSX 17 kb)

Supplementary File 2

Assembly metrics. This Excel file contains detailed assembly metrics for all species assemblies and includes the number of input reads, total contigs, mean contig length, largest contigs, contigs ≥2 kb, contigs ≥1 kb, contigs ≥0.5 kb, and contigs having ≥5 ESTs. Metrics by Assembler are also shown on a second worksheet and include the percentage of bases assembled. (XLSX 16 kb)

Supplementary File 3

BLAST metrics. This Excel file contains BLASTX metrics for all species and all assemblies. The percentage of BLASTX hits determined after querying both the NCBI non-redundant and TAIR9 databases are shown for E-value cutoffs ≤1 × 10−5 and ≤1 × 10−15. Comparison of NCBI and TAIR9 returns by assembler are provided on a separate worksheet. (XLSX 17 kb)

Supplementary File 4

Tutorial on using the Conifer DBMagic database. This Powerpoint file guides users through the Conifer DBMagic database and its use to identify sequence assemblies of interest. (PPTX 2,052 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lorenz, W.W., Ayyampalayam, S., Bordeaux, J.M. et al. Conifer DBMagic: a database housing multiple de novo transcriptome assemblies for 12 diverse conifer species. Tree Genetics & Genomes 8, 1477–1485 (2012). https://doi.org/10.1007/s11295-012-0547-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11295-012-0547-y

Keywords

Navigation