Abstract
Key message
We provide novel genomic resources for Taxus baccata in the form of a reference transcriptome, SSR and SNP markers, and orthologous single-copy genes, useful for phylogenomic and population genomic applications.
Abstract
English yew (T. baccata) is the only European representative of the Taxaceae family, a conifer group originated in the Jurassic period. The wide extent of environmental heterogeneity within the species’ range, together with its long presence in Europe, make English yew an ideal species to investigate adaptive evolution in conifers. To enlarge the genomic resources available for this species, we used Illumina short read sequencing followed by de novo assembly to build the transcriptome of English yew. In addition to a fully annotated transcriptome as well as large sets of new potential SSR and SNP markers for T. baccata, we provide a data set of orthologous single-copy genes across three Taxus species using Picea sitchensis as outgroup, and discuss ortholog uses and limitations for phylogenomic and population genomic applications.
Similar content being viewed by others
References
Ahuja M, Neale D (2005) Evolution of genome size in conifers. Silvae Genet 54:126–137. https://doi.org/10.1515/sg-2005-0020
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Birol I et al (2013) Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data. Bioinformatics 29:1492–1497. https://doi.org/10.1093/bioinformatics/btt178
Blackmon H, Adams R (2015) EvobiR: tools for comparative analyses and teaching evolutionary biology. https://doi.org/10.5281/zenodo.30938
Burgarella C, Navascués M, Zabal-Aguirre M, Berganzo E, Riba M, Mayol M, Vendramin GG, González-Martínez SC (2012) Recent population decline and selection shape diversity of taxol-related genes. Mol Ecol 21:3006–3021. https://doi.org/10.1111/j.1365-294X.2012.05532.x
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. https://doi.org/10.1093/bioinformatics/btp348
Christenhusz MJM, Reveal J, Farjon A, Gardner MF, Mill RR, Chase MW (2011) A new classification and linear sequence of extant gymnosperms. Phytotaxa 19:55–70. https://doi.org/10.11646/phytotaxa.19.1.3
Collins D, Mill RR, Möller M (2003) Species separation of Taxus baccata, T. canadensis and T. cuspidata (Taxaceae) and origins of their reputed hybrids inferred from RAPD and cpDNA data. Am J Bot 90:175–182. https://doi.org/10.3732/ajb.90.2.175
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676. https://doi.org/10.1093/bioinformatics/bti610
de Beaulieu JL, Andrieu-Ponel V, Reille M, Grüger E, Tzedakis C, Svobodova H (2001) An attempt at correlation between the Velay pollen sequence and the Middle Pleistocene stratigraphy from Central Europe. Quatern Sci Rev 20:1593–1602. https://doi.org/10.1016/S0277-3791(01)00027-0
De Wit P, Pespeni MH, Ladner JT, Barshis DJ, Seneca F, Jaris H, Overgaard Therkildsen N, Morikawa M, Palumbi SR (2012) The simple fool’s guide to population genomics via RNA-seq: an introduction to high-throughput sequencing data analysis. Mol Ecol Resour 12:1058–1067. https://doi.org/10.1111/1755-0998.12003
Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375. https://doi.org/10.1038/nrg1603
DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C, Philippakis A, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell T, Kernytsky A, Sivachenko A, Cibulskis K, Gabriel S, Altshuler D, Daly M (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Gen 43:491–498. https://doi.org/10.1038/ng.806
Dubreuil M, Sebastiani F, Mayol M, González-Martínez SC, Riba M et al (2008) Isolation and characterization of polymorphic nuclear microsatellite loci in Taxus baccata L. Conserv Gen 9:1665–1668. https://doi.org/10.1007/s10592-008-9515-3
Dubreuil M, Riba M, González-Martíınez SC, Vendramin GG, Sebastiani F, Mayol M (2010) Genetic effects of chronic habitat fragmentation revisited: strong genetic structure in a temperate tree, Taxus baccata (Taxaceae), with great dispersal capability. Am J Bot 97:303–310. https://doi.org/10.3732/ajb.0900148
Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16:157. https://doi.org/10.1186/s13059-015-0721-2
Engström PG, Steijger T, Sipos B, Grant GR, Kahles A, The RGASP Consortium, Rätsch G, Goldman N, Hubbard TJ, Harrow J, Guigó R, Bertone P (2013) Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Meth 10:1181–1185. https://doi.org/10.1038/nmeth.2722
Faircloth BC (2008) Msatcommander: detection of microsatellite repeat arrays and automated, locus-specific primer design. Mol Ecol Resour 8:92–94. https://doi.org/10.1111/j.1471-8286.2007.01884.x
Farjon A (2010) A Handbook of the World’s Conifers. EJ Brill, Boston
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acid Res 44:D279–D285. https://doi.org/10.1093/nar/gkv1344
Gabaldón T (2008) Large-scale assignment of orthology: back to phylogenetics? Genome Biol 9:235. https://doi.org/10.1186/gb-2008-9-10-235
González-Martínez SC, Dubreuil M, Riba M, Vendramin GG, Sebastiani F, Mayol M (2010) Spatial genetic structure of Taxus baccata L. in the western Mediterranean Basin: past and present limits to gene movement over a broad geographic scale. Mol Phyl Evol 55:805–815. https://doi.org/10.1016/j.ympev.2010.03.001
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systemat Biol 59:307–321. https://doi.org/10.1093/sysbio/syq010
Hao DC, Xiao PG, Huang BL, Ge GB, Yang L (2008a) Interspecific relationships and origins of Taxaceae and Cephalotaxaceae revealed by partitioned Bayesian analyses of chloroplast and nuclear DNA sequences. Plant Syst Evol 276:89–104. https://doi.org/10.1007/s00606-008-0069-0
Hao DC, Huang B, Yang L (2008b) Phylogenetic relationships of the genus Taxus inferred from chloroplast intergenic spacer and nuclear coding DNA. Biol Pharm Bull 31:260–265
Hao DC. Ge G, Xiao P, Zhang YY, Yang L (2011) The first insight into the tissue specific Taxus transcriptome via Illumina second generation sequencing. PLoS ONE 6:e21220. https://doi.org/10.1371/journal.pone.0021220
Hedtke SM, Townsend TM, Hillis DM (2006) Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55:522–529. https://doi.org/10.1080/10635150600697358
Koutsodendris A, Müller UC, Pross J, Brauer A, Kotthoff U, Lotter AF (2010) Vegetation dynamics and climate variability during the Holsteinian interglacial based on a pollen record from Dethlingen (northern Germany). Quaternary Sci Rev 29:3298–3307. https://doi.org/10.1016/j.quascirev.2010.07.024
Lee e-K, Jin Y-W, Park JH et al (2010) Cultured cambial meristematic cells as a source of plant natural products. Nat Biotechnol 28:1213–1217. https://doi.org/10.1038/nbt.1693
Leslie AB, Beaulieu JM, Rai HS, Crane PR, Donoghue MJ, Mathews S (2012) Hemisphere-scale differences in conifer evolutionary dynamics. Proc Natl Acad Sci USA 109:16217–16221. https://doi.org/10.1073/pnas.1213621109
Li J, Davis CC, Del Tredici P, Donoghue MJ (2001) Phylogeny and biogeography of Taxus (Taxaceae) inferred from sequences of the internal transcribed spacer region of nuclear ribosomal DNA. Harvard Papers Bot 6:267–274
Liu J, Möller M, Gao L-M, Zhang D-Q, Li D-Z (2010) DNA barcoding for the discrimination of Eurasian yews (Taxus L., Taxaceae) and the discovery of cryptic species. Mol Ecolo Resour 11:89–100. https://doi.org/10.1111/j.1755-0998.2010.02907.x
Lorenz WW, Ayyampalayam S, Bordeaux JM, Howe GT, Jermstad KD, Neale DB, Rogers DL, Dean JFD (2012) Conifer DBMagic: a database housing multiple de novo transcriptome assemblies for 12 diverse conifer species. Tree Gen Genom 8:1477–1485. https://doi.org/10.1007/s11295-012-0547-y
Martín M (2001) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12. https://doi.org/10.14806/ej.17.1.200
Mayol M, Riba M, González-Martínez SC, Bagnoli F, de Beaulieu J-L, Berganzo E, Burgarella C, Dubreuil M, Krajmerová D, Paule L, Romsáková I, Vettori C, Vincenot L, Vendramin GG (2015) Adapting through glacial cycles: insights from a long-lived tree (Taxus baccata). New Phytol 208:973–986. https://doi.org/10.1111/nph.13496
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303. https://doi.org/10.1101/gr.107524.110
Moir A, Hindson T, Hills T, Haddlesey R (2013) The exceptional yew trees of England, Scotland and Wales. Q J Forest 107:185–191
Müller K, Quandt D, Müller J, Neinhuis C (2005) PhyDE®: phylogenetic data editor, version 0.995. http://www.phyde.de
Neale D, Wegrzyn J, Stevens K et al (2014) Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol 15:R59. https://doi.org/10.1186/gb-2014-15-3-r59
Neale D, McGuire P, Wheeler N et al (2017) The Douglas-fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae. G3-Genes Genom Genet 7:3157–3167. https://doi.org/10.1534/g3.117.300078
Nystedt B, Street NR, Wetterbom A et al (2013) The Norway spruce genome sequence and conifer genome evolution. Nature 497:579–584. https://doi.org/10.1038/nature12211
Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, Reich D (2012) Ancient admixture in human history. Genetics 192:1065–1093. https://doi.org/10.1534/genetics.112.145037
Pavy N, Namroud M-C, Gagnon F, Isabel N, Bousquet J (2012) The heterogeneous levels of linkage disequilibrium in white spruce genes and comparative analysis with other conifers. Heredity 108:273–284. https://doi.org/10.1038/hdy.2011.72
Postolache D, Leonarduzzi C, Piotti A, Spanu I, Roig A, Fady B, Roschanski A, Liepelt S, Vendramin GG (2014) Transcriptome versus genomic microsatellite markers: highly informative multiplexes for genotyping Abies alba Mill. and congeneric species. Plant Mol Biol Rep 32:750–760. https://doi.org/10.1007/s11105-013-0688-7
Pruitt KD, Tatusova T, Klimke W, Maglott DR (2009) NCBI reference sequences: current status, policy and new initiatives. Nucleic Acids Res 37:D32–D36. https://doi.org/10.1093/nar/gkn721
Rambaut A, Suchard MA, Xie D, Drummond AJ (2014) Tracer v1.6. http://beast.bio.ed.ac.uk/Tracer
Ranwez V, Harispe S, Delsuc F, Douzery EJP (2011) MACSE: multiple alignment of coding sequences accounting for frameshifts and stop codons. PLoS ONE 6:e22594. https://doi.org/10.1371/journal.pone.0022594
Ronquist F, Teslenko M, van der Mark P, Ayres D, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542. https://doi.org/10.1093/sysbio/sys029
Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Meth Mol Biol 6:365–386
Seoane P, Ocaña S, Carmona R, Bautista R, Madrid E, Torres AM, Claros MG (2016) AutoFlow, a versatile workflow engine illustrated by assembling an optimised de novo transcriptome for a non-model species, such as faba bean (Vicia faba). Curr Bioinform 11:440–450. https://doi.org/10.2174/1574893611666160212235117
Stevens KA, Wegrzyn JL, Zimin A et al (2016) Sequence of the sugar pine megagenome. Genetics 204:1613–1626. https://doi.org/10.1534/genetics.116.193227
Thomas PA, Polwart A (2003) Biological flora of the British Isles. J Ecol 91:489–524
Van der Auwera GA, Carneiro M, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella K, Altshuler D, Gabriel S, DePristo M (2013) From FastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinform. https://doi.org/10.1002/0471250953.bi1110s43
Wani MC, Taylor HL, Wall ME, Coggon P, McPhail AT (1971) Plant antitumor agents. VI. The isolation and structure of Taxol, a novel antileukemic and antitumor agent from Taxus brevifolia. J Am Chem Soc 93:2325–2327. https://doi.org/10.1021/ja00738a045
Acknowledgements
Funding was provided by the Spanish Ministry of Economy and Competitiveness (MINECO) under AdapCon grant (CGL2011-30182-C02-01/02), as well as by the project INIA-MAPAMA EG17-048 co-financed by FEADER (75%) according to EU Regulation 1305/2013. GGV was supported by a grant of the Italian Ministry of Education and Scientific Research (‘Biodiversitalia’, RBAP10A2T4). SO received funding from the Spanish Ministry of Economy and Competitiveness (MINECO) under PTA2015-10836-I contract. We acknowledge Supercomputing Centre of Galicia (CESGA) as well as CSC—Finnish IT Center and the Finnish grid infrastructure (FGI) for Science for the allocation of computational resources.
Author information
Authors and Affiliations
Contributions
SCGM, DG, MM and GGV conceived the study. SCGM, SP and GGV designed and produced the sequence data sets. SO and SP analyzed the data with input from FA and DG. DG and SO drafted the manuscript. All the authors contributed to editing and revising the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Online Resource 1
Annotations of 16,810 putative Taxus baccata genes. Predicted ORFs for 15,289 genes having an ortholog in the searched databases are presented in the table ESM_1.xlsx (FullLengtherNext_ortholog sheet). Additional 1,521 putative new genes are listed in the table ESM_1.xlsx (FullLengtherNext_new sheet). The annotations of all the 16,810 putative genes are presented in the table ESM_1.xlsx (16810_annotation sheet). (XLSX 8084 KB)
Online Resource 2
Sequences of 16,810 predicted Taxus baccata cds. (ZIP 4402 KB)
Online Resource 3
Distribution of all Gene ontology (GO) terms in the three GO categories assigned to putative genes of Taxus baccata. (TXT 4196 KB)
Online Resource 4
Results of SSR screening in the Taxus baccata transcriptome. Only SSR motifs composed of at least four contiguous repeated units are reported; contig name with start and end positions of the repeat are given in the table ESM_4.xlsx (microsatellites sheet). Identified 2,330 perfect SSR motifs in the putative coding Taxus baccata sequences for which primers were successfully designed are given in the table ESM_4.xlsx (primers sheet). (XLSX 485 KB)
Online Resource 5
59,513 SNPs after filtering in coding regions and UTRs in Taxus baccata. (ZIP 2245 KB)
Online Resource 6
1,320 orthologous groups for the four species: Picea sitchensis, Taxus baccata (Tbac), Taxus cuspidata (Tcus), and Taxus wallichiana (Twal). (ZIP 2557 KB)
Online Resource 7
1,418 orthologous groups present in Taxus baccata, Taxus cuspidata, and Taxus wallichiana. (ZIP 1208 KB)
Online Resource 8
Concatenated data matrix of 914 orthologous groups filtered for possible misalignments and paralogues present in Taxus baccata, Taxus cuspidata, and Taxus wallichiana. Frame shifts in alignments are indicated by exclamation marks. (NEXUS 3700 KB)
Online Resource 9
Concatenated data matrix used for phylogenetic analyses based on 981 orthologous groups filtered for possible misalignments and paralogues present in the four species: Picea sitchensis, Taxus baccata, Taxus cuspidata, and Taxus wallichiana. Exclamation marks generated by frame shifts in alignments were replaced with question marks to be compatible with phylogenetic programs. (NEXUS 3332 KB)
Rights and permissions
About this article
Cite this article
Olsson, S., Pinosio, S., González-Martínez, S.C. et al. De novo assembly of English yew (Taxus baccata) transcriptome and its applications for intra- and inter-specific analyses. Plant Mol Biol 97, 337–345 (2018). https://doi.org/10.1007/s11103-018-0742-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11103-018-0742-9