Skip to main content
Log in

An annotated transcriptome of highly inbred Thuja plicata (Cupressaceae) and its utility for gene discovery of terpenoid biosynthesis and conifer defense

  • Original Article
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript

Abstract

Western redcedar (Thuja plicata; Cupressaceae; WRC) is an ecologically and economically important conifer species of the Pacific Northwest. Regeneration of WRC forests is affected by ungulate browsing, which removes current growth and hampers development of young trees. Monoterpenes make WRC foliage less palatable and can deter browsing. Genomic resources are required to advance knowledge of terpene accumulation and breeding of WRC for herbivore resistance. Unlike most conifers, WRC readily selfs to produce genotypes of reduced heterozygosity. We used seedlings of eight different fifth-generation selfed lines for monoterpene analysis and transcriptome sequencing. Trinity, Velvet/Oases, TransABySS, and SOAPdenovoTrans were used to generate independent transcriptome assemblies for each line. Sequence redundancy was reduced using the EvidentialGene pipeline. The best assembly, as determined by metrics of completeness, contiguity, and accuracy, was used to produce a WRC reference gene set of 28,279 sequences, of which 77% were annotated with significant BLASTp hits and 89% with significant InterProScan hits. An orthology-based approach was used to annotate gene families. Manually curated annotation identified 33 putative full-length terpene synthases (TPS). A maximum likelihood phylogeny revealed that WRC TPS cluster apart from those of Pinaceae within the gymnosperm TPS-d clade. Use of selfed lines enabled the development and annotation of a reduced-redundancy gene set for a gymnosperm of the Cupressaceae family. This gene set serves as a foundation for future functional characterization of WRC TPS and other defense genes and as a resource for the annotation of protein coding sequences in the WRC genome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Acknowledgements

We thank Dr. Carol Ritland and Ms. Karen Reid for excellent project management support, Dr. Timothy J. Sexton for technical assistance, and the McGill University and Génome Québec Innovation Centre for sequencing services. The research was supported with funds from the Natural Sciences and Engineering Research Council of Canada (NSERC Discovery Grant) and funds to JB and JHR from Genome British Columbia, Genome Canada, and the British Columbia Ministry of Forests, Lands, Natural Resource Operations and Rural Development (MFLNRORD) for the CEDaR User Partnership Project (UPP-002, Genome BC) and the CEDaR Applied Genomics Partnership Project (184CED-GAPP, Genome Canada and Genome BC). TJS is supported by a NSERC Postgraduate Doctoral fellowship.

Data archiving statement

The sequence data supporting this work can be found at the NCBI BioProject Database under BioProject ID PRJNA399722. In addition, sequences of the gene lists described in this paper and their annotations are also available in Files S1–S5 and File S9.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jörg Bohlmann.

Additional information

Communicated by C. Dardick

Electronic supplementary material

Figure S1

Representative four-month old WRC seedling used for RNA isolation and sequencing. (JPEG 52 kb)

High resolution image (EPS 620 kb)

Figure S2

Pipeline for the de novo assembly and redundancy reduction of the WRC gene set, carried out for each inbred S5 line separately. (JPEG 208 kb)

High resolution image (EPS 2924 kb)

Figure S3

Monoterpene profiles of foliar samples for 12 different monoterpenes across the eight S5 lines. (JPEG 937 kb)

High resolution image (EPS 4043 kb)

Figure S4

Results of the BUSCO gene set completeness assessment. The reduced-redundancy gene set for WRC S5 Line 4 was found to be the most complete, with the lowest number of missing orthologs. (JPEG 438 kb)

High resolution image (EPS 848 kb)

Table S1

Summary for transcriptome assemblies for WRC S5 lines. (DOCX 14 kb)

Table S2

Results of the Conditional Reciprocal Best BLAST (CRBB) analysis. (DOCX 15 kb)

Table S3

Results of the BLASTp analysis of transcriptome assemblies against the longest predicted proteins (n = 1000) in the P. glauca and A. thaliana reference gene sets. (DOCX 13 kb)

File S1

Sequences of 241 plant terpene synthase (TPS) used in construction of a maximum-likelihood phylogeny of plant TPS. (TXT 184 kb)

File S2

Sequences of 126 gymnosperm and a single P. patens TPS used in construction of a maximum-likelihood phylogeny of gymnosperm TPS. (TXT 101 kb)

File S3

Sequence data for the core WRC gene set. Gene set containing the 28,279 core, reduced-redundancy protein sequences for predicted ORFs as produced by the EvidentialGene pipeline. (TXT 12858 kb)

File S4

Sequence data for the alternate WRC gene set. Gene set containing 40,691 additional putative protein-coding sequences, which may be potential gene isoforms or paralogs. (TXT 18875 kb)

File S5

Summary of significant BLASTp and InterProScan hits for the main reduced-redundancy gene set of Line 4. BLAST columns are as described in the BLAST Command Line Applications User Manual (https://www.ncbi.nlm.nih.gov/books/NBK279690/). The pipeline for BLASTing and filtering hits is described in the Methods section. GO names are separated by Biological Process (P), Molecular Function (F) and Cellular Component (C). The InterPro ID column lists all InterPro domains found for the queried sequence. Top PFAM hit describes the hit with the highest score against the PFAM database for each sequence, using an e-value cut-off of 1e-5. (XLSX 5321 kb)

File S6

Statistical summary of orthogroup analysis for all sequences assigned to orthogroups. Of the 498,235 protein coding sequences from 16 different plant species submitted for orthogroup analysis, 391,179 were successfully assigned to 19,660 orthogroups. The majority of orthogroups (11,616) had an average of less than one gene per species; the largest orthogroup (3201 genes) had an average of 151–200 genes per species. A large number of orthogroups (5614) had members from only two species; however, a similarly large number (3835) had members from all 16 species. (XLSX 15 kb)

File S7

Statistical summary of orthogroup analysis results for each species. The species with the lowest amount of genes assigned to orthogroups was P. patens, with only 57.5% of sequences assigned; the highest was P. glauca with 92.4%. 90.4% of our WRC gene set was successfully assigned to orthogroups; 0.1% of WRC sequences were in species-specific orthogroups. (XLSX 21 kb)

File S8

Summary of orthogroup composition and function. The number of orthogroup members from each species, together with the total number of genes in each orthogroup and the top five PFAM hits for each orthogroup. The largest orthogroup, with 3201 genes consisted mainly of pentatricopetide-repeat containing protein-coding genes, a large protein family in plants with little functional redundancy (Lurin et al. 2004). (XLSX 1665 kb)

File S9

Sequence data for 33 putative full-length TPS genes from the WRC gene set. Putative TPS were identified using BLASTp, InterProScan and orthogroup analysis, and after removal of partial ORFs and proteins less than 400 aa long were reduced to a set of 33 putative full-length TPS. (TXT 26 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shalev, T.J., Yuen, M.M.S., Gesell, A. et al. An annotated transcriptome of highly inbred Thuja plicata (Cupressaceae) and its utility for gene discovery of terpenoid biosynthesis and conifer defense. Tree Genetics & Genomes 14, 35 (2018). https://doi.org/10.1007/s11295-018-1248-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11295-018-1248-y

Keywords

Navigation