Skip to main content
Log in

Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4

  • Short Communication
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript


Populus species are widely distributed across the Northern Hemisphere. The genetic diversity makes the genus an ideal study system for traits of ecological or agronomic significance. However, sequence variation between the genome-sequenced Populus trichocarpa Nisqually-1 and many other Populus species and hybrids poses significant challenges for research that employs sequence-sensitive approaches, such as next-generation sequencing and site-specific genome editing. Using the routinely transformed genotype Populus tremula × alba 717-1B4 as a test case, we utilized established variant-calling pipelines with affordable re-sequencing (~20×) and publicly available transcriptome data to generate a variant-substituted custom genome (sPta717). The sPta717 genome harbors over 10 million SNPs or small indels relative to the P. trichocarpa v3 reference genome. When applied to RNA-Seq analysis, the fraction of uniquely mapped reads increased by 13–28 % relative to that obtained with the P. trichocarpa reference genome, depending on read length and sequence type. The enhanced mapping rates enabled detection of several hundred more expressed genes and improved the differential expression analysis. Similar improvements were observed for DNA-Seq and ChIP-Seq data mapping. The sPta717 genome is also instrumental in guide RNA (gRNA) design for CRISPR-mediated genome editing. We showed that a majority of gRNAs designed from the P. trichocarpa reference genome contain mismatches with the corresponding target sequences of sPta717, likely rendering those gRNAs ineffective in transgenic 717. A website is provided for querying the sPta717 genome by gene model or homology search. The same approach should be applicable to other outcrossing species with a closely related reference genome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2


Download references


We would like to thank Vanessa Michelizzi for genomic DNA extraction and RNA pooling, Roger Nelson from the Georgia Genomics Facility for assistance in DNA library preparation, IntengenX for providing the necessary kits and reagents for demo runs on the Apollo 324 automated system, Patrick Breen for Trinity-assembled 717 transcripts, and Scott Harding for critical reading of the manuscript. This work was supported in part by the Department of Energy, Office of Biological and Environmental Research (grant no. DE-SC0008470), and by the Georgia Research Alliance-Hank Haynes Forest Biotechnology endowment.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Chung-Jui Tsai.

Additional information

Communicated by A. Brunner

This article is part of the Topical Collection on Genome Biology

Supplementary materials

Below is the link to the electronic supplementary material.

Supplemental 1

Table S1. Tissue sources of total RNA used for cDNA-primed genome amplification. Table S2. List of NGS datasets used in this study. Table S3. Identification of 717 genomic variants. Table S4. Number of expressed genes detected using the two different genomes. Table S5. Mapping rates of DNA-Seq and ChIP-Seq reads. Table S6. Re-annotation of Affymetrix probe-sets using the sPta717 genome (DOCX 28.3 kb)

Supplemental 2

Figure S1. Comparisons of bark and xylem RNA-Seq data analysis using the variant-substituted P. tremula x abla 717 (sPta717) genome or the P. trichocarpa (Ptr_v3) reference genome. (a-b) Transcript abundance in bark (a) and xylem (b). Genes with significantly different FPKM values are highlighted in red (higher in sPta717) or blue (higher in Ptr_v3). (c-d) Transcriptional response of bark (c) and xylem (d) to drought. Genes are color-coded if they were found to exhibit significant differences by either genome (black), by sPta717 only (red), by Ptr_v3 (blue) or neither (gray). Significant difference threshold was Q ≤0.05 and fold change ≥2. (PDF 2.75 mb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, LJ., Alabady, M.S., Mohebbi, M. et al. Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4. Tree Genetics & Genomes 11, 82 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: