Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4


Populus species are widely distributed across the Northern Hemisphere. The genetic diversity makes the genus an ideal study system for traits of ecological or agronomic significance. However, sequence variation between the genome-sequenced Populus trichocarpa Nisqually-1 and many other Populus species and hybrids poses significant challenges for research that employs sequence-sensitive approaches, such as next-generation sequencing and site-specific genome editing. Using the routinely transformed genotype Populus tremula × alba 717-1B4 as a test case, we utilized established variant-calling pipelines with affordable re-sequencing (~20×) and publicly available transcriptome data to generate a variant-substituted custom genome (sPta717). The sPta717 genome harbors over 10 million SNPs or small indels relative to the P. trichocarpa v3 reference genome. When applied to RNA-Seq analysis, the fraction of uniquely mapped reads increased by 13–28 % relative to that obtained with the P. trichocarpa reference genome, depending on read length and sequence type. The enhanced mapping rates enabled detection of several hundred more expressed genes and improved the differential expression analysis. Similar improvements were observed for DNA-Seq and ChIP-Seq data mapping. The sPta717 genome is also instrumental in guide RNA (gRNA) design for CRISPR-mediated genome editing. We showed that a majority of gRNAs designed from the P. trichocarpa reference genome contain mismatches with the corresponding target sequences of sPta717, likely rendering those gRNAs ineffective in transgenic 717. A website is provided for querying the sPta717 genome by gene model or homology search. The same approach should be applicable to other outcrossing species with a closely related reference genome.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169

  2. Babst BA, Chen H-Y, Wang H-Q, Payyavula RS, Thomas TP, Harding SA, Tsai C-J (2014) Stress responsive Populus hydroxycinnamate glycosyltransferase modulates phenylpropanoid metabolism. J Exp Bot 65:4191–4200

  3. Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158

  4. DePristo MA et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498

  5. ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia of DNA elements) project. Science 306:636–640

  6. Evans LM et al (2014) Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet 46:1089–1096. doi:10.1038/ng.3075

  7. Frost CJ, Nyamdari B, Tsai C-J, Harding SA (2012) The tonoplast-localized sucrose transporter in Populus (PtaSUT4) regulates whole-plant water relations, responses to water stress, and photosynthesis. PLoS One 7:e44467

  8. Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652

  9. Heigwer F, Kerr G, Boutros M (2014) E-CRISP: fast CRISPR target site identification. Nat Methods 11:122–124

  10. Hsu PD et al (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827–832

  11. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337:816–821

  12. Kelleher CT et al (2007) A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J 50:1063–1078. doi:10.1111/j.1365-313X.2007.03112.x

  13. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. doi:10.1186/gb-2013-14-4-r36

  14. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359

  15. Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL (2014) CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. Mol Plant 7:1494–1496

  16. Leple JC, Brasileiro ACM, Michel MF, Delmotte F, Jouanin L (1992) Transgenic poplars: expression of chimeric genes using four different constructs. Plant Cell Rep 11:137–141

  17. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079

  18. Liu G et al (2003) NetAffx: affymetrix probesets and annotations. Nucleic Acids Res 31:82–86

  19. Liu L, Missirian V, Zinkgraf M, Groover A, Filkov V (2014) Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees. BMC Genomics 15:S3

  20. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. doi:10.1186/s13059-014-0550-8

  21. Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:2

  22. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–264

  23. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033

  24. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26

  25. Rozowsky J et al (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 7:522

  26. Schmitz RJ et al (2013) Patterns of population epigenomic diversity. Nature 495:193–198

  27. Tennakoon C, Purbojati RW, Sung W-K (2012) BatMis: a fast algorithm for k-mismatch mapping. Bioinformatics 28:2122–2128

  28. The 1000 Genome Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65

  29. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31:46–53

  30. Tsai C-J, Ranjan P, DiFazio SP, Tuskan GA, Johnson V (2011) Poplar genome microarrays. In: Joshi CP, DiFazio SP, Kole C (eds) Genetics, Genomics and Breeding of Poplars. Science Publishers, Enfield, NH, pp 112–127

  31. Tuskan GA et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. doi:10.1126/science.1128691

  32. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530

  33. Voelker SL et al (2010) Antisense down-regulation of 4CL expression alters lignification, tree growth, and saccharification potential of field-grown poplar. Plant Physiol 154:874–886. doi:10.1104/pp. 110.159269

  34. Wang HQ, Tuominen LK, Tsai CJ (2011) SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics 27:225–231. doi:10.1093/bioinformatics/btq650

  35. Weigel D, Mott R (2009) The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10:107

  36. Wickett NJ et al (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 111:E4859–E4868

  37. Xue L-J et al (2013) Constitutively elevated salicylic acid levels alter photosynthesis and oxidative state, but not growth in transgenic Populus. Plant Cell 25:2714–2730

  38. Zhou X, Jacobs TB, Xue L-J, Harding SA, Tsai C-J (2015) Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy. New Phytol. doi:10.1111/nph.13470

Download references


We would like to thank Vanessa Michelizzi for genomic DNA extraction and RNA pooling, Roger Nelson from the Georgia Genomics Facility for assistance in DNA library preparation, IntengenX for providing the necessary kits and reagents for demo runs on the Apollo 324 automated system, Patrick Breen for Trinity-assembled 717 transcripts, and Scott Harding for critical reading of the manuscript. This work was supported in part by the Department of Energy, Office of Biological and Environmental Research (grant no. DE-SC0008470), and by the Georgia Research Alliance-Hank Haynes Forest Biotechnology endowment.

Author information

Correspondence to Chung-Jui Tsai.

Additional information

This article is part of the Topical Collection on Genome Biology

Communicated by A. Brunner

Supplementary materials

Below is the link to the electronic supplementary material.

Supplemental 1

Table S1. Tissue sources of total RNA used for cDNA-primed genome amplification. Table S2. List of NGS datasets used in this study. Table S3. Identification of 717 genomic variants. Table S4. Number of expressed genes detected using the two different genomes. Table S5. Mapping rates of DNA-Seq and ChIP-Seq reads. Table S6. Re-annotation of Affymetrix probe-sets using the sPta717 genome (DOCX 28.3 kb)

Supplemental 2

Figure S1. Comparisons of bark and xylem RNA-Seq data analysis using the variant-substituted P. tremula x abla 717 (sPta717) genome or the P. trichocarpa (Ptr_v3) reference genome. (a-b) Transcript abundance in bark (a) and xylem (b). Genes with significantly different FPKM values are highlighted in red (higher in sPta717) or blue (higher in Ptr_v3). (c-d) Transcriptional response of bark (c) and xylem (d) to drought. Genes are color-coded if they were found to exhibit significant differences by either genome (black), by sPta717 only (red), by Ptr_v3 (blue) or neither (gray). Significant difference threshold was Q ≤0.05 and fold change ≥2. (PDF 2.75 mb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xue, L., Alabady, M.S., Mohebbi, M. et al. Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4. Tree Genetics & Genomes 11, 82 (2015). https://doi.org/10.1007/s11295-015-0907-5

Download citation


  • Re-sequencing
  • SNP
  • Substituted genome
  • RNA-Seq