Abstract
Populus species are widely distributed across the Northern Hemisphere. The genetic diversity makes the genus an ideal study system for traits of ecological or agronomic significance. However, sequence variation between the genome-sequenced Populus trichocarpa Nisqually-1 and many other Populus species and hybrids poses significant challenges for research that employs sequence-sensitive approaches, such as next-generation sequencing and site-specific genome editing. Using the routinely transformed genotype Populus tremula × alba 717-1B4 as a test case, we utilized established variant-calling pipelines with affordable re-sequencing (~20×) and publicly available transcriptome data to generate a variant-substituted custom genome (sPta717). The sPta717 genome harbors over 10 million SNPs or small indels relative to the P. trichocarpa v3 reference genome. When applied to RNA-Seq analysis, the fraction of uniquely mapped reads increased by 13–28 % relative to that obtained with the P. trichocarpa reference genome, depending on read length and sequence type. The enhanced mapping rates enabled detection of several hundred more expressed genes and improved the differential expression analysis. Similar improvements were observed for DNA-Seq and ChIP-Seq data mapping. The sPta717 genome is also instrumental in guide RNA (gRNA) design for CRISPR-mediated genome editing. We showed that a majority of gRNAs designed from the P. trichocarpa reference genome contain mismatches with the corresponding target sequences of sPta717, likely rendering those gRNAs ineffective in transgenic 717. A website is provided for querying the sPta717 genome by gene model or homology search. The same approach should be applicable to other outcrossing species with a closely related reference genome.
References
Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169
Babst BA, Chen H-Y, Wang H-Q, Payyavula RS, Thomas TP, Harding SA, Tsai C-J (2014) Stress responsive Populus hydroxycinnamate glycosyltransferase modulates phenylpropanoid metabolism. J Exp Bot 65:4191–4200
Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
DePristo MA et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498
ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia of DNA elements) project. Science 306:636–640
Evans LM et al (2014) Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet 46:1089–1096. doi:10.1038/ng.3075
Frost CJ, Nyamdari B, Tsai C-J, Harding SA (2012) The tonoplast-localized sucrose transporter in Populus (PtaSUT4) regulates whole-plant water relations, responses to water stress, and photosynthesis. PLoS One 7:e44467
Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Heigwer F, Kerr G, Boutros M (2014) E-CRISP: fast CRISPR target site identification. Nat Methods 11:122–124
Hsu PD et al (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827–832
Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337:816–821
Kelleher CT et al (2007) A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J 50:1063–1078. doi:10.1111/j.1365-313X.2007.03112.x
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. doi:10.1186/gb-2013-14-4-r36
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359
Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL (2014) CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. Mol Plant 7:1494–1496
Leple JC, Brasileiro ACM, Michel MF, Delmotte F, Jouanin L (1992) Transgenic poplars: expression of chimeric genes using four different constructs. Plant Cell Rep 11:137–141
Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Liu G et al (2003) NetAffx: affymetrix probesets and annotations. Nucleic Acids Res 31:82–86
Liu L, Missirian V, Zinkgraf M, Groover A, Filkov V (2014) Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees. BMC Genomics 15:S3
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. doi:10.1186/s13059-014-0550-8
Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:2
Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–264
Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033
Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26
Rozowsky J et al (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 7:522
Schmitz RJ et al (2013) Patterns of population epigenomic diversity. Nature 495:193–198
Tennakoon C, Purbojati RW, Sung W-K (2012) BatMis: a fast algorithm for k-mismatch mapping. Bioinformatics 28:2122–2128
The 1000 Genome Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31:46–53
Tsai C-J, Ranjan P, DiFazio SP, Tuskan GA, Johnson V (2011) Poplar genome microarrays. In: Joshi CP, DiFazio SP, Kole C (eds) Genetics, Genomics and Breeding of Poplars. Science Publishers, Enfield, NH, pp 112–127
Tuskan GA et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. doi:10.1126/science.1128691
Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530
Voelker SL et al (2010) Antisense down-regulation of 4CL expression alters lignification, tree growth, and saccharification potential of field-grown poplar. Plant Physiol 154:874–886. doi:10.1104/pp. 110.159269
Wang HQ, Tuominen LK, Tsai CJ (2011) SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics 27:225–231. doi:10.1093/bioinformatics/btq650
Weigel D, Mott R (2009) The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10:107
Wickett NJ et al (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 111:E4859–E4868
Xue L-J et al (2013) Constitutively elevated salicylic acid levels alter photosynthesis and oxidative state, but not growth in transgenic Populus. Plant Cell 25:2714–2730
Zhou X, Jacobs TB, Xue L-J, Harding SA, Tsai C-J (2015) Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy. New Phytol. doi:10.1111/nph.13470
Acknowledgments
We would like to thank Vanessa Michelizzi for genomic DNA extraction and RNA pooling, Roger Nelson from the Georgia Genomics Facility for assistance in DNA library preparation, IntengenX for providing the necessary kits and reagents for demo runs on the Apollo 324 automated system, Patrick Breen for Trinity-assembled 717 transcripts, and Scott Harding for critical reading of the manuscript. This work was supported in part by the Department of Energy, Office of Biological and Environmental Research (grant no. DE-SC0008470), and by the Georgia Research Alliance-Hank Haynes Forest Biotechnology endowment.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. Brunner
This article is part of the Topical Collection on Genome Biology
Supplementary materials
Below is the link to the electronic supplementary material.
Supplemental 1
Table S1. Tissue sources of total RNA used for cDNA-primed genome amplification. Table S2. List of NGS datasets used in this study. Table S3. Identification of 717 genomic variants. Table S4. Number of expressed genes detected using the two different genomes. Table S5. Mapping rates of DNA-Seq and ChIP-Seq reads. Table S6. Re-annotation of Affymetrix probe-sets using the sPta717 genome (DOCX 28.3 kb)
Supplemental 2
Figure S1. Comparisons of bark and xylem RNA-Seq data analysis using the variant-substituted P. tremula x abla 717 (sPta717) genome or the P. trichocarpa (Ptr_v3) reference genome. (a-b) Transcript abundance in bark (a) and xylem (b). Genes with significantly different FPKM values are highlighted in red (higher in sPta717) or blue (higher in Ptr_v3). (c-d) Transcriptional response of bark (c) and xylem (d) to drought. Genes are color-coded if they were found to exhibit significant differences by either genome (black), by sPta717 only (red), by Ptr_v3 (blue) or neither (gray). Significant difference threshold was Q ≤0.05 and fold change ≥2. (PDF 2.75 mb)
Rights and permissions
About this article
Cite this article
Xue, LJ., Alabady, M.S., Mohebbi, M. et al. Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4. Tree Genetics & Genomes 11, 82 (2015). https://doi.org/10.1007/s11295-015-0907-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11295-015-0907-5