Tree Genetics & Genomes

, 11:82 | Cite as

Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula × alba 717-1B4

  • Liang-Jiao Xue
  • Magdy S. Alabady
  • Mohammad Mohebbi
  • Chung-Jui Tsai
Short Communication
Part of the following topical collections:
  1. Genome Biology

Abstract

Populus species are widely distributed across the Northern Hemisphere. The genetic diversity makes the genus an ideal study system for traits of ecological or agronomic significance. However, sequence variation between the genome-sequenced Populus trichocarpa Nisqually-1 and many other Populus species and hybrids poses significant challenges for research that employs sequence-sensitive approaches, such as next-generation sequencing and site-specific genome editing. Using the routinely transformed genotype Populus tremula × alba 717-1B4 as a test case, we utilized established variant-calling pipelines with affordable re-sequencing (~20×) and publicly available transcriptome data to generate a variant-substituted custom genome (sPta717). The sPta717 genome harbors over 10 million SNPs or small indels relative to the P. trichocarpa v3 reference genome. When applied to RNA-Seq analysis, the fraction of uniquely mapped reads increased by 13–28 % relative to that obtained with the P. trichocarpa reference genome, depending on read length and sequence type. The enhanced mapping rates enabled detection of several hundred more expressed genes and improved the differential expression analysis. Similar improvements were observed for DNA-Seq and ChIP-Seq data mapping. The sPta717 genome is also instrumental in guide RNA (gRNA) design for CRISPR-mediated genome editing. We showed that a majority of gRNAs designed from the P. trichocarpa reference genome contain mismatches with the corresponding target sequences of sPta717, likely rendering those gRNAs ineffective in transgenic 717. A website is provided for querying the sPta717 genome by gene model or homology search. The same approach should be applicable to other outcrossing species with a closely related reference genome.

Keywords

Re-sequencing SNP Substituted genome RNA-Seq CRISPR 

Supplementary material

11295_2015_907_MOESM1_ESM.docx (28 kb)
Supplemental 1Table S1. Tissue sources of total RNA used for cDNA-primed genome amplification. Table S2. List of NGS datasets used in this study. Table S3. Identification of 717 genomic variants. Table S4. Number of expressed genes detected using the two different genomes. Table S5. Mapping rates of DNA-Seq and ChIP-Seq reads. Table S6. Re-annotation of Affymetrix probe-sets using the sPta717 genome (DOCX 28.3 kb)
11295_2015_907_MOESM2_ESM.pdf (2.8 mb)
Supplemental 2Figure S1. Comparisons of bark and xylem RNA-Seq data analysis using the variant-substituted P. tremula x abla 717 (sPta717) genome or the P. trichocarpa (Ptr_v3) reference genome. (a-b) Transcript abundance in bark (a) and xylem (b). Genes with significantly different FPKM values are highlighted in red (higher in sPta717) or blue (higher in Ptr_v3). (c-d) Transcriptional response of bark (c) and xylem (d) to drought. Genes are color-coded if they were found to exhibit significant differences by either genome (black), by sPta717 only (red), by Ptr_v3 (blue) or neither (gray). Significant difference threshold was Q ≤0.05 and fold change ≥2. (PDF 2.75 mb)

References

  1. Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169PubMedCentralPubMedCrossRefGoogle Scholar
  2. Babst BA, Chen H-Y, Wang H-Q, Payyavula RS, Thomas TP, Harding SA, Tsai C-J (2014) Stress responsive Populus hydroxycinnamate glycosyltransferase modulates phenylpropanoid metabolism. J Exp Bot 65:4191–4200PubMedCentralPubMedCrossRefGoogle Scholar
  3. Danecek P et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158PubMedCentralPubMedCrossRefGoogle Scholar
  4. DePristo MA et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–498PubMedCentralPubMedCrossRefGoogle Scholar
  5. ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia of DNA elements) project. Science 306:636–640CrossRefGoogle Scholar
  6. Evans LM et al (2014) Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat Genet 46:1089–1096. doi:10.1038/ng.3075 PubMedCrossRefGoogle Scholar
  7. Frost CJ, Nyamdari B, Tsai C-J, Harding SA (2012) The tonoplast-localized sucrose transporter in Populus (PtaSUT4) regulates whole-plant water relations, responses to water stress, and photosynthesis. PLoS One 7:e44467PubMedCentralPubMedCrossRefGoogle Scholar
  8. Grabherr MG et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652PubMedCentralPubMedCrossRefGoogle Scholar
  9. Heigwer F, Kerr G, Boutros M (2014) E-CRISP: fast CRISPR target site identification. Nat Methods 11:122–124PubMedCrossRefGoogle Scholar
  10. Hsu PD et al (2013) DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31:827–832PubMedCentralPubMedCrossRefGoogle Scholar
  11. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337:816–821PubMedCrossRefGoogle Scholar
  12. Kelleher CT et al (2007) A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J 50:1063–1078. doi:10.1111/j.1365-313X.2007.03112.x PubMedCrossRefGoogle Scholar
  13. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. doi:10.1186/gb-2013-14-4-r36 PubMedCentralPubMedCrossRefGoogle Scholar
  14. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359PubMedCentralPubMedCrossRefGoogle Scholar
  15. Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL (2014) CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. Mol Plant 7:1494–1496PubMedCrossRefGoogle Scholar
  16. Leple JC, Brasileiro ACM, Michel MF, Delmotte F, Jouanin L (1992) Transgenic poplars: expression of chimeric genes using four different constructs. Plant Cell Rep 11:137–141PubMedCrossRefGoogle Scholar
  17. Li H et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079PubMedCentralPubMedCrossRefGoogle Scholar
  18. Liu G et al (2003) NetAffx: affymetrix probesets and annotations. Nucleic Acids Res 31:82–86PubMedCentralPubMedCrossRefGoogle Scholar
  19. Liu L, Missirian V, Zinkgraf M, Groover A, Filkov V (2014) Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees. BMC Genomics 15:S3PubMedCentralPubMedCrossRefGoogle Scholar
  20. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. doi:10.1186/s13059-014-0550-8 PubMedCentralPubMedCrossRefGoogle Scholar
  21. Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Genome 6:2CrossRefGoogle Scholar
  22. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255–264PubMedCrossRefGoogle Scholar
  23. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033PubMedCentralPubMedCrossRefGoogle Scholar
  24. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29:24–26PubMedCentralPubMedCrossRefGoogle Scholar
  25. Rozowsky J et al (2011) AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol 7:522PubMedCentralPubMedCrossRefGoogle Scholar
  26. Schmitz RJ et al (2013) Patterns of population epigenomic diversity. Nature 495:193–198PubMedCentralPubMedCrossRefGoogle Scholar
  27. Tennakoon C, Purbojati RW, Sung W-K (2012) BatMis: a fast algorithm for k-mismatch mapping. Bioinformatics 28:2122–2128PubMedCrossRefGoogle Scholar
  28. The 1000 Genome Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65CrossRefGoogle Scholar
  29. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L (2013) Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat Biotechnol 31:46–53PubMedCrossRefGoogle Scholar
  30. Tsai C-J, Ranjan P, DiFazio SP, Tuskan GA, Johnson V (2011) Poplar genome microarrays. In: Joshi CP, DiFazio SP, Kole C (eds) Genetics, Genomics and Breeding of Poplars. Science Publishers, Enfield, NH, pp 112–127CrossRefGoogle Scholar
  31. Tuskan GA et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. doi:10.1126/science.1128691 PubMedCrossRefGoogle Scholar
  32. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530PubMedCrossRefGoogle Scholar
  33. Voelker SL et al (2010) Antisense down-regulation of 4CL expression alters lignification, tree growth, and saccharification potential of field-grown poplar. Plant Physiol 154:874–886. doi:10.1104/pp. 110.159269 PubMedCentralPubMedCrossRefGoogle Scholar
  34. Wang HQ, Tuominen LK, Tsai CJ (2011) SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics 27:225–231. doi:10.1093/bioinformatics/btq650 PubMedCrossRefGoogle Scholar
  35. Weigel D, Mott R (2009) The 1001 genomes project for Arabidopsis thaliana. Genome Biol 10:107PubMedCentralPubMedCrossRefGoogle Scholar
  36. Wickett NJ et al (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 111:E4859–E4868PubMedCentralPubMedCrossRefGoogle Scholar
  37. Xue L-J et al (2013) Constitutively elevated salicylic acid levels alter photosynthesis and oxidative state, but not growth in transgenic Populus. Plant Cell 25:2714–2730PubMedCentralPubMedCrossRefGoogle Scholar
  38. Zhou X, Jacobs TB, Xue L-J, Harding SA, Tsai C-J (2015) Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy. New Phytol. doi:10.1111/nph.13470 PubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.School of Forestry and Natural ResourcesUniversity of GeorgiaAthensUSA
  2. 2.Department of GeneticsUniversity of GeorgiaAthensUSA
  3. 3.Institute of BioinformaticsUniversity of GeorgiaAthensUSA
  4. 4.Department of Plant BiologyUniversity of GeorgiaAthensUSA
  5. 5.Department of Computer ScienceUniversity of GeorgiaAthensUSA

Personalised recommendations