Advertisement

Variant Calling Using Whole Genome Resequencing and Sequence Capture for Population and Evolutionary Genomic Inferences in Norway Spruce (Picea Abies)

  • Carolina Bernhardsson
  • Xi Wang
  • Helena Eklöf
  • Pär K. IngvarssonEmail author
Chapter
  • 146 Downloads
Part of the Compendium of Plant Genomes book series (CPG)

Abstract

Advances in next-generation sequencing methods and the development of new statistical and computational methods have opened up possibilities for large-scale, high-quality genotyping in most organisms. Conifer genomes are large and are known to contain a high fraction of repetitive elements and this complex genome structure has bearings for approaches that aim to use next-generation sequencing methods for genotyping. In this chapter, we provide a detailed description of a workflow for variant calling using next-generation sequencing in Norway spruce (Picea abies). The workflow starts with raw sequencing reads and proceeds through read mapping to variant calling and variant filtering. We illustrate the pipeline using data derived from both whole-genome resequencing data and reduced representation sequencing. We highlight possible problems and pitfalls of using next-generation sequencing data for genotyping stemming from the complex genome structure of conifers and how those issues can be mitigated or eliminated.

Keywords

Genotyping Next-generation sequencing Norway spruce Variant calling Variant filtering 

References

  1. 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467(7319):1061Google Scholar
  2. Altmann A, Weber P, Bader D, Preuß M, Binder EB, Müller-Myhsok B (2012) A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum Genet 131(10):1541–1554PubMedGoogle Scholar
  3. Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92PubMedPubMedCentralGoogle Scholar
  4. Baison J, Vidalis A, Zhou L, Chen Z-Q, Li Z, Sillanpää MJ, Bernhardsson C, Scofield D, Forsberg N, Grahn T et al (2019) Genome-wide association study identified novel candidate loci affecting wood formation in Norway spruce. Plant J 100:83–100CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ (2011) Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 56:406–414PubMedGoogle Scholar
  6. Bernhardsson C, Vidalis A, Wang X, Scofield DG, Schiffthaler B, Baison J, Street NR, García-Gil MR, Ingvarsson PK (2019) An ultra-dense haploid genetic map for evaluating the highly fragmented genome assembly of Norway spruce (Picea abies). Genes Genomes Genet 9:1623–1632Google Scholar
  7. Britten RJ (2010) Transposable element insertions have strongly affected human evolution. Proc Natl Acad Sci 107(46):19945–19948PubMedGoogle Scholar
  8. Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. SRC research report 124. http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-124.pdf
  9. Charlesworth D (2006) Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet 2:e64PubMedPubMedCentralGoogle Scholar
  10. Chen J, Källman T, Ma X, Gyllenstrand N, Zaina G et al (2012) Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics 191:865–881PubMedPubMedCentralGoogle Scholar
  11. Cossu RM, Casola C, Giacomello S, Vidalis A, Scofield DG, Zuccolo A (2017) LTR retrotransposons show low levels of unequal recombination and high rates of intraelement gene conversion in large plant genomes. Genome Biol Evol 9:3449–3462PubMedPubMedCentralGoogle Scholar
  12. Dale RK, Pedersen BS, Quinlan AR (2011) Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27(24):3423–3424PubMedPubMedCentralGoogle Scholar
  13. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158PubMedPubMedCentralGoogle Scholar
  14. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, et al (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43: 491–498Google Scholar
  15. Escalona M, Rocha S, Posada D (2016) A comparison of tools for the simulation of genomic next-generation sequencing data. Nat Rev Genet 17(8):459PubMedPubMedCentralGoogle Scholar
  16. Ebbert MT, Wadsworth ME, Staley LA, Hoyt KL, Pickett B, Miller J, Duce J, Kauwe JS, Ridge PG (2016) Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches. BMC Bioinform 17(7):239Google Scholar
  17. Flicek P (2009) The need for speed. Genome Biol 10(3):212PubMedPubMedCentralGoogle Scholar
  18. Flicek P, Birney E (2010) Sense from sequence reads: methods for alignment and assembly. Nat Methods 7(6):479Google Scholar
  19. Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio.GN]
  20. Gayral P, Melo-Ferreira J, Glémin S, Bierne N, Carneiro M, Nabholz B et al (2013) Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. PLoS Genet 9:e1003457PubMedPubMedCentralGoogle Scholar
  21. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17(6):333PubMedGoogle Scholar
  22. Hartl DL, Clark AG (1989) Principles of population genetics. Sinauer AssociatesGoogle Scholar
  23. Homer N, Nelson SF (2010) Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA. Genome Biol 11(10):R99PubMedPubMedCentralGoogle Scholar
  24. Hua-Van A, Le Rouzic A, Boutin TS, Filée J, Capy P (2011) The struggle for life of the genome’s selfish architects. Biol Direct 6(1):19PubMedPubMedCentralGoogle Scholar
  25. Heuertz M, De Paoli E, Källman T, Larsson H, Jurman I et al (2006) Multilocus patterns of nucleotide diversity, linkage dis-equilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics 174:2095–2105PubMedPubMedCentralGoogle Scholar
  26. Hung JH, Weng Z (2017) Mapping billions of short reads to a reference genome. Cold Spring Harb Protoc 2017(1):pdb-top093153Google Scholar
  27. Jurka J, Kapitonov VV, Kohany O, Jurka MV (2007) Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet 8:241–259PubMedGoogle Scholar
  28. Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, Grubert F, Chen X, Weissman S, Snyder M, Gerstein MB (2008) Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history. Genome Res 18(12):1865–1874PubMedPubMedCentralGoogle Scholar
  29. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK (2012) VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22(3):568–576PubMedPubMedCentralGoogle Scholar
  30. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25PubMedPubMedCentralGoogle Scholar
  31. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14):1754–1760PubMedPubMedCentralGoogle Scholar
  32. Li H, Ruan J, Durbin R (2008a) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858PubMedPubMedCentralGoogle Scholar
  33. Li R, Li Y, Kristiansen K, Wang J (2008b) SOAP: short oligonucleotide alignment program. Bioinformatics 24(5):713–714PubMedGoogle Scholar
  34. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009a) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966–1967PubMedGoogle Scholar
  35. Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J (2009b) SNP detection for massively parallel whole-genome resequencing. Genome Res 19(6):1124–1132PubMedPubMedCentralGoogle Scholar
  36. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009c) The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078–2079PubMedPubMedCentralGoogle Scholar
  37. Li Y, Chen W, Liu EY, Zhou YH (2013) Single nucleotide polymorphism (SNP) detection and genotype calling from massively parallel sequencing (MPS) data. Stat Biosci 5(1):3–25PubMedGoogle Scholar
  38. Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012:251364PubMedPubMedCentralGoogle Scholar
  39. Liu X, Han S, Wang Z, Gelernter J, Yang B-Z (2013) Variant callers for next-generation sequencing data: a comparison study. PLoS ONE 8:e75619Google Scholar
  40. Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21(6):936–939PubMedPubMedCentralGoogle Scholar
  41. Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141PubMedGoogle Scholar
  42. Mardis ER (2017) DNA sequencing technologies: 2006–2016. Nat Protoc 12(2):213PubMedGoogle Scholar
  43. Martin ER, Kinnamon DD, Schmidt MA, Powell EH, Zuchner S, Morris RW (2010) SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies. Bioinformatics 26(22):2803–2810PubMedPubMedCentralGoogle Scholar
  44. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20(9):1297–1303PubMedPubMedCentralGoogle Scholar
  45. McKinney GJ, Waples RK, Seeb LW, Seeb JE (2017) Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping-by-sequencing data from natural populations. Mol Ecol Resour 17:656–669PubMedGoogle Scholar
  46. Morozova O, Marra MA (2008) Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5):255–264PubMedGoogle Scholar
  47. Mielczarek M, Szyda J (2016) Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 57(1):71–79PubMedGoogle Scholar
  48. Neale DB, McGuire PE, Wheeler NC, Stevens KA, Crepeau MW, Cardeno C, Zimin AV, Puiu D, Pertea GM, Sezen UU et al (2017) The Douglas-Fir genome sequence reveals specialization of the photosynthetic apparatus in Pinaceae. G3 7:3157–3167PubMedGoogle Scholar
  49. Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12(6):443PubMedPubMedCentralGoogle Scholar
  50. Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin YC, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R et al (2013) The Norway spruce genome sequence and conifer genome evolution. Nature 497:579PubMedGoogle Scholar
  51. Ojeda D, Mattila T, Ruttink T, Kujala S, Kärkkäinen K, Verta J-P, Pyhajarvi T (2018) Utilization of tissue ploidy level variation in de novo transcriptome assembly of Pinus sylvestris 495689Google Scholar
  52. Pirooznia M, Kramer M, Parla J, Goes FS, Potash JB, McCombie WR, Zandi PP (2014) Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics 8:14PubMedPubMedCentralGoogle Scholar
  53. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842PubMedPubMedCentralGoogle Scholar
  54. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  55. Rumble SM, Lacroute P, Dalca AV, Fiume M, Sidow A, Brudno M (2009) SHRiMP: accurate mapping of short color-space reads. PLoS Comput Biol 5:e1000386PubMedPubMedCentralGoogle Scholar
  56. Schuster SC (2007) Next-generation sequencing transforms today’s biology. Nat Methods 5:16PubMedGoogle Scholar
  57. Stevens KA, Wegrzyn JL, Zimin A, Puiu D, Crepeau M, Cardeno C, Paul R, Gonzalez-Ibeas D, Koriabine M, Holtz-Morris AE et al (2016) Sequence of the sugar pine megagenome. Genetics 204:1613–1626PubMedPubMedCentralGoogle Scholar
  58. Syvänen A-C (2005) Toward genome-wide SNP genotyping. Nat Genet 37:S5–S10PubMedGoogle Scholar
  59. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595PubMedPubMedCentralGoogle Scholar
  60. Trapnell C, Salzberg SL (2009) How to map billions of short reads onto genomes. Nat Biotechnol 27(5):455PubMedPubMedCentralGoogle Scholar
  61. Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13(1):36Google Scholar
  62. Vidalis A, Scofield DG, Neves LG, Bernhardsson C, García-Gil MR, Ingvarsson PK (2018) Design and evaluation of a large sequence-capture probe set and associated SNPs for diploid and haploid samples of Norway spruce (Picea abies) 291716Google Scholar
  63. Wang J, Scofield D, Street NR, Ingvarsson PK (2015) Variant calling using NGS data in European aspen (Populus tremula). In: Sablo G, Kumar S, Ueno S, Kuo J, Varotto C (eds) Advances in the understanding of biological sciences using next generation sequencing (NGS) approaches, pp 43–61. Springer, NYGoogle Scholar
  64. Zimin A, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M, Marçais G, Puiu D, Roberts M, Wegrzyn JL, de Jong PJ et al (2014) Sequencing and assembly of the 22-gb Loblolly pine genome. Genetics 196:875–890PubMedPubMedCentralGoogle Scholar
  65. Zimin AV, Stevens KA, Crepeau MW, Puiu D, Wegrzyn JL, Yorke JA, Langley CH, Neale DB, Salzberg SL (2017) An improved assembly of the Loblolly pine mega-genome using long-read single-molecule sequencing. GigaScience 6:1–4PubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Carolina Bernhardsson
    • 1
    • 2
  • Xi Wang
    • 1
    • 2
  • Helena Eklöf
    • 1
  • Pär K. Ingvarsson
    • 1
    Email author
  1. 1.Department of Plant BiologyLinnean Centre for Plant Biology, Swedish University of Agricultural SciencesUppsalaSweden
  2. 2.Department of Ecology and Environmental ScienceUmeå Plant Science Centre, Umeå UniversityUmeåSweden

Personalised recommendations