Skip to main content
Log in

Next-Generation Sequencing Data Analysis on Pool-Seq and Low-Coverage Retinoblastoma Data

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Next-generation sequencing (NGS) is related to massively parallel or deep deoxyribonucleic acid (DNA) sequencing technology which has revolutionized genomic researches in recent years. Although the cost of generating NGS data was decreased compared to the one at the time of emerging this technology, its cost might still be somewhat a problem. Hence, new strategies as pool-seq and low-coverage NGS data have been developed to overcome the cost problem. Despite decreasing cost, it is important to elucidate whether they are efficient in NGS studies. We applied a bioinformatics pipeline on pool-seq and low-coverage retinoblastoma data retrieved from only tumor data. Retinoblastoma is an eye malignancy in childhood that is initiated by RB1 mutation or MYCN amplification and can lead to the loss of vision of eye(s), and even sometimes life. We applied our pipeline on both retinoblastoma disease data and two other particular data to testify the validity and also for comparison purposes in the aspect of performance. High-confidence variant calls from Genome in a Bottle Consortium were used for fulfilling these purposes. We observed that our pipeline successfully called higher number of variants than a standard pipeline for all these three different data. Besides, the recall and F-score values were quite better in our pipeline as being noteworthy. We further presented our results on disease data in the aspects of the variants, variant types and disease-related genes. This study provides a guideline for performing NGS data analysis pipeline on pool-seq and low-coverage sequencing data in conjunction. To get more conclusive outcomes of these two strategies, we recommend using cancer data having higher mutation rates and larger pools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Aerts I, Lumbroso-Le Rouic L, Gauthier-Villars M, Brisse H, Doz F, Desjardins L (2006) Retinoblastoma. Orphanet J Rare Dis 1:31. https://doi.org/10.1186/1750-1172-1-31

    Article  PubMed  PubMed Central  Google Scholar 

  2. Altmann A, Weber P, Quast C, Rex-Haffner M, Binder EB, Mueller-Myhsok B (2011) vipR: variant identification in pooled DNA using R. Bioinformatics 27(13):I77–I84. https://doi.org/10.1093/bioinformatics/btr205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Anand S, Mangano E, Barizzone N, Bordoni R, Sorosina M, Clarelli F, Corrado L, Martinelli Boneschi F, D’Alfonso S, De Bellis G (2016) Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci Rep 6:33735. https://doi.org/10.1038/srep33735

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs RA, Boerwinkle E, Doddapaneni H, Han Y, Korchina V, Kovar C, Lee S, Muzny D, Reid JG, Zhu Y, Wang J, Chang Y, Feng Q, Fang X, Guo X, Jian M, Jiang H, Jin X, Lan T, Li G, Li J, Li Y, Liu S, Liu X, Lu Y, Ma X, Tang M, Wang B, Wang G, Wu H, Wu R, Xu X, Yin Y, Zhang D, Zhang W, Zhao J, Zhao M, Zheng X, Lander ES, Altshuler DM, Gabriel SB, Gupta N, Gharani N, Toji LH, Gerry NP, Resch AM, Flicek P, Barker J, Clarke L, Gil L, Hunt SE, Kelman G, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Roa A, Smirnov D, Smith RE, Streeter I, Thormann A, Toneva I, Vaughan B, Zheng-Bradley X, Bentley DR, Grocock R, Humphray S, James T, Kingsbury Z, Lehrach H, Sudbrak R, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo ML, Mardis ER, Wilson RK, Fulton L, Fulton R, Sherry ST, Ananiev V, Belaia Z, Beloslyudtsev D, Bouk N, Chen C, Church D, Cohen R, Cook C, Garner J, Hefferon T, Kimelman M, Liu C, Lopez J, Meric P, O?Sullivan C, Ostapchuk Y, Phan L, Ponomarov S, Schneider V, Shekhtman E, Sirotkin K, Slotta D, Zhang H, McVean GA, Durbin RM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M, Schmidt JP, Davies CJ, Gollub J, Webster T, Wong B, Zhan Y, Auton A, Campbell CL, Kong Y, Marcketta A, Gibbs RA, Yu F, Antunes L, Bainbridge M, Muzny D, Sabo A, Huang Z, Wang J, Coin LJM, Fang L, Guo X, Jin X, Li G, Li Q, Li Y, Li Z, Lin H, Liu B, Luo R, Shao H, Xie Y, Ye C, Yu C, Zhang F, Zheng H, Zhu H, Alkan C, Dal E, Kahveci F, Marth GT, Garrison EP, Kural D, Lee WP, Fung Leong W, Stromberg M, Ward AN, Wu J, Zhang M, Daly MJ, DePristo MA, Handsaker RE, Altshuler DM, Banks E, Bhatia G, del Angel G, Gabriel SB, Genovese G, Gupta N, Li H, Kashin S, Lander ES, McCarroll SA, Nemesh JC, Poplin RE, Yoon SC, Lihm J, Makarov V, Clark AG, Gottipati S, Keinan A, Rodriguez-Flores JL, Korbel JO, Rausch T, Fritz MH, Stütz AM, Flicek P, Beal K, Clarke L, Datta A, Herrero J, McLaren WM, Ritchie GRS, Smith RE, Zerbino D, Zheng-Bradley X, Sabeti PC, Shlyakhter I, Schaffner SF, Vitti J, Cooper DN, Ball EV, Stenson PD, Bentley DR, Barnes B, Bauer M, Keira Cheetham R, Cox A, Eberle M, Humphray S, Kahn S, Murray L, Peden J, Shaw R, Kenny EE, Batzer MA, Konkel MK, Walker JA, MacArthur DG, Lek M, Sudbrak R, Amstislavskiy VS, Herwig R, Mardis ER, Ding L, Koboldt DC, Larson D, Ye K, Gravel S, Consortium TGP, authors C, committee S, group P, of Medicine BC, BGI-Shenzhen, of Broad Institute MIT, Harvard, for Medical Research CI, European Molecular Biology Laboratory EBI, Illumina, for Molecular Genetics MPI, at Washington University MGI, of Health USNI, of Oxford U, Institute WTS, group A, Affymetrix, of Medicine AEC, University B, College B, Laboratory CSH, University C, Laboratory EMB, University H, Database HGM, of Medicine at Mount Sinai IS, University LS, Hospital MG, University M, National Eye Institute NIH (2015) A global reference for human genetic variation. Nature 526(7571):68–74. https://doi.org/10.1038/nature15393

  5. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA (2013) From fastq data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protocols Bioinform 43(25431634):11.10.1–11.10.33. https://doi.org/10.1002/0471250953.bi1110s43

    Article  Google Scholar 

  6. Babraham-Bioinformatics (2019) Babraham bioinformatics - fastqc a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed: 2019 Mar 25

  7. Bansal V (2010) A statistical method for the detection of variants from next-generation resequencing of dna pools. Bioinformatics 26(12):i318–i324. https://doi.org/10.1093/bioinformatics/btq214

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bizon C, Spiegel M, Chasse SA, Gizer IR, Li Y, Malc EP, Mieczkowski PA, Sailsbery JK, Wang X, Ehlers CL, Wilhelmsen KC (2014) Variant calling in low-coverage whole genome sequencing of a native american population sample. BMC Genomics 15(1):85. https://doi.org/10.1186/1471-2164-15-85

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. ten Bosch JR, Grody WW (2008) Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J Mol Diagnostics 10:484–92. https://doi.org/10.2353/jmoldx.2008.080027

    Article  CAS  Google Scholar 

  10. Cornish A, Guda C (2015) A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. BioMed Res Int 2015:456479. https://doi.org/10.1155/2015/456479

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Devarajan B, Prakash L, Kannan TR, Abraham AA, Kim U, Muthukkaruppan V, Vanniarajan A (2015) Targeted next generation sequencing of rb1 gene for the molecular diagnosis of retinoblastoma. BMC Cancer 15:320. https://doi.org/10.1186/s12885-015-1340-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. ENA (2018) The european nucleotide archive (ena). https://www.ebi.ac.uk/ena/data/view/PRJEB6630. Accessed 2018 Oct 12

  13. Fang L, Hu J, Wang D, Wang K (2018) NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data. BMC Bioinform 19:180. https://doi.org/10.1186/s12859-018-2207-1

    Article  CAS  Google Scholar 

  14. García-Chequer A, Méndez-Tenorio A, Olguín-Ruiz G, Sánchez-Vallejo C, Isa P, Arias C, Torres J, Hernández-Angeles A, Ramírez-Ortiz M, Lara C, Cabrera-Muñoz M, Sadowinski-Pine S, Bravo-Ortiz J, Ramón-García G, Diegopérez-Ramírez J, Ramírez-Reyes G, Casarrubias-Islas R, Ramírez J, Orjuela M, Ponce-Castañeda M (2016) Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing. Cancer Genet 209(3):57–69. https://doi.org/10.1016/j.cancergen.2015.12.001

    Article  CAS  PubMed  Google Scholar 

  15. Grotta S, D’Elia G, Scavelli R, Genovese S, Surace C, Sirleto P, Cozza R, Romanzo A, De Ioris MA, Valente P, Tomaiuolo AC, Lepri FR, Franchin T, Ciocca L, Russo S, Locatelli F, Angioni A (2015) Advantages of a next generation sequencing targeted approach for the molecular diagnosis of retinoblastoma. BMC Cancer 15:841. https://doi.org/10.1186/s12885-015-1854-0

    Article  PubMed  PubMed Central  Google Scholar 

  16. happy (2020) Illumina/hap.py: Haplotype vcf comparison tools. https://github.com/Illumina/hap.py. Accessed 2020 Mar 02

  17. Huang HW, Mullikin JC, Hansen NF, Program NISCCS (2015) Evaluation of variant detection software for pooled next-generation sequence data. BMC Bioinform. 16(1):235. https://doi.org/10.1186/s12859-015-0624-y

    Article  Google Scholar 

  18. Huang L, Wang B, Chen R, Bercovici S, Batzoglou S (2016) Reveel: large-scale population genotyping using low-coverage sequencing data. Bioinformatics 32(11):1686–1696. https://doi.org/10.1093/bioinformatics/btv530

    Article  CAS  PubMed  Google Scholar 

  19. Kofler R, Pandey RV, Schloetterer C (2011) PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27(24):3435–3436. https://doi.org/10.1093/bioinformatics/btr589

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kooi IE, Mol BM, Massink MPG, Ameziane N, Meijers-Heijboer H, Dommering CJ, van Mil SE, de Vries Y, van der Hout AH, Kaspers GJL, Moll AC, te Riele H, Cloos J, Dorsman JC (2016a) Somatic genomic alterations in retinoblastoma beyond rb1 are rare and limited to copy number changes. Sci Rep 6:25264. https://doi.org/10.1038/srep25264

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kooi IE, Mol BM, Massink MPG, de Jong MC, de Graaf P, van der Valk P, Meijers-Heijboer H, Kaspers GJL, Moll AC, Te Riele H, Cloos J, Dorsman JC (2016b) A meta-analysis of retinoblastoma copy numbers refines the list of possible driver genes involved in tumor progression. PloS One 11:e0153323. https://doi.org/10.1371/journal.pone.0153323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics (Oxford, England) 25:1754–60. https://doi.org/10.1093/bioinformatics/btp324

    Article  CAS  Google Scholar 

  23. Li H, Durbin R (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinformatics (Oxford, England) 26:589–95. https://doi.org/10.1093/bioinformatics/btp698

    Article  CAS  Google Scholar 

  24. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Data GP, Sam T (2009) The sequence alignment / map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Li WL, Buckley J, Sanchez-Lara PA, Maglinte DT, Viduetsky L, Tatarinova TV, Aparicio JG, Kim JW, Au M, Ostrow D, Lee TC, O’Gorman M, Judkins A, Cobrinik D, Triche TJ (2016) A rapid and sensitive next-generation sequencing method to detect rb1 mutations improves care for retinoblastoma patients and their families. J Mol Diagnostics 18(4):480–493. https://doi.org/10.1016/j.jmoldx.2016.02.006

    Article  CAS  Google Scholar 

  26. Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR (2011) Low-coverage sequencing: implications for design of complex trait association studies. Genome Res 21(21460063):940–951. https://doi.org/10.1101/gr.117259.110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Li Z, Wang Y, Wang F (2018) A study on fast calling variants from next-generation sequencing data using decision tree. BMC Bioinformatics 19(1):145. https://doi.org/10.1186/s12859-018-2147-9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. McKenna A, Hanna M, Banks E, Al E, (2010) The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res 20(9):1297–1303. https://doi.org/10.1101/gr.107524.110

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Navon O, Sul JH, Han B, Conde L, Bracci PM, Riby J, Skibola CF, Eskin E, Halperin E (2013) Rare variant association testing under low-coverage sequencing. Genetics 194(3):769. https://doi.org/10.1534/genetics.113.150169

    Article  PubMed  PubMed Central  Google Scholar 

  30. Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, Krabichler B, Speicher MR, Zschocke J, Trajanoski Z (2014) A survey of tools for variant analysis of next-generation genome sequencing data. Briefings Bioinform 15(2):256–278. https://doi.org/10.1093/bib/bbs086

    Article  Google Scholar 

  31. Picard (2019) Picard tools - by broad institute. http://broadinstitute.github.io/picard/. Accessed 2019 Mar 27

  32. Pihlstrom L, Rengmark A, Bjornara KA, Toft M (2014) Effective variant detection by targeted deep sequencing of dna pools: an example from parkinson’s disease. Ann Hum Genet 78:243–52. https://doi.org/10.1111/ahg.12060

    Article  CAS  PubMed  Google Scholar 

  33. Poplin R, Ruano-rubio V, Depristo MA, Fennell TJ, Carneiro MO, Auwera GAVD, Kling DE, Gauthier D, Levy-moonshine A, Roazen D, Shakir K (2017) Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv https://doi.org/10.1101/201178

  34. Popp B, Ekici AB, Thiel CT, Hoyer J, Wiesener A, Kraus C, Reis A, Zweier C (2017) Exome pool-seq in neurodevelopmental disorders. Europ J Hum Genet 25:1364–1376. https://doi.org/10.1038/s41431-017-0022-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. R-Project (2019) R: The r project for statistical computing. https://www.r-project.org/. Accessed 2019 Mar 05

  36. Raineri E, Ferretti L, Esteve-Codina A, Nevado B, Heath S, Pérez-Enciso M (2012) Snp calling by sequencing pooled samples. BMC Bioinformatics 13(1):239. https://doi.org/10.1186/1471-2105-13-239

    Article  PubMed  PubMed Central  Google Scholar 

  37. Schlötterer C, Tobler R, Kofler R, Nolte V (2014) Sequencing pools of individuals-mining genome-wide polymorphism data without big funding. Nat Rev Genet 15:749. https://doi.org/10.1038/nrg3803

    Article  CAS  PubMed  Google Scholar 

  38. Shyr D, Liu Q (2013) Next generation sequencing in cancer research and clinical application. Biol Procedures Online 15:4. https://doi.org/10.1186/1480-9222-15-4

    Article  CAS  Google Scholar 

  39. Theriault BL, Dimaras H, Gallie BL, Corson TW (2014) The genomic landscape of retinoblastoma: a review. Clin Exp Ophthalmol 42(1):33–52. https://doi.org/10.1111/ceo.12132

    Article  PubMed  Google Scholar 

  40. Tomar S, Sethi R, Sundar G, Quah TC, Quah BL, Lai PS (2017) Mutation spectrum of rb1 mutations in retinoblastoma cases from singapore with implications for genetic management and counselling. PloS One 12:e0178776. https://doi.org/10.1371/journal.pone.0178776

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Wang K, Li M, Hakonarson H (2010) Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucl Acids Res 38(20601685):e164–e164. https://doi.org/10.1093/nar/gkq603

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Wold B, Myers RM (2008) Sequence census methods for functional genomics. Nat Methods 5:19–21. https://doi.org/10.1038/nmeth1157

    Article  CAS  PubMed  Google Scholar 

  43. Yu X, Sun S (2013) Comparing a few snp calling algorithms using low-coverage sequencing data. BMC Bioinform 14(1):274. https://doi.org/10.1186/1471-2105-14-274

    Article  CAS  Google Scholar 

  44. Zhang J, Wu Y (2011) SVseq: an approach for detecting exact breakpoints of deletions with low-coverage sequence data. Bioinformatics 27(23):3228–3234. https://doi.org/10.1093/bioinformatics/btr563

    Article  CAS  PubMed  Google Scholar 

  45. Zhang J, Benavente CA, McEvoy J, Flores-Otero J, Ding L, Chen X, Ulyanov A, Wu G, Wilson M, Wang J, Brennan R, Rusch M, Manning AL, Ma J, Easton J, Shurtleff S, Mullighan C, Pounds S, Mukatira S, Gupta P, Neale G, Zhao D, Lu C, Fulton RS, Fulton LL, Hong X, Dooling DJ, Ochoa K, Naeve C, Dyson NJ, Mardis ER, Bahrami A, Ellison D, Wilson RK, Downing JR, Dyer MA (2012) A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 481(7381):329–334. https://doi.org/10.1038/nature10733

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Zhang J, Wang J, Wu Y (2012) An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC Bioinform 13(6):S6. https://doi.org/10.1186/1471-2105-13-S6-S6

    Article  Google Scholar 

Download references

Acknowledgments

The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hilal Kaya.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Data availability

Disease data are downloaded from The European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena/data/view/PRJEB6630) with accession number PRJEB6630. NA12878 low coverage sequencing data are downloaded from NCBI Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra) with accession number SRR622461. NA20355 low coverage sequencing data are downloaded from the data portal of 1000 Genomes Project (https://www.internationalgenome.org/data-portal/sample/NA20355) with accession number ERR251661 and ERR251662. Disease test data are available in the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-3515.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özdemir Özdoğan, G., Kaya, H. Next-Generation Sequencing Data Analysis on Pool-Seq and Low-Coverage Retinoblastoma Data. Interdiscip Sci Comput Life Sci 12, 302–310 (2020). https://doi.org/10.1007/s12539-020-00374-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-020-00374-8

Keywords

Navigation