Identification of Factors that Affect Reproducibility of Mutation Calling Methods in Data Originating from the Next-Generation Sequencing

  • Roman JaksikEmail author
  • Krzysztof Psiuk-Maksymowicz
  • Andrzej Swierniak
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 935)


Identification of somatic mutations, based on data from next-generation sequencing of the DNA, has become one of the fundamental research strategies in oncology, with the goal to seek mechanisms underlying the process of carcinogenesis and resistance to commonly used therapies. Despite significant advances in the development of sequencing methods and data processing algorithms, the reproducibility of experiments is relatively low and depending significantly on the methods used to identify changes in the structure of the DNA. This is mainly due to the influence of three factors: (1) high heterogeneity of tumors due to which some mutations are characteristic for a small number of cells, (2) bias associated with the process of exome isolation and (3) specificity of data pre-processing strategies.

The aim of the work was to determine the impact of these factors on the identification of somatic mutations, allowing to determine the reasons for low reproducibility in such studies.


Bioinformatic data analysis Next-generation sequencing Whole exome sequencing Somatic mutations Reproducibility of results 



This work was partially supported by the National Centre for Research and Development grant No. Strategmed2/267398/4/NCBR/2015 (KPM), the National Science Centre grant No. 2016/23/D/ST7/03665 (RJ), and by internal grant of Institute of Automatic Control BK-204/RAu1/2017 (AS).

Calculations were carried out by means of the infrastructure of the Ziemowit computer cluster ( in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.


  1. 1.
    Luo, J., Wu, M., Gopukumar, D., Zhao, Y.: Big data application in biomedical research and health care: a literature review. Biomed. Inf. Insights 8, 1–10 (2016)Google Scholar
  2. 2.
    Bensz, W., et al.: Integrated System supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems. SCI, vol. 642, pp. 399–409. Springer, Cham (2016). Scholar
  3. 3.
    Psiuk-Maksymowicz, K., et al.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 449–462. Springer, Cham (2016). Scholar
  4. 4.
    Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Cech, M., Chilton, J.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44(W1), W3–W10 (2016)CrossRefGoogle Scholar
  5. 5.
    Psiuk-Maksymowicz, K., Mrozek, D., Jaksik, R., Borys, D., Fujarewicz, K., Swierniak, A.: Scalability of a genomic data analysis in the biotest platform. In: Nguyen, N.T., Tojo, S., Nguyen, L.M., Trawiński, B. (eds.) ACIIDS 2017. LNCS (LNAI), vol. 10192, pp. 741–752. Springer, Cham (2017). Scholar
  6. 6.
    Gruca, A., Jaksik, R., Psiuk-Maksymowicz, K.: Functional interpretation of gene sets: semantic-based clustering of gene ontology terms on the biotest platform. In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 125–136. Springer, Cham (2018). Scholar
  7. 7.
    Gerlinger, M., Rowan, A.J., Horswell, S., Larkin, J., Endesfelder, D., Gronroos, E., Martinez, P., Matthews, N.: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012)CrossRefGoogle Scholar
  8. 8.
    Shi, W., Ng, C.K.Y., Lim, R.S., Jiang, T., Kumar, S., Li, X., Wali, V.B., Piscuoglio, S., Gerstein, M.B., Chagpar, A.B., Weigelt, B., Pusztai, L., Reis-Filho, J.S., Hatzis, C.: Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. bioRxiv (2018)Google Scholar
  9. 9.
    Derryberry, D.Z., Cowperthwaite, M.C., Wilke, C.O.: Reproducibility of SNV-calling in multiple sequencing runs from single tumors. PeerJ 4, e1508 (2016)CrossRefGoogle Scholar
  10. 10.
    Qi, Y., Liu, X., Liu, C., Wang, B., Hess, K.R., Symmans, W.F., Shi, W., Pusztai, L.: Reproducibility of variant calls in replicate next generation sequencing experiments. PLoS One 7, e0119230 (2015)CrossRefGoogle Scholar
  11. 11.
    Meynert, A.M., Ansari, M., FitzPatrick, D.R., Taylor, M.S.: Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinform. 15, 247 (2014)CrossRefGoogle Scholar
  12. 12.
    Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. p. arXiv:1303.3997 (2013)
  13. 13.
    Cibulskis, C., Lawrence, M.S., Carter, S.L., Sivachenko, A., Jaffe, D., Sougnez, C., Gabriel, S., Meyerson, M., Lander, E.S., Getz, G.: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013)CrossRefGoogle Scholar
  14. 14.
    Metzker, M.L.: Sequencing technologies – the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)CrossRefGoogle Scholar
  15. 15.
    McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensembl variant effect predictor. Genome Biol 17(1), 122 (2016)CrossRefGoogle Scholar
  16. 16.
    Jaksik, R., Marczyk, M., Polanska, J., Rzeszowska-Wolny, J.: Sources of high variance between probe signals in affymetrix short oligonucleotide microarrays. Sensors 14, 532–548 (2014)CrossRefGoogle Scholar
  17. 17.
    Vissers, L., van Nimwegen, K., Schieving, J., Kamsteeg, E., Kleefstra, T., Yntema, H., Pfundt, R., van der Wilt, G.J., Krabbenborg, L., Brunner, H., van der Burg, S., Grutters, J., Veltman, J., Willemsen, M.: A clinical utility study of exome sequencing versus conventional genetic testing in pediatric neurology. Genet. Med. 19, 1055–1063 (2017)CrossRefGoogle Scholar
  18. 18.
    Bamshad, M.J., Ng, S.B., Bigham, A.W., Tabor, H.K., Emond, M.J., Nickerson, D.A., Shendure, J.: Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute of Automatic Control, Silesian University of TechnologyGliwicePoland
  2. 2.Biotechnology Centre, Silesian University of TechnologyGliwicePoland

Personalised recommendations