Next generation sequencing produces an ever increasing amount of data, requiring increasingly fast computing infrastructures to keep up. We present GNATY, a collection of tools for NGS data analysis, aimed at optimizing parts of the sequence analysis process to reduce the hardware requirements. The tools are developed with efficiency in mind, using multithreading and other techniques to speed up the analysis. The architecture has been verified by implementing a variant caller based on the Varscan 2 variant calling model, achieving a speedup of nearly 18 times. Additionally, the flexibility of the algorithm is also demonstrated by applying it to coverage analysis. Compared to BEDtools 2 the same analysis results were found but in only half the time by GNATY. The speed increase allows for a faster data analysis and more flexibility to analyse the same sample using multiple settings. The software is freely available for non-commercial usage at http://gnaty.phenosystems.com/.
Next generation sequencing Variant calling Algorithmics
This is a preview of subscription content, log in to check access.
The authors thank Phenosystems SA for the opportunity to release part of their software for free.
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings Bioinform. 11(5), 473–483 (2010)CrossRefGoogle Scholar
Wolf, B., et al.: DNAseq workflow in a diagnostic context and an example of a user friendly implementation. BioMed Res. Int. 2015, 11 (2015). Article ID 403497CrossRefGoogle Scholar
Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinform. (Oxford, England) 25(16), 2078–2079 (2009)CrossRefGoogle Scholar
DePristo, M.A., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)CrossRefGoogle Scholar
Koboldt, D.C., et al.: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–576 (2012)CrossRefGoogle Scholar
Warden, C.D., et al.: Detailed comparison of two popular variant calling packages for exome and targeted exon studies. PeerJ 2, 600 (2014)CrossRefGoogle Scholar
Quinlan, A.R., et al.: A flexible suite of utilities for comparing genomic features. Bioinformatics 26(6), 841–842 (2010)CrossRefGoogle Scholar
Thorvaldsdottir, H., et al.: Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings Bioinform. 14(2), 178–192 (2013)CrossRefGoogle Scholar
Highnam, G., et al.: An analytical framework for optimizing variant discovery from personal genomes. Nat. Commun. 6, 16 (2015)CrossRefGoogle Scholar
Li, H., et al.: Fast and accurate long-read alignment with burrows-wheeler transform. Bioinform. (Oxford, England) 26(5), 589–595 (2010)CrossRefGoogle Scholar
O’Rawe, J., et al.: Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5(3), 28 (2013)CrossRefGoogle Scholar