Sprites2: Detection of Deletions Based on an Accurate Alignment Strategy
Since humans are diploid organisms, homozygous and heterozygous deletions are ubiquitous in the human genome. How to distinguish homozygous and heterozygous deletions is an important issue for current structural variation detection tools. Additionally, due to the problems of sequencing errors, micro-homologies and micro-insertions, breakpoint locations identified with common alignment tools which use greedy strategy may not be the true deletion locations, and usually lead to false structural variation detections. In this paper, we propose a deletion detection method called Sprites2. Comparing with Sprites, Sprites2 adds the following novel function modules: (1) Sprites2 takes advantage of the variance of insert size distribution to determine the type of deletions which can enhance the accuracy of deletion calls; (2) Sprites2 uses a novel alignment strategy based on AGE (one algorithm aligning 5’ and 3’ ends between two sequences simultaneously) to locate breakpoints which can solve the problems introduced by sequencing errors, micro-homologies and micro-insertions. For testing the performance of Sprites2, simulated and real datasets are used in our experiments, and some popular structural variation detection tools are compared with Sprites2. The experimental results show that Sprites2 can improve deletion detection performance. Sprites2 is publicly available at https://github.com/zhangzhen/sprites2.
KeywordsStructural variation Deletion detection Alignment strategy Sequence analysis
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61732009, No. 61622213, No. 61728211, No. 61772552, No. 61772557 and No. 61602156.