OMGS: Optical Map-Based Genome Scaffolding
Due to the current limitations of sequencing technologies, de novo genome assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. While scaffolding is computationally easier than sequence assembly, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and inaccuracies in the linkage information. Genome scaffolding tools either use paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) as linkage information. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others). However, the most commonly used scaffolding tools have a serious limitation: they can only deal with one optical map at a time, forcing users to alternate or iterate over multiple maps. In this paper, we introduce a novel scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map. OMGS can be obtained from https://github.com/ucrbioinfo/OMGS.
KeywordsDe novo genome assembly Scaffolding Optical maps Combinatorial optimization
This work was supported in part by National Science Foundation grants IIS-1814359, IOS-1543963, IIS-1526742 and IIS-1646333, the Natural Science Foundation of China grant 61772197 and the National Key Research and Development Program of China grant 2018YFC0910404.
- 2.Baharev, A., Schichl, H., Neumaier, A., Achterberg, T.: An exact method for the minimum feedback arc set problem, vol. 10, pp. 35–60. University of Vienna (2015)Google Scholar
- 12.Gao, S., Nagarajan, N., Sung, W.-K.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 437–451. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20036-6_40CrossRefGoogle Scholar
- 26.Pan, W., Lonardi, S.: Accurate detection of chimeric contigs via bionano optical maps. Bioinformatics (2018, in press)Google Scholar
- 39.Ye, C., Hill, C.M., Wu, S., Ruan, J., Ma, Z.S.: DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep. 6 (2016). Article number: 31900Google Scholar
- 40.Zheng, J., Lonardi, S.: Discovery of repetitive patterns in DNA with accurate boundaries. In: Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE 2005), pp. 105–112, October 2005Google Scholar