Orientation of Ordered Scaffolds

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)

Abstract

Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds.

We address the problem of orientation of ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem within the earlier introduced framework for comparative analysis and merging of scaffold assemblies (CAMSA). We prove that this problem is \(\mathsf {NP}\)-hard, and further present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. This lays the foundation for a follow-up FPT algorithm for the general case. The proposed algorithms are implemented in the CAMSA software version 2.

Keywords

Genome assembly Genome scaffolding Scaffold orientation Computational complexity Algorithms 

Notes

Acknowledgements

The authors thank the anonymous reviewers for their suggestions and comments that helped to improve the exposition.

The work is supported by the National Science Foundation under the grant No. IIS-1462107. The work of SA is also partially supported by the National Science Foundation under the grant No. CCF-1053753 and by the National Institute of Health under the grant No. U24CA211000.

References

  1. 1.
    Aganezov, S., Alekseyev, M.A.: Multi-genome scaffold co-assembly based on the analysis of gene orders and genomic repeats. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds.) ISBRA 2016. LNCS, vol. 9683, pp. 237–249. Springer, Cham (2016). doi:10.1007/978-3-319-38782-6_20 Google Scholar
  2. 2.
    Aganezov, S.S., Alekseyev, M.A.: CAMSA: A Tool for Comparative Analysis and Merging of Scaffold Assemblies. Preprint bioRrxiv:10.1101/069153 (2016)
  3. 3.
    Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genom. 16(Suppl 10), S11 (2015)CrossRefGoogle Scholar
  4. 4.
    Assour, L.A., Emrich, S.J.: Multi-genome synteny for assembly improvement multi-genome synteny for assembly improvement. In: Proceedings of 7th International Conference on Bioinformatics and Computational Biology, pp. 193–199 (2015)Google Scholar
  5. 5.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bashir, A., Klammer, A.A., Robins, W.P., Chin, C.S., Webster, D., Paxinos, E., Hsu, D., Ashby, M., Wang, S., Peluso, P., Sebra, R., Sorenson, J., Bullard, J., Yen, J., Valdovino, M., Mollova, E., Luong, K., Lin, S., LaMay, B., Joshi, A., Rowe, L., Frace, M., Tarr, C.L., Turnsek, M., Davis, B.M., Kasarskis, A., Mekalanos, J.J., Waldor, M.K., Schadt, E.E.: A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotech. 30(7), 701–707 (2012)CrossRefGoogle Scholar
  7. 7.
    Bazgan, C., Paschos, V.T.: Differential approximation for optimal satisfiability and related problems. Eur. J. Oper. Res. 147(2), 397–404 (2003)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Bentley, J.L., Haken, D., Saxe, J.B.: A general method for solving divide-and-conquer recurrences. ACM SIGACT News 12(3), 36–44 (1980)CrossRefMATHGoogle Scholar
  9. 9.
    Bodily, P.M., Fujimoto, M.S., Snell, Q., Ventura, D., Clement, M.J.: ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction. Bioinformatics 32(1), 17–24 (2015)Google Scholar
  10. 10.
    Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., Pirovano, W.: Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4), 578–579 (2011)CrossRefGoogle Scholar
  11. 11.
    Boetzer, M., Pirovano, W.: SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinf. 15(1), 211 (2014)CrossRefGoogle Scholar
  12. 12.
    Burton, J.N., Adey, A., Patwardhan, R.P., Qiu, R., Kitzman, J.O., Shendure, J.: Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31(12), 1119–1125 (2013)CrossRefGoogle Scholar
  13. 13.
    Chen, Z.Z., Harada, Y., Guo, F., Wang, L.: Approximation algorithms for the scaffolding problem and its generalizations. Theoret. Comput. Sci. (2017). http://www.sciencedirect.com/science/article/pii/S0304397517302815
  14. 14.
    Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinf. 11, 345 (2010)CrossRefGoogle Scholar
  15. 15.
    Escoffier, B., Paschos, V.T.: Differential approximation of min sat, max sat and related problems. Eur. J. Oper. Res. 181(2), 620–633 (2007)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Gao, S., Nagarajan, N., Sung, W.-K.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 437–451. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20036-6_40 CrossRefGoogle Scholar
  17. 17.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide To The Theory of Np-completeness, vol. 58. Freeman, San Francisco (1979)MATHGoogle Scholar
  18. 18.
    Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete graph problems. Theoret. Comput. Sci. 1(3), 237–267 (1976)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)CrossRefGoogle Scholar
  20. 20.
    Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)CrossRefGoogle Scholar
  21. 21.
    Jiao, W.B., Garcia Accinelli, G., Hartwig, B., Kiefer, C., Baker, D., Severing, E., Willing, E.M., Piednoel, M., Woetzel, S., Madrid-Herrero, E., Huettel, B., Hümann, U., Reinhard, R., Koch, M.A., Swan, D., Clavijo, B., Coupland, G., Schneeberger, K.: Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 27(5), 116 (2017)CrossRefGoogle Scholar
  22. 22.
    Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13(1–2), 7–51 (1995)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Kolmogorov, M., Armstrong, J., Raney, B.J., Streeter, I., Dunn, M., Yang, F., Odom, D., Flicek, P., Keane, T., Thybert, D., Paten, B., Pham, S.: Chromosome assembly of large and complex genomes using multiple references. Preprint bioRxiv:10.1101/088435 (2016)
  24. 24.
    Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)CrossRefGoogle Scholar
  25. 25.
    Lam, K.K., Labutti, K., Khalak, A., Tse, D.: FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics 31(19), 3207–3209 (2015)CrossRefGoogle Scholar
  26. 26.
    Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., Wang, J.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)CrossRefGoogle Scholar
  27. 27.
    Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)CrossRefGoogle Scholar
  28. 28.
    Pop, M., Kosack, D.S., Salzberg, S.L.: Hierarchical scaffolding with Bambus. Genome Res. 14(1), 149–159 (2004)CrossRefGoogle Scholar
  29. 29.
    Putnam, N.H., O’Connell, B.L., Stites, J.C., Rice, B.J., Blanchette, M., Calef, R., Troll, C.J., Fields, A., Hartley, P.D., Sugnet, C.W., Haussler, D., Rokhsar, D.S., Green, R.E.: Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26(3), 342–350 (2016)CrossRefGoogle Scholar
  30. 30.
    Reyes-Chin-Wo, S., Wang, Z., Yang, X., Kozik, A., Arikit, S., Song, C., Xia, L., Froenicke, L., Lavelle, D.O., Truco, M.J., Xia, R., Zhu, S., Xu, C., Xu, H., Xu, X., Cox, K., Korf, I., Meyers, B.C., Michelmore, R.W.: Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, Article no. 14953 (2017). https://www.nature.com/articles/ncomms14953
  31. 31.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  32. 32.
    Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C., Schnable, P.S., Lyons, E., Lu, J.: ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16(1), 3 (2015)CrossRefGoogle Scholar
  33. 33.
    Warren, R.L., Yang, C., Vandervalk, B.P., Behsaz, B., Lagman, A., Jones, S.J.M., Birol, I.: LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience 4(1), 35 (2015)CrossRefGoogle Scholar
  34. 34.
    Zimin, A.V., Smith, D.R., Sutton, G., Yorke, J.A.: Assembly reconciliation. Bioinformatics 24(1), 42–45 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Princeton UniversityPrincetonUSA
  2. 2.ITMO UniversitySt. PetersburgRussia
  3. 3.The George Washington UniversityWashington, DCUSA

Personalised recommendations