Multi-genome Scaffold Co-assembly Based on the Analysis of Gene Orders and Genomic Repeats

  • Sergey Aganezov
  • Max A. Alekseyev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9683)


Advances in the DNA sequencing technology over the past decades have increased the volume of raw sequenced genomic data available for further assembly and analysis. While there exist many software tools for assembly of sequenced genomic material, they often experience difficulties with reconstructing complete chromosomes. Major obstacles include uneven read coverage and long similar subsequences (repeats) in genomes. Assemblers therefore often are able to reliably reconstruct only long subsequences, called scaffolds.

We present a method for simultaneous co-assembly of all fragmented genomes (represented as collections of scaffolds rather than chromosomes) in a given set of annotated genomes. The method is based on the analysis of gene orders and relies on the evolutionary model, which includes genome rearrangements as well as gene insertions and deletions. It can also utilize information about genomic repeats and the phylogenetic tree of the given genomes, further improving their assembly quality.


Genome assembly Scaffolding Gene order 


  1. 1.
    Aganezov, S., Sydtnikova, N., AGC Consortium, Alekseyev, M.A.: Scaffold assembly based on genome rearrangement analysis. Comput. Biol. Chem. 57, pp. 46–53 (2015)Google Scholar
  2. 2.
    Alekseyev, M.A., Pevzner, P.A.: Breakpoint graphs and ancestral genome reconstructions. Genome Res. 19(5), 943–957 (2009)CrossRefGoogle Scholar
  3. 3.
    Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 16(Suppl. 10), S11 (2015)CrossRefGoogle Scholar
  4. 4.
    Assour, L., Emrich, S.: Multi-genome synteny for assembly improvement. In: Proceedings of 7th International Conference on Bioinformatics and Computational Biology, pp. 193–199 (2015)Google Scholar
  5. 5.
    Avdeyev, P., Jiang, S., Aganezov, S., Hu, F., Alekseyev, M.A.: Reconstruction of ancestral genomes in presence of gene gain and loss. J. Comput. Biol. 23(3), 1–15 (2016)CrossRefGoogle Scholar
  6. 6.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nat. Rev. Genet. 7(2), 85–97 (2006)CrossRefGoogle Scholar
  8. 8.
    Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., et al.: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. 108(4), 1513–1518 (2011)CrossRefGoogle Scholar
  9. 9.
    Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)CrossRefGoogle Scholar
  10. 10.
    Kasprzyk, A.: BioMart: driving a paradigm change in biological data management. Database 2011, bar049 (2011)CrossRefGoogle Scholar
  11. 11.
    Megy, K., Emrich, S.J., Lawson, D., Campbell, D., Dialynas, E., Hughes, D.S., Koscielny, G., Louis, C., MacCallum, R.M., Redmond, S.N., et al.: VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics. Nucleic Acids Res. 40(D1), D729–D734 (2012)CrossRefGoogle Scholar
  12. 12.
    Neafsey, D.E., Waterhouse, R.M., Abai, M.R., Aganezov, S.S., Alekseyev, M.A., et al.: Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347(6217), 1258522 (2015)CrossRefGoogle Scholar
  13. 13.
    Smit, A., Hubley, R., Green, P.: RepeatMasker Open-3.0 (1996–2010).
  14. 14.
    The GFA Format Specification Working Group: Graphical Fragment Assembly (GFA) Format Specification.

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.The George Washington UniversityWashington, DCUSA
  2. 2.ITMO UniversitySt. PetersburgRussia

Personalised recommendations