Orientation of Ordered Scaffolds

Aganezov, Sergey; Alekseyev, Max A.

doi:10.1007/978-3-319-67979-2_10

Orientation of Ordered Scaffolds

Sergey Aganezov^15,16 &
Max A. Alekseyev¹⁷

Conference paper
First Online: 15 September 2017

946 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Abstract

Despite the recent progress in genome sequencing and assembly, many of the currently available assembled genomes come in a draft form. Such draft genomes consist of a large number of genomic fragments (scaffolds), whose order and/or orientation (i.e., strand) in the genome are unknown. There exist various scaffold assembly methods, which attempt to determine the order and orientation of scaffolds along the genome chromosomes. Some of these methods (e.g., based on FISH physical mapping, chromatin conformation capture, etc.) can infer the order of scaffolds, but not necessarily their orientation. This leads to a special case of the scaffold orientation problem (i.e., deducing the orientation of each scaffold) with a known order of the scaffolds.

We address the problem of orientation of ordered scaffolds as an optimization problem based on given weighted orientations of scaffolds and their pairs (e.g., coming from pair-end sequencing reads, long reads, or homologous relations). We formalize this problem within the earlier introduced framework for comparative analysis and merging of scaffold assemblies (CAMSA). We prove that this problem is \(\mathsf {NP}\)-hard, and further present a polynomial-time algorithm for solving its special case, where orientation of each scaffold is imposed relatively to at most two other scaffolds. This lays the foundation for a follow-up FPT algorithm for the general case. The proposed algorithms are implemented in the CAMSA software version 2.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
We remark that contigs can be viewed as a special type of scaffolds with no gaps.
2.
It can be easily seen that a realization of \(\mathbb {A}\) may exist only if \(\mathbb {A}\) is proper.
3.
\(\deg (v)\) denotes the degree of a vertex v, i.e., the number of edges (counted with multiplicity) incident to v.
4.
More generally, \(\mathbb {O}\) may be a multiset whose elements have real positive multiplicities (weights).
5.
We remind that a vertex is articulation if its removal from the graph increases the number of connected components.

References

Aganezov, S., Alekseyev, M.A.: Multi-genome scaffold co-assembly based on the analysis of gene orders and genomic repeats. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds.) ISBRA 2016. LNCS, vol. 9683, pp. 237–249. Springer, Cham (2016). doi:10.1007/978-3-319-38782-6_20
Google Scholar
Aganezov, S.S., Alekseyev, M.A.: CAMSA: A Tool for Comparative Analysis and Merging of Scaffold Assemblies. Preprint bioRrxiv:10.1101/069153 (2016)
Anselmetti, Y., Berry, V., Chauve, C., Chateau, A., Tannier, E., Bérard, S.: Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genom. 16(Suppl 10), S11 (2015)
Article Google Scholar
Assour, L.A., Emrich, S.J.: Multi-genome synteny for assembly improvement multi-genome synteny for assembly improvement. In: Proceedings of 7th International Conference on Bioinformatics and Computational Biology, pp. 193–199 (2015)
Google Scholar
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)
Article MathSciNet Google Scholar
Bashir, A., Klammer, A.A., Robins, W.P., Chin, C.S., Webster, D., Paxinos, E., Hsu, D., Ashby, M., Wang, S., Peluso, P., Sebra, R., Sorenson, J., Bullard, J., Yen, J., Valdovino, M., Mollova, E., Luong, K., Lin, S., LaMay, B., Joshi, A., Rowe, L., Frace, M., Tarr, C.L., Turnsek, M., Davis, B.M., Kasarskis, A., Mekalanos, J.J., Waldor, M.K., Schadt, E.E.: A hybrid approach for the automated finishing of bacterial genomes. Nat. Biotech. 30(7), 701–707 (2012)
Article Google Scholar
Bazgan, C., Paschos, V.T.: Differential approximation for optimal satisfiability and related problems. Eur. J. Oper. Res. 147(2), 397–404 (2003)
Article MathSciNet MATH Google Scholar
Bentley, J.L., Haken, D., Saxe, J.B.: A general method for solving divide-and-conquer recurrences. ACM SIGACT News 12(3), 36–44 (1980)
Article MATH Google Scholar
Bodily, P.M., Fujimoto, M.S., Snell, Q., Ventura, D., Clement, M.J.: ScaffoldScaffolder: solving contig orientation via bidirected to directed graph reduction. Bioinformatics 32(1), 17–24 (2015)
Google Scholar
Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., Pirovano, W.: Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27(4), 578–579 (2011)
Article Google Scholar
Boetzer, M., Pirovano, W.: SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinf. 15(1), 211 (2014)
Article Google Scholar
Burton, J.N., Adey, A., Patwardhan, R.P., Qiu, R., Kitzman, J.O., Shendure, J.: Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31(12), 1119–1125 (2013)
Article Google Scholar
Chen, Z.Z., Harada, Y., Guo, F., Wang, L.: Approximation algorithms for the scaffolding problem and its generalizations. Theoret. Comput. Sci. (2017). http://www.sciencedirect.com/science/article/pii/S0304397517302815
Dayarian, A., Michael, T.P., Sengupta, A.M.: SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinf. 11, 345 (2010)
Article Google Scholar
Escoffier, B., Paschos, V.T.: Differential approximation of min sat, max sat and related problems. Eur. J. Oper. Res. 181(2), 620–633 (2007)
Article MathSciNet MATH Google Scholar
Gao, S., Nagarajan, N., Sung, W.-K.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 437–451. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20036-6_40
Chapter Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide To The Theory of Np-completeness, vol. 58. Freeman, San Francisco (1979)
MATH Google Scholar
Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete graph problems. Theoret. Comput. Sci. 1(3), 237–267 (1976)
Article MathSciNet MATH Google Scholar
Gritsenko, A.A., Nijkamp, J.F., Reinders, M.J.T., de Ridder, D.: GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics 28(11), 1429–1437 (2012)
Article Google Scholar
Hunt, M., Newbold, C., Berriman, M., Otto, T.D.: A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 15(3), R42 (2014)
Article Google Scholar
Jiao, W.B., Garcia Accinelli, G., Hartwig, B., Kiefer, C., Baker, D., Severing, E., Willing, E.M., Piednoel, M., Woetzel, S., Madrid-Herrero, E., Huettel, B., Hümann, U., Reinhard, R., Koch, M.A., Swan, D., Clavijo, B., Coupland, G., Schneeberger, K.: Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data. Genome Res. 27(5), 116 (2017)
Article Google Scholar
Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13(1–2), 7–51 (1995)
Article MathSciNet MATH Google Scholar
Kolmogorov, M., Armstrong, J., Raney, B.J., Streeter, I., Dunn, M., Yang, F., Odom, D., Flicek, P., Keane, T., Thybert, D., Paten, B., Pham, S.: Chromosome assembly of large and complex genomes using multiple references. Preprint bioRxiv:10.1101/088435 (2016)
Koren, S., Treangen, T.J., Pop, M.: Bambus 2: scaffolding metagenomes. Bioinformatics 27(21), 2964–2971 (2011)
Article Google Scholar
Lam, K.K., Labutti, K., Khalak, A., Tse, D.: FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads. Bioinformatics 31(19), 3207–3209 (2015)
Article Google Scholar
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., Wang, J.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)
Article Google Scholar
Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)
Article Google Scholar
Pop, M., Kosack, D.S., Salzberg, S.L.: Hierarchical scaffolding with Bambus. Genome Res. 14(1), 149–159 (2004)
Article Google Scholar
Putnam, N.H., O’Connell, B.L., Stites, J.C., Rice, B.J., Blanchette, M., Calef, R., Troll, C.J., Fields, A., Hartley, P.D., Sugnet, C.W., Haussler, D., Rokhsar, D.S., Green, R.E.: Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26(3), 342–350 (2016)
Article Google Scholar
Reyes-Chin-Wo, S., Wang, Z., Yang, X., Kozik, A., Arikit, S., Song, C., Xia, L., Froenicke, L., Lavelle, D.O., Truco, M.J., Xia, R., Zhu, S., Xu, C., Xu, H., Xu, X., Cox, K., Korf, I., Meyers, B.C., Michelmore, R.W.: Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, Article no. 14953 (2017). https://www.nature.com/articles/ncomms14953
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Article Google Scholar
Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C., Schnable, P.S., Lyons, E., Lu, J.: ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16(1), 3 (2015)
Article Google Scholar
Warren, R.L., Yang, C., Vandervalk, B.P., Behsaz, B., Lagman, A., Jones, S.J.M., Birol, I.: LINKS: scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience 4(1), 35 (2015)
Article Google Scholar
Zimin, A.V., Smith, D.R., Sutton, G., Yorke, J.A.: Assembly reconciliation. Bioinformatics 24(1), 42–45 (2008)
Article Google Scholar

Download references

Acknowledgements

The authors thank the anonymous reviewers for their suggestions and comments that helped to improve the exposition.

The work is supported by the National Science Foundation under the grant No. IIS-1462107. The work of SA is also partially supported by the National Science Foundation under the grant No. CCF-1053753 and by the National Institute of Health under the grant No. U24CA211000.

Author information

Authors and Affiliations

Princeton University, Princeton, NJ, USA
Sergey Aganezov
ITMO University, St. Petersburg, Russia
Sergey Aganezov
The George Washington University, Washington, DC, USA
Max A. Alekseyev

Authors

Sergey Aganezov
View author publications
You can also search for this author in PubMed Google Scholar
Max A. Alekseyev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Aganezov .

Editor information

Editors and Affiliations

University of Campinas, Campinas, São Paulo, Brazil
Joao Meidanis
Rice University, Houston, Texas, USA
Luay Nakhleh

Appendix. Pseudocodes

In the algorithms below we do not explicitly describe the function OrConsCount, which takes 4 arguments:

1.
a subgraph c from \({{\mathrm{\mathsf {COG}}}}(\mathbb {O}_o)\) with 1 or 2 vertices;
2.
a hash table so with scaffolds as keys and their orientations as values;
3.
a set of orientation imposing assembly points \(\mathbb {O}_o\);
4.
an assembly \(\mathbb {A}\)

and counts the assembly points from \(\mathbb {O}\) that have consistent orientation with \(\mathbb {A}\) in the case where scaffold(s) corresponding to vertices from c were to have orientation from so in \(\mathbb {A}\). With simple hash-table based preprocessing of \(\mathbb {A}\) and \(\mathbb {O}\) (can be done in \(\mathcal {O}\left( k\log (k)\right) \) time, where \(k=\max \{|\mathbb {O}|, \mathbb {S}(\mathbb {A})\}\)) this function runs in \(\mathcal {O}\left( n\right) \) time, where n is a number of assembly points in \(\mathbb {O}\) involving scaffolds that correspond to vertices in c. So, total running time for all invocations of this function will be \(\mathcal {O}\left( |\mathbb {O}|\right) \).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aganezov, S., Alekseyev, M.A. (2017). Orientation of Ordered Scaffolds. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-67979-2_10
Published: 15 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix. Pseudocodes

Appendix. Pseudocodes

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation