Skip to main content

Optimal Sequence Alignment to ED-Strings

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13760))

Included in the following conference series:

  • 716 Accesses

Abstract

Partial Order Alignment (POA) was introduced by Lee et al. in 2002 to allow the alignment of a string to a graph-like structure representing a set of aligned strings (a Multiple Sequence Alignment, MSA). However, the POA edit transcript (the sequence of edit operations that describe the alignment) does not reflect the possible elasticity of the MSA (different gaps sizes in the aligned string), leaving room for possible misalignment and its propagation in progressive MSA. Elastic-Degenerate Strings (ED-strings) are strings that can represent the outcome of an MSA by highlighting gaps and variants as a list of strings that can differ in size and that can possibly include the empty string. In this paper, we define a method that optimally aligns a string to an ED-string, the latter compactly representing an MSA, overcoming the ambiguity in the POA edit transcript while maintaining its time and space complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In [14, 15] the size of the ED-string \(\widetilde{T}\) is defined in a slightly different way as the actual number of letters or empty strings that appear in \(\widetilde{T}\).

  2. 2.

    The method can generalize to the case of cost \(s>0\) for skips.

  3. 3.

    In an edit distance computation framework, the match has typically null cost in order to fulfill the metric requirement that a string has zero-distance to itself; for this reason in our examples we assume \(a=0\). However, since the the dynamic programming method we design also works when one wants to compute a similarity score rather than a distance (it suffices to adapt the penalty scores and seek the maximum instead of the minimum), then in our problem statement as well as in the recurrence formula that describe our algorithm, we parametrize the score of a match with a.

References

  1. Cisłak, A., Grabowski, S.: SOPanG2: online searching over a pan-genome without false positives. arXiv:2004.03033 [cs] (2020)

  2. Cisłak, A., Grabowski, S., Holub, J.: SOPanG: online text searching over a pan-genome. Bioinformatics 34(24), 4290–4292 (2018)

    Article  PubMed  Google Scholar 

  3. Loytynoja, A.L., Goldman, N.: An algorithm for progressive multiple alignment of sequences with insertions. Proc. Natl. Acad. Sci. 102(30), 10557–10562 (2005)

    Article  PubMed  PubMed Central  Google Scholar 

  4. Aoyama, K., Nakashima, Y., I, T., Inenaga, S., Bannai, H., Takeda, M.: Faster online elastic degenerate string matching. In: 29th Annual Symposium on Combinatorial Pattern Matching (CPM). LIPIcs, vol. 105 (2018)

    Google Scholar 

  5. Darby, C.A., Gaddipati, R., Schatz, M.C., Langmead, B.: Vargas: heuristic-free alignment for assessing linear and graph read aligners. Bioinformatics 36(12), 3712–3718 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Grasso, C., Lee, C.: Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20(10), 1546–1556 (2004)

    Article  CAS  PubMed  Google Scholar 

  7. Lee, C., Grasso, C., Sharlow, M.F.: Multiple sequence alignment using partial order graphs. Bioinformatics 18(3), 452–464 (2002)

    Article  CAS  PubMed  Google Scholar 

  8. The Computational Pan-Genomics Consortium: Computational Pan-Genomics: Status, Promises and Challenges. Brief. Bioinform. 19(1), 118–135 (2018)

    Google Scholar 

  9. Iliopoulos, C.S., Kundu, R., Pissis, S.P.: Efficient pattern matching in elastic-degenerate texts. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds.) LATA 2017. LNCS, vol. 10168, pp. 131–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53733-7_9

    Chapter  Google Scholar 

  10. Feng, D.-F., Doolittle, R.F.: Progressive sequence alignment as a prerequisitet to correct phylogenetic trees. J. Mol. Evol. 25(4), 351–360 (1987)

    Article  CAS  PubMed  Google Scholar 

  11. Higgins. D.G., Sharp, P.M.: CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1), 237–244 (1988)

    Google Scholar 

  12. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  Google Scholar 

  13. Birmelé, E., et al.: Efficient bubble enumeration in directed graphs. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 118–129. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34109-0_13

    Chapter  Google Scholar 

  14. Bernardini, G., Pisanti, N., Pissis, S.P., Rosone, G.: Pattern matching on elastic-degenerate text with errors. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 74–90. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_7

    Chapter  Google Scholar 

  15. Bernardini, G., Pisanti, N., Pissis, S.P., Rosone, G.: Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci. 812, 109–122 (2020)

    Article  Google Scholar 

  16. Bernardini, G,. Gawrychowski, P., Pisanti, N., Pissis, S.P., Rosone, G.: Even faster elastic-degenerate string matching via fast matrix multiplication. In: 46th International Colloquium on Automata, Languages, and Programming (ICALP). LIPIcs, vol. 132, pp. 21:1–21:15 (2019)

    Google Scholar 

  17. Bernardini, G., Gawrychowski, P., Pisanti, N., Pissis, S.P., Rosone, G.: Elastic-degenerate string matching via fast matrix multiplication. SIAM J. Comput. 51(3), 549–576 (2022)

    Article  Google Scholar 

  18. Li, H., Feng, X., Chu, C.: The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  19. Eizenga, J.M., et al.: Efficient dynamic variation graphs. Bioinformatics 36(21), 5139–5144 (2021)

    Article  PubMed  Google Scholar 

  20. Alzamel, M., et al.: Degenerate string comparison and applications. In: 18th International Workshop on Algorithms in Bioinformatics (WABI). LIPIcs, vol. 113, pp. 21:1–21:14 (2018)

    Google Scholar 

  21. Alzamel, M., et al.: Comparing degenerate strings. Fundamenta Informaticae 175(1–4), 41–58 (2020)

    Article  Google Scholar 

  22. Rautiainen, M., Marschall, T.: GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21, 253 (2020)

    Article  PubMed  PubMed Central  Google Scholar 

  23. Mwaniki, N.M. Garrison, E. Pisanti, N.: Fast exact string to d-texts alignments. CoRR, abs/2206.03242 (2022)

    Google Scholar 

  24. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1982)

    Article  CAS  PubMed  Google Scholar 

  25. Grossi, R., et al.: On-line pattern matching on similar texts. In: 28th Annual Symposium on Combinatorial Pattern Matching (CPM). LIPIcs, vol. 78, pp. 9:1–9:14 (2017)

    Google Scholar 

  26. Grossi, R., et al.: Circular sequence comparison: algorithms and applications. Algorithms Mol. Biol. 11, 12 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  27. Vaser, R., Sović, I., Nagarajan, N., Šikić, M.: Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27(5), 737–746 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  CAS  PubMed  Google Scholar 

  29. Carletti, V., Foggia, P., Garrison, E., Greco, L., Ritrovato, P., Vento, M.: Graph-based representations for supporting genome data analysis and visualization: opportunities and challenges. In: Conte, D., Ramel, J.-Y., Foggia, P. (eds.) GbRPR 2019. LNCS, vol. 11510, pp. 237–246. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20081-7_23

    Chapter  Google Scholar 

  30. Gao, Y., Liu, Y., Ma, Y., Liu, B., Wang, Y., Xing, Y.: abPOA: an SIMD-based C library for fast partial order alignment using adaptive band. bioRxiv (2020)

    Google Scholar 

Download references

Acknowledgment

This work is part of the ALPACA project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 956229.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadia Pisanti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mwaniki, N.M., Pisanti, N. (2022). Optimal Sequence Alignment to ED-Strings. In: Bansal, M.S., Cai, Z., Mangul, S. (eds) Bioinformatics Research and Applications. ISBRA 2022. Lecture Notes in Computer Science(), vol 13760. Springer, Cham. https://doi.org/10.1007/978-3-031-23198-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23198-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23197-1

  • Online ISBN: 978-3-031-23198-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics