Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree

Cazaux, Bastien; Lecroq, Thierry; Rivals, Eric

doi:10.1007/978-3-319-15579-1_8

Bastien Cazaux¹⁷,
Thierry Lecroq¹⁸ &
Eric Rivals¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8977))

Included in the following conference series:

International Conference on Language and Automata Theory and Applications

1421 Accesses

Abstract

In the life sciences, determining the sequence of bio-molecules is essential step towards the understanding of their functions and interactions inside an organism. Powerful technologies allows to get huge quantities of short sequencing reads that need to be assemble to infer the complete target sequence. These constraints favour the use of a version de Bruijn Graph (DBG) dedicated to assembly. The de Bruijn Graph is usually built directly from the reads, which is time and space consuming. Given a set \(R\) of input words, well-known data structures, like the generalised suffix tree, can index all the substrings of words in \(R\). In the context of DBG assembly, only substrings of length \(k+1\) and some of length \(k\) are useful. A truncated version of the suffix tree can index those efficiently. As indexes are exploited for numerous purposes in bioinformatics, as read cleaning, filtering, or even analysis, it is important to enable the community to reuse an existing index to build the DBG directly from it. In an earlier work we provided the first algorithms when starting from a suffix tree or suffix array. Here, we exhibit an algorithm that exploits a reduced version of the truncated suffix tree and computes the DBG from it. Importantly, a variation of this algorithm is also shown to compute the contracted DBG, which offers great benefits in practice. Both algorithms are linear in time and space in the size of the output.

This work is supported by ANR Colib’read (http://colibread.inria.fr) (ANR-12-BS02-0008) and by Défi MASTODONS SePhHaDe (http://www.lirmm.fr/mastodons) from CNRS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apostolico, A.: The myriad virtues of suffix trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO Advanced Science Institutes. Series F, vol. 12, pp. 85–96. Springer (1985)
Google Scholar
Bowe, A., Onodera, T., Sadakane, K., Shibuya, T.: Succinct de Bruijn graphs. In: Raphael, B., Tang, J. (eds.) WABI 2012. LNCS, vol. 7534, pp. 225–235. Springer, Heidelberg (2012)
Chapter Google Scholar
de Bruijn, N.: On bases for the set of integers. Publ. Math. Debr. 1, 232–242 (1950)
MATH Google Scholar
Cazaux, B., Lecroq, T., Rivals, E.: From indexing data structures to de Bruijn graphs. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 89–99. Springer, Heidelberg (2014)
Chapter Google Scholar
Chikhi, R., Rizk, G.: Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology 8, 22 (2013)
Article Google Scholar
Conway, T.C., Bromage, A.J.: Succinct data structures for assembling large genomes. Bioinformatics 27(4), 479–486 (2011)
Article Google Scholar
Golovnev, A., Kulikov, A.S., Mihajlin, I.: Approximating shortest superstring problem using de Bruijn graphs. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 120–129. Springer, Heidelberg (2013)
Chapter Google Scholar
Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
McCreight, E.: A space-economical suffix tree construction algorithm. J. of Association for Computing Machinery 23(2), 262–272 (1976)
Article MATH MathSciNet Google Scholar
Na, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304(1–3), 87–101 (2003)
Article MATH MathSciNet Google Scholar
Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA – A practical iterative de Bruijn graph de novo assembler. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 426–440. Springer, Heidelberg (2010)
Chapter Google Scholar
Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98(17), 9748–9753 (2001)
Article MATH MathSciNet Google Scholar
Philippe, N., Salson, M., Commes, T., Rivals, E.: CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biology 14(3), R30 (2013)
Article Google Scholar
Rizk, G., Gouin, A., Chikhi, R., Lemaitre, C.: Mindthegap: integrated detection and assembly of short and long insertions. Bioinformatics (2014)
Google Scholar
Salmela, L.: Correction of sequencing errors in a mixed set of reads. Bioinformatics 26(10), 1284–1290 (2010)
Article Google Scholar
Schulz, M.H., Bauer, S., Robinson, P.N.: The generalised k-truncated suffix tree for time-and space-efficient searches in multiple DNA or protein sequences. International J. of Bioinformatics Research and Applications 4(1), 81–95 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

L.I.R.M.M. and Institut Biologie Computationnelle, Université de Montpellier II, CNRS U.M.R. 5506, Montpellier, France
Bastien Cazaux & Eric Rivals
LITIS EA 4108, NormaStic CNRS FR 3638, Université de Rouen, Rouen, France
Thierry Lecroq

Authors

Bastien Cazaux
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Lecroq
View author publications
You can also search for this author in PubMed Google Scholar
Eric Rivals
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Rivals .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Nice Sophia Antipolis University, Sophia Antipolis, France
Enrico Formenti
Mathematical Linguistics, Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Justus-Liebig-Universität, Gießen, Germany
Bianca Truthe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cazaux, B., Lecroq, T., Rivals, E. (2015). Construction of a de Bruijn Graph for Assembly from a Truncated Suffix Tree. In: Dediu, AH., Formenti, E., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2015. Lecture Notes in Computer Science(), vol 8977. Springer, Cham. https://doi.org/10.1007/978-3-319-15579-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-15579-1_8
Published: 24 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15578-4
Online ISBN: 978-3-319-15579-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics