Detecting Repeat Families in Incompletely Sequenced Genomes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5251)


Repeats form a major class of sequence in genomes with implications for functional genomics and practical problems. Their detection and analysis pose a number of challenges in genomic sequence analysis, especially if the genome is not completely sequenced. The most abundant and evolutionary active forms of repeats are found in the form of families of long similar sequences. We present a novel method for repeat family detection and characterization in cases where the target genome sequence is not completely known. Therefore we first establish the sequence graph, a compacted version of sparse de Bruijn graphs. Using appropriate analysis of the structure of this graph and its connected components after local modifications, we are able to devise two algorithms for repeat family detection. The applicability of the methods is shown for both simulated and real genomic data sets.


Bacterial Genome Mobile Element Chronic Granulomatous Disease Graph Dimension Repeat Family 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bao, Z., Eddy, S.R.: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002)CrossRefGoogle Scholar
  2. 2.
    Bokhari, S.H., Sauer, J.R.: A parallel graph decomposition algorithm for DNA sequencing with nanopores. Bioinformatics 21(7), 889–896 (2005)CrossRefGoogle Scholar
  3. 3.
    Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)CrossRefGoogle Scholar
  4. 4.
    Chaisson, M., Pevzner, P., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)CrossRefGoogle Scholar
  5. 5.
    Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008)CrossRefGoogle Scholar
  6. 6.
    Diestel, R.: Graph Theory, 3rd edn. Graduate Texts in Mathematics, vol. 173. Springer, Heidelberg (2005)zbMATHGoogle Scholar
  7. 7.
    Hall, A.E., Fiebig, A., Preuss, D.: Beyond the arabidopsis genome: Opportunities for comparative genomics. Plant Physiol. 129, 1439–1447 (2002)CrossRefGoogle Scholar
  8. 8.
    Heath, L.S., Pati, A.: Genomic signatures in de Bruijn chains. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 216–227. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)CrossRefGoogle Scholar
  10. 10.
    Jelinek, W.R., Toomey, T.P., Leinwald, L., Duncan, C.H., Biro, P.A., Choudary, P.V., Weissman, S.M., Rubin, C.M., Houck, C.M., Deininger, P.L., Schmid, C.W.: Ubiquitous, interspersed repeated sequences in mammalian genomes. Proc. Natl. Acad. Sci. USA 77(3), 1398–1402 (1980)CrossRefGoogle Scholar
  11. 11.
    Mahillon, J., Chandler, M.: Insertion sequences. Microbiol. Mol. Biol. Rev. 62(3), 725–774 (1998)Google Scholar
  12. 12.
    Myers, E.W.: The fragment assembly string graphs. Bioinformatics 21, ii79–ii85 (2005)Google Scholar
  13. 13.
    Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. In: Proceedings of RECOMB 2004, pp. 213–222 (March 2004)Google Scholar
  14. 14.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98(17), 9748–9753 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Luning Prak, E.T., Kazazian Jr., H.H.: Mobile elements and the human genome. Nature Rev. 1, 134–144 (2000)CrossRefGoogle Scholar
  16. 16.
    Amgarten Quitzau, J.A., Stoye, J.: A space efficient representation for sparse de Bruijn subgraphs. Report, Technische Fakultät der Universität Bielefeld, Abteilung Informationstechnik (2008),
  17. 17.
    Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14, 2336–2346 (2004)CrossRefGoogle Scholar
  18. 18.
    Setubal, J.C., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Publishing (1997)Google Scholar
  19. 19.
    Zerbino, D.R., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)CrossRefGoogle Scholar
  20. 20.
    Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. Proc. Natl. Acad. Sci. USA 102(5), 1285–1290 (2005)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  1. 1.AG Genominformatik, Technische Fakultät 
  2. 2.International NRW Graduate School in Bioinformatics and Genome ResearchBielefeld UniversityGermany

Personalised recommendations