Phylogenetic Comparative Assembly

  • Peter Husemann
  • Jens Stoye
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5724)


Recent high throughput sequencing technologies are capable of generating a huge amount of data for bacterial genome sequencing projects. Although current sequence assemblers successfully merge the overlapping reads, often several contigs remain which cannot be assembled any further. It is still costly and time consuming to close all the gaps in order to acquire the whole genomic sequence. Here we propose an algorithm that takes several related genomes and their phylogenetic relationships into account to create a contig adjacency graph. From this a layout graph can be computed which indicates putative adjacencies of the contigs in order to aid biologists in finishing the complete genomic sequence.


Reference Genome Edge Weight Phylogenetic Distance Related Genome Reverse Complement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Staden, R.: A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 6(7), 2601–2610 (1979)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Anderson, S.: Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 9(13), 3015–3027 (1981)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Richter, D.C., Schuster, S.C., Huson, D.H.: OSLay: optimal syntenic layout of unfinished assemblies. Bioinformatics 23(13), 1573–1579 (2007)CrossRefPubMedGoogle Scholar
  4. 4.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefPubMedGoogle Scholar
  5. 5.
    van Hijum, S.A.F.T., Zomer, A.L., Kuipers, O.P., Kok, J.: Projector 2: contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic Acids Res. 566, W560–W566 (2005)CrossRefGoogle Scholar
  6. 6.
    Zhao, F., Zhao, F., Li, T., Bryant, D.A.: A new pheromone trail-based genetic algorithm for comparative genome assembly. Nucleic Acids Res. 36(10), 3455–3462 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Rasmussen, K.R., Stoye, J., Myers, E.W.: Efficient q-gram filters for finding all epsilon-matches over a given length. J. Comp. Biol. 13(2), 296–308 (2006)CrossRefGoogle Scholar
  8. 8.
    Bentley, J.J.: Fast algorithms for Geometric Traveling Salesman Problems. Informs. J. Comp. 4(4), 387–411 (1992)CrossRefGoogle Scholar
  9. 9.
    Tauch, A., et al.: The lifestyle of Corynebacterium urealyticum derived from its complete genome sequence established by pyrosequencing. J. Biotechnol. 136(1-2), 11–21 (2008)CrossRefPubMedGoogle Scholar
  10. 10.
    Tauch, A., et al.: Ultrafast pyrosequencing of Corynebacterium kroppenstedtii DSM44385 revealed insights into the physiology of a lipophilic corynebacterium that lacks mycolic acids. J. Biotechnol. 136(1-2), 22–30 (2008)CrossRefPubMedGoogle Scholar
  11. 11.
    Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D., Madden, T.L., Schuler, G., Tatusova, T.A., Rapp, B.A.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 28(1), 10–14 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A., Wheeler, D.L.: Genbank. Nucleic Acids Res. 28(1), 15–18 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Blom, J., Albaum, S.P., Doppmeier, D., Pühler, A., Vorhölter, F.J., Goesmann, A.: EDGAR: A software framework for the comparative analysis of microbial genomes. BMC Bioinformatics (to appear, 2009)Google Scholar
  14. 14.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)PubMedGoogle Scholar
  15. 15.
    Fredslund, J.: PHY.FI: fast and easy online creation and manipulation of phylogeny color figures. BMC Bioinformatics 7, 315 (2006)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Gansner, E.R., North, S.C.: An open graph visualization system and its applications to software engineering. SPE 30, 1203–1233 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Peter Husemann
    • 1
    • 2
  • Jens Stoye
    • 1
  1. 1.AG Genominformatik,Technische FakultätBielefeld UniversityGermany
  2. 2.International NRW Graduate School in Bioinformatics and Genome ResearchBielefeld UniversityGermany

Personalised recommendations