Genomic Signatures in De Bruijn Chains

  • Lenwood S. Heath
  • Amrita Pati
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4645)

Abstract

Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons to regions of conserved or divergent gene order. This work examines the unique manner in which oligonucleotides fit together to comprise a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a generalization of a finite Markov chain. A DNA word graph (DWG) is a generalization of a de Bruijn graph that records the occurrence counts of node and edges in a genomic sequence generated by a DBC. We combine the properties of DWGs and DBCs to obtain a powerful genomic signature demonstrated as information-rich, efficient, and sufficiently representative of the sequence from which it is derived. We illustrate its practical value in distinguishing genomic sequences and predicting the origin of short DNA sequences of unknown origin, while highlighting its superior performance compared to existing genomic signatures including the dinucleotides odds ratio.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carbone, A., Kepes, F., Zinovyev, A.: Codon bias signatures, organization of micro-organisms in codon space, and lifestyle. Molecular Biology and Evolution 22(3), 547–561 (2005)CrossRefGoogle Scholar
  2. 2.
    Coenye, T., Vandamme, P.: Use of the genomic signature in bacterial classification and identification. Systematic and Applied Microbiology 27(2), 175–185 (2004)CrossRefGoogle Scholar
  3. 3.
    Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B.: Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences. Molecular Biology and Evolution 16(10), 1391–1399 (1999)Google Scholar
  4. 4.
    Dufraigne, C., Fertil, B., Lespinats, S., Giron, A., Deschavanne, P.: Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Research 33(1), 12 pages (2005)Google Scholar
  5. 5.
    Fertil, B., Massin, M., Lespinats, S., Devic, C., Dumee, P., Giron, A.: GENSTYLE: exploration and analysis of DNA sequences with genomic signature. Nucleic Acids Research 33(Web Server issue), W512–W515 (2005)Google Scholar
  6. 6.
    Jernigan, R.W., Baran, R.H.: Pervasive properties of the genomic signature. BMC Genomics 3, 9 pages (2002)Google Scholar
  7. 7.
    Karlin, S., Burge, C.: Dinucleotide relative abundance extremes — A genomic signature. Trends in Genetics 11(7), 283–290 (1995)CrossRefGoogle Scholar
  8. 8.
    Karlin, S., Mrazek, J., Campbell, A.M.: Compositional biases of bacterial genomes and evolutionary implications. Journal of Bacteriology 179(12), 3899–3913 (1997)Google Scholar
  9. 9.
    Sandberg, R., Branden, C.I., Ernberg, I., Coster, J.: Quantifying the species-specificity in genomics signatures, synonymous codon choice, amino acid usage, and G+C content. Gene 311, 35–42 (2003)CrossRefGoogle Scholar
  10. 10.
    Teeling, H., Meyerdierks, A., Buaer, M., Amann, R., Glockner, F.O.: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environmental Microbiology 6, 938–947 (2004)CrossRefGoogle Scholar
  11. 11.
    van Passel, M.W.J., Bart, A., Thygesen, H.H., Luyf, A.C.M., van Kampen, A.H.C., van der Ende, A.: An acquisition account of genomic islands based on genome signature comparisons. BMC Genomics 6, 10 pages (2005)CrossRefGoogle Scholar
  12. 12.
    Pevzner, P.A.: DNA physical mapping and alternating Eulerian cycles in colored graphs. Algorithmica 13(1-2), 77–105 (1995)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Pevzner, P.A., Tang, H.X., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of The National Academy of Sciences of the United States Of America 98(17), 9748–9753 (2001)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Zhang, Y., Waterman, M.S.: An Eulerian path approach to global multiple alignment for DNA sequences. Journal of Computational Biology 10(6), 803–819 (2003)CrossRefGoogle Scholar
  15. 15.
    Raphael, B., Zhi, D.G., Tang, H.X., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Research 14(11), 2336–2346 (2004)CrossRefGoogle Scholar
  16. 16.
    Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. Proceedings of The National Academy of Sciences of the United States Of America 102(5), 1285–1290 (2005)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Heath, L.S., Pati, A.: Genomic signatures from DNA word graphs. LNCS (LNBI), vol. 4463, pp. 317–328. Springer, Heidelberg (2007)Google Scholar
  18. 18.
    Fickett, J.W., Torney, D.C., Wolf, D.R.: Base compositional structure of genomes. Genomics 13(4), 1056–1064 (1992)CrossRefGoogle Scholar
  19. 19.
    Rosenberg, A.L., Heath, L.S.: Graph Separators, With Applications. Frontiers of Computer Science. Kluwer Academic/Plenum Publishers, Dordrecht (2000)Google Scholar
  20. 20.
    Feller, W.: An Introduction to Probability Theory and Its Applications, 3rd edn., vol. I. John Wiley & Sons Inc., New York (1968)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Lenwood S. Heath
    • 1
  • Amrita Pati
    • 1
  1. 1.Department of Computer Science, Virginia Tech, Blacksburg, VA 24061-0106 

Personalised recommendations