Genomic Signatures from DNA Word Graphs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4463)


Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons and cis-elements through genes and on to regions of conserved or divergent gene order. The DNA Words program aims to identify mathematical structures that characterize genomes at multiple scales. The focus of this work is the fine structure of genomic sequences, the manner in which short nucleotide sequences fit together to comprise the genome as an abstract sequence, within a graph-theoretic setting. A DNA word graph is a generalization of a de Bruijn graph that records the occurrence counts of node and edges in a genomic sequence. A DNA word graph can be derived from a genomic sequence generated by a finite Markov chain or a subsequence of a sequenced genome. Both theoretically and empirically, DNA word graphs give rise to genomic signatures. Several genomic signatures are derived from the structure of a DNA word graph, including an information-rich and visually appealing genomic bar code. Application of genomic signatures to several genomes demonstrate their practical value in identifying and distinguishing genomic sequences.


Caenorhabditis Elegans Codon Bias Probability Generate Function Edge Deletion Count Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Karlin, S., Burge, C.: Dinucleotide relative abundance extremes — A genomic signature. Trends in Genetics 11(7), 283–290 (1995)CrossRefGoogle Scholar
  2. 2.
    Karlin, S., Mrazek, J., Campbell, A.M.: Compositional biases of bacterial genomes and evolutionary implications. Journal of Bacteriology 179(12), 3899–3913 (1997)Google Scholar
  3. 3.
    Jernigan, R.W., Baran, R.H.: Pervasive properties of the genomic signature. BMC Genomics 3 (2002)Google Scholar
  4. 4.
    Coenye, T., Vandamme, P.: Use of the genomic signature in bacterial classification and identification. Systematic and Applied Microbiology 27(2), 175–185 (2004)CrossRefGoogle Scholar
  5. 5.
    Deschavanne, P.J., et al.: Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences. Molecular Biology and Evolution 16(10), 1391–1399 (1999)Google Scholar
  6. 6.
    Sandberg, R., et al.: Quantifying the species-specificity in genomics signatures, synonymous codon choice, amino acid usage, and G+C content. Gene 311, 35–42 (2003)CrossRefGoogle Scholar
  7. 7.
    Dufraigne, C., et al.: Detection and characterization of horizontal transfers in prokaryotes using genomic signature. Nucleic Acids Research 33(1) (2005)Google Scholar
  8. 8.
    van Passel, M.W.J., et al.: An acquisition account of genomic islands based on genome signature comparisons. BMC Genomics 6 (2005)Google Scholar
  9. 9.
    Carbone, A., Kepes, F., Zinovyev, A.: Codon bias signatures, organization of micro-organisms in codon space, and lifestyle. Molecular Biology and Evolution 22(3), 547–561 (2005)CrossRefGoogle Scholar
  10. 10.
    Pevzner, P.A.: DNA physical mapping and alternating Eulerian cycles in colored graphs. Algorithmica 13(1-2), 77–105 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Pevzner, P.A., Tang, H.X., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proceedings of The National Academy of Sciences of The United States Of America 98(17), 9748–9753 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Zhang, Y., Waterman, M.S.: An Eulerian path approach to global multiple alignment for DNA sequences. Journal of Computational Biology 10(6), 803–819 (2003)CrossRefGoogle Scholar
  13. 13.
    Raphael, B., et al.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Research 14(11), 2336–2346 (2004)CrossRefGoogle Scholar
  14. 14.
    Zhang, Y., Waterman, M.S.: An Eulerian path approach to local multiple alignment for DNA sequences. Proceedings of The National Academy of Sciences of The United States Of America 102(5), 1285–1290 (2005)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Fickett, J.W., Torney, D.C., Wolf, D.R.: Base compositional structure of genomes. Genomics 13(4), 1056–1064 (1992)CrossRefGoogle Scholar
  16. 16.
    Rosenberg, A.L., Heath, L.S.: Graph separators, with applications. In: Frontiers of Computer Science, Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  17. 17.
    Feller, W.: An Introduction to Probability Theory and Its Applications, vol. I, 3rd edn. ohn Wiley & Sons Inc., New York (1968)Google Scholar
  18. 18.
    Waterman, M.: Introduction to Computational Biology. Academic Press Inc., Boston (1995)zbMATHGoogle Scholar
  19. 19.
    Cauchy, A.L.: Cours d’analyse de l’École Royale Polytechnique. Première partie. Instrumenta Rationis. Sources for the History of Logic in the Modern Age, VII. Cooperativa Libraria Universitaria Editrice Bologna, Bologna (1992) Analyse algébrique. [Algebraic analysis], Reprint of the 1821 edition, Edited and with an introduction by Umberto Bottazzini.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  1. 1.Department of Computer Science, Virginia Tech, Blacksburg, VA 24061-0106 

Personalised recommendations