Advertisement

Distances between Dinucleotides in the Human Genome

  • Carlos A. C. Bastos
  • Vera Afreixo
  • Armando J. Pinho
  • Sara P. Garcia
  • João M. O. S. Rodrigues
  • Paulo J. S. G. Ferreira
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)

Abstract

We developed a methodology to process DNA sequences based on the inter-dinucleotide distances and we characterized the inter-dinucleotide distance distributions of the human genome. The distance distribution of each dinucleotide was compared to the distance distribution of all the other dinucleotides using the Kullback-Leibler divergence. We found out that the divergence between the distribution of the distances of a dinucleotide and that of its reversed complement is very small, indicating that these distance distributions are very similar. This is an interesting finding that might give evidence of a stronger parity rule than the one provided by Chargaff’s second parity rule. Furthermore, we also compared the distance distribution of each dinucleotide to a reference distribution, that of a random sequence generated with the same dinucleotide abundances, revealing the CG dinucleotide as the one with the highest cumulative relative error for the first 60 distances.

Keywords

Distance Distribution Reference Distribution Parity Rule Reversed Complement Relative Frequency Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afreixo, V., Bastos, C.A.C., Pinho, A.J., Garcia, S.P., Ferreira, P.J.S.G.: Genome analysis with inter-nucleotide distances. Bioinformatics 25(23), 3064–3070 (2009)CrossRefGoogle Scholar
  2. 2.
    Albrecht-Buehler, G.: Asymptotically increasing compliance of genomes with Chargaff’s second parity rules through inversions and inverted transpositions. Proceedings of the National Academy of Sciences of the United States of America 103(47), 17828–17833 (2006)CrossRefGoogle Scholar
  3. 3.
    Albrecht-Buehler, G.: Inversions and inverted transpositions as the basis for an almost universal “format” of genome sequences. Genomics 90, 297–305 (2007)CrossRefGoogle Scholar
  4. 4.
    Burge, C., Campbell, A.M., Karlin, S.: Over- and under-representation of short oligonucleotides. Proc. Nat. Acad. Sci. USA 89, 1358–1362 (1992)CrossRefGoogle Scholar
  5. 5.
    Gentles, A.J., Karlin, S.: Genome-scale compositional comparisons in eukaryotes. Genome Research 11, 540–546 (2001)CrossRefGoogle Scholar
  6. 6.
    Glass, J.L., Thompson, R.F., Khulan, B., Figueroa, M.E., Olivier, E.N., Oakley, E.J., Van Zant, G., Bouhassira, E.E., Melnick, A., Golden, A., Fazzari, M.J., Greally, J.M.: CG dinucleotide clustering is a species-specific property of the genome. Nucleic Acids Research 35(20), 6798–6807 (2007)CrossRefGoogle Scholar
  7. 7.
    Qi, D., Jamie Cuticchia, A.: Compositional symmetries in complete genomes. Bioinformatics 17(6), 557–559 (2001)CrossRefGoogle Scholar
  8. 8.
    Qi, J., Wang, B., Hao, B.-I.: Whole proteome prokaryote phylogeny without sequence alignment: A K-string composition approach. Journal of Molecular Evolution 58, 1–11 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Carlos A. C. Bastos
    • 1
  • Vera Afreixo
    • 2
  • Armando J. Pinho
    • 1
  • Sara P. Garcia
    • 1
  • João M. O. S. Rodrigues
    • 1
  • Paulo J. S. G. Ferreira
    • 1
  1. 1.Signal Processing Lab, IEETA and Department of Electronics Telecommunications and InformaticsUniversity of AveiroAveiroPortugal
  2. 2.Department of MathematicsUniversity of AveiroAveiroPortugal

Personalised recommendations