Comparing Reverse Complementary Genomic Words Based on Their Distance Distributions and Frequencies

  • Ana Helena Tavares
  • Jakob Raymaekers
  • Peter J. Rousseeuw
  • Raquel M. Silva
  • Carlos A. C. Bastos
  • Armando Pinho
  • Paula Brito
  • Vera Afreixo
Original Research Article


In this work, we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is also explored, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff’s rules. This study uses both the complete human genome and its repeat-masked version.


Chargaff’s rules Human genome Distance distribution Peak dissimilarity Symmetric word pairs 



This work was partially supported by the Portuguese Foundation for Science and Technology (FCT), Center for Research and Development in Mathematics and Applications (CIDMA), Institute of Biomedicine (iBiMED) and Institute of Electronics and Telematics Engineering of Aveiro (IEETA), within projects UID/MAT/04106/2013, UID/BIM/04501/2013 and UID/CEC/00127/2013. A. Tavares acknowledges the Ph.D. Grant PD/BD/105729/2014 from the FCT. The research of P. Brito was financed by the ERDF—European Regional Development Fund through the Operational Programme for Competitiveness and Internationalization—COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by the FCT as part of project UID/EEA/50014/2013. The research of J. Raymaekers and P. J. Rousseeuw was supported by projects of Internal Funds KU Leuven.


  1. 1.
    Afreixo V, Bastos CAC, Pinho AJ, Garcia SP, Ferreira PJSG (2009) Genome analysis with inter-nucleotide distances. Bioinformatics 25(23):3064–3070CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Tavares AH, Afreixo V, Rodrigues JMOS, Bastos CAC (2015) The symmetry of oligonucleotide distance distributions in the human genome. Proc ICPRAM 2:256–263Google Scholar
  3. 3.
    Forsdyke DR, Mortimer JR (2000) Chargaff’s legacy. Gene 261(1):127–137CrossRefPubMedGoogle Scholar
  4. 4.
    Zhang SH, Huang YZ (2010) Strand symmetry: characteristics and origins. In: 2010 4th international conference on bioinformatics and biomedical engineering (iCBBE). IEEE, pp 1–4Google Scholar
  5. 5.
    Albrecht-Buehler G (2007) Inversions and inverted transpositions as the basis for an almost universal ‘format’ of genome sequences. Genomics 90(3):297–305CrossRefPubMedGoogle Scholar
  6. 6.
    Baisnée PF, Hampson S, Baldi P (2002) Why are complementary DNA strands symmetric? Bioinformatics 18(8):1021–1033CrossRefPubMedGoogle Scholar
  7. 7.
    Inagaki H, Kato T, Tsutsumi M, Ouchi Y, Ohye T, Kurahashi H (2016) Palindrome-mediated translocations in humans: a new mechanistic model for gross chromosomal rearrangements. Front Genet 7:125CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Afreixo V, Bastos CAC, Garcia SP, Rodrigues JMOS, Pinho AJ, Ferreira PJSG (2013) The breakdown of the word symmetry in the human genome. J Theor Biol 335:153–159CrossRefPubMedGoogle Scholar
  9. 9.
    Afreixo V, Rodrigues JMOS, Bastos CAC (2015) Analysis of single-strand exceptional word symmetry in the human genome: new measures. Biostatistics 16(2):209–221CrossRefPubMedGoogle Scholar
  10. 10.
    Albrecht-Buehler G (2006) Asymptotically increasing compliance of genomes with Chargaff’s second parity rules through inversions and inverted transpositions. Proc Natl Acad Sci 103(47):17828–17833CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Tavares AH, Raymaekers J, Rousseeuw PJ, Silva RM, Bastos CAC, Pinho AJ, Brito P, Afreixo V (2017) Dissimilar symmetric word pairs in the human genome. In: Fdez-Riverola F, Mohamad M, Rocha M, De Paz J, Pinto T (eds) 11th International Conference on Practical Applications of Computational Biology & Bioinformatics. PACBB 2017. Advances in Intelligent Systems and Computing, vol 161. Springer, Cham, pp 248–256Google Scholar
  12. 12.
    Agresti A (2007) An introduction to categorical data analysis. Wiley series in probability and statistics. Wiley, New YorkGoogle Scholar
  13. 13.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
  14. 14.
    Jeffreys H (1946) An invariant form for the prior probability in estimation problems. In: Proceedings of the Royal Society of London. Series A, Mathematical and physical sciences, vol 186. The Royal Society, London, pp 453–461Google Scholar
  15. 15.
    Smit AFA, Hubley RM, Green P (2013) RepeatMasker open-4.0. 2013–2015.
  16. 16.
    Benson G et al (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27(2):573–580CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Fu JC (1996) Distribution theory of runs and patterns associated with a sequence of multi-state trials. Stat Sin 6:957–974Google Scholar
  18. 18.
    Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Mathematics and CIDMA and iBiMEDUniversity of AveiroAveiroPortugal
  2. 2.Department of MathematicsKU LeuvenLeuvenBelgium
  3. 3.Department of Medical Sciences and iBiMED and IEETAUniversity of AveiroAveiroPortugal
  4. 4.Department of Electronics Telecommunications and Informatics and IEETAUniversity of AveiroAveiroPortugal
  5. 5.Faculty of Economics and LIAAD-INESC TECUniversity of PortoPortoPortugal
  6. 6.Department of Mathematics and CIDMA and iBiMED and IEETAUniversity of AveiroAveiroPortugal

Personalised recommendations