Dissimilar Symmetric Word Pairs in the Human Genome
In this work we explore the dissimilarity between symmetric word pairs, by comparing the inter-word distance distribution of a word to that of its reversed complement. We propose a new measure of dissimilarity between such distributions. Since symmetric pairs with different patterns could point to evolutionary features, we search for the pairs with the most dissimilar behaviour. We focus our study on the complete human genome and its repeat-masked version.
KeywordsInter-word distance Reversed complements Dissimilarity measure Human genome
This work was partially supported by the Portuguese Foundation for Science and Technology (FCT), Center for Research & Development in Mathematics and Applications (CIDMA), Institute of Biomedicine (iBiMED) and Institute of Electronics and Informatics Engineering of Aveiro (IEETA), within projects UID/MAT/04106/2013, UID/BIM/04501/2013 and UID/CEC/00127/2013, and by PhD grant PD/BD/105729/2014. The research of P. Brito was financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation (COMPETE 2020) within project POCI-01-0145-FEDER-006961, and by the FCT as part of project UID/EEA/50014/2013. The research of J. Raymaekers and P. Rousseeuw was supported by projects of Internal Funds KU Leuven.
- 9.Smit, A.F.A., Hubley, R.M., Green, P.: Repeatmasker open-4.0. 2013–2015 (http://repeatmasker.org)
- 10.Tavares, A.H., Afreixo, V., Rodrigues, J.M.O.S., Bastos, C.A.C.: The symmetry of oligonucleotide distance distributions in the human genome. In: Proceedings of ICPRAM, vol. 2, pp. 256–263 (2015)Google Scholar
- 11.Zhang, S.-H., Huang, Y.-Z.: Strand symmetry: characteristics and origins. In: 2010 4th International Conference on Bioinformatics and Biomedical Engineering (iCBBE), pp. 1–4. IEEE (2010)Google Scholar