On the Approximation of the Kolmogorov Complexity for DNA Sequences
The Kolmogorov complexity furnishes several ways for studying different natural processes that can be expressed using sequences of symbols from a finite alphabet, such as the case of DNA sequences. Although the Kolmogorov complexity is not algorithmically computable, it can be approximated by lossless normal compressors. In this paper, we use a specific DNA compressor to approximate the Kolmogorov complexity and we assess it regarding its normality. Then, we use it on several datasets, that are constituted by different DNA sequences, representing complete genomes of different species and domains. We show several evolution-related insights associated with the complexity, namely that, globally, archaea have higher relative complexity than bacteria and eukaryotes.
KeywordsKolmogorov complexity Compression DNA sequences
This work was partially funded by FEDER (Programa Operacional Factores de Competitividade - COMPETE) and by National Funds through the FCT - Foundation for Science and Technology, in the context of the projects UID/CEC/00127/2013, PTCD/EEI-SII/6608/2014.
- 7.Hutter, M.: Algorithmic information theory: a brief non-technical guide to the field. Scholarpedia 9620, March 2007Google Scholar
- 13.Pratas, D., Pinho, A.J., Ferreira, P.: Efficient compression of genomic sequences. In: Proceedings of the Data Compression Conference, DCC-2016, Snowbird, UT, pp. 231–240, March 2016Google Scholar
- 14.Pratas, D.: Compression and analysis of genomic data. Ph.D. thesis, University of Aveiro (2016)Google Scholar
- 17.Ferreira, P.J.S.G., Pinho, A.J.: Compression-based normal similarity measures for DNA sequences. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-2014, Florence, Italy, pp. 419–423, May 2014Google Scholar
- 24.Maumus, F., Epert, A., Nogué, F., Blanc, G.: Plant genomes enclose footprints of past infections by giant virus relatives. Nat. Commun. 5, 4268 (2014)Google Scholar
- 31.Pratas, D., Pinho, A.J.: Compressing the human genome using exclusively Markov models. In: Rocha, M.P., Rodríguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds.) PACBB 2011. AISC, vol. 93, pp. 213–220. Springer, Heidelberg (2011)Google Scholar