Abstract
The structure of the deoxyribonucleic acid (DNA) is quantified using an algorithm that translates the chromosome data into a state-space plot. This scheme converts the information written by means of the four nitrogenous bases {thymine, cytosine, adenine, guanine} into a two-dimensional numerical representation. The resulting locus is then analyzed in the perspective of fractal and information theories. The classical integer-order expressions, namely the entropy and mutual information of two-dimensional distributions, are compared with those obtained using recently proposed fractional formulations. It is verified that the fractional approach leads to a superior discrimination of the information embedded in the two-dimensional locus. The dataset is further explored through the Jensen–Shannon divergence and multidimensional scaling. The proposed approach is first applied to the 24 Human chromosomes. In a second phase, the Bonobo, Chimpanzee and Orangutan are also considered for establishing a comparison between the four species. The results shed some light upon the information content of the distinct chromosomes.
Similar content being viewed by others
References
Almirantis, Y., Arndt, P., Li, W., Provata, A.: Editorial: complexity in genomes. Comput. Biol. Chem. 53(1–4), 1–4 (2014). https://doi.org/10.1016/j.compbiolchem.2014.08.003
Antão, R., Mota, A., Machado, J.A.T.: Kolmogorov complexity as a data similarity metric: application in mitochondrial DNA. Nonlinear Dyn. 93(3), 1059–1071 (2018). https://doi.org/10.1007/s11071-018-4245-7
Arneodo, A., Bacry, E., Graves, P., Muzy, J.: Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys. Rev. Lett. 74(16), 3293–3296 (1995). https://doi.org/10.1103/PhysRevLett.74.3293
Baleanu, D., Diethelm, K., Scalas, E., Trujillo, J.J.: Fractional Calculus: Models and Numerical Methods. Series on Complexity, Nonlinearity and Chaos. World Scientific Publishing Company, Singapore (2012)
Beck, C.: Generalised information and entropy measures in physics. Contemp. Phys. 50(4), 495–510 (2009). https://doi.org/10.1080/00107510902823517
Berry, M.V.: Diffractals. J. Phys. A Math. Gen. 12(6), 781–797 (1979)
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling-Theory and Applications. Springer-Verlag, New York (2005)
Briët, J., Harremoës, P.: Properties of classical and quantum Jensen–Shannon divergence. Phys. Rev. A 79, 052–311 (2009). https://doi.org/10.1103/PhysRevA.79.052311
Buldyrev, S.V., Goldberger, A.L., Havlin, S., Peng, C., Sciortino, M.S.F., Stanley, H.E.: Long-range fractal correlations in DNA. Phys. Rev. Lett. 71(11), 1776 (1993). https://doi.org/10.1103/PhysRevLett.71.1776
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. John Wiley & Sons, Hoboken, New Jersey (2012)
Dai, Q., Liu, X., Wang, T.: A novel 2D graphical representation of DNA sequences and its application. J. Mol. Graph. Modell. 25(3), 340–344 (2006). https://doi.org/10.1016/j.jmgm.2005.12.004
Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer-Verlag, Berlin, Heidelberg (2009)
Diethelm, K.: The Analysis of Fractional Differential Equations: An Application-Oriented Exposition Using Differential Operators of Caputo Type. Series on Complexity, Nonlinearity and Chaos. Springer, Heidelberg (2010)
Ebeling, W.: Prediction and entropy of nonlinear dynamical systems and symbolic sequences with LRO. Phys. D Nonlinear Phenom. 109(1–2), 42–52 (1997). https://doi.org/10.1016/S0167-2789(97)00157-7
Ebeling, W., Nicolis, G.: Entropy of symbolic sequences: the role of correlations. EPL Europhys. Lett. 14(3), 191 (1991). https://doi.org/10.1209/0295-5075/14/3/001
Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003). https://doi.org/10.1109/TIT.2003.813506
Georgiadis, M.M., Singh, I., Kellett, W.F., Hoshika, S., Benner, S.A., Richards, N.G.: Structural basis for a six nucleotide genetic alphabet. J. Am. Chem. Soc. 137(21), 6947–6955 (2015). https://doi.org/10.1021/jacs.5b03482
Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (2009)
Herzel, H., Gro\(\beta \)e, I.: Measuring correlations in symbol sequences. Phys. A: Stat. Mech. Appl. 216(4), 518–542 (1995). https://doi.org/10.1016/0378-4371(95)00104-F
Hilfer, R.: Application of Fractional Calculus in Physics. World Scientific, Singapore (2000)
Itzkovitz, S., Hodis, E., Segal, E.: Overlapping codes within protein-coding sequences. Genome Res. 20(11), 1582–1589 (2010). https://doi.org/10.1101/gr.105072.110
Kilbas, A., Srivastava, H., Trujillo, J.: Theory and applications of fractional differential equations, vol. 204. North-Holland Mathematics Studies, Elsevier, Amsterdam (2006)
Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: Information decomposition method to analyze symbolical sequences. Phys. Lett. A 312(3–4), 198–210 (2003). https://doi.org/10.1016/S0375-9601(03)00641-8
Kruskal, J.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications, Newbury Park (1978)
Lapidus, M.L., Fleckinger-Pellé, J.: Tambour fractal: vers une résolution de la conjecture de Weyl-Berry pour les valeurs propres du Laplacien. Comptes Rendus de l’Académie des Sciences Paris Sér. I Math. 306, 171–175 (1988)
Leong, P.M., Morgenthaler, S.: Random walk and gap plots of DNA sequences. Bioinformatics 11(5), 503–507 (1995). https://doi.org/10.1093/bioinformatics/11.5.503
Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991). https://doi.org/10.1109/18.61115
Machado, J.A.T.: Shannon entropy analysis of the genome code. Math. Probl. Eng. 2012(Article ID 132625), 1–12 (2012). https://doi.org/10.1155/2012/132625
Machado, J.A.T.: Complex dynamics of financial indices. Nonlinear Dyn. 74(1–2), 287–296 (2013). https://doi.org/10.1007/s11071-013-0965-x
Machado, J.A.T.: Fractional order generalized information. Entropy 16(4), 2350–2361 (2014). https://doi.org/10.3390/e16042350
Machado, J.A.T.: Relativistic time effects in financial dynamics. Nonlinear Dyn. 75(4), 735–744 (2014). https://doi.org/10.1007/s11071-013-1100-8
Machado, J.A.T.: Entropy analysis of the human DNA information. In: NODYCON 2019. First International Nonlinear Dynamics Conference, pp. 1789–790. Italy, Rome (2019)
Machado, J.A.T., Duarte, G.M., Duarte, F.B.: Analysis of financial data series using fractional Fourier transform and multidimensional scaling. Nonlinear Dyn. 65(3), 235–245 (2011). https://doi.org/10.1007/s11071-010-9885-1
Machado, J.A.T., Duarte, G.M., Duarte, F.B.: Identifying economic periods and crisis with the multidimensional scaling. Nonlinear Dyn. 63(4), 611–622 (2011). https://doi.org/10.1007/s11071-010-9823-2
Machado, J.A.T., Lopes, A.M.: The persistence of memory. Nonlinear Dyn. 79(1), 63–82 (2015). https://doi.org/10.1007/s11071-014-1645-1
Machado, J.A.T., Lopes, A.M.: Ranking the scientific output of researchers in fractional calculus. Fractional calculus and applied analysis. Int. J. Theory Appl. 22(1), 11–26 (2019). https://doi.org/10.1515/fca-2019-0002
Machado, J.T.: Fractional order description of DNA. Appl. Math. Model. 39(14), 4095–4102 (2015). https://doi.org/10.1016/j.apm.2014.12.037
Machado, J.T.: Bond graph and memristor approach to DNA analysis. Nonlinear Dyn. 88(2), 1051–1057 (2017). https://doi.org/10.1007/s11071-016-3294-z
Machado, J.T., Costa, A., Quelhas, M.: Entropy analysis of DNA code dynamics in human chromosomes. Comput. Math. Appl. 62(3), 1612–1617 (2011)
Machado, J.T., Costa, A.C., Quelhas, M.D.: Shannon, Rényi and Tsallis entropy analysis of DNA using phase plane. Nonlinear Anal. Ser. B: Real World Appl. 12(6), 3135–3144 (2011)
Machado, J.T., Lopes, A.M.: Fractional Jensen–Shannon analysis of the scientific output of researchers in fractional calculus. Entropy 19(3), 127 (2017). https://doi.org/10.3390/e19030127
Machado, J.T., Lopes, A.M.: Artistic painting: a fractional calculus perspective. Appl. Math. Model. 65, 614–626 (2019). https://doi.org/10.1016/j.apm.2018.09.009
Majtey, A.P., Lamberti, P.W., Prato, D.P.: Jensen–Shannon divergence as a measure of distinguishability between mixed quantum states. Phys. Rev. A 72, 052,310 (2005). https://doi.org/10.1103/PhysRevA.72.052310
Miller, K., Ross, B.: An Introduction to the Fractional Calculus and Fractional Differential Equations. John Wiley and Sons, New York (1993)
Oldham, K., Spanier, J.: The Fractional Calculus: Theory and Application of Differentiation and Integration to Arbitrary Order. Academic Press, New York (1974)
Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E.: Long-range correlations in nucleotide sequences. Nature 356(6365), 168–170 (1992). https://doi.org/10.1038/356168a0
Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlina, S., Sciortino, F., Simons, M., Stanley, H.E.: Fractal landscape analysis of DNA walks. Physica A: Statistical Mechanics and its Applications 191(1–4), 25–29 (1993). https://doi.org/10.1016/0378-4371(92)90500-P
Petráš, I.: Fractional-Order Nonlinear Systems: Modeling, Analysis and Simulation. Springer, Heidelberg (2011)
Podlubny, I.: Fractional Differential Equations, Volume 198: An Introduction to Fractional Derivatives, Fractional Differential Equations, to Methods of Their Solution, Mathematics in Science and Engineering. Academic Press, San Diego (1998)
Provata, A., Almirantis, Y.: Fractal Cantor patterns in the sequence structure of DNA. Fractals 08(01), 15–27 (2000). https://doi.org/10.1142/S0218348X00000044
Provata, A., Nicolis, C., Nicolis, G.: DNA viewed as an out-of-equilibrium structure. Phys. Rev. E 89(052105) (2014). https://doi.org/10.1103/PhysRevE.89.052105
Randić, M., Vračko, M., Nandy, A., Basak, S.C.: On 3-D graphical representation of DNA primary sequences and their numerical characterization. J. Chem. Inf. Comput. Sci. 40(5), 1235–1244 (2000). https://doi.org/10.1021/ci000034q
Román-Roldán, R., Bernaola-Galván, P., Oliver, J.L.: Application of information theory to DNA sequence analysis: a review. Pattern Recognit. 29(7), 1187–1194 (1996). https://doi.org/10.1016/0031-3203(95)00145-X
Román-Roldán, R., Bernaola-Galván, P., Oliver, J.L.: Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80, 1344 (1998). https://doi.org/10.1103/PhysRevLett.80.1344
Roy, A., Raychaudhury, C., Nandy, A.: Novel techniques of graphical representation and analysis of DNA sequences—A review. J. Biosci. 23(1), 55–71 (1998). https://doi.org/10.1007/BF02728525
Sabatier, J., Agrawal, O.P., Machado, J.T.: Advances in Fractional Calculus: Theoretical Developments and Applications in Physics and Engineering. Springer, Dordrecht, The Netherlands (2007)
Samko, S., Kilbas, A., Marichev, O.: Fractional Integrals and Derivatives: Theory and Applications. Gordon and Breach Science Publishers, Amsterdam (1993)
Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman, New York (1991)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)
Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America 106(8), 2677–2682 (2009). https://doi.org/10.1073/pnas.0813249106
Tarasov, V.: Fractional Dynamics: Applications of Fractional Calculus to Dynamics of Particles, Fields and Media. Springer, New York (2010)
Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68(25), 3805–3808 (1992). https://doi.org/10.1103/PhysRevLett.68.3805
Zhang, C.T., Zhang, R., Ou, H.Y.: The Z curve database: a graphic representation of genome sequences. Bioinformatics 19(5), 593–599 (2003). https://doi.org/10.1093/bioinformatics/btg041
Acknowledgements
The author thanks the following organizations for allowing access to genome data. Human: The Genome Reference Consortium, Common Chimpanzee: The Chimpanzee Sequencing and Analysis Consortium, Bonobo: Max Planck Institute for Evolutionary Anthropology, Orangutan: Genome Sequencing Center at WUSTL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Machado, J.T. Information analysis of the human DNA. Nonlinear Dyn 98, 3169–3186 (2019). https://doi.org/10.1007/s11071-019-05066-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-019-05066-7