Skip to main content

Advertisement

Log in

Information analysis of the human DNA

  • Original Paper
  • Published:
Nonlinear Dynamics Aims and scope Submit manuscript

Abstract

The structure of the deoxyribonucleic acid (DNA) is quantified using an algorithm that translates the chromosome data into a state-space plot. This scheme converts the information written by means of the four nitrogenous bases {thymine, cytosine, adenine, guanine} into a two-dimensional numerical representation. The resulting locus is then analyzed in the perspective of fractal and information theories. The classical integer-order expressions, namely the entropy and mutual information of two-dimensional distributions, are compared with those obtained using recently proposed fractional formulations. It is verified that the fractional approach leads to a superior discrimination of the information embedded in the two-dimensional locus. The dataset is further explored through the Jensen–Shannon divergence and multidimensional scaling. The proposed approach is first applied to the 24 Human chromosomes. In a second phase, the Bonobo, Chimpanzee and Orangutan are also considered for establishing a comparison between the four species. The results shed some light upon the information content of the distinct chromosomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Almirantis, Y., Arndt, P., Li, W., Provata, A.: Editorial: complexity in genomes. Comput. Biol. Chem. 53(1–4), 1–4 (2014). https://doi.org/10.1016/j.compbiolchem.2014.08.003

    Article  Google Scholar 

  2. Antão, R., Mota, A., Machado, J.A.T.: Kolmogorov complexity as a data similarity metric: application in mitochondrial DNA. Nonlinear Dyn. 93(3), 1059–1071 (2018). https://doi.org/10.1007/s11071-018-4245-7

    Article  Google Scholar 

  3. Arneodo, A., Bacry, E., Graves, P., Muzy, J.: Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys. Rev. Lett. 74(16), 3293–3296 (1995). https://doi.org/10.1103/PhysRevLett.74.3293

    Article  Google Scholar 

  4. Baleanu, D., Diethelm, K., Scalas, E., Trujillo, J.J.: Fractional Calculus: Models and Numerical Methods. Series on Complexity, Nonlinearity and Chaos. World Scientific Publishing Company, Singapore (2012)

    Book  Google Scholar 

  5. Beck, C.: Generalised information and entropy measures in physics. Contemp. Phys. 50(4), 495–510 (2009). https://doi.org/10.1080/00107510902823517

    Article  Google Scholar 

  6. Berry, M.V.: Diffractals. J. Phys. A Math. Gen. 12(6), 781–797 (1979)

    Article  MathSciNet  Google Scholar 

  7. Borg, I., Groenen, P.J.: Modern Multidimensional Scaling-Theory and Applications. Springer-Verlag, New York (2005)

    MATH  Google Scholar 

  8. Briët, J., Harremoës, P.: Properties of classical and quantum Jensen–Shannon divergence. Phys. Rev. A 79, 052–311 (2009). https://doi.org/10.1103/PhysRevA.79.052311

    Article  Google Scholar 

  9. Buldyrev, S.V., Goldberger, A.L., Havlin, S., Peng, C., Sciortino, M.S.F., Stanley, H.E.: Long-range fractal correlations in DNA. Phys. Rev. Lett. 71(11), 1776 (1993). https://doi.org/10.1103/PhysRevLett.71.1776

    Article  Google Scholar 

  10. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. John Wiley & Sons, Hoboken, New Jersey (2012)

    MATH  Google Scholar 

  11. Dai, Q., Liu, X., Wang, T.: A novel 2D graphical representation of DNA sequences and its application. J. Mol. Graph. Modell. 25(3), 340–344 (2006). https://doi.org/10.1016/j.jmgm.2005.12.004

    Article  Google Scholar 

  12. Deza, M.M., Deza, E.: Encyclopedia of Distances. Springer-Verlag, Berlin, Heidelberg (2009)

    Book  Google Scholar 

  13. Diethelm, K.: The Analysis of Fractional Differential Equations: An Application-Oriented Exposition Using Differential Operators of Caputo Type. Series on Complexity, Nonlinearity and Chaos. Springer, Heidelberg (2010)

    Book  Google Scholar 

  14. Ebeling, W.: Prediction and entropy of nonlinear dynamical systems and symbolic sequences with LRO. Phys. D Nonlinear Phenom. 109(1–2), 42–52 (1997). https://doi.org/10.1016/S0167-2789(97)00157-7

    Article  MATH  Google Scholar 

  15. Ebeling, W., Nicolis, G.: Entropy of symbolic sequences: the role of correlations. EPL Europhys. Lett. 14(3), 191 (1991). https://doi.org/10.1209/0295-5075/14/3/001

    Article  Google Scholar 

  16. Endres, D.M., Schindelin, J.E.: A new metric for probability distributions. IEEE Trans. Inf. Theory 49(7), 1858–1860 (2003). https://doi.org/10.1109/TIT.2003.813506

    Article  MathSciNet  MATH  Google Scholar 

  17. Georgiadis, M.M., Singh, I., Kellett, W.F., Hoshika, S., Benner, S.A., Richards, N.G.: Structural basis for a six nucleotide genetic alphabet. J. Am. Chem. Soc. 137(21), 6947–6955 (2015). https://doi.org/10.1021/jacs.5b03482

    Article  Google Scholar 

  18. Gray, R.M.: Entropy and Information Theory. Springer-Verlag, New York (2009)

    Google Scholar 

  19. Herzel, H., Gro\(\beta \)e, I.: Measuring correlations in symbol sequences. Phys. A: Stat. Mech. Appl. 216(4), 518–542 (1995). https://doi.org/10.1016/0378-4371(95)00104-F

    Article  MathSciNet  Google Scholar 

  20. Hilfer, R.: Application of Fractional Calculus in Physics. World Scientific, Singapore (2000)

    Book  Google Scholar 

  21. Itzkovitz, S., Hodis, E., Segal, E.: Overlapping codes within protein-coding sequences. Genome Res. 20(11), 1582–1589 (2010). https://doi.org/10.1101/gr.105072.110

    Article  Google Scholar 

  22. Kilbas, A., Srivastava, H., Trujillo, J.: Theory and applications of fractional differential equations, vol. 204. North-Holland Mathematics Studies, Elsevier, Amsterdam (2006)

    Book  Google Scholar 

  23. Korotkov, E.V., Korotkova, M.A., Kudryashov, N.A.: Information decomposition method to analyze symbolical sequences. Phys. Lett. A 312(3–4), 198–210 (2003). https://doi.org/10.1016/S0375-9601(03)00641-8

    Article  MathSciNet  MATH  Google Scholar 

  24. Kruskal, J.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)

    Article  MathSciNet  Google Scholar 

  25. Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications, Newbury Park (1978)

    Book  Google Scholar 

  26. Lapidus, M.L., Fleckinger-Pellé, J.: Tambour fractal: vers une résolution de la conjecture de Weyl-Berry pour les valeurs propres du Laplacien. Comptes Rendus de l’Académie des Sciences Paris Sér. I Math. 306, 171–175 (1988)

  27. Leong, P.M., Morgenthaler, S.: Random walk and gap plots of DNA sequences. Bioinformatics 11(5), 503–507 (1995). https://doi.org/10.1093/bioinformatics/11.5.503

    Article  Google Scholar 

  28. Lin, J.: Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 37(1), 145–151 (1991). https://doi.org/10.1109/18.61115

    Article  MathSciNet  MATH  Google Scholar 

  29. Machado, J.A.T.: Shannon entropy analysis of the genome code. Math. Probl. Eng. 2012(Article ID 132625), 1–12 (2012). https://doi.org/10.1155/2012/132625

    MathSciNet  Google Scholar 

  30. Machado, J.A.T.: Complex dynamics of financial indices. Nonlinear Dyn. 74(1–2), 287–296 (2013). https://doi.org/10.1007/s11071-013-0965-x

    Article  Google Scholar 

  31. Machado, J.A.T.: Fractional order generalized information. Entropy 16(4), 2350–2361 (2014). https://doi.org/10.3390/e16042350

    Article  Google Scholar 

  32. Machado, J.A.T.: Relativistic time effects in financial dynamics. Nonlinear Dyn. 75(4), 735–744 (2014). https://doi.org/10.1007/s11071-013-1100-8

    Article  MathSciNet  Google Scholar 

  33. Machado, J.A.T.: Entropy analysis of the human DNA information. In: NODYCON 2019. First International Nonlinear Dynamics Conference, pp. 1789–790. Italy, Rome (2019)

  34. Machado, J.A.T., Duarte, G.M., Duarte, F.B.: Analysis of financial data series using fractional Fourier transform and multidimensional scaling. Nonlinear Dyn. 65(3), 235–245 (2011). https://doi.org/10.1007/s11071-010-9885-1

    Article  Google Scholar 

  35. Machado, J.A.T., Duarte, G.M., Duarte, F.B.: Identifying economic periods and crisis with the multidimensional scaling. Nonlinear Dyn. 63(4), 611–622 (2011). https://doi.org/10.1007/s11071-010-9823-2

    Article  Google Scholar 

  36. Machado, J.A.T., Lopes, A.M.: The persistence of memory. Nonlinear Dyn. 79(1), 63–82 (2015). https://doi.org/10.1007/s11071-014-1645-1

    Article  Google Scholar 

  37. Machado, J.A.T., Lopes, A.M.: Ranking the scientific output of researchers in fractional calculus. Fractional calculus and applied analysis. Int. J. Theory Appl. 22(1), 11–26 (2019). https://doi.org/10.1515/fca-2019-0002

    Article  MATH  Google Scholar 

  38. Machado, J.T.: Fractional order description of DNA. Appl. Math. Model. 39(14), 4095–4102 (2015). https://doi.org/10.1016/j.apm.2014.12.037

    Article  MATH  Google Scholar 

  39. Machado, J.T.: Bond graph and memristor approach to DNA analysis. Nonlinear Dyn. 88(2), 1051–1057 (2017). https://doi.org/10.1007/s11071-016-3294-z

    Article  Google Scholar 

  40. Machado, J.T., Costa, A., Quelhas, M.: Entropy analysis of DNA code dynamics in human chromosomes. Comput. Math. Appl. 62(3), 1612–1617 (2011)

    Article  MathSciNet  Google Scholar 

  41. Machado, J.T., Costa, A.C., Quelhas, M.D.: Shannon, Rényi and Tsallis entropy analysis of DNA using phase plane. Nonlinear Anal. Ser. B: Real World Appl. 12(6), 3135–3144 (2011)

    Article  Google Scholar 

  42. Machado, J.T., Lopes, A.M.: Fractional Jensen–Shannon analysis of the scientific output of researchers in fractional calculus. Entropy 19(3), 127 (2017). https://doi.org/10.3390/e19030127

    Article  Google Scholar 

  43. Machado, J.T., Lopes, A.M.: Artistic painting: a fractional calculus perspective. Appl. Math. Model. 65, 614–626 (2019). https://doi.org/10.1016/j.apm.2018.09.009

    Article  MathSciNet  Google Scholar 

  44. Majtey, A.P., Lamberti, P.W., Prato, D.P.: Jensen–Shannon divergence as a measure of distinguishability between mixed quantum states. Phys. Rev. A 72, 052,310 (2005). https://doi.org/10.1103/PhysRevA.72.052310

    Article  Google Scholar 

  45. Miller, K., Ross, B.: An Introduction to the Fractional Calculus and Fractional Differential Equations. John Wiley and Sons, New York (1993)

    MATH  Google Scholar 

  46. Oldham, K., Spanier, J.: The Fractional Calculus: Theory and Application of Differentiation and Integration to Arbitrary Order. Academic Press, New York (1974)

    MATH  Google Scholar 

  47. Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E.: Long-range correlations in nucleotide sequences. Nature 356(6365), 168–170 (1992). https://doi.org/10.1038/356168a0

    Article  Google Scholar 

  48. Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlina, S., Sciortino, F., Simons, M., Stanley, H.E.: Fractal landscape analysis of DNA walks. Physica A: Statistical Mechanics and its Applications 191(1–4), 25–29 (1993). https://doi.org/10.1016/0378-4371(92)90500-P

    Article  Google Scholar 

  49. Petráš, I.: Fractional-Order Nonlinear Systems: Modeling, Analysis and Simulation. Springer, Heidelberg (2011)

    Book  Google Scholar 

  50. Podlubny, I.: Fractional Differential Equations, Volume 198: An Introduction to Fractional Derivatives, Fractional Differential Equations, to Methods of Their Solution, Mathematics in Science and Engineering. Academic Press, San Diego (1998)

    Google Scholar 

  51. Provata, A., Almirantis, Y.: Fractal Cantor patterns in the sequence structure of DNA. Fractals 08(01), 15–27 (2000). https://doi.org/10.1142/S0218348X00000044

    Article  Google Scholar 

  52. Provata, A., Nicolis, C., Nicolis, G.: DNA viewed as an out-of-equilibrium structure. Phys. Rev. E 89(052105) (2014). https://doi.org/10.1103/PhysRevE.89.052105

  53. Randić, M., Vračko, M., Nandy, A., Basak, S.C.: On 3-D graphical representation of DNA primary sequences and their numerical characterization. J. Chem. Inf. Comput. Sci. 40(5), 1235–1244 (2000). https://doi.org/10.1021/ci000034q

    Article  Google Scholar 

  54. Román-Roldán, R., Bernaola-Galván, P., Oliver, J.L.: Application of information theory to DNA sequence analysis: a review. Pattern Recognit. 29(7), 1187–1194 (1996). https://doi.org/10.1016/0031-3203(95)00145-X

    Article  Google Scholar 

  55. Román-Roldán, R., Bernaola-Galván, P., Oliver, J.L.: Sequence compositional complexity of DNA through an entropic segmentation method. Phys. Rev. Lett. 80, 1344 (1998). https://doi.org/10.1103/PhysRevLett.80.1344

    Article  Google Scholar 

  56. Roy, A., Raychaudhury, C., Nandy, A.: Novel techniques of graphical representation and analysis of DNA sequences—A review. J. Biosci. 23(1), 55–71 (1998). https://doi.org/10.1007/BF02728525

    Article  Google Scholar 

  57. Sabatier, J., Agrawal, O.P., Machado, J.T.: Advances in Fractional Calculus: Theoretical Developments and Applications in Physics and Engineering. Springer, Dordrecht, The Netherlands (2007)

    Book  Google Scholar 

  58. Samko, S., Kilbas, A., Marichev, O.: Fractional Integrals and Derivatives: Theory and Applications. Gordon and Breach Science Publishers, Amsterdam (1993)

    MATH  Google Scholar 

  59. Schroeder, M.: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise. W. H. Freeman, New York (1991)

    MATH  Google Scholar 

  60. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)

    Article  MathSciNet  Google Scholar 

  61. Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America 106(8), 2677–2682 (2009). https://doi.org/10.1073/pnas.0813249106

    Article  Google Scholar 

  62. Tarasov, V.: Fractional Dynamics: Applications of Fractional Calculus to Dynamics of Particles, Fields and Media. Springer, New York (2010)

    Book  Google Scholar 

  63. Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68(25), 3805–3808 (1992). https://doi.org/10.1103/PhysRevLett.68.3805

    Article  Google Scholar 

  64. Zhang, C.T., Zhang, R., Ou, H.Y.: The Z curve database: a graphic representation of genome sequences. Bioinformatics 19(5), 593–599 (2003). https://doi.org/10.1093/bioinformatics/btg041

    Article  Google Scholar 

Download references

Acknowledgements

The author thanks the following organizations for allowing access to genome data. Human: The Genome Reference Consortium, Common Chimpanzee: The Chimpanzee Sequencing and Analysis Consortium, Bonobo: Max Planck Institute for Evolutionary Anthropology, Orangutan: Genome Sequencing Center at WUSTL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Tenreiro Machado.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Machado, J.T. Information analysis of the human DNA. Nonlinear Dyn 98, 3169–3186 (2019). https://doi.org/10.1007/s11071-019-05066-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11071-019-05066-7

Keywords

Navigation