Revealing the Relation Between Structure of Chloroplast Genomes and Host Taxonomy

  • Michael SadovskyEmail author
  • Anna Chernyshova
Conference paper
Part of the Springer Proceedings in Complexity book series (SPCOM)


The distribution of chloroplast genomes in 63-dimensional space of triplet frequencies was studied, in connection to the taxonomy correlation to the clusters observed in the distribution. That latter was developed through K-means implementation, for the number of classes varying from 2 to 8. The clade composition of those clusters has been analyzed. Unexpectedly high regularity in clades occupation of different clusters has been found thus proving very high synchrony in evolution of two physically independent genetic entities (chloroplasts vs. nuclear genomes): the proximity in frequency space was determined over the organelle genomes, while the proximity in taxonomy was determined morphologically.


Chloroplast Genome Nuclear Genome Symbol Sequence Final Distribution Unsupervised Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partly supported by a research grant No. 14.Y26.31.0004 from the Government of the Russian Federation.


  1. 1.
    Provata, A., Nicolis, C., Nicolis, G.: DNA viewed as an out-of-equilibrium structure. Phys. Rev. E 89, 052105 (2014)ADSCrossRefGoogle Scholar
  2. 2.
    Qin, L., Zhang, Z., Zhao, X., Xiaolong, W., Chen, Y., Tan, Z., Li, S.: Survey and analysis of simple sequence repeats (SSRs) present in the genomes of plant viroids. FEBS Open Bio 4, 185–189 (2014)Google Scholar
  3. 3.
    Tiwari, A.K., Srivastava, R.: A survey of computational intelligence techniques in protein function prediction. Int. J. Proteomics. 845479 (2014)Google Scholar
  4. 4.
    Huang, Y., Mrázek, J.: Assessing diversity of DNA structure-related sequence features in prokaryotic genomes. DNA Res. 21(3), 285–297 (2014)CrossRefGoogle Scholar
  5. 5.
    Foulongne-Oriol, M., Murat, C., Castanera, R., Ramírez, L., Sonnenberg, A.S.: Genome-wide survey of repetitive DNA elements in the button mushroom Agaricus bisporus. Fungal Genet. Biol. 55, 6–21 (2013)CrossRefGoogle Scholar
  6. 6.
    Sharma, M.K., Sharma, R., Peijian, C., Jenkins, J., Bartley, L.E., Qualls, M., Grimwood, J., Schmutz, J., Rokhsar, D., Ronald, P.C.A.: Genome-wide survey of switchgrass genome structure and organization. PLoS One. 7(4), e33892 (2012)Google Scholar
  7. 7.
    Fischer, N.O., Tok, J.B., Tarasow, T.M.: Massively parallel interrogation of aptamer sequence, structure and function. PLoS One 3(7), e2720 (2008)ADSCrossRefGoogle Scholar
  8. 8.
    Burge, S., Parkinson, G.N., Hazel, P., Todd, A.K., Neidle, S.: Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res. 34(19), 5402–5415 (2006)CrossRefGoogle Scholar
  9. 9.
    Zinovjev, A.Y., Gorban, A.N., Popova, T.G.: Seven clusters in genomic triplet distribution. Siliko Biol. 3, 471–482 (2003)Google Scholar
  10. 10.
    Gorban, A.N., Zinovjev, A.Y., Popova, T.G.: Self-organizing approach for automated gene identification. Open Syst. Inf. Dyn. 10, 321–333 (2003)Google Scholar
  11. 11.
    Gusev, V.D., Nemytikova, L.A., Chuzhanova, N.A.: On the complexity measures of genetic sequences. Bioinformatics 15, 994–999 (1999)CrossRefGoogle Scholar
  12. 12.
    Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Towards the definition of information content of nucleotide sequences. Mol. Biol. Moscow. 30, 5, 529–541 (1996)Google Scholar
  13. 13.
    Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: The information capacity of nucleotide sequences and their fragments. Biophysics 5, 1063–1069 (1997)Google Scholar
  14. 14.
    Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5, 2, 265–278 (1998)Google Scholar
  15. 15.
    Popova, T.G., Sadovsky, M.G.: Splicing results in decrease of gene redundancy. Mol. Biol. Moscow 29(3), 500–506 (1995)Google Scholar
  16. 16.
    Popova, T.G., Sadovsky, M.G.: Introns differ from exons in their redundancy. Rus. J. Genet. 31(10), 1365–1369 (1995)Google Scholar
  17. 17.
    Gorban, A.N., Popova, T.G., Sadovsky, M.G.: Human viruses genes are less redundant than the human genes. Rus. J. Genet. 32(2), 281–294 (1996)Google Scholar
  18. 18.
    Sadovsky, M.G.: On the redundancy of viral and prokaryotic genomes. Rus. J. Genet. 38(5), 695–701 (2002)CrossRefGoogle Scholar
  19. 19.
    Gorban, A.N., Popova, T.G., Sadovsky, M.G., Wünsch, D.C.: Information content of the frequency dictionaries, reconstruction, transformation and classification of dictionaries and genetic texts. In: Intelligent Engineering Systems through Artificial Neural Networks, vol. 11. Smart Engineering System Design, pp. 657–663. ASME Press, New York (2001)Google Scholar
  20. 20.
    Gorban, A.N., Popova, T.G., Sadovsky, M.G.: Classification of symbol sequences over their frequency dictionaries: towards the connection between structure and natural taxonomy. Open Syst. Inf. Dyn. 7, 1–17 (2000)CrossRefzbMATHGoogle Scholar
  21. 21.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2 edn., p. 591. Academic Press, London (1990)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Institute of Computational Modelling SB RASKrasnoyarskRussia
  2. 2.Siberian Federal UniversityKrasnoyarskRussia

Personalised recommendations