GRaphical Footprint Based Alignment-Free Method (GRAFree) for Classifying the Species in Large-Scale Genomics

  • Aritra MahapatraEmail author
  • Jayanta Mukherjee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11942)


In our study, we propose to use novel features from mitochondrial genomic sequences reflecting their evolutionary traits by a novel GRaphical footprint based Alignment-Free method (GRAFree). These features are used to classify a set of species to different classes. A novel distance measure in the feature space is also proposed to measure the proximity of these species in the evolutionary processes. The distance function is found to be a metric. Further we model the evolutionary relationships of these classes by forming a phylogenetic tree. Experimentations were carried out with 157 species covering four different classes such as, Insecta, Actinopterygii, Aves, and Mammalia. We apply our proposed distance function on the selected feature vectors for three different graphical representations of genome. The inferred trees corroborate accepted evolutionary traits. This demonstrates that our proposed distance function and feature representation can be applied to classify different species and to capture the evolutionary relationships among their classes.


Classification Phylogeny Mitochondrial genome Graphical footprint k-nearest neighbor classifier Hierarchical clustering 


  1. 1.
    Bernard, G., Ragan, M.A., Chan, C.X.: Recapitulating phylogenies using k-mers: from trees to networks. F1000Research 5, 2789 (2016)CrossRefGoogle Scholar
  2. 2.
    Bernt, M., Braband, A., Schierwater, B., Stadler, P.F.: Genetic aspects of mitochondrial genome evolution. Mol. Phylogenet. Evol. 69(2), 328–338 (2013)CrossRefGoogle Scholar
  3. 3.
    Bourque, G., Pevzner, P.A.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 12, 26–36 (2002)Google Scholar
  4. 4.
    Eyre-Walker, A., Awadalla, P.: Does human mtDNA recombine? J. Mol. Evol. 53(4), 430–435 (2001)CrossRefGoogle Scholar
  5. 5.
    Fulton, T.L., Wagner, S.M., Fisher, C., Shapiro, B.: Nuclear DNA from the extinct Passenger Pigeon (Ectopistes migratorius) confirms a single origin of New World pigeons. Ann. Anat. - Anatomischer Anzeiger 194(1), 52–57 (2012). Special Issue: Ancient DNACrossRefGoogle Scholar
  6. 6.
    Gao, Y., Luo, L.: Genome-based phylogeny of dsDNA viruses by a novel alignment-free method. Gene 492(1), 309–314 (2012)CrossRefGoogle Scholar
  7. 7.
    Gates, M.: A simple way to look at DNA. J. Theor. Biol. 119(3), 319–328 (1986)CrossRefGoogle Scholar
  8. 8.
    Huang, Y., Wang, T.: Phylogenetic analysis of DNA sequences with a novel characteristic vector. J. Math. Chem. 49(8), 1479–1492 (2011)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Kumar, V., et al.: The evolutionary history of bears is characterized by gene flow across species. Sci. Rep. 7, 46487 (2017)CrossRefGoogle Scholar
  10. 10.
    Langille, M.G.I., Hsiao, W.W.L., Brinkman, F.S.L.: Detecting genomic islands using bioinformatics approaches. Nat. Rev. Microbiol. 8(5), 373–382 (2010)CrossRefGoogle Scholar
  11. 11.
    Leimeister, C.A., Morgenstern, B.: Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison. Bioinformatics 30(14), 2000–2008 (2014)CrossRefGoogle Scholar
  12. 12.
    Leong, P., Morgenthaler, S.: Random walk and gap plots of DNA sequences. Bioinformatics 11(5), 503–507 (1995)CrossRefGoogle Scholar
  13. 13.
    Moret, B.M.E.: Phylogenetic analysis of whole genomes. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 4–7. Springer, Heidelberg (2011). Scholar
  14. 14.
    Nandy, A.: A new graphical representation and analysis of DNA sequence structure: I. Methodology and application to globin genes. Curr. Sci. 66(4), 309–314 (1994)Google Scholar
  15. 15.
    Nandy, A., Harle, M., Basak, S.C.: Mathematical descriptors of DNA sequences: development and applications. ARKIVOC 2006(9), 211–238 (2006)CrossRefGoogle Scholar
  16. 16.
    Randić, M., Novič, M., Plavšić, D.: Milestones in graphical bioinformatics. Int. J. Quantum Chem. 113(22), 2413–2446 (2013)Google Scholar
  17. 17.
    Ren, J., et al.: Alignment-free sequence analysis and applications. Ann. Rev. Biomed. Data Sci. 1, 93–114 (2018)CrossRefGoogle Scholar
  18. 18.
    Shi, X., Tian, P., Lin, R., Huang, D., Wang, J.: Characterization of the complete mitochondrial genome sequence of the globose head whiptail cetonurus globiceps (gadiformes: macrouridae) and its phylogenetic analysis. PLOS One 11(4), 688–704 (2016)Google Scholar
  19. 19.
    Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. W. H. Freeman and Company, San Francisco (1973)zbMATHGoogle Scholar
  20. 20.
    Song, S.N., Tang, P., Wei, S.J., Chen, X.X.: Comparative and phylogenetic analysis of the mitochondrial genomes in basal hymenopterans. Sci. Rep. 6, 20972 (2016)CrossRefGoogle Scholar
  21. 21.
    Xie, G.S., Jin, X.B., Yang, C., Pu, J., Mo, Z.: Graphical representation and similarity analysis of DNA sequences based on trigonometric functions. Acta Biotheoretica 66(2), 113–133 (2018)CrossRefGoogle Scholar
  22. 22.
    Zhang, W., Zhang, M.: Complete mitochondrial genomes reveal phylogeny relationship and evolutionary history of the family Felidae. Genet. Mol. Res. 12, 3256–3262 (2013)CrossRefGoogle Scholar
  23. 23.
    Zhao, L., Gao, T., Lu, W.: Complete mitochondrial DNA sequence of the endangered fish (Bahaba taipingensis): mitogenome characterization and phylogenetic implications. ZooKeys 546, 181 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of TechnologyKharagpurIndia

Personalised recommendations