Abstract
This paper presents an experimental investigation to determine the efficacy and the appropriate order of Frequency Chaos Game Representation (FCGR) for accurate and in silico classification of pathogenic viruses. For this study, we curated genomic sequences of selected viral pathogens from the virus pathogen database and analysis resource corpus. The viral genomes were encoded using the first to seventh order FCGRs so as to produce training and testing genomic data features. Thereafter, four different kernels of naïve Bayes classifier were experimentally trained and tested with the generated FCGR genomic features. The performance result with the highest average classification accuracy of 98% was returned by the third and fourth order FCGRs. However, due to consideration for memory utilization, computational efficiency vis-à-vis classification accuracy, the third order FCGR is deemed suitable for accurate classification of viral pathogens from genome sequences. This provides a promising foundation for developing genomic based diagnostic toolkit that could be used to promptly address the global incidence of epidemics from pathogenic viruses.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adetiba, E., Olugbara, O.O., Taiwo, T.B.: Identification of pathogenic viruses using genomic cepstral coefficients with radial basis function neural network. In: Pillay, N., Engelbrecht, A.P., Abraham, A., du Plessis, M.C., Snášel, V., Muda, A.K. (eds.) Advances in Nature and Biologically Inspired Computing. AISC, vol. 419, pp. 281–291. Springer, Cham (2016). doi:10.1007/978-3-319-27400-3_25
Hoang, T., Yin, C., Yau, S.S.T.: Numerical encoding of DNA sequences by chaos game representation with application in similarity comparison. Genomics 108(3), 134–142 (2016)
Huang, G., Zhou, H., Li, Y., Xu, L.: Alignment-free comparison of genome sequences by a new numerical characterization. J. Theor. Biol. 281(1), 107–112 (2011)
Qi, Z.H., Du, M.H., Qi, X.Q., Zheng, L.J.: Gene comparison based on the repetition of single-nucleotide structure patterns. Comput. Biol. Med. 42(10), 975–981 (2012)
Karamichalis, R., Kari, L., Konstantinidis, S., Kopecki, S.: An investigation into inter-and intragenomic variations of graphic genomic signatures. BMC Bioinform. 16(1), 1 (2015)
Swain, M.T.: Fast comparison of microbial genomes using the Chaos games representation for metagenomic applications. Procedia Comput. Sci. 18, 1372–1381 (2013)
Deschavanne, P.J., Giron, A., Vilain, J., Fagot, G., Fertil, B.: Genomic signature: characterization and classification of species assessed by chaos game representation of sequences. Mol. Biol. Evol. 16(10), 1391–1399 (1999)
Almeida, J.S., Carrico, J.A., Maretzek, A., Noble, P.A., Fletcher, M.: Analysis of genomic sequences by chaos game representation. Bioinformatics 17(5), 429–437 (2001)
Jeffrey, H.J.: Chaos game representation of gene structure. Nucleic Acids Res. 18, 2163–2170 (1990)
Wang, Y., Hill, K., Singh, S., Kari, L.: The spectrum of genomic signatures: from dinucleotides to chaos game representation. Gene 14(346), 173–178 (2005)
Messaoudi, I., Oueslati, A.E., Lachiri, Z.: Wavelet analysis of frequency chaos game signal: a time-frequency signature of the C. elegans DNA. EURASIP J. Bioinform. Syst. Biol. 2014(1), 1 (2014)
Kari, L., Hill, K.A., Sayem, A.S., Karamichalis, R., Bryans, N., Davis, K., Dattani, N.S.: Mapping the space of genomic signatures. PLoS one 10(5), e0119815 (2015)
Tanchotsrinon, W., Lursinsap, C., Poovorawan, Y.: A high performance prediction of HPV genotypes by chaos game representation and singular value decomposition. BMC Bioinform. 16(1), 1 (2015)
Stan, C., Cristescu, C.P., Scarlat, E.I.: Similarity analysis for DNA sequences based on chaos game representation. Case study: the albumin. J. Theoret. Biol. 267(4), 513–518 (2010)
Sandberg, R., Winberg, G., Bränden, C.I., Kaske, A., Ernberg, I., Cöster, J.: Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res. 11(8), 1404–1409 (2001)
Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 5261–5267 (2007)
Janecek, A., Gansterer, W.N., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: FSDM, pp. 90–105, 15 September 2008
Vijayan, K., Nair, V.V., Gopinath, D.P.: Classification of organisms using frequency-chaos game representation of genomic sequences and ANN. In: 10th National Conference on Technological Trends (NCTT 2009), pp. 6–7, November 2009
Nair, V.V., Nair, A.S.: Combined classifier for unknown genome classification using chaos game representation features. In: Proceedings of the International Symposium on Biocomputing, p. 35. ACM (2010)
Yang, L., Tan, Z., Wang, D., Xue, L., Guan, M.X., Huang, T., Li, R.: Species identification through mitochondrial rRNA genetic analysis. Sci. Rep. 4(4089), 1–11 (2014)
Adetiba, E., Olugbara, O.O.: Classification of eukaryotic organisms through cepstral analysis of mitochondrial DNA. In: Mansouri, A., Nouboud, F., Chalifour, A., Mammass, D., Meunier, J., ElMoataz, A. (eds.) ICISP 2016. LNCS, vol. 9680, pp. 243–252. Springer, Cham (2016). doi:10.1007/978-3-319-33618-3_25
Acknowledgement
The publication of this study is supported and funded by the Covenant University Centre for Research, Innovation and Development (CUCRID), Covenant University, Canaanland, Ota, Ogun State, Nigeria.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Adetiba, E., Badejo, J.A., Thakur, S., Matthews, V.O., Adebiyi, M.O., Adebiyi, E.F. (2017). Experimental Investigation of Frequency Chaos Game Representation for in Silico and Accurate Classification of Viral Pathogens from Genomic Sequences. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10208. Springer, Cham. https://doi.org/10.1007/978-3-319-56148-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-56148-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56147-9
Online ISBN: 978-3-319-56148-6
eBook Packages: Computer ScienceComputer Science (R0)