Abstract
The Helitrons, an important sub-class of the transposable elements (TEs) class II, have been revealed in diverse eukaryotic genomes. They are mobile elements with great impact on genomic evolution. Till today, there is no systematic classification model of helitrons; that’s why we thought of creating an efficient automatic model to identify these sequences. This paper focuses on the discrimination between helitrons and non-helitrons using the Support Vector Machine (SVM). In this study, we use all the SVM kernels and the higher accuracy rates are obtained by reaching the optimal kernels-parameters (d, c and σ). Further, we introduce two methods to represent the genomic sequences in the form of features to be considered later for the classification task: (i) the temporal and the spectral features extracted from the Frequency Chaos Game Signals order 2 (FCGS2) (ii) the features extracted from the Continuous Wavelet Transform (CWT) applied to the FCGS2 signals. The dataset we used regards two types DNA classes in C.elegans: the helitrons and the repetitive DNA sequences that contain microsatellites and do not form helitrons. The classification results prove that the wavelet energy feature is more effective than the FCGS2 features in the helitron’s recognition system. The performance of our system achieves a high recognition rate (Globally accuracy rate) reaching the value of 92.27%.
Similar content being viewed by others
References
Amin HU, Malik AS, Ahmad RF (2015) Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques. Australas Phys Eng Sci Med 38:139–149. https://doi.org/10.1007/s1324
Barbaglia AM, Klusman KM, Higgins J, Shaw JR, Hannah LC, Lal SK (2012) Gene capture by Helitron transposons reshuffles the transcriptome of maize. Genetics 190:965–975. https://doi.org/10.1534/genetics.111.136176
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 2:273–297
Dias GB, Heringer P, Kuhn GC (2016) Helitrons in Drosophila: chromatin modulation and tandem insertions. Mob Genet Elements 62:e1154638
Du C, Caronna J, He L, Dooner HK (2008) Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9:51. https://doi.org/10.1186/1471-2164-9-51
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. https://doi.org/10.1093/nar/gkh340
Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914. https://doi.org/10.1093/bioinformatics/16.10.906
Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. MTAP 76:7803–7821. https://doi.org/10.1007/s11042-016-3418-y
Grossmann A, Morlet J (1984) Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J Math Anal 15:723–736. https://doi.org/10.1137/0515056
Gutschoven B, Verlinde P (2000) Multi-modal identity verification using support vector machines (SVM). In: Information Fusion. FUSION 2000. Proceedings of the Third International Conference on IEEE, Vol. 2, pp. THB3–3, July. 2000
Hood ME (2005) Repetitive DNA in the automictic fungus Microbotryumviolaceum. Genetica 124:1–10. https://doi.org/10.1007/s10709-004-6615-y
Huang Y, Yang YB, Gao XC et al (2017) Genome-wide identification and characterization of microRNAs and target prediction by computational approaches in common carp. Gene Reports 8:30–36
Jahankhani P, Kodogiannis V, Revett K (2006) EEG signal classification using wavelet feature extraction and neural networks. In: Modern Computing IEEE John Vincent Atanasoff 2006 International Symposium 120–124
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res 110:462–467. https://doi.org/10.1159/000084979
Kapitonov VV, Jurka J (2001) Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci 98:8714–8719. https://doi.org/10.1073/pnas.151269298
Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet 23:521–529. https://doi.org/10.1016/j.tig.2007.08.004
Kaur B, Singh D, Roy PP (2017) A novel framework of eeg-based user identification by analyzing music-listening behavior. MTAP 76(24):25581–25602. https://doi.org/10.1007/s11042-016-4232-2
Kumar M, Gromiha MM, Raghava GP (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J Mol Recognit 24:303–313. https://doi.org/10.1002/jmr.1061
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley
Li L, Luo Q, Xiao W et al (2017) A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J Bioinforma Comput Biol 15:01: 1650025. https://doi.org/10.1142/S0219720016500256
Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Comput 3:1–32
Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovitzky G (2002) Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res 12:1703–1715 http://www.genome.org/cgi/doi/10.1101/gr.192502
Mena-Chalco J, Carrer H, Zana Y, Cesar RM (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM TCBB 5:198–207
Merry RJE, Steinbuch M (2005) Wavelet theory and applications. Literature Study, Eindhoven University of Technology, Department of Mechanical Engineering, Control Systems Technology Group
Messaoudi I, Oueslati AE, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed Fourier analysis. IEEE/ACM Trans Comput Biol Bioinform 11:863–877. https://doi.org/10.1109/TCBB.2014.2315991
Messaoudi I, Oueslati AE, Lachiri Z (2015) 2D DNA representations generated using a new coding and the time-frequency analysis. JMIHI 5:1035–1044. https://doi.org/10.1166/jmihi.2015.1498
NAJMI AH, SADOWSKY J (1997) The continuous wavelet transform and variable resolution time-frequency analysis. Johns Hopkins APL Tech Dig 18:134–140
Nigatu D, Sobetzko P, Yousef M et al (2017) Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinformatics 18:1: 473. https://doi.org/10.1186/s12859-017-1884-5
Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38:13475–13481. https://doi.org/10.1016/j.eswa.2011.04.149
Oueslati AE, Ellouze N, Lachiri Z (2007) 3D spectrum analysis of DNA sequence: application to Caenorhabditis elegans genome. In: Bioinformatics and Bioengineering (BIBE 2007) 864–871
Oueslati AE, Messaoudi I, Lachiri Z, Ellouze N (2015) A new way to visualize DNA’s base succession: the Caenorhabditis elegans chromosome landscapes. Med Biol Eng Comput 53:1165–1176. https://doi.org/10.1007/s11517-015-1304-9
Öz E, Kaya H (2013) Support vector machines for quality control of DNA sequencing. JIAP 2013:85. https://doi.org/10.1186/1029-242X-2013-85
Poulter RTM, Goodwin TJD (2005) DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res 110:575–588. https://doi.org/10.1159/000084991
Poulter RT, Goodwin TJ, Butler MI (2003) Vertebrate helentrons and othernovel Helitrons. Gene 313:201–212. https://doi.org/10.1016/S0378-1119(03)00679-6
Pritham EJ, Feschotte C (2007) Massive amplification of rolling-circle transposons in the lineage of the bat Myotislucifugus. Proc Natl Acad Sci 104:1895–1900. https://doi.org/10.1073/pnas.0609601104
Schiilkopf B (2001) The kernel trick for distances. Adv Neural Inf Proces Syst 13:301–307
Schlötterer C (2000) Evolutionary dynamics of microsatellite DNA. Chromosoma 109:365–371. https://doi.org/10.1007/s004120000089
Shawe-Taylor J et al (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44:1926–1940. https://doi.org/10.1109/18.705570
Song J, Li F, Takemoto K et all (2018) PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 443:125–137 https://doi.org/10.1016/j.jtbi.2018.01.023
Suo H, Li M, Lu P, Yan Y (2008) Using SVM as back-end classifier for language identification. EURASIP ASMP 2008:674859. https://doi.org/10.1155/2008/674859
Sweredoski M, DeRose-Wilson L, Gaut BSA (2008) Comparative computational analysis of nonautonomous helitron elements between maize and rice. BMC Genomics 9:467. https://doi.org/10.1186/1471-2164-9-467
Takezaki N, Nei M (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389–399
Tempel S (2007) Dynamique des hélitronsdans le génomed’arabidopsisthaliana: développement de nouvellesstratégiesd’analyse des élémentstransposables. PHD Thesis, IRISA, Université de Rennes I. https://tel.archives-ouvertes.fr/tel-00185256
The NCBI GenBank database. [Online]. Available: http://www.ncbi.nlm.nih.gov/Genbank/. Accessed 15 Sept 2005
Thomas J, Pritham EJ (2015) Helitrons, the eukaryotic rolling-circle transposable elements. Mobile DNAIII ASMscience 3:893–926. https://doi.org/10.1128/microbiolspec.MDNA3-0049-2014
Touati R, Messaoudi I, Oueslati AE, Lachiri Z (2018) Helitron’s periodicities identification in C. Elegans based on the smoothed spectral analysis and the frequency Chaos game signal coding. Int J Adv Comput Sci Appl 9(4). https://doi.org/10.14569/IJACSA.2018.090438
Touati R, Messaoudi I, Oueslati AE, Lachiri, Z (2018) Classification of Helitron’s Types in the C. elegans Genome based on Features Extracted from Wavelet Transform and SVM Methods. Bioinformatics 127–134. https://doi.org/10.5220/0006631001270134
Valli I, Marquand AF, Mechelli A et al (2016) Identifying individuals at high risk of psychosis: predictive utility of support vector machine using structural and functional Mri data. Front Psychiatry 7:52. https://doi.org/10.3389/fpsyt.2016.00052
Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media
Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New York
Wicker T, Sabot F, Hua-Van A et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982. https://doi.org/10.1038/nrg2165
Xie D, Li A, Wang M, Fan Z, Feng H (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33:W105–W110. https://doi.org/10.1093/nar/gki359
Xiong W, He L, Lai J, Dooner HK, Du C (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci 111:10263–10268. https://doi.org/10.1073/pnas.1410068111
Yang L, Bennetzen JL (2009) Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci 106:12832–12837. https://doi.org/10.1073/pnas.0905563106
Zhou Q et al (2006) Helitron transposons on the sex chromosomes of the Platyfish Xiphophorus maculatus and their evolution in animal genomes. Zebrafish 3:39–52. https://doi.org/10.1089/zeb.2006.3.39
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Touati, R., Messaoudi, I., Oueslati, A.E. et al. A combined support vector machine-FCGS classification based on the wavelet transform for Helitrons recognition in C.elegans. Multimed Tools Appl 78, 13047–13066 (2019). https://doi.org/10.1007/s11042-018-6455-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6455-x