Skip to main content
Log in

A combined support vector machine-FCGS classification based on the wavelet transform for Helitrons recognition in C.elegans

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The Helitrons, an important sub-class of the transposable elements (TEs) class II, have been revealed in diverse eukaryotic genomes. They are mobile elements with great impact on genomic evolution. Till today, there is no systematic classification model of helitrons; that’s why we thought of creating an efficient automatic model to identify these sequences. This paper focuses on the discrimination between helitrons and non-helitrons using the Support Vector Machine (SVM). In this study, we use all the SVM kernels and the higher accuracy rates are obtained by reaching the optimal kernels-parameters (d, c and σ). Further, we introduce two methods to represent the genomic sequences in the form of features to be considered later for the classification task: (i) the temporal and the spectral features extracted from the Frequency Chaos Game Signals order 2 (FCGS2) (ii) the features extracted from the Continuous Wavelet Transform (CWT) applied to the FCGS2 signals. The dataset we used regards two types DNA classes in C.elegans: the helitrons and the repetitive DNA sequences that contain microsatellites and do not form helitrons. The classification results prove that the wavelet energy feature is more effective than the FCGS2 features in the helitron’s recognition system. The performance of our system achieves a high recognition rate (Globally accuracy rate) reaching the value of 92.27%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Amin HU, Malik AS, Ahmad RF (2015) Feature extraction and classification for EEG signals using wavelet transform and machine learning techniques. Australas Phys Eng Sci Med 38:139–149. https://doi.org/10.1007/s1324

    Article  Google Scholar 

  2. Barbaglia AM, Klusman KM, Higgins J, Shaw JR, Hannah LC, Lal SK (2012) Gene capture by Helitron transposons reshuffles the transcriptome of maize. Genetics 190:965–975. https://doi.org/10.1534/genetics.111.136176

    Article  Google Scholar 

  3. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 2:273–297

    MATH  Google Scholar 

  4. Dias GB, Heringer P, Kuhn GC (2016) Helitrons in Drosophila: chromatin modulation and tandem insertions. Mob Genet Elements 62:e1154638

    Article  Google Scholar 

  5. Du C, Caronna J, He L, Dooner HK (2008) Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 9:51. https://doi.org/10.1186/1471-2164-9-51

    Article  Google Scholar 

  6. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. https://doi.org/10.1093/nar/gkh340

    Article  Google Scholar 

  7. Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914. https://doi.org/10.1093/bioinformatics/16.10.906

    Article  Google Scholar 

  8. Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. MTAP 76:7803–7821. https://doi.org/10.1007/s11042-016-3418-y

    Google Scholar 

  9. Grossmann A, Morlet J (1984) Decomposition of hardy functions into square integrable wavelets of constant shape. SIAM J Math Anal 15:723–736. https://doi.org/10.1137/0515056

    Article  MathSciNet  MATH  Google Scholar 

  10. Gutschoven B, Verlinde P (2000) Multi-modal identity verification using support vector machines (SVM). In: Information Fusion. FUSION 2000. Proceedings of the Third International Conference on IEEE, Vol. 2, pp. THB3–3, July. 2000

  11. Hood ME (2005) Repetitive DNA in the automictic fungus Microbotryumviolaceum. Genetica 124:1–10. https://doi.org/10.1007/s10709-004-6615-y

    Article  Google Scholar 

  12. Huang Y, Yang YB, Gao XC et al (2017) Genome-wide identification and characterization of microRNAs and target prediction by computational approaches in common carp. Gene Reports 8:30–36

    Article  Google Scholar 

  13. Jahankhani P, Kodogiannis V, Revett K (2006) EEG signal classification using wavelet feature extraction and neural networks. In: Modern Computing IEEE John Vincent Atanasoff 2006 International Symposium 120–124

  14. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenetic Genome Res 110:462–467. https://doi.org/10.1159/000084979

    Article  Google Scholar 

  15. Kapitonov VV, Jurka J (2001) Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci 98:8714–8719. https://doi.org/10.1073/pnas.151269298

    Article  Google Scholar 

  16. Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet 23:521–529. https://doi.org/10.1016/j.tig.2007.08.004

    Article  Google Scholar 

  17. Kaur B, Singh D, Roy PP (2017) A novel framework of eeg-based user identification by analyzing music-listening behavior. MTAP 76(24):25581–25602. https://doi.org/10.1007/s11042-016-4232-2

    Google Scholar 

  18. Kumar M, Gromiha MM, Raghava GP (2011) SVM based prediction of RNA-binding proteins using binding residues and evolutionary information. J Mol Recognit 24:303–313. https://doi.org/10.1002/jmr.1061

    Article  Google Scholar 

  19. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley

  20. Li L, Luo Q, Xiao W et al (2017) A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features. J Bioinforma Comput Biol 15:01: 1650025. https://doi.org/10.1142/S0219720016500256

    Google Scholar 

  21. Lin HT, Lin CJ (2003) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Neural Comput 3:1–32

    Google Scholar 

  22. Mateos A, Dopazo J, Jansen R, Tu Y, Gerstein M, Stolovitzky G (2002) Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res 12:1703–1715 http://www.genome.org/cgi/doi/10.1101/gr.192502

    Article  Google Scholar 

  23. Mena-Chalco J, Carrer H, Zana Y, Cesar RM (2008) Identification of protein coding regions using the modified Gabor-wavelet transform. IEEE/ACM TCBB 5:198–207

    Google Scholar 

  24. Merry RJE, Steinbuch M (2005) Wavelet theory and applications. Literature Study, Eindhoven University of Technology, Department of Mechanical Engineering, Control Systems Technology Group

  25. Messaoudi I, Oueslati AE, Lachiri Z (2014) Building specific signals from frequency chaos game and revealing periodicities using a smoothed Fourier analysis. IEEE/ACM Trans Comput Biol Bioinform 11:863–877. https://doi.org/10.1109/TCBB.2014.2315991

    Article  Google Scholar 

  26. Messaoudi I, Oueslati AE, Lachiri Z (2015) 2D DNA representations generated using a new coding and the time-frequency analysis. JMIHI 5:1035–1044. https://doi.org/10.1166/jmihi.2015.1498

    Google Scholar 

  27. NAJMI AH, SADOWSKY J (1997) The continuous wavelet transform and variable resolution time-frequency analysis. Johns Hopkins APL Tech Dig 18:134–140

    Google Scholar 

  28. Nigatu D, Sobetzko P, Yousef M et al (2017) Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinformatics 18:1: 473. https://doi.org/10.1186/s12859-017-1884-5

    Article  Google Scholar 

  29. Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering and a multilayer perceptron neural network model. Expert Syst Appl 38:13475–13481. https://doi.org/10.1016/j.eswa.2011.04.149

    Article  Google Scholar 

  30. Oueslati AE, Ellouze N, Lachiri Z (2007) 3D spectrum analysis of DNA sequence: application to Caenorhabditis elegans genome. In: Bioinformatics and Bioengineering (BIBE 2007) 864–871

  31. Oueslati AE, Messaoudi I, Lachiri Z, Ellouze N (2015) A new way to visualize DNA’s base succession: the Caenorhabditis elegans chromosome landscapes. Med Biol Eng Comput 53:1165–1176. https://doi.org/10.1007/s11517-015-1304-9

    Article  Google Scholar 

  32. Öz E, Kaya H (2013) Support vector machines for quality control of DNA sequencing. JIAP 2013:85. https://doi.org/10.1186/1029-242X-2013-85

    MATH  Google Scholar 

  33. Poulter RTM, Goodwin TJD (2005) DIRS-1 and the other tyrosine recombinase retrotransposons. Cytogenet Genome Res 110:575–588. https://doi.org/10.1159/000084991

    Article  Google Scholar 

  34. Poulter RT, Goodwin TJ, Butler MI (2003) Vertebrate helentrons and othernovel Helitrons. Gene 313:201–212. https://doi.org/10.1016/S0378-1119(03)00679-6

    Article  Google Scholar 

  35. Pritham EJ, Feschotte C (2007) Massive amplification of rolling-circle transposons in the lineage of the bat Myotislucifugus. Proc Natl Acad Sci 104:1895–1900. https://doi.org/10.1073/pnas.0609601104

    Article  Google Scholar 

  36. Schiilkopf B (2001) The kernel trick for distances. Adv Neural Inf Proces Syst 13:301–307

    Google Scholar 

  37. Schlötterer C (2000) Evolutionary dynamics of microsatellite DNA. Chromosoma 109:365–371. https://doi.org/10.1007/s004120000089

    Article  Google Scholar 

  38. Shawe-Taylor J et al (1998) Structural risk minimization over data-dependent hierarchies. IEEE Trans Inf Theory 44:1926–1940. https://doi.org/10.1109/18.705570

    Article  MathSciNet  MATH  Google Scholar 

  39. Song J, Li F, Takemoto K et all (2018) PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 443:125–137 https://doi.org/10.1016/j.jtbi.2018.01.023

  40. Suo H, Li M, Lu P, Yan Y (2008) Using SVM as back-end classifier for language identification. EURASIP ASMP  2008:674859. https://doi.org/10.1155/2008/674859

    Google Scholar 

  41. Sweredoski M, DeRose-Wilson L, Gaut BSA (2008) Comparative computational analysis of nonautonomous helitron elements between maize and rice. BMC Genomics 9:467. https://doi.org/10.1186/1471-2164-9-467

    Article  Google Scholar 

  42. Takezaki N, Nei M (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics 144:389–399

    Google Scholar 

  43. Tempel S (2007) Dynamique des hélitronsdans le génomed’arabidopsisthaliana: développement de nouvellesstratégiesd’analyse des élémentstransposables. PHD Thesis, IRISA, Université de Rennes I. https://tel.archives-ouvertes.fr/tel-00185256

  44. The NCBI GenBank database. [Online]. Available: http://www.ncbi.nlm.nih.gov/Genbank/. Accessed 15 Sept 2005

  45. Thomas J, Pritham EJ (2015) Helitrons, the eukaryotic rolling-circle transposable elements. Mobile DNAIII ASMscience  3:893–926. https://doi.org/10.1128/microbiolspec.MDNA3-0049-2014

    Google Scholar 

  46. Touati R, Messaoudi I, Oueslati AE, Lachiri Z (2018) Helitron’s periodicities identification in C. Elegans based on the smoothed spectral analysis and the frequency Chaos game signal coding. Int J Adv Comput Sci Appl 9(4). https://doi.org/10.14569/IJACSA.2018.090438

  47. Touati R, Messaoudi I, Oueslati AE, Lachiri, Z (2018) Classification of Helitron’s Types in the C. elegans Genome based on Features Extracted from Wavelet Transform and SVM Methods. Bioinformatics 127–134. https://doi.org/10.5220/0006631001270134

  48. Valli I, Marquand AF, Mechelli A et al (2016) Identifying individuals at high risk of psychosis: predictive utility of support vector machine using structural and functional Mri data. Front Psychiatry 7:52. https://doi.org/10.3389/fpsyt.2016.00052

    Article  Google Scholar 

  49. Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media

  50. Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  51. Wicker T, Sabot F, Hua-Van A et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982. https://doi.org/10.1038/nrg2165

    Article  Google Scholar 

  52. Xie D, Li A, Wang M, Fan Z, Feng H (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33:W105–W110. https://doi.org/10.1093/nar/gki359

    Article  Google Scholar 

  53. Xiong W, He L, Lai J, Dooner HK, Du C (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci 111:10263–10268. https://doi.org/10.1073/pnas.1410068111

    Article  Google Scholar 

  54. Yang L, Bennetzen JL (2009) Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci 106:12832–12837. https://doi.org/10.1073/pnas.0905563106

    Article  Google Scholar 

  55. Zhou Q et al (2006) Helitron transposons on the sex chromosomes of the Platyfish Xiphophorus maculatus and their evolution in animal genomes. Zebrafish 3:39–52. https://doi.org/10.1089/zeb.2006.3.39

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rabeb Touati.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Touati, R., Messaoudi, I., Oueslati, A.E. et al. A combined support vector machine-FCGS classification based on the wavelet transform for Helitrons recognition in C.elegans. Multimed Tools Appl 78, 13047–13066 (2019). https://doi.org/10.1007/s11042-018-6455-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6455-x

Keywords

Navigation