Skip to main content

A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition

A Correction to this article was published on 05 December 2017

This article has been updated

Abstract

Enhancers are the short regularity genome regions, which are bounded with proteins to activate the transcription of a specific gene. Enhancers are further categorized into two classes on the basis of its structure and strength, namely strong enhancers and weak enhancers. Owing to technological improvement, huge numbers of DNA sequences are explored in data banks. So the identification of these unprocessed data via traditional methods is challenging due to intricate and vague structures. In addition, only limited numbers of recognized structures are available. In order to tackle the limitations of traditional methods, it is indispensable to adopt the concept of intelligent and machine learning. In this regard, a two-layer automated model is proposed. In the first layer, it discriminates between enhancer and non-enhancer. In case of enhancer, further, the second layer identifies the types of enhancer. DNA sequences are expressed using pseudo dinucleotide composition, pseudo trinucleotide composition and pseudo tetra nucleotide composition. In order to combine the strength of various feature spaces, a hybrid feature space is formed by amalgamating these three feature spaces. After compiling the results, it is observed that support vector machine achieved encouraging results in conjunction with hybrid feature space, which is 77.86 and 65.83% of accuracies on examined datasets. The obtained results exposed that the proposed model performed prominent compared to the existing approaches so far notifiable in the literature. It is realized that the developed model might be more useful and expedient for basic research and academia.

This is a preview of subscription content, access via your institution.

Change history

  • 05 December 2017

    The original version of this article unfortunately contained a mistake. One character was missing in the title, which was therefore incorrectly given as “A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition”.

References

  1. Erwin, G.D.; Oksenberg, N.; Truty, R.M.; Kostka, D.; Murphy, K.K.; Ahituv, N.; Pollard, K.S.; Capra, J.A.: Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput. Biol. 10, e1003677 (2014)

    Article  Google Scholar 

  2. Visel, A.; Rubin, E.M.; Pennacchio, L.A.: Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009)

    Article  Google Scholar 

  3. Sakabe, N.J.; Savic, D.; Nobrega, M.A.: Transcriptional enhancers in development and disease. Genome Biol. 13, 1 (2012)

    Article  Google Scholar 

  4. Heintzman, N.D.; Ren, B.: Finding distal regulatory elements in the human genome. Curr. Opin. Genet. Dev. 19, 541–549 (2009)

    Article  Google Scholar 

  5. May, D.; Blow, M.J.; Kaplan, T.; McCulley, D.J.; Jensen, B.C.; Akiyama, J.A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.: Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 44, 89–93 (2012)

    Article  Google Scholar 

  6. Heintzman, N.D.; Stuart, R.K.; Hon, G.; Fu, Y.; Ching, C.W.; Hawkins, R.D.; Barrera, L.O.; Van Calcar, S.; Qu, C.; Ching, K.A.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007)

    Article  Google Scholar 

  7. Ernst, J.; Kheradpour, P.; Mikkelsen, T.S.; Shoresh, N.; Ward, L.D.; Epstein, C.B.; Zhang, X.; Wang, L.; Issner, R.; Coyne, M.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011)

    Article  Google Scholar 

  8. Fernández, M.; Miranda-Saavedra, D.: Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines. Nucleic Acids Res. 40, e77–e77 (2012)

    Article  Google Scholar 

  9. Firpi, H.A.; Ucar, D.; Tan, K.: Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics 26, 1579–1586 (2010)

    Article  Google Scholar 

  10. Rajagopal, N.; Xie, W.; Li, Y.; Wagner, U.; Wang, W.; Stamatoyannopoulos, J.; Ernst, J.; Kellis, M.; Ren, B.: RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput. Biol. 9, e1002968 (2013)

    Article  Google Scholar 

  11. Visel, A.; Blow, M.J.; Li, Z.; Zhang, T.; Akiyama, J.A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.; Chen, F.: ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009)

    Article  Google Scholar 

  12. Visel, A.; Prabhakar, S.; Akiyama, J.A.; Shoukry, M.; Lewis, K.D.; Holt, A.; Plajzer-Frick, I.; Afzal, V.; Rubin, E.M.; Pennacchio, L.A.: Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat. Genet. 40, 158–160 (2008)

    Article  Google Scholar 

  13. Bryne, J.C.; Valen, E.; Tang, M.-H.E.; Marstrand, T.; Winther, O.; da Piedade, I.; Krogh, A.; Lenhard, B.; Sandelin, A.: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36(2008), D102–D106 (2008)

    Google Scholar 

  14. Kulakovskiy, I.V.; Medvedeva, Y.A.; Schaefer, U.; Kasianov, A.S.; Vorontsov, I.E.; Bajic, V.B.; Makeev, V.J.: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013)

    Article  Google Scholar 

  15. Ravasi, T.; Suzuki, H.; Cannistraci, C.V.; Katayama, S.; Bajic, V.B.; Tan, K.; Akalin, A.; Schmeier, S.; Kanamori-Katayama, M.; Bertin, N.: An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010)

    Article  Google Scholar 

  16. Ernst, J.; Kellis, M.: ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012)

    Article  Google Scholar 

  17. Hoffman, M.M.; Buske, O.J.; Wang, J.; Weng, Z.; Bilmes, J.A.; Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012)

    Article  Google Scholar 

  18. Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K.-C.: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32, 362–369 (2016)

    Article  Google Scholar 

  19. Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct. Funct. Bioinform. 43, 246–255 (2001)

    Article  Google Scholar 

  20. Cai, Y.-D.; Zhou, G.-P.; Chou, K.-C.: Support vector machines for predicting membrane protein types by using functional domain composition. Biophys. J. 84, 3257–3263 (2003)

    Article  Google Scholar 

  21. Kandaswamy, K.K.; Chou, K.-C.; Martinetz, T.; Möller, S.; Suganthan, P.; Sridharan, S.; Pugalenthi, G.: AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J. Theor. Biol. 270, 56–62 (2011)

    Article  Google Scholar 

  22. Thompson, T.B.; Chou, K.-C.; Zheng, C.: Neural network prediction of the HIV-1 protease cleavage sites. J. Theor. Biol. 177, 369–379 (1995)

    Article  Google Scholar 

  23. Lin, S.-X.; Lapointe, J.: Theoretical and experimental biology in one-a symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers. J. Biomed. Sci. Eng. 6, 435 (2013)

    Article  Google Scholar 

  24. Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)

    Article  Google Scholar 

  25. Chou, K.-C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)

    Article  Google Scholar 

  26. Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.: propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29, 960–962 (2013)

    Article  Google Scholar 

  27. Kabir, M.; Iqbal, M.; Ahmad, S.; Hayat, M.: iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput. Biol. Med. 66, 252–257 (2015)

    Article  Google Scholar 

  28. Du, P.; Gu, S.; Jiao, Y.: PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int. J. Mol. Sci. 15, 3495–3506 (2014)

    Article  Google Scholar 

  29. Chou, K.-C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteom. 6, 262–274 (2009)

    Article  Google Scholar 

  30. Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C.: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68 (2013)

    Article  Google Scholar 

  31. Qiu, W.-R.; Xiao, X.; Chou, K.-C.: iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int. J. Mol. Sci. 15, 1746–1766 (2014)

    Article  Google Scholar 

  32. Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.-C.: PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31, 119–120 (2015)

    Article  Google Scholar 

  33. Chen, W.; Lin, H.; Chou, K.-C.: Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol. BioSyst. 11, 2620–2634 (2015)

    Article  Google Scholar 

  34. Li, C.; Li, X.; Lin, Y.-X.: Numerical characterization of protein sequences based on the generalized Chou’s pseudo amino acid composition. Appl. Sci. 6, 406 (2016)

    Article  Google Scholar 

  35. Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)

    Article  Google Scholar 

  36. Tahir, M.; Hayat, M.: iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. BioSyst. (2016)

  37. Iqbal, M.; Hayat, M.: “iSS-Hyb-mRMR”: identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition. Comput. Methods Programs Biomed. 128, 1–11 (2016)

    Article  Google Scholar 

  38. Kabir, M.; Yu, D.-J.: Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition. Chemom. Intell. Lab. Syst. 167, 78–84 (2017)

    Article  Google Scholar 

  39. Chen, W.; Feng, P.-M.; Deng, E.-Z.; Lin, H.; Chou, K.-C.: iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem. 462, 76–83 (2014)

    Article  Google Scholar 

  40. Tahir, M.; Hayat, M.; Kabir, M.: Sequence based predictor for discrimination of Enhancer and their Types by applying general form of Chou’s Trinucleotide Composition. Comput. Methods Programs Biomed. 146, 69–75 (2017)

    Article  Google Scholar 

  41. Hayat, M.; Khan, A.: Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. Anal. Biochem. 424, 35–44 (2012)

    Article  Google Scholar 

  42. Tahir, M.; Hayat, M.: iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. Biosyst. 12, 2587–2593 (2016)

    Article  Google Scholar 

  43. Kozma, L.: k Nearest Neighbors Algorithm (kNN). Helsinki University of Technology, Helsinki (2008)

    Google Scholar 

  44. Khan, Z.U.; Hayat, M.; Khan, M.A.: Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol. 365, 197–203 (2015)

    Article  MathSciNet  Google Scholar 

  45. Tahir, M.; Hayat, M.: Machine learning based identification of protein–protein interactions using derived features of physiochemical properties and evolutionary profiles. Artif. Intell. Med. 78, 61–71 (2017)

    Article  Google Scholar 

  46. Kabir, M.; Hayat, M.: iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol. Genet. Genom. 291, 285–296 (2016)

    Article  Google Scholar 

  47. Ahmad, S.; Kabir, M.; Hayat, M.: Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou’s general PseAAC. Comput. Methods Prog. Biomed. 122, 165–174 (2015)

    Article  Google Scholar 

  48. Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C.: iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem. 474, 69–77 (2015)

    Article  Google Scholar 

  49. Hayat, M.; Tahir, M.: PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine. J. Mol. Biosyst. 2255, 2262–2015 (2015)

    Google Scholar 

  50. Waris, M.; Ahmad, K.; Hayat, M.: Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. J. Neurocomput. 199, 154–162 (2016)

    Article  Google Scholar 

  51. Ali, S.; Majid, A.: Can–Evo–Ens: classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J. Biomed. Inform. 54, 256–269 (2015)

    Article  Google Scholar 

  52. Majid, A.; Ali, S.: HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system. Amino Acids 47, 217–221 (2015)

    Article  Google Scholar 

  53. Ali, S.; Majid, A.; Khan, A.: IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46, 977–993 (2014)

    Article  Google Scholar 

  54. Majid, A.; Ali, S.; Iqbal, M.; Kausar, N.: Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput. Methods Prog. Biomed. 113, 792–808 (2014)

    Article  Google Scholar 

  55. Ali, F.; Hayat, M.: Machine learning approaches for prediction of Extracellular Matrix proteins using hybrid feature space. J. Theor. Biol. 403, 30–37 (2016)

    Article  Google Scholar 

  56. Lin, H.; Deng, E.-Z.; Ding, H.; Chen, W.; Chou, K.-C.: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 42, 12961–12972 (2014)

    Article  Google Scholar 

  57. Liu, B.: iEnhancer-PsedeKNC: identification of enhancers and their subgroups based on pseudo degenerate kmer nucleotide composition. Neurocomputing 217, 46–52 (2016)

    Article  Google Scholar 

  58. Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K.-C.: pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 32, 3133–3141 (2016)

    Article  Google Scholar 

  59. Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C.: Using deformation energy to analyze nucleosome positioning in genomes. Genomics 107, 69–75 (2016)

    Article  Google Scholar 

  60. Xiao, X.; Wu, Z.-C.; Chou, K.-C.: A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS ONE 6, e20592 (2011)

    Article  Google Scholar 

  61. Chou, K.-C.; Wu, Z.-C.; Xiao, X.: iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst. 8, 629–641 (2012)

    Article  Google Scholar 

  62. Qiu, W.-R.; Xiao, X.; Lin, W.-Z.; Chou, K.-C.: iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J. Biomol. Struct. Dyn. 33, 1731–1742 (2015)

    Article  Google Scholar 

  63. He, X.; Han, K.; Hu, J.; Yan, H.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J.: TargetFreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition. J. Membr. Biol. 248, 1005–1014 (2015)

    Article  Google Scholar 

  64. Lin, H.; Wang, H.; Ding, H.; Chen, Y.-L.; Li, Q.-Z.: Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor. 57, 321–330 (2009)

    Article  Google Scholar 

  65. Chou, K.-C.: Impacts of bioinformatics to medicinal chemistry. Med. Chem. 11, 218–234 (2015)

    Article  Google Scholar 

  66. Chou, K.-C.; Shen, H.-B.: Review: recent advances in developing web-servers for predicting protein attributes. Nat. Sci. 1, 63 (2009)

    Google Scholar 

  67. Hayat, M.; Khan, A.: MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J. Theor. Biol. 292, 93–102 (2012)

    Article  MathSciNet  Google Scholar 

  68. Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.-C.: iRNA-PseU: identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids 5, e332 (2016)

    Google Scholar 

  69. Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C.: iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol. Ther. Nucleic Acids 7, 155–163 (2017)

    Article  Google Scholar 

  70. Liu, B.; Yang, F.; Chou, K.-C.: 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol. Theor. Nucleic Acids 7, 267–277 (2017)

    Article  Google Scholar 

  71. Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C.: iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 8(3), 4208–4217 (2016)

  72. Liu, B.; Wu, H.; Zhang, D.; Wang, X.; Chou, K.-C.: Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 8(8), 13338–13343 (2017)

    Google Scholar 

  73. Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C.: iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics 33(3), 341–346 (2016)

  74. Liu, B.; Wang, S.; Long, R.; Chou, K.-C.: iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33, 35–41 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maqsood Hayat.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tahir, M., Hayat, M. & Khan, S.A. A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. Arab J Sci Eng 43, 6719–6727 (2018). https://doi.org/10.1007/s13369-017-2818-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-017-2818-2

Keywords

  • SVM
  • Dinucleotide composition
  • Trinucleotide composition
  • Hybrid