Abstract
Enhancers are the short regularity genome regions, which are bounded with proteins to activate the transcription of a specific gene. Enhancers are further categorized into two classes on the basis of its structure and strength, namely strong enhancers and weak enhancers. Owing to technological improvement, huge numbers of DNA sequences are explored in data banks. So the identification of these unprocessed data via traditional methods is challenging due to intricate and vague structures. In addition, only limited numbers of recognized structures are available. In order to tackle the limitations of traditional methods, it is indispensable to adopt the concept of intelligent and machine learning. In this regard, a two-layer automated model is proposed. In the first layer, it discriminates between enhancer and non-enhancer. In case of enhancer, further, the second layer identifies the types of enhancer. DNA sequences are expressed using pseudo dinucleotide composition, pseudo trinucleotide composition and pseudo tetra nucleotide composition. In order to combine the strength of various feature spaces, a hybrid feature space is formed by amalgamating these three feature spaces. After compiling the results, it is observed that support vector machine achieved encouraging results in conjunction with hybrid feature space, which is 77.86 and 65.83% of accuracies on examined datasets. The obtained results exposed that the proposed model performed prominent compared to the existing approaches so far notifiable in the literature. It is realized that the developed model might be more useful and expedient for basic research and academia.
This is a preview of subscription content, access via your institution.
Change history
05 December 2017
The original version of this article unfortunately contained a mistake. One character was missing in the title, which was therefore incorrectly given as “A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition”.
References
Erwin, G.D.; Oksenberg, N.; Truty, R.M.; Kostka, D.; Murphy, K.K.; Ahituv, N.; Pollard, K.S.; Capra, J.A.: Integrating diverse datasets improves developmental enhancer prediction. PLoS Comput. Biol. 10, e1003677 (2014)
Visel, A.; Rubin, E.M.; Pennacchio, L.A.: Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009)
Sakabe, N.J.; Savic, D.; Nobrega, M.A.: Transcriptional enhancers in development and disease. Genome Biol. 13, 1 (2012)
Heintzman, N.D.; Ren, B.: Finding distal regulatory elements in the human genome. Curr. Opin. Genet. Dev. 19, 541–549 (2009)
May, D.; Blow, M.J.; Kaplan, T.; McCulley, D.J.; Jensen, B.C.; Akiyama, J.A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.: Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 44, 89–93 (2012)
Heintzman, N.D.; Stuart, R.K.; Hon, G.; Fu, Y.; Ching, C.W.; Hawkins, R.D.; Barrera, L.O.; Van Calcar, S.; Qu, C.; Ching, K.A.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007)
Ernst, J.; Kheradpour, P.; Mikkelsen, T.S.; Shoresh, N.; Ward, L.D.; Epstein, C.B.; Zhang, X.; Wang, L.; Issner, R.; Coyne, M.: Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011)
Fernández, M.; Miranda-Saavedra, D.: Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines. Nucleic Acids Res. 40, e77–e77 (2012)
Firpi, H.A.; Ucar, D.; Tan, K.: Discover regulatory DNA elements using chromatin signatures and artificial neural network. Bioinformatics 26, 1579–1586 (2010)
Rajagopal, N.; Xie, W.; Li, Y.; Wagner, U.; Wang, W.; Stamatoyannopoulos, J.; Ernst, J.; Kellis, M.; Ren, B.: RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput. Biol. 9, e1002968 (2013)
Visel, A.; Blow, M.J.; Li, Z.; Zhang, T.; Akiyama, J.A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M.; Wright, C.; Chen, F.: ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009)
Visel, A.; Prabhakar, S.; Akiyama, J.A.; Shoukry, M.; Lewis, K.D.; Holt, A.; Plajzer-Frick, I.; Afzal, V.; Rubin, E.M.; Pennacchio, L.A.: Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nat. Genet. 40, 158–160 (2008)
Bryne, J.C.; Valen, E.; Tang, M.-H.E.; Marstrand, T.; Winther, O.; da Piedade, I.; Krogh, A.; Lenhard, B.; Sandelin, A.: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36(2008), D102–D106 (2008)
Kulakovskiy, I.V.; Medvedeva, Y.A.; Schaefer, U.; Kasianov, A.S.; Vorontsov, I.E.; Bajic, V.B.; Makeev, V.J.: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41, D195–D202 (2013)
Ravasi, T.; Suzuki, H.; Cannistraci, C.V.; Katayama, S.; Bajic, V.B.; Tan, K.; Akalin, A.; Schmeier, S.; Kanamori-Katayama, M.; Bertin, N.: An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010)
Ernst, J.; Kellis, M.: ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012)
Hoffman, M.M.; Buske, O.J.; Wang, J.; Weng, Z.; Bilmes, J.A.; Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012)
Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K.-C.: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics 32, 362–369 (2016)
Chou, K.C.: Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct. Funct. Bioinform. 43, 246–255 (2001)
Cai, Y.-D.; Zhou, G.-P.; Chou, K.-C.: Support vector machines for predicting membrane protein types by using functional domain composition. Biophys. J. 84, 3257–3263 (2003)
Kandaswamy, K.K.; Chou, K.-C.; Martinetz, T.; Möller, S.; Suganthan, P.; Sridharan, S.; Pugalenthi, G.: AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J. Theor. Biol. 270, 56–62 (2011)
Thompson, T.B.; Chou, K.-C.; Zheng, C.: Neural network prediction of the HIV-1 protease cleavage sites. J. Theor. Biol. 177, 369–379 (1995)
Lin, S.-X.; Lapointe, J.: Theoretical and experimental biology in one-a symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers. J. Biomed. Sci. Eng. 6, 435 (2013)
Guo, S.-H.; Deng, E.-Z.; Xu, L.-Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.-C.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)
Chou, K.-C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)
Cao, D.-S.; Xu, Q.-S.; Liang, Y.-Z.: propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29, 960–962 (2013)
Kabir, M.; Iqbal, M.; Ahmad, S.; Hayat, M.: iTIS-PseKNC: Identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput. Biol. Med. 66, 252–257 (2015)
Du, P.; Gu, S.; Jiao, Y.: PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int. J. Mol. Sci. 15, 3495–3506 (2014)
Chou, K.-C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteom. 6, 262–274 (2009)
Chen, W.; Feng, P.-M.; Lin, H.; Chou, K.-C.: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res. 41(6), e68 (2013)
Qiu, W.-R.; Xiao, X.; Chou, K.-C.: iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int. J. Mol. Sci. 15, 1746–1766 (2014)
Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K.-C.: PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31, 119–120 (2015)
Chen, W.; Lin, H.; Chou, K.-C.: Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. Mol. BioSyst. 11, 2620–2634 (2015)
Li, C.; Li, X.; Lin, Y.-X.: Numerical characterization of protein sequences based on the generalized Chou’s pseudo amino acid composition. Appl. Sci. 6, 406 (2016)
Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K.-C.: Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. 43, W65–W71 (2015)
Tahir, M.; Hayat, M.: iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. BioSyst. (2016)
Iqbal, M.; Hayat, M.: “iSS-Hyb-mRMR”: identification of splicing sites using hybrid space of pseudo trinucleotide and pseudo tetranucleotide composition. Comput. Methods Programs Biomed. 128, 1–11 (2016)
Kabir, M.; Yu, D.-J.: Predicting DNase I hypersensitive sites via un-biased pseudo trinucleotide composition. Chemom. Intell. Lab. Syst. 167, 78–84 (2017)
Chen, W.; Feng, P.-M.; Deng, E.-Z.; Lin, H.; Chou, K.-C.: iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem. 462, 76–83 (2014)
Tahir, M.; Hayat, M.; Kabir, M.: Sequence based predictor for discrimination of Enhancer and their Types by applying general form of Chou’s Trinucleotide Composition. Comput. Methods Programs Biomed. 146, 69–75 (2017)
Hayat, M.; Khan, A.: Mem-PHybrid: hybrid features-based prediction system for classifying membrane protein types. Anal. Biochem. 424, 35–44 (2012)
Tahir, M.; Hayat, M.: iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol. Biosyst. 12, 2587–2593 (2016)
Kozma, L.: k Nearest Neighbors Algorithm (kNN). Helsinki University of Technology, Helsinki (2008)
Khan, Z.U.; Hayat, M.; Khan, M.A.: Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol. 365, 197–203 (2015)
Tahir, M.; Hayat, M.: Machine learning based identification of protein–protein interactions using derived features of physiochemical properties and evolutionary profiles. Artif. Intell. Med. 78, 61–71 (2017)
Kabir, M.; Hayat, M.: iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol. Genet. Genom. 291, 285–296 (2016)
Ahmad, S.; Kabir, M.; Hayat, M.: Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou’s general PseAAC. Comput. Methods Prog. Biomed. 122, 165–174 (2015)
Liu, Z.; Xiao, X.; Qiu, W.-R.; Chou, K.-C.: iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal. Biochem. 474, 69–77 (2015)
Hayat, M.; Tahir, M.: PSOFuzzySVM-TMH: identification of transmembrane helix segments using ensemble feature space by incorporated fuzzy support vector machine. J. Mol. Biosyst. 2255, 2262–2015 (2015)
Waris, M.; Ahmad, K.; Hayat, M.: Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. J. Neurocomput. 199, 154–162 (2016)
Ali, S.; Majid, A.: Can–Evo–Ens: classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J. Biomed. Inform. 54, 256–269 (2015)
Majid, A.; Ali, S.: HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system. Amino Acids 47, 217–221 (2015)
Ali, S.; Majid, A.; Khan, A.: IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46, 977–993 (2014)
Majid, A.; Ali, S.; Iqbal, M.; Kausar, N.: Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput. Methods Prog. Biomed. 113, 792–808 (2014)
Ali, F.; Hayat, M.: Machine learning approaches for prediction of Extracellular Matrix proteins using hybrid feature space. J. Theor. Biol. 403, 30–37 (2016)
Lin, H.; Deng, E.-Z.; Ding, H.; Chen, W.; Chou, K.-C.: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res. 42, 12961–12972 (2014)
Liu, B.: iEnhancer-PsedeKNC: identification of enhancers and their subgroups based on pseudo degenerate kmer nucleotide composition. Neurocomputing 217, 46–52 (2016)
Jia, J.; Zhang, L.; Liu, Z.; Xiao, X.; Chou, K.-C.: pSumo-CD: predicting sumoylation sites in proteins with covariance discriminant algorithm by incorporating sequence-coupled effects into general PseAAC. Bioinformatics 32, 3133–3141 (2016)
Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.-C.: Using deformation energy to analyze nucleosome positioning in genomes. Genomics 107, 69–75 (2016)
Xiao, X.; Wu, Z.-C.; Chou, K.-C.: A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS ONE 6, e20592 (2011)
Chou, K.-C.; Wu, Z.-C.; Xiao, X.: iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol. Biosyst. 8, 629–641 (2012)
Qiu, W.-R.; Xiao, X.; Lin, W.-Z.; Chou, K.-C.: iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J. Biomol. Struct. Dyn. 33, 1731–1742 (2015)
He, X.; Han, K.; Hu, J.; Yan, H.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J.: TargetFreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition. J. Membr. Biol. 248, 1005–1014 (2015)
Lin, H.; Wang, H.; Ding, H.; Chen, Y.-L.; Li, Q.-Z.: Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor. 57, 321–330 (2009)
Chou, K.-C.: Impacts of bioinformatics to medicinal chemistry. Med. Chem. 11, 218–234 (2015)
Chou, K.-C.; Shen, H.-B.: Review: recent advances in developing web-servers for predicting protein attributes. Nat. Sci. 1, 63 (2009)
Hayat, M.; Khan, A.: MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J. Theor. Biol. 292, 93–102 (2012)
Chen, W.; Tang, H.; Ye, J.; Lin, H.; Chou, K.-C.: iRNA-PseU: identifying RNA pseudouridine sites. Mol. Ther. Nucleic Acids 5, e332 (2016)
Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K.-C.: iRNA-PseColl: identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol. Ther. Nucleic Acids 7, 155–163 (2017)
Liu, B.; Yang, F.; Chou, K.-C.: 2L-piRNA: a two-layer ensemble classifier for identifying piwi-interacting RNAs and their function. Mol. Theor. Nucleic Acids 7, 267–277 (2017)
Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.-C.: iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 8(3), 4208–4217 (2016)
Liu, B.; Wu, H.; Zhang, D.; Wang, X.; Chou, K.-C.: Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 8(8), 13338–13343 (2017)
Cheng, X.; Zhao, S.-G.; Xiao, X.; Chou, K.-C.: iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals. Bioinformatics 33(3), 341–346 (2016)
Liu, B.; Wang, S.; Long, R.; Chou, K.-C.: iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics 33, 35–41 (2017)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tahir, M., Hayat, M. & Khan, S.A. A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. Arab J Sci Eng 43, 6719–6727 (2018). https://doi.org/10.1007/s13369-017-2818-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-017-2818-2
Keywords
- SVM
- Dinucleotide composition
- Trinucleotide composition
- Hybrid