IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

Wang, Hao; Xi, Qilemuge; Liang, Pengfei; Zheng, Lei; Hong, Yan; Zuo, Yongchun

doi:10.1007/s00726-021-02941-9

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

Original Article
Published: 23 January 2021

Volume 53, pages 239–251, (2021)
Cite this article

Amino Acids Aims and scope Submit manuscript

Hao Wang¹^na1,
Qilemuge Xi¹^na1,
Pengfei Liang¹,
Lei Zheng¹,
Yan Hong¹ &
…
Yongchun Zuo ORCID: orcid.org/0000-0002-6065-7835¹

342 Accesses
9 Citations
Explore all metrics

Abstract

Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called ‘reduced amino acid cluster’. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

Article Open access 30 January 2024

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition

Article 25 April 2016

References

Al-Barakati HJ, McConnell EW, Hicks LM, Poole LB, Newman RH, Kc DB (2018) SVM-SulfoSite: a support vector machine based predictor for sulfenylation sites. Sci Rep 8(1):11288. https://doi.org/10.1038/s41598-018-29126-x
Article CAS PubMed PubMed Central Google Scholar
Ashari ZE, Brayton KA, Broschat SL (2019) Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila. PLoS ONE. https://doi.org/10.1371/journal.pone.0202312
Article Google Scholar
Bhadra P, Yan J, Li J, Fong S, Siu SWI (2018) AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci Rep 8(1):1697. https://doi.org/10.1038/s41598-018-19752-w
Article CAS PubMed PubMed Central Google Scholar
Cai YD, Zhou GP, Chou KC (2005) Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition. J Theor Biol 234(1):145–149. https://doi.org/10.1016/j.jtbi.2004.11.017
Article CAS PubMed Google Scholar
Chang C-C, Lin C-J (2011) Libsvm. ACM Trans Intell Syst Technol 2(3):1–27. https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Chou KJB (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19
Article CAS Google Scholar
Chou KC, Cai YD (2004) Predicting enzyme family class in a hybridization space. Protein Sci 13(11):2857–2863. https://doi.org/10.1110/ps.04981104
Article CAS PubMed PubMed Central Google Scholar
Chou K-C, Zhang C-T (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30(4):275–349. https://doi.org/10.3109/10409239509083488
Article CAS PubMed Google Scholar
Dao FY, Lv H, Wang F, Feng CQ, Ding H, Chen W, Lin H (2019) Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 35(12):2075–2083. https://doi.org/10.1093/bioinformatics/bty943
Article CAS PubMed Google Scholar
Feng CQ, Zhang ZY, Zhu XJ, Lin Y, Chen W, Tang H, Lin H (2019) iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 35(9):1469–1477. https://doi.org/10.1093/bioinformatics/bty827
Article CAS PubMed Google Scholar
Fu X, Cai L, Zeng X, Zou Q (2020) StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics 36(10):3028–3034. https://doi.org/10.1093/bioinformatics/btaa131
Article CAS PubMed Google Scholar
He W, Jia C, Zou Q (2019) 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction. Bioinformatics 35(4):593–601. https://doi.org/10.1093/bioinformatics/bty668
Article CAS PubMed Google Scholar
Heine D, Müller R, Brüsselbach SJGT (2001) Cell surface display of a lysosomal enzyme for extracellular gene-directed enzyme prodrug therapy. Gene Ther 8(13):1005
Article CAS Google Scholar
Izidoro SC, de Melo-Minardi RC, Pappa GL (2015) GASS: identifying enzyme active sites with genetic algorithms. Bioinformatics 31(6):864–870. https://doi.org/10.1093/bioinformatics/btu746
Article CAS PubMed Google Scholar
Jiao Y, Du PJQB (2016) Performance measures in evaluating machine learning based bioinformatics predictors for classifications. 4 (4)
Kato T, Nagano N (2010) Metric learning for enzyme active-site search. Bioinformatics 26(21):2698–2704. https://doi.org/10.1093/bioinformatics/btq519
Article CAS PubMed PubMed Central Google Scholar
Liang ZY, Lai HY, Yang H, Zhang CJ, Yang H, Wei HH, Chen XX, Zhao YW, Su ZD, Li WC, Deng EZ, Tang H, Chen W, Lin H (2017) Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics 33(3):467–469. https://doi.org/10.1093/bioinformatics/btw630
Article CAS PubMed Google Scholar
Liu X, Liu D, Qi J, Zheng WM (2002) Simplified amino acid alphabets based on deviation of conditional probability from random background. Phys Rev E Stat Nonlin Soft Matter Phys 66(2 Pt 1):021906. https://doi.org/10.1103/PhysRevE.66.021906
Article CAS PubMed Google Scholar
Liu D, Li G, Zuo Y (2019) Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief Bioinform 20(5):1826–1835. https://doi.org/10.1093/bib/bby053
Article CAS PubMed Google Scholar
Liu ML, Su W, Wang JS, Yang YH, Yang H, Lin H (2020) predicting preference of transcription factors for methylated DNA using sequence information. Mol Ther Nucl Acids 22:1043–1050. https://doi.org/10.1016/j.omtn.2020.07.035
Article CAS Google Scholar
Lv Z, Jin S, Ding H, Zou Q (2019) A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front Bioeng Biotechnol 7:215
Article Google Scholar
Matsuta Y, Ito M, Tohsato Y (2013) ECOH: an enzyme commission number predictor using mutual information and a support vector machine. Bioinformatics 29(3):365–372. https://doi.org/10.1093/bioinformatics/bts700
Article CAS PubMed Google Scholar
Meng C, Guo F, Zou Q (2020) CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes. Comput Biol Chem 87:107304. https://doi.org/10.1016/j.compbiolchem.2020.107304
Article CAS PubMed Google Scholar
Moraes JPA, Pappa GL, Pires DEV, Izidoro SC (2017) GASS-WEB: a web server for identifying enzyme active sites based on genetic algorithms. Nucleic Acids Res 45(W1):W315–W319. https://doi.org/10.1093/nar/gkx337
Article CAS PubMed PubMed Central Google Scholar
Oosterhoff D, Overmeer RM, Graaf MD, Meulen IHVD, Giaccone G, Beusechem VWV, Haisma HJ, Pinedo HM, Gerritsen WRJBJoC, (2005) Adenoviral vector-mediated expression of a gene encoding secreted, EpCAM-targeted carboxylesterase-2 sensitises colon cancer spheroids to CPT-11. Br J Cancer. https://doi.org/10.1038/sj.bjc.6602362
Article PubMed PubMed Central Google Scholar
Patil K, Chouhan U (2019) Relevance of machine learning techniques and various protein features in protein fold classification: a Review. Curr Bioinform 14(8):688–697. https://doi.org/10.2174/1574893614666190204154038
Article CAS Google Scholar
Qiu JD, Huang JH, Shi SP, Liang RP (2010) Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Peptide Lett 17(6):715–722. https://doi.org/10.2174/092986610791190372
Article CAS Google Scholar
Solis AD (2015) Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins. Proteins 83(12):2198–2216. https://doi.org/10.1002/prot.24936
Article CAS PubMed Google Scholar
Tan JX, Li SH, Zhang ZM, Chen CX, Chen W, Tang H, Lin H (2019a) Identification of hormone binding proteins based on machine learning methods. Math Biosci Eng 16(4):2466–2480. https://doi.org/10.3934/mbe.2019123
Article PubMed Google Scholar
Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H (2019b) A survey for predicting enzyme family classes using machine learning methods. Curr Drug Targets 20(5):540–550. https://doi.org/10.2174/1389450119666181002143355
Article CAS PubMed Google Scholar
Tang H, Chen W, Lin H (2016) Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol BioSyst 12(4):1269–1275. https://doi.org/10.1039/c5mb00883b
Article CAS PubMed Google Scholar
ValizadehAslani T, Zhao Z, Sokhansanj BA, Rosen GL (2020) Amino acid k-mer feature extraction for quantitative antimicrobial resistance (AMR) prediction by machine learning and model interpretation for biological insights. Biology (Basel). https://doi.org/10.3390/biology9110365
Article Google Scholar
Volpato V, Adelfio A, Pollastri G (2013) Accurate prediction of protein enzymatic class by N-to-1 Neural Networks. Bioinformatics. https://doi.org/10.1186/1471-2105-14-S1-S11
Article PubMed Google Scholar
Wang Z, Liu D, Xu B, Tian R, Zuo Y (2020) Modular arrangements of sequence motifs determine the functional diversity of KDM proteins. Brief Bioinform. https://doi.org/10.1093/bib/bbaa215
Article PubMed PubMed Central Google Scholar
Wei LY, Luan S, Nagai LAE, Su R, Zou Q (2019a) Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species. Bioinformatics 35(8):1326–1333. https://doi.org/10.1093/bioinformatics/bty824
Article CAS PubMed Google Scholar
Wei LY, Zhou C, Su R, Zou Q (2019b) PEPred-suite: improved and robust prediction of therapeutic peptides using adaptive feature representation learning. Bioinformatics 35(21):4272–4280. https://doi.org/10.1093/bioinformatics/btz246
Article CAS PubMed Google Scholar
Weng SF, Kai J, Guha IN, Qureshi NJOH (2015) The value of aspartate aminotransferase and alanine aminotransferase in cardiovascular disease risk assessment. Open Heart 2(1):e000272
Article Google Scholar
Wrabl JO, Grishin NV (2005) Grouping of amino acid types and extraction of amino acid properties from multiple sequence alignments using variance maximization. Proteins 61(3):523–534. https://doi.org/10.1002/prot.20648
Article CAS PubMed Google Scholar
Wu Y, Tang H, Chen W, Lin H (2016a) Predicting human enzyme family classes by using pseudo amino acid composition. Curr Proteomics 13:99–104. https://doi.org/10.2174/157016461302160514003437
Article CAS Google Scholar
Wu Y, Tang H, Chen W, Lin H (2016b) Predicting human enzyme family classes by using pseudo amino acid composition. Curr Proteomics 13(2):99–104. https://doi.org/10.2174/157016461302160514003437
Article CAS Google Scholar
Xu HD, Shi SP, Wen PP, Qiu JD (2015) SuccFind: a novel succinylation sites online prediction tool via enhanced characteristic strategy. Bioinformatics 31(23):3748–3750. https://doi.org/10.1093/bioinformatics/btv439
Article CAS PubMed Google Scholar
Xu B, Liu D, Wang Z, Tian R, Zuo Y (2020) Multi-substrate selectivity based on key loops and non-homologous domains: new insight into ALKBH family. Cell Mol Life Sci. https://doi.org/10.1007/s00018-020-03594-9
Article PubMed Google Scholar
Yang L, Lv Y, Li T, Zuo Y, Jiang W (2014) Human proteins characterization with subcellular localizations. J Theor Biol 358:61–73. https://doi.org/10.1016/j.jtbi.2014.05.008
Article CAS PubMed Google Scholar
Yang W, Zhu XJ, Huang J, Ding H, Lin H (2019) A brief survey of machine learning methods in protein sub-Golgi localization. Curr Bioinform 14:234–240
Article CAS Google Scholar
Yang YH, Ma C, Wang JS, Yang H, Ding H, Han SG, Li YW (2020a) Prediction of N7-methylguanosine sites in human RNA based on optimal sequence features. Genomics. https://doi.org/10.1016/j.ygeno.2020.07.035
Article PubMed PubMed Central Google Scholar
Yang YH, Ma C, Wang JS, Yang H, Ding H, Han SG, Li YW (2020b) Prediction of N7-methylguanosine sites in human RNA based on optimal sequence features. Genomics 112(6):4342–4347
Article CAS Google Scholar
Zhang J, Liu B (2019) A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinform 14(3):190–199. https://doi.org/10.2174/1574893614666181212102749
Article CAS Google Scholar
Zhang YP, Zou Q (2020) PPTPP: a novel therapeutic peptide prediction method using physicochemical property encoding and adaptive feature representation learning. Bioinformatics 36(13):3982–3987. https://doi.org/10.1093/bioinformatics/btaa275
Article CAS PubMed Google Scholar
Zhang Q, Wang S, Pan Y, Su D, Lu Q, Zuo Y, Yang L (2019) Characterization of proteins in different subcellular localizations for Escherichia coli K12. Genomics 111(5):1134–1141. https://doi.org/10.1016/j.ygeno.2018.07.008
Article CAS PubMed Google Scholar
Zhang D, Xu ZC, Su W, Yang YH, Lv H, Yang H, Lin H (2020a) iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features. Bioinformatics. https://doi.org/10.1093/bioinformatics/btaa702
Article PubMed PubMed Central Google Scholar
Zhang ZY, Yang YH, Ding H, Wang D, Chen W, Lin H (2020b) Design powerful predictor for mRNA subcellular location prediction in Homo sapiens. Brief Bioinform. https://doi.org/10.1093/bib/bbz177
Article PubMed PubMed Central Google Scholar
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y (2019) RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database (Oxford). https://doi.org/10.1093/database/baz131
Article PubMed Central Google Scholar
Zheng L, Liu D, Yang W, Yang L, Zuo Y (2020) RaacLogo: a new sequence logo generator by using reduced amino acid clusters. Brief Bioinform. https://doi.org/10.1093/bib/bbaa096
Article PubMed PubMed Central Google Scholar
Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248(3):546–551. https://doi.org/10.1016/j.jtbi.2007.06.001
Article CAS PubMed Google Scholar
Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H (2019) Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Based Syst 163:787–793. https://doi.org/10.1016/j.knosys.2018.10.007
Article Google Scholar
Zou Q, Wan S, Ju Y, Tang J, Zeng X (2016) Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst Biol 10(4):114
Article Google Scholar
Zuo YC, Li QZ (2009) Using reduced amino acid composition to predict defensin family and subfamily: integrating similarity measure and structural alphabet. Peptides 30(10):1788–1793
Article CAS Google Scholar
Zuo YC, Chen W, Fan GL, Li QZ (2013) A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 44(2):573–580. https://doi.org/10.1007/s00726-012-1374-z
Article CAS PubMed Google Scholar
Zuo Y, Lv Y, Wei Z, Yang L, Li G, Fan G (2015) iDPF-PseRAAAC: a web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS ONE 10(12):e0145541. https://doi.org/10.1371/journal.pone.0145541
Article CAS PubMed PubMed Central Google Scholar
Zuo Y, Li Y, Chen Y, Li G, Yan Z, Yang L (2017) PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition. Bioinformatics 33(1):122–124. https://doi.org/10.1093/bioinformatics/btw564
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Nature Scientific Foundation of China (No: 62061034, 61702290, 61861036), Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region (NJYT-18-B01) and the Fund for Excellent Young Scholars of Inner Mongolia (2017JQ04).

Author information

Hao Wang and Qilemuge Xi have contributed equally to this work.

Authors and Affiliations

State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
Hao Wang, Qilemuge Xi, Pengfei Liang, Lei Zheng, Yan Hong & Yongchun Zuo

Authors

Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qilemuge Xi
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Yongchun Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YZ designed this work. HW and QX performed the data analyses and wrote the manuscript. PL and LZ contributed significantly to analysis and manuscript preparation. YH helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Yongchun Zuo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Human/animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

No individual participant has included in this study therefore no informed consent was necessary.

Additional information

Handling editor: Y. Su.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 17565 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Xi, Q., Liang, P. et al. IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy. Amino Acids 53, 239–251 (2021). https://doi.org/10.1007/s00726-021-02941-9

Download citation

Received: 17 October 2020
Accepted: 11 January 2021
Published: 23 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00726-021-02941-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

Abstract

Access this article

Similar content being viewed by others

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human/animal rights statement

Informed consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 17565 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

Abstract

Access this article

Similar content being viewed by others

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou’s General Pseudo Amino Acid Composition

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Human/animal rights statement

Informed consent

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 17565 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation