Skip to main content

Experiment to Find Out Suitable Machine Learning Algorithm for Enzyme Subclass Classification

  • Conference paper
  • First Online:
Micro-Electronics and Telecommunication Engineering (ICMETE 2023)

Abstract

Proteins play a major role in determining many characteristics and functions of living beings. Prediction of protein classes and subclasses is one of the prominent topics of research in bioinformatics. Machine learning methods are widely used for prediction purposes, also applied for classification and subclassification of proteins. The problem is to classify the proteins to the corresponding subclass they belong to and choose a suitable machine learning method which can be used for better subclass classification. The objective is to compare the performances of three existing machine learning methods: logistic regression, support vector machine (SVM), and random forest, for protein subclassification. For this study the methods are implemented, and their results are compared by varying the number of samples of different subclasses and varying the number of subclasses. Logistic regression and support vector machine are used as a binary classifier for predicting multiple classes with \(log_2(n)\) number of classifiers for n class labels. It is observed that both random forest and support vector machine provide almost same accuracy for smaller data size, but as the data size increases random forest performs better than SVM.

Supported by organization x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hackett G, Cole N, Bhartia M, Kennedy D, Raju J, Wilkinson P, Saghir A (2014) Blast study group the response to testosterone undecanoate in men with type 2 diabetes is dependent on achieving threshold serum levels (the BLAST study). Int J Clin Pract 68(2):203–215

    Article  Google Scholar 

  2. Donkor ES, Dayie N, Adiku TK (2014) Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA). J Bioinf Sequence Anal 1:1–6

    Google Scholar 

  3. Jones NC, Pevzner PA, Pevzner P (2004) In: An introduction to bioinformatics algorithms, MIT Press

    Google Scholar 

  4. Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM (2000) Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases 16(11):988–1002. https://doi.org/10.1093/bioinformatics/16.11.988

  5. Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev Econ 18(1):5–33

    Article  Google Scholar 

  6. Fawagreh K, Gaber MM (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2(1):602–609

    Google Scholar 

  7. Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev Econ 18(1):5–33

    Article  Google Scholar 

  8. Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network, Physica D: Nonlinear Phenomena, March 2020: special issue on machine learning and dynamical systems, vol 404. Elsevier

    Google Scholar 

  9. Peng J, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. The J Educat Res 96(1):3-14. https://doi.org/10.1080/00220670209598786

  10. Hastie, Tibshirani, Friedman (2009) In: Elements of statistical learning. Springer, pp 763

    Google Scholar 

  11. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428):751–753

    Article  Google Scholar 

  12. Overbeek R, Fonstein M, D’souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci 96(6):2896–2901

    Article  Google Scholar 

  13. Cai YD, Liu XJ, Chou KC (2002) Artificial neural network model for predicting protein subcellular location. Comput Chem 26(2):179–182

    Article  Google Scholar 

  14. Stawiski EW, Mandel-Gutfreund Y, Lowenthal AC, Gregoret LM(2002) Progress in predicting protein function from structure: unique features of O-glycosidases. Biocomputing 637–648

    Google Scholar 

  15. Dobson PD, Doig AJ (2003) Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 330(4):771–783

    Article  Google Scholar 

  16. Shen HB, Chou KC (2007) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364(1):53–59

    Article  Google Scholar 

  17. Debasmita P, Biswajit S, Misra BB, Padhy S (2020) A multiclass SVM classifier with teaching learning based feature subset selection for enzyme subclass classification. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106664

  18. Kumar C, Choudhary A (2012) A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP J Bioinform Syst Biol 1

    Google Scholar 

  19. Ying W, Xiuzhen H, Lixia S, Zhenxing F, Hangyu S (2014) Predicting enzyme subclasses by using random forest with multicharacteristic parameters protein and peptide letters. 21(3):275-284(10); Bentham Science Publishers

    Google Scholar 

  20. Pradhan D, Padhy S, Sahoo B (2017) Enzyme classification using multiclass support vector machine and feature subset selection. Comput Biol Chem 70:211-219. https://doi.org/10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31. PMID: 28934693

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amitav Saran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saran, A., Ghosh, P.S., Das, U., Kalvinathan, T.C. (2024). Experiment to Find Out Suitable Machine Learning Algorithm for Enzyme Subclass Classification. In: Sharma, D.K., Peng, SL., Sharma, R., Jeon, G. (eds) Micro-Electronics and Telecommunication Engineering. ICMETE 2023. Lecture Notes in Networks and Systems, vol 894. Springer, Singapore. https://doi.org/10.1007/978-981-99-9562-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9562-2_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9561-5

  • Online ISBN: 978-981-99-9562-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics