Abstract
Proteins play a major role in determining many characteristics and functions of living beings. Prediction of protein classes and subclasses is one of the prominent topics of research in bioinformatics. Machine learning methods are widely used for prediction purposes, also applied for classification and subclassification of proteins. The problem is to classify the proteins to the corresponding subclass they belong to and choose a suitable machine learning method which can be used for better subclass classification. The objective is to compare the performances of three existing machine learning methods: logistic regression, support vector machine (SVM), and random forest, for protein subclassification. For this study the methods are implemented, and their results are compared by varying the number of samples of different subclasses and varying the number of subclasses. Logistic regression and support vector machine are used as a binary classifier for predicting multiple classes with \(log_2(n)\) number of classifiers for n class labels. It is observed that both random forest and support vector machine provide almost same accuracy for smaller data size, but as the data size increases random forest performs better than SVM.
Supported by organization x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hackett G, Cole N, Bhartia M, Kennedy D, Raju J, Wilkinson P, Saghir A (2014) Blast study group the response to testosterone undecanoate in men with type 2 diabetes is dependent on achieving threshold serum levels (the BLAST study). Int J Clin Pract 68(2):203–215
Donkor ES, Dayie N, Adiku TK (2014) Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA). J Bioinf Sequence Anal 1:1–6
Jones NC, Pevzner PA, Pevzner P (2004) In: An introduction to bioinformatics algorithms, MIT Press
Wallqvist A, Fukunishi Y, Murphy LR, Fadel A, Levy RM (2000) Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases 16(11):988–1002. https://doi.org/10.1093/bioinformatics/16.11.988
Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev Econ 18(1):5–33
Fawagreh K, Gaber MM (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng 2(1):602–609
Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev Econ 18(1):5–33
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network, Physica D: Nonlinear Phenomena, March 2020: special issue on machine learning and dynamical systems, vol 404. Elsevier
Peng J, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. The J Educat Res 96(1):3-14. https://doi.org/10.1080/00220670209598786
Hastie, Tibshirani, Friedman (2009) In: Elements of statistical learning. Springer, pp 763
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428):751–753
Overbeek R, Fonstein M, D’souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci 96(6):2896–2901
Cai YD, Liu XJ, Chou KC (2002) Artificial neural network model for predicting protein subcellular location. Comput Chem 26(2):179–182
Stawiski EW, Mandel-Gutfreund Y, Lowenthal AC, Gregoret LM(2002) Progress in predicting protein function from structure: unique features of O-glycosidases. Biocomputing 637–648
Dobson PD, Doig AJ (2003) Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 330(4):771–783
Shen HB, Chou KC (2007) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364(1):53–59
Debasmita P, Biswajit S, Misra BB, Padhy S (2020) A multiclass SVM classifier with teaching learning based feature subset selection for enzyme subclass classification. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106664
Kumar C, Choudhary A (2012) A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP J Bioinform Syst Biol 1
Ying W, Xiuzhen H, Lixia S, Zhenxing F, Hangyu S (2014) Predicting enzyme subclasses by using random forest with multicharacteristic parameters protein and peptide letters. 21(3):275-284(10); Bentham Science Publishers
Pradhan D, Padhy S, Sahoo B (2017) Enzyme classification using multiclass support vector machine and feature subset selection. Comput Biol Chem 70:211-219. https://doi.org/10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31. PMID: 28934693
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Saran, A., Ghosh, P.S., Das, U., Kalvinathan, T.C. (2024). Experiment to Find Out Suitable Machine Learning Algorithm for Enzyme Subclass Classification. In: Sharma, D.K., Peng, SL., Sharma, R., Jeon, G. (eds) Micro-Electronics and Telecommunication Engineering. ICMETE 2023. Lecture Notes in Networks and Systems, vol 894. Springer, Singapore. https://doi.org/10.1007/978-981-99-9562-2_21
Download citation
DOI: https://doi.org/10.1007/978-981-99-9562-2_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9561-5
Online ISBN: 978-981-99-9562-2
eBook Packages: EngineeringEngineering (R0)