Abstract
The interrelated complex protein sequence databases are brimming day by day with the rapid advancement in technology. The sophisticated computational techniques are required for the extraction of data from these huge loads, so that refined extracted information can be easily deployable for the progress of mankind. The human protein function prediction (HPFP) is the relevant research area whose identification or function prediction leads to the discovery of drugs, detection of disease, crop hybridization, etc. Numerous approaches are present these days for HPFP because of its wide and versatile nature of this domain. The Decision tree (DT) based white box Machine Learning (ML) approaches is enriched with computational techniques to grab the information from this important research area. This study uses the decision tree based machine learning approach together with a sequence derived features (SDF’s) extraction from the human protein sequence in order to predict the protein function. The experiment has been performed by manually extracting the human protein classes and sequences from HPRD (human protein reference database) [1]. Thereafter extract the SDF’s from the sequences with the help of proposed HP-SDFE server as well as with the help of web servers and the DT based different classifiers such as boosting, winnowing, pruning etc. has been used for HPF prediction. The efficacies of different DT classifiers are examined and compared with the existing benchmark. The importance of input configurations together with enhanced SDF’s has been thoroughly examined, which leads the individual molecular class prediction accuracy to 97%. The proposed methodology is also applicable in other similar research areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Johns-Hopkins-University and Institute-of-Bioinformatics, Human Protein Reference Database-2009 Update, January (2009). http://www.hprd.org/. Accessed 21 January, 2014
I. o. See5/C5.0, Information on See5/C5.0. http://rulequest.com/see5-info.html. Accessed 05 January 2016
Jansson, J.: Decision Tree Classification of Products Using C5.0 and Prediction of Workload Using Time Series Analysis, 30 HP Stockholm, Sverige (2016)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (2005)
Freitas, A.A., Wieser, D.C., Apweiler, R.: On the importance of comprehensible classification model for protein function prediction. In: IEEE/ACM Transporation Computer Biology Bioinformatics, vol. 7, no. 1, pp. 172–182, January–March 2010
Shehu, A., Barbará, D., Molloy, K.: A survey of computational methods for protein function prediction. In: Wong, K.-C. (ed.) Big Data Analytics in Genomics, pp. 225–298. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41279-5_7
Quinlan, J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
King, R.D., Karwath, A., Clare, A., Dehaspe, L.: Accurate prediction of protein functional class from sequence in the mycobacterium tuberculosis and escherichia coli genomes using data Mining. Yeast 17(4), 283–293 (2000)
Jensen, L.J., et al.: Prediction of human protein function from post-translational modifications and localization Features. J. Mol. Biol. 319(5), 1257–1265 (2002)
Friedberg, I.: Automated protein function prediction-the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)
Singh, M., Sandhu, P., Singh, H.: Decision tree classifier for human protein function prediction. In: International Conference on Advanced Computing and Communications, National Institute of Technology, Surathkal, Karnatka, India (2006)
Clare, A., Karwath, A., Ougham, H., King, R.D.: Functional bioinformatics for Arabidopsis thaliana. Bioinformatics (Data and text mining) 22(9), 1130–1136 (2006)
Singh, M., Singh, P., Wadhwa, P.K.: Human protein function prediction using decision tree induction. Int. J. Comput. Sci. Netw. Secur. 7(4), 92–98 (2007)
Singh, M., Singh, G.: Cluster analysis technique based on bipartite graph for human protein class prediction. Int. J. Comput. Appl. 20(3), 22–27 (2011). (0975–8887)
Yang, A., Li, R., Zhu, W., Yue, G.: A novel method for protein function prediction based on sequence numerical features. MATCH Commun. Math. Comput. Chem. 67, 833–843 (2012). (ISSN: 0340-6253)
Singh, M., Singh, D.G., Kahlon, D.K.S.: Machine learning classifiers for human protein function prediction. Int. J. Comput. Sci. Telecommun. 3(10), 21–25 (2012)
Ofer, D., Linial, M.: ProFET: feature engineering captures high-level protein functions. Bioinformatics 31(21), 3429–3436 (2015)
Singh, R., Singh, R., Kaur, D.P.: Improved protein function classification using support vector machine. Int. J. Comput. Sci. Inf. Technol. 6(2), 964–968 (2015)
Singh, A., Sharma, S., Singh, R., Singh, G., Kaur, A.: Quality of service enhanced framework for disease detection and drug discovery. Int. J. Comput. Sci. Eng. 6(9), 130–136 (2018)
PROFEAT-Server, “PROFEAT 2015 HOME,”. http://bidd2.nus.edu.sg/cgi-bin/prof2015/prof_home.cgi. Accessed 15 June 2017
PSORT-WWW-Server, “PSORT WWW Server,”. http://psort.hgc.jp/. Accessed 12 July 2017
TMHMM-Server, “TMHMM Server, v. 2.0,”. http://www.cbs.dtu.dk/services/TMHMM/. Accessed 10 September 2017
NetNGlyc-Server, “NetNGlyc 1.0 Server,”. http://www.cbs.dtu.dk/services/NetNGlyc/. Accessed 03 December 2017
Singh, M.: Machine learning classifiers for human protein function prediction. Department of Computer Science and Engineering, Guru Nanak Dev University, Amritsar, Punjab, India (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sharma, S., Singh, G., Singh, R. (2019). Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach. In: Gani, A., Das, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-15-1384-8_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-1384-8_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1383-1
Online ISBN: 978-981-15-1384-8
eBook Packages: Computer ScienceComputer Science (R0)