Skip to main content

Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach

  • Conference paper
  • First Online:
Book cover Information, Communication and Computing Technology (ICICCT 2019)

Abstract

The interrelated complex protein sequence databases are brimming day by day with the rapid advancement in technology. The sophisticated computational techniques are required for the extraction of data from these huge loads, so that refined extracted information can be easily deployable for the progress of mankind. The human protein function prediction (HPFP) is the relevant research area whose identification or function prediction leads to the discovery of drugs, detection of disease, crop hybridization, etc. Numerous approaches are present these days for HPFP because of its wide and versatile nature of this domain. The Decision tree (DT) based white box Machine Learning (ML) approaches is enriched with computational techniques to grab the information from this important research area. This study uses the decision tree based machine learning approach together with a sequence derived features (SDF’s) extraction from the human protein sequence in order to predict the protein function. The experiment has been performed by manually extracting the human protein classes and sequences from HPRD (human protein reference database) [1]. Thereafter extract the SDF’s from the sequences with the help of proposed HP-SDFE server as well as with the help of web servers and the DT based different classifiers such as boosting, winnowing, pruning etc. has been used for HPF prediction. The efficacies of different DT classifiers are examined and compared with the existing benchmark. The importance of input configurations together with enhanced SDF’s has been thoroughly examined, which leads the individual molecular class prediction accuracy to 97%. The proposed methodology is also applicable in other similar research areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Johns-Hopkins-University and Institute-of-Bioinformatics, Human Protein Reference Database-2009 Update, January (2009). http://www.hprd.org/. Accessed 21 January, 2014

  2. I. o. See5/C5.0, Information on See5/C5.0. http://rulequest.com/see5-info.html. Accessed 05 January 2016

  3. Jansson, J.: Decision Tree Classification of Products Using C5.0 and Prediction of Workload Using Time Series Analysis, 30 HP Stockholm, Sverige (2016)

    Google Scholar 

  4. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (2005)

    MATH  Google Scholar 

  5. Freitas, A.A., Wieser, D.C., Apweiler, R.: On the importance of comprehensible classification model for protein function prediction. In: IEEE/ACM Transporation Computer Biology Bioinformatics, vol. 7, no. 1, pp. 172–182, January–March 2010

    Article  Google Scholar 

  6. Shehu, A., Barbará, D., Molloy, K.: A survey of computational methods for protein function prediction. In: Wong, K.-C. (ed.) Big Data Analytics in Genomics, pp. 225–298. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41279-5_7

    Chapter  Google Scholar 

  7. Quinlan, J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  8. King, R.D., Karwath, A., Clare, A., Dehaspe, L.: Accurate prediction of protein functional class from sequence in the mycobacterium tuberculosis and escherichia coli genomes using data Mining. Yeast 17(4), 283–293 (2000)

    Article  Google Scholar 

  9. Jensen, L.J., et al.: Prediction of human protein function from post-translational modifications and localization Features. J. Mol. Biol. 319(5), 1257–1265 (2002)

    Article  Google Scholar 

  10. Friedberg, I.: Automated protein function prediction-the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)

    Article  Google Scholar 

  11. Singh, M., Sandhu, P., Singh, H.: Decision tree classifier for human protein function prediction. In: International Conference on Advanced Computing and Communications, National Institute of Technology, Surathkal, Karnatka, India (2006)

    Google Scholar 

  12. Clare, A., Karwath, A., Ougham, H., King, R.D.: Functional bioinformatics for Arabidopsis thaliana. Bioinformatics (Data and text mining) 22(9), 1130–1136 (2006)

    Google Scholar 

  13. Singh, M., Singh, P., Wadhwa, P.K.: Human protein function prediction using decision tree induction. Int. J. Comput. Sci. Netw. Secur. 7(4), 92–98 (2007)

    Google Scholar 

  14. Singh, M., Singh, G.: Cluster analysis technique based on bipartite graph for human protein class prediction. Int. J. Comput. Appl. 20(3), 22–27 (2011). (0975–8887)

    Google Scholar 

  15. Yang, A., Li, R., Zhu, W., Yue, G.: A novel method for protein function prediction based on sequence numerical features. MATCH Commun. Math. Comput. Chem. 67, 833–843 (2012). (ISSN: 0340-6253)

    MathSciNet  Google Scholar 

  16. Singh, M., Singh, D.G., Kahlon, D.K.S.: Machine learning classifiers for human protein function prediction. Int. J. Comput. Sci. Telecommun. 3(10), 21–25 (2012)

    Google Scholar 

  17. Ofer, D., Linial, M.: ProFET: feature engineering captures high-level protein functions. Bioinformatics 31(21), 3429–3436 (2015)

    Article  Google Scholar 

  18. Singh, R., Singh, R., Kaur, D.P.: Improved protein function classification using support vector machine. Int. J. Comput. Sci. Inf. Technol. 6(2), 964–968 (2015)

    Google Scholar 

  19. Singh, A., Sharma, S., Singh, R., Singh, G., Kaur, A.: Quality of service enhanced framework for disease detection and drug discovery. Int. J. Comput. Sci. Eng. 6(9), 130–136 (2018)

    Google Scholar 

  20. PROFEAT-Server, “PROFEAT 2015 HOME,”. http://bidd2.nus.edu.sg/cgi-bin/prof2015/prof_home.cgi. Accessed 15 June 2017

  21. PSORT-WWW-Server, “PSORT WWW Server,”. http://psort.hgc.jp/. Accessed 12 July 2017

  22. TMHMM-Server, “TMHMM Server, v. 2.0,”. http://www.cbs.dtu.dk/services/TMHMM/. Accessed 10 September 2017

  23. NetNGlyc-Server, “NetNGlyc 1.0 Server,”. http://www.cbs.dtu.dk/services/NetNGlyc/. Accessed 03 December 2017

  24. Singh, M.: Machine learning classifiers for human protein function prediction. Department of Computer Science and Engineering, Guru Nanak Dev University, Amritsar, Punjab, India (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunny Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, S., Singh, G., Singh, R. (2019). Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach. In: Gani, A., Das, P., Kharb, L., Chahal, D. (eds) Information, Communication and Computing Technology. ICICCT 2019. Communications in Computer and Information Science, vol 1025. Springer, Singapore. https://doi.org/10.1007/978-981-15-1384-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1384-8_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1383-1

  • Online ISBN: 978-981-15-1384-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics