Abstract
Knowing the roles of unknown proteins is vigorous to understand the cellular processes of parasite and the cause of disease progression. But it is highly challenging to decipher the function of an unknown protein. There are limited methods accessible for the prediction of an unknown protein function. We have used a hyper-tuned random forest that is a promising method of the classifier for reliable function predictions of an unknown protein. This method is tested for the prediction of some unknown cytosolic proteins of Leishmania donovani identified in our previous mass spectrometry-based proteomics study. L. donovani is a protozoan parasite that causes Visceral Leishmaniasis (VL: a fatal disease) in humans around the globe. Using Random Forest (RF) classifier, the obtained results in this study indicated that this method predicts the function of the unknown protein with higher precision and significance. We have employed this model to provide predictions 98% to know the role of the unknown proteins in cytoplasmic protein pool of L. donovani. This study reported the function of the unknown proteins of L. donovani which is imperative information for this parasite. They could be an important and promising target for new drug discovery vaccine candidate development. Further characterization and in-depth study of these unknown proteins may open a gate for successful therapy of fatal VL.
Similar content being viewed by others
References
Araujo RP et al (2007) Proteins, drug targets and the mechanisms they control: the simple truth about complex networks. Nat Rev Drug Discov 6:871–880
Breiman L (2001) Random forests. Mach Learn 45:5–32
Dhusia K, Kesarwani P, Yadav PK (2016) Epitope prediction for msp119 protein in Plasmodium yeolii using computational approaches. Netw Model Anal Health Inform Bioinform (Springer) 5:19
Hou Q et al (2017) Seeing the trees through the forest: sequence-based homo-and heteromeric protein-protein interaction sites prediction using random forest. Bioinformatics 33(10):1479–1487
Jin C et al (2016) 3D fast automatic segmentation of kidney based on modified AAM and random forest. IEEE Trans Med Imaging 35:1395–1407
Kumar A et al (2010) Amplified fragment length polymorphism (AFLP) analysis is useful for distinguishing Leishmania species of visceral and cutaneous forms. Acta Trop 113:202–206
Kumar A et al (2014) Mass spectrometry-based proteomic analysis of Leishmania donovani soluble proteins in Indian clinical isolate. Pathog Dis 70:84–87
Kumar A et al (2015) Proteomic analyses of membrane enriched proteins of Leishmania donovani Indian clinical isolate by mass spectrometry. Parasitol Int 64:36–42
Kumari S et al (2008) Th1-stimulatory polyproteins of soluble Leishmania donovani promastigotes ranging from 89.9 to 97.1 kDa offers long-lasting protection against experimental visceral leishmaniasis. Vaccine 26:5700–5711
Landis JR, Koch GG (2008) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Misra P et al (2009) Pro-apoptotic effect of the landrace Bangla Mahoba of Piper betle on Leishmania donovani may be due to the high content of eugenol. J Med Microbiol 58:1058–1066
Pal SK, Pal A (2001) Pattern recognition: from classical to modern approaches. World Scientific, pp 1–612
Quinlan JR (2006) Bagging, boosting, and C4.5. Proc Thirteen Natl Conf Artif Intell 5:725–730
Rahman R et al (2017) IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics 33:1407–1410
Sinha AK et al (2017) Putative drug and vaccine target identification in Leishmania donovani membrane proteins using naive bayes probabilistic classifier. IEEE Trans Comput Biol Bioinforma 14:204–211
Steinwart I, Christmann A (2008) Support vector machines. Information Science and Statistics. Springer, pp 1–601
Vazquez A et al (2003) Global protein function prediction from protein—protein interaction networks. Nat Biotechnol 21(6):697–700
Witten I et al (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Elsevier, Amsterdam, pp 1–664
Acknowledgments
We are greatly thankful for the support provided by the National Institute of Technology Raipur (CG), India for providing the facility, space and an opportunity for this work. This article does not contain any studies with human participants performed by any of the authors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, P., Kumar, A. Deciphering the function of unknown Leishmania donovani cytosolic proteins using hyperparameter-tuned random forest. Netw Model Anal Health Inform Bioinforma 9, 2 (2020). https://doi.org/10.1007/s13721-019-0208-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13721-019-0208-2