Abstract
Pluripotency is a unique property of stem cells that allows them to differentiate into all types of adult cells or maintain the self-renewal property. PluriPred predicts whether a protein is involved in pluripotency from primary protein sequence using manually curated pluripotent proteins as training datasets. Machine learning techniques (MLTs) such as Support Vector Machine (SVM), Naïve Base (NB), Random Forest (RF), and sequence alignment technique BLAST were used in our study. The combination of SVM and PSI-BLAST was our proposed best model, which obtained a sensitivity of 77.40%, specificity of 79.72%, accuracy of 79.2%, and area under the ROC curve was 0.82 using 5-fold cross-validation. Furthermore, PluriPred gives the confidence of the prediction from training dataset’s SVM score distribution and p-value from BLAST. We validated our proposed model with the other existing high-throughput studies using blind/independent datasets. Using PluriPred, 233 novel core and 323 novel extended core pluripotent proteins from mouse proteome, and 167 novel core and 385 extended core pluripotent proteins from human proteome, were predicted with high confidence. The Web application of PluriPred is available from bicresources.jcbose.ac.in/ssaha4/pluripred/. Many pluripotent genes/proteins take part in protein-protein networks associated with stem cell, cancer, and developmental biology, and we believe that PluriPred will help in these research.
Similar content being viewed by others
References
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402
Berriz GF, Beaver JE, Cenik C, Tasan M and Roth FP 2009 Next generation software for functional trend analysis. Bioinformatics 25 3043–3044
Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, et al. 2005 Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122 947–956
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL 2009 BLAST+: architecture and applications. BMC Bioinf. 10
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P and Witten IH 2009 The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11 10–18
Joachims T 1999 Making large-scale SVM learning practical; in Advances in Kernel methods - support vector learning (MIT Press) pp 169–184
Li W and Godzik A 2006 Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22 1658–1659
Muller FJ, Laurent LC, Kostka D, Ulitsky I, Williams R, Lu C, Park IH, Rao MS, et al. 2008 Regulatory networks define phenotypic classes of human stem cell lines. Nature 455 401–405
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, et al. 2014 The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42 D358–D363
Saha S and Raghava GPS 2006 AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res. 34 W202–W209
Scheubert L, Schmidt R, Repsilber D, Lustrek M and Fuellen G 2011 Learning biomarkers of pluripotent stem cells in mouse. DNA Res. 18 233–51
Som A, Harder C, Greber B, Siatkowski M, Paudel Y, Warsow G, Cap C, Schöler H, et al. 2010 The PluriNetWork: an electronic representation of the network underlying pluripotency in mouse, and its applications. PLoS One 5 e15165
Takahashi K and Yamanaka S 2006 Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126 663–676
The UniProt Consortium 2015 UniProt: a hub for protein information. Nucleic Acids Res. 2015 D204–D212
Tonge PD, Corso AJ, Monetti C, Hussein SMI, Puri MC, Michael IP, Li M, Lee DS, et al. 2014 Divergent reprogramming routes lead to alternative stem-cell states. Nature 516 192–197
Wang A, Zhong Y, Wang Y and He Q 2014a A web server of cell type discrimination system. Sci. World J. 2014, Article ID 459064
Wang Y, Thilmony R and Gu YQ 2014b NetVenn: an integrated network analysis web platform for gene lists. Nucleic Acids Res. 42 W161–W166
Xu H, Lemischka IR and Ma'ayan A 2010 SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells. BMC Syst. Biol. 4 1–10
Xu H, Baroukh C, Dannenfelser R, Chen EY, Tan CM, Kou Y, Kim YE, Lemischka IR, et al. 2013 ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells. Database (Oxford) 2013, bat045
Acknowledgements
We thank Tanmoy Jana, Debasree Sarkar, Sumit Mukherjee, and Souvik Sinha for their valuable comments for developing the server. We also give a special thanks to the Bioinformatics Centre, Bose Institute, for providing us the computational facility to do the work.
Author information
Authors and Affiliations
Corresponding author
Additional information
[Mandal SD and Saha S 2016 PluriPred: A Web server for predicting proteins involved in pluripotent network. J. Biosci.]
Supplementary materials pertaining to this article are available on the Journal of Biosciences Website.
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 731 kb)
Rights and permissions
About this article
Cite this article
Mandal, S.D., Saha, S. PluriPred: A Web server for predicting proteins involved in pluripotent network. J Biosci 41, 743–750 (2016). https://doi.org/10.1007/s12038-016-9649-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-016-9649-2