Abstract
Subcellular localization prediction of the proteome is one of major goals of large-scale genome or proteome sequencing projects to define the gene functions that could be possible with the help of computational modeling techniques. Previously, different methods have been developed for this purpose using multi-label classification system and achieved a high level of accuracy. However, during the validation of our blind dataset of plant vacuole proteins, we observed that they have poor performance with accuracy value range from ~1.3% to 48.5%. The results showed that the previously developed methods are not very accurate for the plant vacuole protein prediction and thus emphasize the need to develop a more accurate and reliable algorithm. In this study, we have developed various compositions as well as PSSM-based models and achieved a high accuracy than previously developed methods. We have shown that our best model achieved ~63% accuracy on blind dataset, which is far better than currently available tools. Furthermore, we have implemented our best models in the form of GUI-based free software called ‘VacPred’ which is compatible with both Linux and Window platform. This software is freely available for download at www.deepaklab.com/vacpred.
Similar content being viewed by others
References
Boden M and Hawkins J 2005 Prediction of subcellular localization using sequence-biased recurrent networks. Bioinformatics 21 2279–2286
Boopathi V, Subramaniyam S, Malik A et al. 2019 mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int. J. Mol. Sci. 20 1964
Briesemeister S, Rahnenfuhrer J and Kohlbacher O 2010 YLoc—an interpretable web server for predicting subcellular localization. Nucleic Acids Res. 38 W497–W502
Blum T, Briesemeister S and Kohlbacher O 2009 MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10 274
Cheng X, Xiao X, and Chou KC 2017 pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. Mol. Biosyst. 13 1722–1727
Chou KC and Shen HB 2010 Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One 5 e11335
Dao FY, Lv H, Wang F, et al. 2019 Identify origin of replication in saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 35 2075-2083
Emanuelsson O, Nielsen H, Brunak S, et al. 2000 Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300 1005–1016
Grotewold E 2006 The genetics and biochemistry of floral pigments. Annu. Rev. Plant Biol. 57 761–780
Hawkins J and Bodén M 2006 Detecting and sorting targeting peptides with neural networks and support vector machines. J. Bioinform. Comput. Biol. 4 1–18
Hooper CM, Castleden IR, Aryamanesh N, et al. 2016 Finding the subcellular location of barley, wheat, rice and maize proteins: the compendium of crop proteins with annotated locations (cropPAL). Plant Cell Physiol. 57 e9–e9
Horton P, Park KJ, Obayashi T, et al. 2007 WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35 W585–W587
Ibl V and Stoger E 2014 Live Cell imaging during germination reveals dynamic tubular structures derived from protein storage vacuoles of barley aleurone cells. Plants 3 442–457
Li W and Godzik A 2006 Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22 1658–1659
Manavalan B, Subramaniyam S, Shin TH, et al. 2018 Machine-Learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res. 17 2715–2726
Manavalan B, Shin TH and Lee G 2018 PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front. Microbiol. 9 476
Manavalan B and Lee J 2017 SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 33 2496–2503
Marty F 1999 Plant vacuoles. Plant Cell 11 587–599
McGuffin LJ, Bryson K and Jones DT 2000 The PSIPRED protein structure prediction server. Bioinformatics 16 404–5
Mishra NK, Chang J, and Zhao PX 2014 Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One 9 e100278
Park M, Kim SJ, Vitale A, et al. 2004 Identification of the protein storage vacuole and protein targeting to the vacuole in leaf cells of three plant species. Plant Physiol. 134 625–639
Pereira C, Pereira S and Pissarra J 2014 Delivering of proteins to the plant vacuole-an update. Int. J. Mol. Sci. 15 7611–7623
Pierleoni A, Martelli PL, Fariselli P, et al. 2006 BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22 e408–e416
Ramana J and Gupta D 2009 LipocalinPred: A SVM-based method for prediction of lipocalins. BMC Bioinformatics 10 445
Saha S, Zack J, Singh B, et al. 2006 VGIchan: prediction and classification of voltage-gated ion channels. Genomics. Proteomics Bioinformatics 4 253–8
Sahu SS, Loaiza CD, Kaundal R, et al. 2019 Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AoB Plants 12 068
Shimada T, Takagi J, Ichino T, et al. 2018 Plant Vacuoles. Annu. Rev. Plant Biol. 69 123–145
Srinivasan SM, Vural S, King BR, et al. 2013 Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics 14 96
Tamanna and Ramana j 2015 MATEPRED-A-SVM-Based prediction method for multidrug and toxin extrusion (MATE) proteins. Comput. Biol. Chem. 58 199–204
Wang J, Yang B, Revote J, et al. 2017 POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 33 2756–2758
Wei L, Zhou C, Chen H, et al. 2018 ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34 4007–4016
Wei L, Chen H and Su R 2018 M6APred-EL: A sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Mol. Ther. Nucleic Acid 12 635–644
Zavaljevski N, Stevens FJ and Reifman J 2002 Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics 18 689-696
Zhang C, Hicks G and Raikhel N 2015 Molecular composition of plant vacuoles: important but less understood regulations and roles of tonoplast lipids. Plants 4 320–333
Zhang C, Hicks GR and Raikhel NV 2014 Plant vacuole morphology and vacuolar trafficking. Front. Plant Sci. 5 476
Zhang L, Zhao X and Kong L 2014 Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou[U+05F3]s pseudo amino acid composition. J. Theor. Biol. 355 105–110
Acknowledgements
The authors are thankful to the DBT-BTISNET for providing the bioinformatics facility at the School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana.
Author information
Authors and Affiliations
Corresponding author
Additional information
Corresponding editor: Sreenivas Chavali
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yadav, A.K., Singla, D. VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques. J Biosci 45, 106 (2020). https://doi.org/10.1007/s12038-020-00076-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12038-020-00076-9