Skip to main content
Log in

VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Subcellular localization prediction of the proteome is one of major goals of large-scale genome or proteome sequencing projects to define the gene functions that could be possible with the help of computational modeling techniques. Previously, different methods have been developed for this purpose using multi-label classification system and achieved a high level of accuracy. However, during the validation of our blind dataset of plant vacuole proteins, we observed that they have poor performance with accuracy value range from ~1.3% to 48.5%. The results showed that the previously developed methods are not very accurate for the plant vacuole protein prediction and thus emphasize the need to develop a more accurate and reliable algorithm. In this study, we have developed various compositions as well as PSSM-based models and achieved a high accuracy than previously developed methods. We have shown that our best model achieved ~63% accuracy on blind dataset, which is far better than currently available tools. Furthermore, we have implemented our best models in the form of GUI-based free software called ‘VacPred’ which is compatible with both Linux and Window platform. This software is freely available for download at www.deepaklab.com/vacpred.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  • Boden M and Hawkins J 2005 Prediction of subcellular localization using sequence-biased recurrent networks. Bioinformatics 21 2279–2286

    Article  CAS  Google Scholar 

  • Boopathi V, Subramaniyam S, Malik A et al. 2019 mACPpred: A Support Vector Machine-Based Meta-Predictor for Identification of Anticancer Peptides. Int. J. Mol. Sci. 20 1964

    Article  CAS  Google Scholar 

  • Briesemeister S, Rahnenfuhrer J and Kohlbacher O 2010 YLoc—an interpretable web server for predicting subcellular localization. Nucleic Acids Res. 38 W497–W502

    Article  CAS  Google Scholar 

  • Blum T, Briesemeister S and Kohlbacher O 2009 MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10 274

    Article  Google Scholar 

  • Cheng X, Xiao X, and Chou KC 2017 pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. Mol. Biosyst. 13 1722–1727

    Article  CAS  Google Scholar 

  • Chou KC and Shen HB 2010 Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One 5 e11335

    Article  Google Scholar 

  • Dao FY, Lv H, Wang F, et al. 2019 Identify origin of replication in saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 35 2075-2083

    Article  CAS  Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, et al. 2000 Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300 1005–1016

    Article  CAS  Google Scholar 

  • Grotewold E 2006 The genetics and biochemistry of floral pigments. Annu. Rev. Plant Biol. 57 761–780

    Article  CAS  Google Scholar 

  • Hawkins J and Bodén M 2006 Detecting and sorting targeting peptides with neural networks and support vector machines. J. Bioinform. Comput. Biol. 4 1–18

    Article  CAS  Google Scholar 

  • Hooper CM, Castleden IR, Aryamanesh N, et al. 2016 Finding the subcellular location of barley, wheat, rice and maize proteins: the compendium of crop proteins with annotated locations (cropPAL). Plant Cell Physiol. 57 e9–e9

    Article  Google Scholar 

  • Horton P, Park KJ, Obayashi T, et al. 2007 WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35 W585–W587

    Article  Google Scholar 

  • Ibl V and Stoger E 2014 Live Cell imaging during germination reveals dynamic tubular structures derived from protein storage vacuoles of barley aleurone cells. Plants 3 442–457

    Article  Google Scholar 

  • Li W and Godzik A 2006 Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22 1658–1659

    Article  CAS  Google Scholar 

  • Manavalan B, Subramaniyam S, Shin TH, et al. 2018 Machine-Learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res. 17 2715–2726

    Article  Google Scholar 

  • Manavalan B, Shin TH and Lee G 2018 PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine. Front. Microbiol. 9 476

    Article  Google Scholar 

  • Manavalan B and Lee J 2017 SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 33 2496–2503

    Article  Google Scholar 

  • Marty F 1999 Plant vacuoles. Plant Cell 11 587–599

    Article  CAS  Google Scholar 

  • McGuffin LJ, Bryson K and Jones DT 2000 The PSIPRED protein structure prediction server. Bioinformatics 16 404–5

    Article  CAS  Google Scholar 

  • Mishra NK, Chang J, and Zhao PX 2014 Prediction of membrane transport proteins and their substrate specificities using primary sequence information. PLoS One 9 e100278

    Article  Google Scholar 

  • Park M, Kim SJ, Vitale A, et al. 2004 Identification of the protein storage vacuole and protein targeting to the vacuole in leaf cells of three plant species. Plant Physiol. 134 625–639

    Article  CAS  Google Scholar 

  • Pereira C, Pereira S and Pissarra J 2014 Delivering of proteins to the plant vacuole-an update. Int. J. Mol. Sci. 15 7611–7623

    Article  Google Scholar 

  • Pierleoni A, Martelli PL, Fariselli P, et al. 2006 BaCelLo: a balanced subcellular localization predictor. Bioinformatics 22 e408–e416

    Article  CAS  Google Scholar 

  • Ramana J and Gupta D 2009 LipocalinPred: A SVM-based method for prediction of lipocalins. BMC Bioinformatics 10 445

    Article  Google Scholar 

  • Saha S, Zack J, Singh B, et al. 2006 VGIchan: prediction and classification of voltage-gated ion channels. Genomics. Proteomics Bioinformatics 4 253–8

    Article  CAS  Google Scholar 

  • Sahu SS, Loaiza CD, Kaundal R, et al. 2019 Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AoB Plants 12 068

    Google Scholar 

  • Shimada T, Takagi J, Ichino T, et al. 2018 Plant Vacuoles. Annu. Rev. Plant Biol. 69 123–145

    Article  CAS  Google Scholar 

  • Srinivasan SM, Vural S, King BR, et al. 2013 Mining for class-specific motifs in protein sequence classification. BMC Bioinformatics 14 96

    Article  CAS  Google Scholar 

  • Tamanna and Ramana j 2015 MATEPRED-A-SVM-Based prediction method for multidrug and toxin extrusion (MATE) proteins. Comput. Biol. Chem. 58 199–204

    Article  Google Scholar 

  • Wang J, Yang B, Revote J, et al. 2017 POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 33 2756–2758

    Article  CAS  Google Scholar 

  • Wei L, Zhou C, Chen H, et al. 2018 ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34 4007–4016

    CAS  PubMed  PubMed Central  Google Scholar 

  • Wei L, Chen H and Su R 2018 M6APred-EL: A sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Mol. Ther. Nucleic Acid 12 635–644

    Article  CAS  Google Scholar 

  • Zavaljevski N, Stevens FJ and Reifman J 2002 Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics 18 689-696

    Article  CAS  Google Scholar 

  • Zhang C, Hicks G and Raikhel N 2015 Molecular composition of plant vacuoles: important but less understood regulations and roles of tonoplast lipids. Plants 4 320–333

    Article  Google Scholar 

  • Zhang C, Hicks GR and Raikhel NV 2014 Plant vacuole morphology and vacuolar trafficking. Front. Plant Sci. 5 476

    PubMed  PubMed Central  Google Scholar 

  • Zhang L, Zhao X and Kong L 2014 Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou[U+05F3]s pseudo amino acid composition. J. Theor. Biol. 355 105–110

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors are thankful to the DBT-BTISNET for providing the bioinformatics facility at the School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Singla.

Additional information

Corresponding editor: Sreenivas Chavali

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 35 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadav, A.K., Singla, D. VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques. J Biosci 45, 106 (2020). https://doi.org/10.1007/s12038-020-00076-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12038-020-00076-9

Keywords

Navigation