Skip to main content

Prediction of Bacterial Immunogenicity by Machine Learning Methods

  • Protocol
  • First Online:
Computational Vaccine Design

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2673))

Abstract

Prediction of bacterial immunogens is a prerequisite for the process of vaccine development through reverse vaccinology. The application of in silico methods allows significant reduction in time and cost for the discovery of potential vaccine candidates among proteins of a bacterial species. The steps in the prediction algorithm include collection of protein sequence datasets of known bacterial immunogens and non-immunogens, data preprocessing to transform the protein sequences into numerical matrices suitable for use as training and test sets for various machine learning methods, and derivation of predictive models. The performance of the derived models is evaluated by means of classification metrics.

In this chapter, we present a protocol for predicting bacterial immunogenicity by applying machine learning methods. The protocol describes the process of model development from data collection and manipulation to training and validation of the derived models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arnon R (2011) Overview of vaccine strategies. In: Rappuoli R (ed) Vaccine design. Innovative approaches and novel strategies. Caister Academic Press, Norfolk

    Google Scholar 

  2. Pizza M, Scarlato V, Masignani V, Giuliani M et al (2000) Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 287(5459):1816–1820

    Article  CAS  PubMed  Google Scholar 

  3. Bagnoli F, Norais N, Ferlenghi I, Scarselli M et al (2011) Designing vaccines in the era of genomics. In: Rappuoli R (ed) Vaccine design. Innovative approaches and novel strategies. Caister Academic Press, Norfolk, pp 21–54

    Google Scholar 

  4. Vivona S, Bernante F, Filippini F (2006) NERVE: new enhanced reverse vaccinology environment. BMC Biotechnol 6:35. https://doi.org/10.1186/1472-6750-6-35

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinf 8:4. https://doi.org/10.1186/1471-2105-8-4

    Article  CAS  Google Scholar 

  6. He Y, Xiang Z, Mobley HLT (2010) Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development. J Biomed Biotechnol 2010:297505. https://doi.org/10.1155/2010/297505

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Jaiswal V, Chanumolu SK, Gupta A, Chauhan RS, Rout C (2013) Jennerpredict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinf 14:211. https://doi.org/10.1186/1471-2105-14-211

    Article  Google Scholar 

  8. Rizwan M, Naz A, Ahmad J, Naz K, Obaid A, Parveen T et al (2017) VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology. BMC Bioinf 18:106. https://doi.org/10.1186/s12859-017-1540-0

    Article  CAS  Google Scholar 

  9. Goodswen SJ, Kennedy PJ, Ellis JT (2014) Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics 30:2381–2383. https://doi.org/10.1093/bioinformatics/btu300

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dalsass M, Brozzi A, Medini D, Rappuoli R (2019) Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery. Front Immunol 10:113. https://doi.org/10.3389/fimmu.2019.00113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Bowman BN, McAdam PR, Vivona S, Zhang JX, Luong T, Belew RK et al (2011) Improving reverse vaccinology with a machine learning approach. Vaccine 29:8156–8164. https://doi.org/10.1016/j.vaccine.2011.07.1422

    Article  PubMed  Google Scholar 

  12. Heinson AI, Gunawardana Y, Moesker B, Denman Hume CC, Vataga E, Hall Y et al (2017) Enhancing the biological relevance of machine learning classifiers for reverse vaccinology. Int J Mol Sci 18:E312. https://doi.org/10.3390/ijms18020312

    Article  CAS  Google Scholar 

  13. Hellberg S, Sjöström M, Skagerberg B, Wold S (1987) Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 30:1126–1135

    Article  CAS  PubMed  Google Scholar 

  14. Wold S, Jonsson J, Sjöström M, Sandberg M, Rännar S (1993) DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least squares projections to latent structures. Anal Chim Acta 277:239–253

    Article  CAS  Google Scholar 

  15. Dimitrov I, Zaharieva N, Doytchinova I (2020) Bacterial immunogenicity prediction by machine learning methods. Vaccines (Basel) 8(4):709. https://doi.org/10.3390/vaccines8040709

    Article  PubMed  Google Scholar 

  16. Zaharieva N, Dimitrov I, Flower DR, Doytchinova I (2019) VaxiJen dataset of bacterial immunogens: an update. Curr Comput Aided Drug Des 15(5):398–400. https://doi.org/10.2174/1573409915666190318121838

    Article  CAS  PubMed  Google Scholar 

  17. NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:D7–D19

    Article  Google Scholar 

  18. The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515

    Article  Google Scholar 

  19. Frank E, Hall MA, Witten IH (2016) The WEKA workbench. In: Online appendix for “data mining: practical machine learning tools and techniques”, 4th edn. Morgan Kaufmann, Burlington

    Google Scholar 

  20. Venkatarajan MS, Braun W (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J Mol Model 7:445–453

    Article  CAS  Google Scholar 

  21. Umetrics AB (2006) PLS. In: Multi- and megavariate data analysis part I. Umetrics Academy, Umea, p 63

    Google Scholar 

  22. Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251:26–34

    Article  Google Scholar 

  23. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27

    Article  Google Scholar 

  24. El-Manzalawy Y, Honavar V (2005) WLSVM: integrating LibSVM into Weka environment. Software available at http://www.cs.iastate.edu/yasser/wlsvm

  25. Breiman L (2001) Random forests. Mach Learn 45:5

    Article  Google Scholar 

  26. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844

    Article  Google Scholar 

  27. Li S, Harner EJ, Adjeroh DA (2014) Random KNN. In: Proceedings of the IEEE international conference on data mining workshop, Shenzhen, China, 14 December 2014

    Google Scholar 

  28. Breiman L (1997) Arcing the edge. Technical report 486. Statistics Department, University of California, Berkeley

    Google Scholar 

  29. Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  30. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13–17 August 2016

    Google Scholar 

  31. Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones Z (2016) mlr: machine learning in R. J Mach Learn Res 17(170):1–5

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Science and Education for Smart Growth Operational Program and co-financed by the European Union through the European Structural and Investment funds (Grant No BG05M2OP001-1.001-0003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ivan Dimitrov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Dimitrov, I., Doytchinova, I. (2023). Prediction of Bacterial Immunogenicity by Machine Learning Methods. In: Reche, P.A. (eds) Computational Vaccine Design. Methods in Molecular Biology, vol 2673. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3239-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3239-0_20

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3238-3

  • Online ISBN: 978-1-0716-3239-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics