Abstract
Prediction of bacterial immunogens is a prerequisite for the process of vaccine development through reverse vaccinology. The application of in silico methods allows significant reduction in time and cost for the discovery of potential vaccine candidates among proteins of a bacterial species. The steps in the prediction algorithm include collection of protein sequence datasets of known bacterial immunogens and non-immunogens, data preprocessing to transform the protein sequences into numerical matrices suitable for use as training and test sets for various machine learning methods, and derivation of predictive models. The performance of the derived models is evaluated by means of classification metrics.
In this chapter, we present a protocol for predicting bacterial immunogenicity by applying machine learning methods. The protocol describes the process of model development from data collection and manipulation to training and validation of the derived models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arnon R (2011) Overview of vaccine strategies. In: Rappuoli R (ed) Vaccine design. Innovative approaches and novel strategies. Caister Academic Press, Norfolk
Pizza M, Scarlato V, Masignani V, Giuliani M et al (2000) Identification of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 287(5459):1816–1820
Bagnoli F, Norais N, Ferlenghi I, Scarselli M et al (2011) Designing vaccines in the era of genomics. In: Rappuoli R (ed) Vaccine design. Innovative approaches and novel strategies. Caister Academic Press, Norfolk, pp 21–54
Vivona S, Bernante F, Filippini F (2006) NERVE: new enhanced reverse vaccinology environment. BMC Biotechnol 6:35. https://doi.org/10.1186/1472-6750-6-35
Doytchinova IA, Flower DR (2007) VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinf 8:4. https://doi.org/10.1186/1471-2105-8-4
He Y, Xiang Z, Mobley HLT (2010) Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development. J Biomed Biotechnol 2010:297505. https://doi.org/10.1155/2010/297505
Jaiswal V, Chanumolu SK, Gupta A, Chauhan RS, Rout C (2013) Jennerpredict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinf 14:211. https://doi.org/10.1186/1471-2105-14-211
Rizwan M, Naz A, Ahmad J, Naz K, Obaid A, Parveen T et al (2017) VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology. BMC Bioinf 18:106. https://doi.org/10.1186/s12859-017-1540-0
Goodswen SJ, Kennedy PJ, Ellis JT (2014) Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics 30:2381–2383. https://doi.org/10.1093/bioinformatics/btu300
Dalsass M, Brozzi A, Medini D, Rappuoli R (2019) Comparison of open-source reverse vaccinology programs for bacterial vaccine antigen discovery. Front Immunol 10:113. https://doi.org/10.3389/fimmu.2019.00113
Bowman BN, McAdam PR, Vivona S, Zhang JX, Luong T, Belew RK et al (2011) Improving reverse vaccinology with a machine learning approach. Vaccine 29:8156–8164. https://doi.org/10.1016/j.vaccine.2011.07.1422
Heinson AI, Gunawardana Y, Moesker B, Denman Hume CC, Vataga E, Hall Y et al (2017) Enhancing the biological relevance of machine learning classifiers for reverse vaccinology. Int J Mol Sci 18:E312. https://doi.org/10.3390/ijms18020312
Hellberg S, Sjöström M, Skagerberg B, Wold S (1987) Peptide quantitative structure-activity relationships, a multivariate approach. J Med Chem 30:1126–1135
Wold S, Jonsson J, Sjöström M, Sandberg M, Rännar S (1993) DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least squares projections to latent structures. Anal Chim Acta 277:239–253
Dimitrov I, Zaharieva N, Doytchinova I (2020) Bacterial immunogenicity prediction by machine learning methods. Vaccines (Basel) 8(4):709. https://doi.org/10.3390/vaccines8040709
Zaharieva N, Dimitrov I, Flower DR, Doytchinova I (2019) VaxiJen dataset of bacterial immunogens: an update. Curr Comput Aided Drug Des 15(5):398–400. https://doi.org/10.2174/1573409915666190318121838
NCBI Resource Coordinators (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44:D7–D19
The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. In: Online appendix for “data mining: practical machine learning tools and techniques”, 4th edn. Morgan Kaufmann, Burlington
Venkatarajan MS, Braun W (2001) New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. J Mol Model 7:445–453
Umetrics AB (2006) PLS. In: Multi- and megavariate data analysis part I. Umetrics Academy, Umea, p 63
Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251:26–34
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27
El-Manzalawy Y, Honavar V (2005) WLSVM: integrating LibSVM into Weka environment. Software available at http://www.cs.iastate.edu/yasser/wlsvm
Breiman L (2001) Random forests. Mach Learn 45:5
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844
Li S, Harner EJ, Adjeroh DA (2014) Random KNN. In: Proceedings of the IEEE international conference on data mining workshop, Shenzhen, China, 14 December 2014
Breiman L (1997) Arcing the edge. Technical report 486. Statistics Department, University of California, Berkeley
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, CA, USA, 13–17 August 2016
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones Z (2016) mlr: machine learning in R. J Mach Learn Res 17(170):1–5
Acknowledgment
This work was supported by the Science and Education for Smart Growth Operational Program and co-financed by the European Union through the European Structural and Investment funds (Grant No BG05M2OP001-1.001-0003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Dimitrov, I., Doytchinova, I. (2023). Prediction of Bacterial Immunogenicity by Machine Learning Methods. In: Reche, P.A. (eds) Computational Vaccine Design. Methods in Molecular Biology, vol 2673. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3239-0_20
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3239-0_20
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3238-3
Online ISBN: 978-1-0716-3239-0
eBook Packages: Springer Protocols