Building Classifier Ensembles for B-Cell Epitope Prediction

  • Yasser EL-Manzalawy
  • Vasant Honavar
Part of the Methods in Molecular Biology book series (MIMB, volume 1184)


Identification of B-cell epitopes in target antigens is a critical step in epitope-driven vaccine design, immunodiagnostic tests, and antibody production. B-cell epitopes could be linear, i.e., a contiguous amino acid sequence fragment of an antigen, or conformational, i.e., amino acids that are often not contiguous in the primary sequence but appear in close proximity within the folded 3D antigen structure. Numerous computational methods have been proposed for predicting both types of B-cell epitopes. However, the development of tools for reliably predicting B-cell epitopes remains a major challenge in immunoinformatics.

Classifier ensembles a promising approach for combining a set of classifiers such that the overall performance of the resulting ensemble is better than the predictive performance of the best individual classifier. In this chapter, we show how to build a classifier ensemble for improved prediction of linear B-cell epitopes. The method can be easily adapted to build classifier ensembles for predicting conformational epitopes.

Key words

B-cell epitope prediction Classifiers ensemble Random forest Epitope prediction toolkit 



This work was supported in part by a grant from the National Institutes of Health (NIH GM066387) and by Edward Frymoyer Chair of Information Sciences and Technology at Pennsylvania State University.


  1. 1.
    Abbas AK, Lichtman AH, Pillai S (2007) Cellular and molecular immunology, 6th edn. Saunders Elsevier, PhiladelphiaGoogle Scholar
  2. 2.
    Reineke U, Schutkowski M (2009) Epitope mapping protocols, vol 524, 2nd edn, Methods in molecular biology. Humana Press, New YorkGoogle Scholar
  3. 3.
    Ansari HR, Raghava GP (2013) In silico models for B-cell epitope recognition and signaling. Methods Mol Biol 993:129–138PubMedCrossRefGoogle Scholar
  4. 4.
    El-Manzalawy Y, Honavar V (2010) Recent advances in B-cell epitope prediction methods. Immunome Res 6(Suppl 2):S2PubMedCentralPubMedCrossRefGoogle Scholar
  5. 5.
    Yao B, Zheng D, Liang S et al (2013) Conformational B-cell epitope prediction on antigen protein structures: a review of current algorithms and comparison with common binding site prediction methods. PLoS One 8(4):e62249PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Emini EA, Hughes JV, Perlow D et al (1985) Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. J Virol 55(3):836–839PubMedCentralPubMedGoogle Scholar
  7. 7.
    Karplus P, Schulz G (1985) Prediction of chain flexibility in proteins. Naturwissenschaften 72(4):212–213CrossRefGoogle Scholar
  8. 8.
    Parker JM, Guo D, Hodges RS (1986) New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochemistry 25(19):5425–5432PubMedCrossRefGoogle Scholar
  9. 9.
    Pellequer J-L, Westhof E, Van Regenmortel MH (1993) Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunol Lett 36(1):83–99PubMedCrossRefGoogle Scholar
  10. 10.
    El-Manzalawy Y, Dobbs D, Honavar V (2008) Predicting linear B-cell epitopes using string kernels. J Mol Recognit 21(4):243–255. doi: 10.1002/jmr.893 PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    El-Manzalawy Y, Dobbs D (2008) Honavar V (3400678) Predicting flexible length linear B-cell epitopes. Comput Syst Bioinformatics, In, pp 121–132Google Scholar
  12. 12.
    Larsen JE, Lund O, Nielsen M (2006) Improved method for predicting linear B-cell epitopes. Immunome Res 2:2. doi: 10.1186/1745-7580-2-2 PubMedCentralPubMedCrossRefGoogle Scholar
  13. 13.
    Saha S, Raghava GP (2006) Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins 65(1):40–48PubMedCrossRefGoogle Scholar
  14. 14.
    Sweredoski MJ, Baldi P (2009) COBEpro: a novel system for predicting continuous B-cell epitopes. Protein Eng Des Sel 22(3):113–120PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Haste Andersen P, Nielsen M, Lund O (2006) Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 15(11):2558–2567PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Kringelum JV, Lundegaard C, Lund O et al (2012) Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS Comput Biol 8(12):e1002829PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Ponomarenko J, Bui H-H, Li W et al (2008) ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics 9(1):514PubMedCentralPubMedCrossRefGoogle Scholar
  18. 18.
    Sun J, Wu D, Xu T et al (2009) SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res 37(suppl 2):W612–W616PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Sweredoski MJ, Baldi P (2008) PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics 24(12):1459–1460PubMedCrossRefGoogle Scholar
  20. 20.
    Resende DM, Rezende AM, Oliveira NJ et al (2012) An assessment on epitope prediction methods for protozoa genomes. BMC Bioinformatics 13:309PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Wozniak M (2013) Hybrid Classifiers: Methods of Data, Knowledge, and Classifier Combination, vol 519. Studies in Computational Intelligence, Springer Heidelberg LondonGoogle Scholar
  22. 22.
    El-Manzalawy Y (2010) Honavar V A framework for developing epitope prediction tools. In: Proceedings of the First ACM International conference on bioinformatics and computational biology. ACM, pp 660–662Google Scholar
  23. 23.
    Saha S, Bhasin M, Raghava GP (2005) Bcipep: a database of B-cell epitopes. BMC Genomics 6:79PubMedCentralPubMedCrossRefGoogle Scholar
  24. 24.
    Frank E, Hall M, Holmes G, Kirkby R, Pfahringer B, Witten IH, Trigg L (2005) Weka: A machine learning workbench for data mining. In Data Mining and Knowledge Discovery Handbook (pp 1305–1314) Springer USGoogle Scholar
  25. 25.
    Jungermann F Information extraction with rapidminer. In: Proceedings of the GSCL Symposium’Sprachtechnologie und eHumanities, 2009. pp 50–61Google Scholar
  26. 26.
    Berthold MR, Cebron N, Dill F et al (2008) KNIME: The Konstanz information miner. Data Analysis, Machine Learning and Applications. Springer Berlin Heidelberg, In, pp 319–326Google Scholar
  27. 27.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  28. 28.
    Chen J, Liu H, Yang J et al (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33(3):423–428PubMedCrossRefGoogle Scholar
  29. 29.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259CrossRefGoogle Scholar
  30. 30.
    Cai C, Han L, Ji ZL et al (2003) SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res 31(13):3692–3697PubMedCentralPubMedCrossRefGoogle Scholar
  31. 31.
    Bernard S, Heutte L, Adam S (2009) Towards a better understanding of random forests through the study of strength and correlation. Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. Springer, In, pp 536–545Google Scholar
  32. 32.
    Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  34. 34.
    Freund Y (1996) Schapire RE Experiments with a new boosting algorithm. ICML, In, pp 148–156Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Systems and Computer EngineeringAl-Azhar UniversityCairoEgypt
  2. 2.College of Information Sciences and Technology, Huck Institutes of the Life SciencesPennsylvania State UniversityUniversity ParkUSA

Personalised recommendations