Novel Machine Learning Methods for MHC Class I Binding Prediction

  • Christian Widmer
  • Nora C. Toussaint
  • Yasemin Altun
  • Oliver Kohlbacher
  • Gunnar Rätsch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6282)


MHC class I molecules are key players in the human immune system. They bind small peptides derived from intracellular proteins and present them on the cell surface for surveillance by the immune system. Prediction of such MHC class I binding peptides is a vital step in the design of peptide-based vaccines and therefore one of the major problems in computational immunology. Thousands of different types of MHC class I molecules exist, each displaying a distinct binding specificity. The lack of sufficient training data for the majority of these molecules hinders the application of Machine Learning to this problem.

We propose two approaches to improve the predictive power of kernel-based Machine Learning methods for MHC class I binding prediction: First, a modification of the Weighted Degree string kernel that allows for the incorporation of amino acid properties. Second, we propose an enhanced Multitask kernel and an optimization procedure to fine-tune the kernel parameters. The combination of both approaches yields improved performance, which we demonstrate on the IEDB benchmark data set.


Machine Learn Method Multiple Kernel Learn Weighted Degree Amino Acid Property Binding Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Adams, H.P., Koziol, J.A.: Prediction of binding to MHC class I molecules. Journal of Immunological Methods 185(2), 181–190 (1995)CrossRefPubMedGoogle Scholar
  2. 2.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Dönnes, P., Elofsson, A.: Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 3, 25 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Evgeniou, T., Pontil, M.: Regularized multi–task learning. In: Kim, W., Kohavi, R., Gehrke, J., DuMouchel, W. (eds.) Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August 22-25, pp. 109–117. ACM, New York (2004)Google Scholar
  5. 5.
    Gehler, P., Nowozin, S.: Infinite kernel learning. In: NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels (2008)Google Scholar
  6. 6.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89(22), 10915–10919 (1992)Google Scholar
  7. 7.
    Jacob, L., Bach, F., Vert, J.P.: Clustered Multi-Task Learning: A Convex Formulation. In: NIPS, pp. 745–752. MIT Press, Cambridge (2009)Google Scholar
  8. 8.
    Jacob, L., Vert, J.P.: Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics 24(3), 358 (2008)CrossRefPubMedGoogle Scholar
  9. 9.
    Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A., Laskov, P., Müller, K.R.: Efficient and accurate LP-norm MKL. In: Advances in Neural Information Processing Systems, vol. 22 (2009)Google Scholar
  10. 10.
    Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. In: Proceedings IEEE Computational Systems Bioinformatics Conference (2004)Google Scholar
  11. 11.
    Moll, A., Hildebrandt, A., Lenhof, H., Kohlbacher, O.: BALLView: an object-oriented molecular visualization and modeling framework. J. Comput. Aided Mol. Des. 19(11), 791–800 (2005)CrossRefPubMedGoogle Scholar
  12. 12.
    Peters, B., Bui, H.H., Frankild, S., Nielsen, M., Lundegaard, C., Kostem, E., Basch, D., Lamberth, K., Harndahl, M., Fleri, W., Wilson, S.S., Sidney, J., Lund, O., Buus, S., Sette, A.: A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules. PLoS Comput. Biol. 2(6), e65 (2006)CrossRefGoogle Scholar
  13. 13.
    Pfeifer, N., Kohlbacher, O.: Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS (LNBI), vol. 5251, pp. 210–221. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Rammensee, H., Bachmann, J., Emmerich, N.P., Bachor, O.A., Stevanovic, S.: SYFPEITHI: Database for MHC ligands and peptide motifs. Immunogenetics 50, 213–219 (1999)CrossRefPubMedGoogle Scholar
  15. 15.
    Rätsch, G., Sonnenburg, S.: Accurate Splice Site Detection for Caenorhabditis elegans. In: Schölkopf, B., Vert, K.T. (eds.) Kernel Methods in Computational Biology, pp. 277–298. MIT Press, Cambridge (2004)Google Scholar
  16. 16.
    Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K.R., Sommer, R.J., Schölkopf, B.: Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Comput. Biol. 3(2), e20 (2007)CrossRefGoogle Scholar
  17. 17.
    Reche, P.A., Glutting, J.P., Reinherz, E.L.: Prediction of MHC class I binding peptides using profile motifs. Hum. Immunol. 63(9), 701–709 (2002)CrossRefPubMedGoogle Scholar
  18. 18.
    Schölkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge (1999)Google Scholar
  19. 19.
    Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)CrossRefPubMedGoogle Scholar
  20. 20.
    Schweikert, G., Zien, A., Zeller, G., Behr, J., Dieterich, C., Ong, C.S., Philips, P., De Bona, F., Hartmann, L., Bohlen, A., Krüger, N., Sonnenburg, S., Rätsch, G.: mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19(11), 2133–2143 (2009)CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7, 1531–1565 (2006)Google Scholar
  22. 22.
    Toussaint, N.C., Kohlbacher, O.: Towards in silico design of epitope-based vaccines. Expert Opinion on Drug Discovery 4(10) (2009)Google Scholar
  23. 23.
    Toussaint, N.C., Widmer, C., Kohlbacher, O., Rätsch, G.: Exploiting physico-chemical properties in string kernels. BMC Bioinformatics (submitted, 2010)Google Scholar
  24. 24.
    Tung, C.-W., Ho, S.-Y.: POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics 23(8), 942–949 (2007)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Christian Widmer
    • 1
  • Nora C. Toussaint
    • 2
  • Yasemin Altun
    • 3
  • Oliver Kohlbacher
    • 2
  • Gunnar Rätsch
    • 1
  1. 1.Friedrich Miescher LaboratoryMax Planck SocietyTübingenGermany
  2. 2.Center for Bioinformatics TübingenEberhard-Karls-UniversitätTübingenGermany
  3. 3.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations