Multiple Instance Learning Allows MHC Class II Epitope Predictions Across Alleles

  • Nico Pfeifer
  • Oliver Kohlbacher
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5251)

Abstract

Human adaptive immune response relies on the recognition of short peptides through proteins of the major histocompatibility complex (MHC). MHC class II molecules are responsible for the recognition of antigens external to a cell. Understanding their specificity is an important step in the design of peptide-based vaccines. The high degree of polymorphism in MHC class II makes the prediction of peptides that bind (and then usually cause an immune response) a challenging task. Typically, these predictions rely on machine learning methods, thus a sufficient amount of data points is required. Due to the scarcity of data, currently there are reliable prediction models only for about 7% of all known alleles available.

We show how to transform the problem of MHC class II binding peptide prediction into a well-studied machine learning problem called multiple instance learning. For alleles with sufficient data, we show how to build a well-performing predictor using standard kernels for multiple instance learning. Furthermore, we introduce a new method for training a classifier of an allele without the necessity for binding allele data of the target allele. Instead, we use binding peptide data from other alleles and similarities between the structures of the MHC class II alleles to guide the learning process. This allows for the first time constructing predictors for about two thirds of all known MHC class II alleles. The average performance of these predictors on 14 test alleles is 0.71, measured as area under the ROC curve.

Availability: The methods are integrated into the EpiToolKit framework for which there exists a webserver at http://www.epitoolkit.org/mhciimulti

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Topalian, S.L.: MHC class II restricted tumor antigens and the role of CD4+ T cells in cancer immunotherapy. Curr. Opin. Immunol. 6(5), 741–745 (1994)CrossRefGoogle Scholar
  2. 2.
    Robinson, J., Waller, M.J., Parham, P., Groot, N.d., et al.: IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex. Nucleic Acids Res. 31(1), 311–314 (2003)CrossRefGoogle Scholar
  3. 3.
    Peters, B., Sidney, J., Bourne, P., Bui, H.H., Buus, S., et al.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 3(3), 91 (2005)CrossRefGoogle Scholar
  4. 4.
    Bui, H.H., Sidney, J., Peters, B., Sathiamurthy, M., Asabe, S., et al.: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics 57(5), 304–314 (2005)CrossRefGoogle Scholar
  5. 5.
    Nielsen, M., Lundegaard, C., Lund, O.: Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 8, 238 (2007)CrossRefGoogle Scholar
  6. 6.
    Rammensee, H.G., Friede, T., Stevanović, S.: MHC ligands and peptide motifs: first listing. Immunogenetics 41(4), 178–228 (1995)CrossRefGoogle Scholar
  7. 7.
    Reche, P.A., Glutting, J.P., Zhang, H., Reinherz, E.L.: Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics 56(6), 405–419 (2004)CrossRefGoogle Scholar
  8. 8.
    Singh, H., Raghava, G.P.: ProPred: prediction of HLA-DR binding sites. Bioinformatics 17(12), 1236–1237 (2001)CrossRefGoogle Scholar
  9. 9.
    Sturniolo, T., Bono, E., Ding, J., Raddrizzani, L., Tuereci, O., et al.: Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat. Biotechnol. 17(6), 555–561 (1999)CrossRefGoogle Scholar
  10. 10.
    Nielsen, M., Lundegaard, C., Worning, P., Hvid, C.S., Lamberth, K., Buus, S., Brunak, S., Lund, O.: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics 20(9), 1388–1397 (2004)CrossRefGoogle Scholar
  11. 11.
    Noguchi, H., Kato, R., Hanai, T., Matsubara, Y., Honda, H., Brusic, V., Kobayashi, T.: Hidden markov model-based prediction of antigenic peptides that interact with MHC class II molecules. J. Biosci. Bioeng. 94(3), 264–270 (2002)CrossRefGoogle Scholar
  12. 12.
    Karpenko, O., Shi, J., Dai, Y.: Prediction of MHC class II binders using the ant colony search strategy. Artif. Intell. Med. 35(1-2), 147–156 (2005)CrossRefGoogle Scholar
  13. 13.
    Brusic, V., Rudy, G., Honeyman, G., Hammer, J., Harrison, L.: Prediction of MHC class II-binding peptides using an evolutionary algorithm and artificial neural network. Bioinformatics 14(2), 121–130 (1998)CrossRefGoogle Scholar
  14. 14.
    Guan, P., Doytchinova, I.A., Zygouri, C., Flower, D.R.: MHCPred: A server for quantitative prediction of peptide-MHC binding. Nucleic Acids Res. 31(13), 3621–3624 (2003)CrossRefGoogle Scholar
  15. 15.
    Dönnes, P., Kohlbacher, O.: SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res. 34, 194–197 (Web Server issue) (2006)CrossRefGoogle Scholar
  16. 16.
    Salomon, J., Flower, D.: Predicting class II MHC-peptide binding: a kernel based approach using similarity scores. BMC Bioinformatics 7(1), 501 (2006)CrossRefGoogle Scholar
  17. 17.
    Wan, J., Liu, W., Xu, Q., Ren, Y., Flower, D.R., Li, T.: SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics 7, 463 (2006)CrossRefGoogle Scholar
  18. 18.
    Wang, P., Sidney, J., Dow, C., Mothé, B., Sette, A., Peters, B.: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput. Biol. 4(4), 1000048 (2008)CrossRefGoogle Scholar
  19. 19.
    Zaitlen, N., Reyes-Gomez, M., Heckerman, D., Jojic, N.: Shift-invariant adaptive double threading: Learning MHC II - peptide binding. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 181–195. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    DeLuca, D., Khattab, B., Blasczyk, R.: A modular concept of hla for comprehensive peptide binding prediction. Immunogenetics 59(1), 25–35 (2007)CrossRefGoogle Scholar
  21. 21.
    Jacob, L., Vert, J.P.: Efficient peptide-MHC-I binding prediction for alleles with few known binders. Bioinformatics 24(3), 358–366 (2008)CrossRefGoogle Scholar
  22. 22.
    Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., Røder, G., Peters, B., Sette, A., Lund, O., Buus, S.: NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE 2(8), 796 (2007)CrossRefGoogle Scholar
  23. 23.
    Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Sammut, C., Hoffmann, A.G. (eds.) ICML, pp. 179–186. Morgan Kaufmann, San Francisco (2002)Google Scholar
  24. 24.
    Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1-2), 31–71 (1997)MATHCrossRefGoogle Scholar
  25. 25.
    Schölkopf, B., Smola, A.J., Williamson, R.C., Bartlett, P.L.: New support vector algorithms. Neural Comput. 12(5), 1207–1245 (2000)CrossRefGoogle Scholar
  26. 26.
    Dooly, D.R., Zhang, Q., Goldman, S.A., Amar, R.A.: Multiple-instance learning of real-valued data. J. Machine Learn Res. 3, 651–678 (2002)CrossRefGoogle Scholar
  27. 27.
    Ray, S., Page, D.: Multiple instance regression. In: ICML 2001: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 425–432. Morgan Kaufmann Publishers Inc, San Francisco (2001)Google Scholar
  28. 28.
    Hammer, J., Belunis, C., Bolin, D., Papadopoulos, J., Walsky, R., Higelin, J., Danho, W., Sinigaglia, F., Nagy, Z.A.: High-affinity binding of short peptides to major histocompatibility complex class II molecules by anchor combinations. Proc. Natl. Acad. Sci. USA 91(10), 4456–4460 (1994)CrossRefGoogle Scholar
  29. 29.
    Venkatarajan, M.S., Braun, W.: New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical-chemical properties. Journal of Molecular Modeling 7(12), 445–453 (2001)CrossRefGoogle Scholar
  30. 30.
    Kawashima, S., Ogata, H., Kanehisa, M.: AAindex: Amino acid index database. Nucleic Acids Res. 27(1), 368–369 (1999)CrossRefGoogle Scholar
  31. 31.
    Hertz, T., Yanover, C.: Pepdist: A new framework for protein-peptide binding prediction based on learning peptide distance functions. BMC Bioinformatics 7 (suppl. 1), S3 (2006)CrossRefGoogle Scholar
  32. 32.
    Crooks, G.E., Hon, G., Chandonia, J.M., Brenner, S.E.: WebLogo: a sequence logo generator. Genome Res. 14(6), 1188–1190 (2004)CrossRefGoogle Scholar
  33. 33.
    Li, H., Jiang, T.: A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs. In: RECOMB, pp. 262–271 (2004)Google Scholar
  34. 34.
    Schoenberg, I.J.: Metric spaces and positive definite functions. Trans. Amer. Math. Soc. 44(3), 522–536 (1938)MATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Consogno, G., Manici, S., Facchinetti, V., Bachi, A., Hammer, J., et al.: Identification of immunodominant regions among promiscuous HLA-DR-restricted CD4+ T-cell epitopes on the tumor antigen MAGE-3. Blood 101(3), 1038–1044 (2003)CrossRefGoogle Scholar
  36. 36.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  37. 37.
    Feldhahn, M., Thiel, P., Schuler, M.M., Hillen, N., Stevanović, S., et al.: EpiToolKit–a web server for computational immunomics. Nucleic Acids Res. (2008) (advanced access, doi:10.1093/nar/gkn229)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nico Pfeifer
    • 1
  • Oliver Kohlbacher
    • 1
  1. 1.Division for Simulation of Biological Systems, Center for Bioinformatics TübingenEberhard Karls University TübingenTübingenGermany

Personalised recommendations