Original Paper


, Volume 65, Issue 2, pp 97-105

First online:

Open Access This content is freely available online to anyone, anywhere at any time.

Consensus classification of human leukocyte antigen class II proteins

  • Indrajit SahaAffiliated withInterdisciplinary Centre for Mathematical and Computational Modeling, University of WarsawDepartment of Computer Science and Engineering, Jadavpur University
  • , Giovanni MazzoccoAffiliated withInterdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw
  • , Dariusz PlewczynskiAffiliated withInterdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw Email author 


Class II human leukocyte antigens (HLA II) are proteins involved in the human immunological adaptive response by binding and exposing some pre-processed, non-self peptides in the extracellular domain in order to make them recognizable by the CD4+ T lymphocytes. However, the understanding of HLA–peptide binding interaction is a crucial step for designing a peptide-based vaccine because the high rate of polymorphisms in HLA class II molecules creates a big challenge, even though the HLA II proteins can be grouped into supertypes, where members of different class bind a similar pool of peptides. Hence, first we performed the supertype classification of 27 HLA II proteins using their binding affinities and structural-based linear motifs to create a stable group of supertypes. For this purpose, a well-known clustering method was used, and then, a consensus was built to find the stable groups and to show the functional and structural correlation of HLA II proteins. Thus, the overlap of the binding events was measured, confirming a large promiscuity within the HLA II–peptide interactions. Moreover, a very low rate of locus-specific binding events was observed for the HLA-DP genetic locus, suggesting a different binding selectivity of these proteins with respect to HLA-DR and HLA-DQ proteins. Secondly, a predictor based on a support vector machine (SVM) classifier was designed to recognize HLA II-binding peptides. The efficiency of prediction was estimated using precision, recall (sensitivity), specificity, accuracy, F-measure, and area under the ROC curve values of random subsampled dataset in comparison with other supervised classifiers. Also the leave-one-out cross-validation was performed to establish the efficiency of the predictor. The availability of HLA II–peptide interaction dataset, HLA II-binding motifs, high-quality amino acid indices, peptide dataset for SVM training, and MATLAB code of the predictor is available at http://​sysbio.​icm.​edu.​pl/​HLA.


MHC HLA class II Peptide binding T cell epitopes Clustering Machine learning