The Classification of HLA Supertypes by GRID/CPCA and Hierarchical Clustering Methods

  • Pingping Guan
  • Irini A. Doytchinova
  • Darren R. Flower
Part of the Methods in Molecular Biology™ book series (MIMB, volume 409)


Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.

Key Words

HLA MHC supertype principal component analysis hierarchical clustering GOLPE 


  1. 1.
    Sidney, J, Grey, HM, Kubo, RT, and Sette, A, Practical, biochemical and evolutionary implications of the discovery of HLA class I supermotifs. Immunol Today, 1996. 17(6): 261–6.CrossRefPubMedGoogle Scholar
  2. 2.
    del Guercio, MF, Sidney, J, Hermanson, G, Perez, C, Grey, HM, Kubo, RT, and Sette, A, Binding of a peptide antigen to multiple HLA alleles allows definition of an A2-like supertype. J Immunol, 1995. 154(2): 685–93.PubMedGoogle Scholar
  3. 3.
    Sidney, J, Grey, HM, Southwood, S, Celis, E, Wentworth, PA, del Guercio, MF, Kubo, RT, Chesnut, RW, and Sette, A, Definition of an HLA-A3-like supermotif demonstrates the overlapping peptide-binding repertoires of common HLA molecules. Hum Immunol, 1996. 45(2): 79–93.CrossRefPubMedGoogle Scholar
  4. 4.
    Sidney, J, Southwood, S, Pasquetto, V, and Sette, A, Simultaneous prediction of binding capacity for multiple molecules of the HLA B44 supertype. J Immunol, 2003. 171(11): 5964–74.PubMedGoogle Scholar
  5. 5.
    Sette, A and Sidney, J, Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics, 1999. 50(3–4): 201–12.CrossRefPubMedGoogle Scholar
  6. 6.
    Cruciani, G and Watson, KA, Comparative molecular field analysis using GRID force-field and GOLPE variable selection methods in a study of inhibitors of glycogen phosphorylase b. J Med Chem, 1994. 37(16): 2589–601.CrossRefPubMedGoogle Scholar
  7. 7.
    van der Voet, H and Franke, JP, A discussion of principal component analysis. J Anal Toxicol, 1985. 9(4): 185–8.PubMedGoogle Scholar
  8. 8.
    Inoue, M and Kajiya, F, [Multivariate analysis in computer diagnosis. 3. Principal component analysis]. Iyodenshi To Seitai Kogaku, 1976. 14(1): 52–7.PubMedGoogle Scholar
  9. 9.
    Doytchinova, IA, Guan, P, and Flower, DR, Identifying human MHC supertypes using bioinformatic methods. J Immunol, 2004. 172(7): 4314–23.PubMedGoogle Scholar
  10. 10.
    Pate, ME, Turner, MK, Thornhill, NF, and Titchener-Hooker, NJ, Principal component analysis of nonlinear chromatography. Biotechnol Prog, 2004. 20(1): 215–22.CrossRefPubMedGoogle Scholar
  11. 11.
    Kastenholz, MA, Pastor, M, Cruciani, G, Haaksma, EE, and Fox, T, GRID/CPCA: a new computational tool to design selective ligands. J Med Chem, 2000. 43(16): 3033–44.CrossRefPubMedGoogle Scholar
  12. 12.
    Myshkin, E and Wang, B, Chemometrical classification of ephrin ligands and Eph kinases using GRID/CPCA approach. J Chem Inf Comput Sci, 2003. 43(3): 1004–10.PubMedGoogle Scholar
  13. 13.
    Terp, GE, Cruciani, G, Christensen, IT, and Jorgensen, FS, Structural differences of matrix metalloproteinases with potential implications for inhibitor selectivity examined by the GRID/CPCA approach. J Med Chem, 2002. 45(13): 2675–84.CrossRefPubMedGoogle Scholar
  14. 14.
    Wold, S, Hellberg, S, Lundstedt, T, Sjostrom, M, and Wold, H, Proc. Symp. on PLS Model Building: Theory and Application. 1987, Germany: Frankfurt am Main.Google Scholar
  15. 15.
    Doytchinova, IA and Flower, DR, Toward the quantitative prediction of T-cell epitopes: coMFA and coMSIA studies of peptides with affinity for the class I MHC molecule HLA-A * 0201. J Med Chem, 2001. 44(22): 3572–81.CrossRefPubMedGoogle Scholar
  16. 16.
    Johnson, SC, Hierarchical clustering schemes. Psychometrika, 1967. 32(3): 241–54.CrossRefPubMedGoogle Scholar
  17. 17.
    Guess, MJ and Wilson, SB, Introduction to hierarchical clustering. J Clin Neurophysiol, 2002. 19(2): 144–51.CrossRefPubMedGoogle Scholar
  18. 18.
    Glazko, GV and Mushegian, AR, Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns. Genome Biol, 2004. 5(5): R32.CrossRefPubMedGoogle Scholar
  19. 19.
    Levenstien, MA, Yang, Y, and Ott, J,Statistical significance for hierarchical clustering in genetic association and microarray expression studies. BMC Bioinformatics, 2003. 4(1): 62.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  • Pingping Guan
    • 1
  • Irini A. Doytchinova
    • 2
  • Darren R. Flower
    • 2
  1. 1.Computational Biology GroupJohn Innes CentreUK
  2. 2.The Jenner InstituteUniversity of OxfordUK

Personalised recommendations