Ligand design by a combinatorial approach based on modeling and experiment: application to HLA-DR4
- First Online:
- Cite this article as:
- Evensen, E., Joseph-McCarthy, D., Weiss, G.A. et al. J Comput Aided Mol Des (2007) 21: 395. doi:10.1007/s10822-007-9119-x
- 219 Views
Combinatorial synthesis and large scale screening methods are being used increasingly in drug discovery, particularly for finding novel lead compounds. Although these “random” methods sample larger areas of chemical space than traditional synthetic approaches, only a relatively small percentage of all possible compounds are practically accessible. It is therefore helpful to select regions of chemical space that have greater likelihood of yielding useful leads. When three-dimensional structural data are available for the target molecule this can be achieved by applying structure-based computational design methods to focus the combinatorial library. This is advantageous over the standard usage of computational methods to design a small number of specific novel ligands, because here computation is employed as part of the combinatorial design process and so is required only to determine a propensity for binding of certain chemical moieties in regions of the target molecule. This paper describes the application of the Multiple Copy Simultaneous Search (MCSS) method, an active site mapping and de novo structure-based design tool, to design a focused combinatorial library for the class II MHC protein HLA-DR4. Methods for the synthesizing and screening the computationally designed library are presented; evidence is provided to show that binding was achieved. Although the structure of the protein-ligand complex could not be determined, experimental results including cross-exclusion of a known HLA-DR4 peptide ligand (HA) by a compound from the library. Computational model building suggest that at least one of the ligands designed and identified by the methods described binds in a mode similar to that of native peptides.
KeywordsCombinatorial libraryMCSSFragment dockingStructure-based drug designActive site map
Combinatorial synthesis combined with efficient screening procedures provides a powerful method for generating large numbers of compounds and selecting those that have high affinity for a particular target [1, 2]. Practical considerations, however, limit these methods to sampling a relatively small portion (approximately 102 to 106 compounds) of the chemical space (on the order 10200 compounds) of small molecules (molecular weight less than 850 Da) . Recent developments in computational chemistry aid in designing libraries to focus the sampling of chemical space. This approach is particularly suited to finding ligands for a receptor with unknown structure [4, 5]. If the structure of the receptor is known, an alternative approach is to use computed functional group binding preferences to focus the combinatorial library for the specific target [6–9]. Diversity estimates of computed binding preferences can aid in this process by analyzing the distribution of possible library constituents in chemical space and designing libraries to target specific coverage. This corresponds to using the computed interaction maps (as one would use the initial iterations of a library) to narrow experimental sampling to the chemical space which is likely to yield candidate ligands. In this way, targeted libraries can be designed and higher discovery efficiency may be achieved. Subsequent libraries may be used to fill in gaps in the information derived from the first generation results and to probe more finely the most promising portions of chemical space [10–12].
Other fragment-based computational ligand design methods have been described and applied with promising results. These include the GrowMol method which builds ligands complementary to a protein binding site de novo from a manually selected “root” or seed point by combinatorially adding and evaluating atoms and fragments [19, 20]. The CombiSMoG method generates combinatorial libraries in silico and evaluates them using a knowledge-based potential . Another recently reported method is SkelGen  which grows compounds in a target active site de novo. It is difficult to compare the performance of these and other methods because they have been applied to very different biological targets. It should be noted that the targets in the GrowMol, CombiSMoG and SkelGen publications contain more compact binding sites in comparison to the large, open, and relatively promiscuous MHC class II binding site targeted here. MCSS has been applied previously to guide the design of much smaller libraries targeted to picornavirus . Other groups have shown that pharmacophore approaches incorporating active site interaction information derived from protein structures improve combinatorial library design [24–28]. Indeed, MCSS molecular probe maps can be used to derive active site pharmacophore models . Somewhat analogous experimental molecular probe mapping and fragment– based ligand discovery approaches have shown success [30–36].
This paper describes improvements to the MCSS method and its use in conjunction with combinatorial synthesis and screening to identify potential ligands for HLA-DR4, an MHC Class II peptide binding protein. The target is of interest because it is involved in modulating the humoral immune response. Aberrant recognition of self peptide-MHC complexes as foreign can lead to autoimmune pathologies; specifically, individuals possessing the HLA-DR4 allotype are observed to have a greater likelihood of developing rheumatoid arthritis [37, 38]. In addition HLA-DR4 is linked to autoimmune hepatitis .
The MCSS method was used to determine favorable binding positions and orientations with respect to the MHC Class II allotypes HLA-DR3 and HLA-DR4 for 23 small molecule probes representing candidate functionalities for the combinatorial library. The resulting molecular probe maps were examined using clustering, diversity analysis, and computer graphics. The skeleton of the chosen library which includes a novel branched architecture was suggested by the molecular probe map for propane. The propane map showed how to link several regions in the binding site which bound proportionately larger numbers of functional groups. Most of these regions corresponded to known peptide side chain binding pockets, but there was also one pocket that had not been observed previously to be involved in ligand binding. Once the overall architecture of the library had been selected, the specific combinatorial library was developed based on the MCSS derived data, information on the peptide binding preferences of HLA-DR4 and the availability of chemical starting materials. Synthesis and screening of the library showed that a number of ligands generated by the combinatorial library bound to HLA-DR4 under stringent assay conditions. Characterization of these molecules identified in the selection assay, followed by building and refining a model structure suggested how the putative ligand could bind in the peptide binding pocket. Experimental studies demonstrated that peptides compete with the ligand for binding to HLA-DR4. This suggests that the ligand is located in the MHC binding site. Crystals suitable for X-ray analysis, however, could not be obtained to verify the results.
The major histocompatibility complex (MHC) spans approximately 4 × 106 base pairs and contains at least 50 genes . It contains a polymorphic set of genes coding for cell-surface glycoproteins (MHC Class I and Class II) responsible for presenting peptide fragments to T cells as well as genes for transporters associated with antigen processing (TAP1 and TAP2), the 70kD heat shock protein (HSP-70), and other proteins involved in immunity [41, 42]. In the case of class I MHC proteins, found on the surface of all nucleated cells, the peptides originate from endogenously expressed proteins and provide a sample of the cell interior to CD8+ cytotoxic T lymphocytes (CTLs); cells recognized as presenting non-self proteins (e.g., viral proteins) are killed by the CTLs. Class II MHC proteins are present on the surface of antigen presenting cells (APCs) such as B cells and macrophages and bind antigenic peptide fragments from endocytosed proteins. Recognition by CD4+ helper T cells of non-self peptides presented by class II MHCs precipitates a humoral immune response . Aberrant recognition of self peptides can lead to auto-immune disorders [39, 43–49].
The MHC class I and II proteins are encoded by highly variable loci of a limited number of genes in the MHC. Each varying gene is termed an allele, of which there may be more than 100 for some the loci. Different alleles give rise to variable immunocompatibility between individuals and MHC mismatches cause graft rejection . Genetic permutations in the MHC produce a set of polymorphic proteins with different peptide sequence binding preferences. Because each individual has a small number of MHC proteins (six class I and eight class II) in comparison to the number of peptide sequences to be presented (107–108), each protein must be able to bind a large variety of peptides to give sufficient immuno-surveillance [51, 52]. This promiscuous peptide binding implies that many different peptide ligands bind a given MHC protein with a narrow range of affinities .
The MHC proteins, also called human leukocyte antigens (HLAs) in humans, are named by the allotype from which they arise. The most studied disease association is that of HLA-DR1 with type 1 diabetes . Further, HLA-DR3 is associated with susceptibility to the autoimmune disease myasthenia gravis . Similarly, HLA-DR4 is associated with autoimmune hepatitis  and HLA-DR2 with multiple sclerosis [38, 49]. The key role of MHC proteins in modulating immune system activity make them attractive targets for therapeutic agents of autoimmune and inflammatory disorders [72–75].
Other groups have approached the determination of ligands for MHC class II proteins, specifically HLA-DR1, through traditional methods, including peptidomimetic design and structure-based modifications of known ligands. Smith and co-workers substituted a peptidomimetic pyrolinone for a four amino-acid segment in the known thirteen amino acid peptide ligand (virus hemagglutinin (HA) peptide) and reported favorable binding (IC50 = 137 nM in comparison to IC50 = 89 nM for the native ligand). Further attempts to modify the four residue substitution based on molecular modeling results yielded poorer affinity ligands (IC50 > 100 μM) [76, 77]. Woulfe, et al. reported a seven amino acid long peptidomimetic inhibitor that competes favorably with the HA peptide (IC50 = 50 nM compared to IC50 = 62 nM for HA as reported in their work) . Another group found heptapeptide peptidomimetics that they designed based on sequence preferences discovered through phage display and studies of random heptapeptides. Selections of mimetic functionalities were based on molecular modeling. They discovered compounds with IC50s as low as 50 nM . The same group reported designing, based on phage display and X-ray crystallographic data, peptide and peptide-mimetic HLA-DR ligands with significant binding affinity and robust activity inhibiting T-cell activation in vitro . In contrast to these studies, the approach presented in this paper uses computational de novo design tools in conjunction with chemical and computational diversity methods to discover novel ligands for HLA-DR4. Other groups have reported designing ligands for MHC class I proteins [80, 81].
As previously mentioned, an increasingly popular way to identify starting points for drug discovery is combinatorial synthesis and high throughput screening. However, because of the relatively non-specific peptide binding of the MHC class I and II proteins described above, discriminating hits from a naive (non-focused) combinatorial library is challenging. It has proven difficult to use a standard strategy based on a diverse library that covers chemical space to first identify the region that should yield a specific high affinity ligand because the protein is likely to bind a large number of compounds with low affinity. Consequently, the exploratory library is likely to yield an uninformatively large number of hits which do not limit sufficiently the chemical space that needs to be searched. The procedure presented in this study addresses this difficulty. First, an active site map is generated via a computational virtual screen of the binding site using the MCSS method. Clustering, diversity methods, and visualization are employed to augment chemical intuition in analyzing the MCSS results. The information derived from this analysis is used to determine a set of functionalities that when linked to a common molecular framework or backbone architecture generate a collection of compounds (i.e., a combinatorial library) that samples a portion of ligand space that is thought to be more likely to contain compounds that have affinity and specificity for the target. This focused library is then synthesized combinatorially and evaluated experimentally. Even though the computational method does not treat protein flexibility explicitly, the complete approach presented here probes binding site plasticity experimentally.
Materials and methods
Preparation of protein coordinates
Coordinates of the MHC Class II proteins HLA-DR3 and HLA-DR4 (accession codes 1A6A and 2SEB, respectively) were supplied by D. Wiley (personal communication) in Protein Data Bank (PDB) format. The co-crystallized peptide ligands were removed from the structures and the coordinates were translated into CHARMM format coordinate files. Polar hydrogens were added using the HBUILD  facility in the CHARMM program [84, 85]. To facilitate specifying a conservative box shaped binding region (for more efficient MCSS calculations) the coordinates were transformed so that the major axis of the binding region coincides with the Cartesian z-axis. The structures were not minimized prior to MCSS molecular probe mapping. Later minimization of the HLA-DR4 structure for 300 steps of steepest descents showed a minimal (0.25 Å root mean square) difference in all-atom coordinates. Selected maps were rerun and compared and they exhibited no significant differences (unpublished results).
Molecular probe mapping using the method described below was first performed for the HLA-DR3 structure because HLA-DR4 coordinates were not available. HLA-DR4 was chosen as the experimental library target because it was available in sufficient quantity for use in screening and assaying the library, as well as for its pharmaceutical interest. Coordinates for HLA-DR4 became available before the library design was finalized and the molecular probe maps were recalculated. HLA-DR3 and HLA-DR4 are nearly sequence identical in the region for which the ligand was targeted and so very few differences between the molecular probe maps for the two allotypes were expected or found. Structural alignment reveals that the two structures are similar with root mean square difference in backbone atom coordinates of less than 0.6 Å.
The enhanced MCSS method
The MCSS method efficiently finds favorable positions and orientations for molecular probes in the binding site of macromolecules. The original MCSS algorithm  was enhanced and fully reimplemented in the Expect scripting language  which is an extension of the Tcl scripting language ; a version implemented in the C programming language is now available (see below). Version 2.1 of MCSS  was developed and applied for generating molecular probe maps for HLA-DR3 and HLA-DR4.
In MCSS, a large number of copies of individual molecular probes (replicas) are simultaneously energy minimized in the forcefield of a macromolecule according to the Time Dependent Hartree (TDH) approximation  as implemented in the REPLICA facility in CHARMM. In the TDH approximation, the system is divided into two parts i and j, atoms belonging to molecules in i are influenced by the average field of all replicas in j, and vice versa. Replicas in the same partition do not interact. The standard application of the MCSS method is a special case of the TDH approximation in that one partition contains the rigid target macromolecule and the second is comprised of freely moving replicas. Under these conditions, the field influencing the replicas (i.e., that of the target molecule) is constant and explicitly determined; the replicas do not see each other and the target molecule does not move.
The new MCSS script was developed using the Expect language which was designed for automating interactive procedures. With Expect’s ability to parse output from CHARMM and to access Tcl language constructs it was possible to rapidly prototype and implement the flexibility and ease of use desired in MCSS. The novel features of MCSS version 2.1 include: (i) incorporation of a standard version of CHARMM; (ii) simplified creation of new molecular probes; (iii) faster distance-based generation of the initial distribution of molecular probe replicas; (iv) ability to generate molecular probe maps in box shaped regions; (v) user specified minimization protocols, constraints, replication specifications, non-bond conditions, and target macromolecule structure generation; (vi) non-covalently linked molecular probes (e.g., hydrated N-methylacetamide); and (vii) hybrid (i.e., all hydrogen protein and polar hydrogen molecular probe) representation.
The simplified procedure for creating molecular probes was particularly useful for the present project because it allows convenient incorporation of chemical functionalities for specific ligand design studies. The new customizability allows the user to modify MCSS without changing the core source code by altering templates that the MCSS script reads and uses to generate input to the CHARMM program. Because the Expect language provides access to Tcl constructs the user is able to use values read from the input (e.g., the nonbond cutoff) in their input templates; this makes it easier to keep modifications to the default protocols coherent with the rest of the MCSS script. In addition, because Tcl is a well documented, rich, and accepted scripting language, the MCSS protocol may be used as part of a larger script and, through the use of the Tk extensions to Tcl, may be integrated to a cross platform graphical user interface. Recently, the Expect script was translated to a C language program by Accelrys (formerly Molecular Simulations, Inc.) and the code was back-ported and is distributed as a version 2.5 of MCSS; all features described for version 2.1 are available in version 2.5.
Generating molecular probe maps in box shaped regions allows the binding site of some target molecules to be mapped more efficiently than by using spherical regions. This is because spherical mapping regions often include large volumes that are not of interest or not in contact with the target. Since the binding region of MHC proteins is long and narrow, for example, a sphere sufficiently large to encompass the length of the binding site would include a large amount of empty space above the binding groove.
The program RMOL generates initial placements of molecular probes as follows. First the user specified binding region is discretized into a grid. A random gridpoint is chosen and a replica is placed with random position and orientation with its center of mass within half a grid space of the selected gridpoint. The placement is evaluated by checking whether any atom of the replica is within a user specified distance of any atom of the target macromolecule. To accomplish this check quickly, a list of the ten closest target molecule atoms to each gridpoint is computed and stored. To check the distance from the closest target molecule atom to each replica atom, the pair list for the gridpoint closest to the replica atom is looked up and the distance is checked for only the ten atoms in this pair list. In the previous version of MCSS, generating the initial placements and orientations of the molecular probes was time consuming because each candidate placement was evaluated using a pseudo-energy function which necessitated examining a larger number of atoms in the target than the simple distance-based check requires.
Batches of initially placed molecular probe replicas are energy minimized using the CHARMM program. The REPLICA facility in CHARMM is used to generate the copies of the molecular probe in the internal CHARMM protein structure file and to normalize the forces between the protein and replicas. In this study the default minimization protocol for MCSS was used. This involves an initial 800 steps of steepest descents minimization to reduce pathological bad contacts followed by up to 10000 steps of conjugate gradient minimization with culling (gathering) of high energy replicas (the energy assigned to each replica is the sum of its internal energy and its interaction with the target) and duplicate replicas (i.e., replicas that converged to the same position and orientation) every 500 steps. The script decreases geometrically the energy criterion for removing high energy minima after each gather step by computing the average of the current energy criterion and the final energy criterion. Initial and final energy cutoffs of 500 kcal/mol and 3 kcal/mol, respectively, were used. The script monitors the minimization and does no further minimization of the current batch if it has converged (gradient less than 0.01 kcal/Å) or there are no remaining novel replicas (i.e., all replicas have converged to local minima that were found in previous minimization batches). This monitoring allows requesting a large number of minimization steps to ensure that the replicas will find local minima while maintaining high efficiency since no excess minimization is performed. Minimization protocols and convergence criteria can be readily adjusted by editing configuration files that are read by the MCSS script—no modification of the script (v2.1) or source code (v2.5) is required.
Replicas are gathered using a root mean square difference (RMSD) between atoms criterion. Pairs of replicas are deemed redundant if they satisfy two criteria: first, their all atom RMSD is less than a specified cutoff (typically 0.2 Å); second, the RMSD decreases during minimization (otherwise, they are considered diverging and are not clustered). The algorithm is optimized by initially checking the distance between the centers-of-mass of replica pairs and only computing the all atom RMSD and checking for divergence if the centers-of-mass distance is less than the cutoff. If a set of active replicas are converging to a novel local minimum, one replica is kept as a representative and the duplicates are removed. All replicas converging to previously found local minima are removed.
The minimize-gather cycles are repeated until all batches have been minimized. The results are stored in a CHARMM coordinate file that contains the coordinates and interaction energies of the functional group minima. The interaction energies for each replica are calculated using the BLOCK facility in CHARMM . The interaction energy for a replica is the sum of internal energy terms and Coulombic and van der Waals interactions between the replica and the target macromolecule. Energies are normalized by subtracting the ground state in vacuo energy of the molecular probe. In effect, this includes the internal strain caused by the replica adopting a conformation to fit the binding region in the MCSS reported interaction energy. This more accurately rank orders replicas of a given type by accounting for internal strain each replica experiences to form favorable interactions with the protein. The normalization also, to a first approximation, puts sets of minima calculated for different groups (or for the same group using different representations) on the same energy scale so that they can be compared.
By changing the default constraint specification, MCSSv2.1 can be used to study systems with flexible target macromolecules and systems in which parts of the target macromolecule are replicated. In a specific study of the HLA system, five residues of the protein were allowed to move while computing a map to reproduce known binding preferences for the aspartate side chain. In addition, alternate protocols for quenching the molecular probe replica in the binding site such as Monte Carlo annealing (R. Putzer and D. Joseph-McCarthy, personal communication) or molecular dynamics  may be incorporated with small modifications to the script and configuration files. These features are under development. Monte Carlo annealing of molecular probe replica into binding regions can be useful for placing larger probes or ligands since it should allow for more conformational freedom or greater sampling of internal degrees of freedom of the probe. Dynamics has not been used in production runs with a flexible or partially flexible target because the behavior of systems with large numbers of replicas in one partition and a relatively small number in the other exhibit difficult dynamics properties such as unequal distributions of kinetic energy between the partitions. In more focused studies where the ratio of replicas in each partition is close to one, molecular dynamics protocols can be useful for probing phenomena such as induced fit in ligand association . When using MCSS results to bias or focus combinatorial libraries, it is not strictly necessary to treat binding site flexibility because it possible to include elements of varying size and flexibility in the library to experimentally probe protein dynamics. Coupling MCSS with pharmacophore methods may prove to be an efficient way to implement such loose constraints in library design.
Functional group probes for which maps were computed for HLA-DR3 and HLA-DR4. The “side chain” groups represent the amino acid R-groups with Cβ treated as a terminal CH3 group
functional group name
aspartate side chain, acetate ion
arginine side chain, propylguanidinium
asparagine side chain, acetamide
benzene, computed in polar and all hydrogen representations
cysteine side chain, methanethiol
isoleucine side chain, butane
leucine side chain, isobutane
lysine side chain, 1-amino pentane ion
phenylalanine side chain, toluene
threonine side chain, ethanol
vinylogous peptide backbone unit, CH3CONH-CH2CHCHCH2
Clustering molecular probe maps
The molecular probe maps were clustered spatially to facilitate analysis of the results. Clustering was performed using the ART-2’  algorithm as implemented in CHARMM. To compare differing molecular probes for the purpose of assigning them to clusters, the center-of-mass for each molecular probe replica minimized position was computed. The centers-of-mass of all minima for all groups were combined into a single dataset and clustered simultaneously. This was done to produce a map of spatially distinct areas of favorable interaction with the protein, the clusters were later analyzed with respect to the number and the chemical and spatial diversity of the minima contained within them. The goal of clustering the minima in this way was to derive a semi-quantitative picture of where functionalities should be placed with respect to the protein.
The ART-2’ algorithm iteratively classifies elements into clusters. In each iteration, after all molecular probe minima have been assigned to a cluster (as described below), the centroid of each cluster is recalculated as the average of the centers-of-mass of the minima belonging to that cluster. Assigning the minima to clusters (binning) and computing the centroids is repeated until there is no residual error, as defined as any center-of-mass lying further than a given radius from the centroid, and there is no change in cluster membership upon updating the centroid positions. Binning is performed using a neural network to classify each minimum as belonging to one cluster. The result of clustering is a list of molecular probe replica minima that belong to each cluster as well as the coordinates of the center of the cluster, the radius of the cluster, and the number of minima contained in the cluster. Minima were not filtered energetically; that is, all minima computed using the MCSS method (with final energy cutoff of 3 kcal/mol) were used in computing the cluster positions and were assigned to a cluster. A cluster radius of 2.5 Å was chosen empirically to provide a balance between reducing data to a reasonable form and retaining sufficient information. Clusters containing relatively small numbers of minima (less than 50 out of approximately 7000–10000 total minima) were discarded to facilitate the visual analysis and to improve the statistical analysis of the clusters. The discarded low occupancy clusters generally were located outside of the binding region.
Analysis of the clusters
Visual analysis. The clusters were analyzed visually in the context of the proposed framework for the combinatorial library and with respect to the protein structure. Clusters were visualized in VMD  as spheres colored by occupancy (i.e., the number of minima contained in a cluster) and centered at the centroid of each cluster. Clusters corresponding to elements of the combinatorial library were selected by superimposing the cluster spheres with the propane molecular probe map and manually picking the representatives.
Numerical analysis. The clusters were also analyzed quantitatively in terms of the proportion of each functionality present and the corresponding energies of these replicas. After clustering, the molecular probe minima were filtered based on several energy criteria described below. For each molecular probe type in a given cluster, the ratio of the number of minima of that type in the cluster to the total number of minima of that type remaining after energy filtering were used for comparing the relative contribution of each group to the population of each cluster. Ratios were compared to normalize the numbers of groups with respect to the potential energy surface for each group. In other words absolute numbers of minima should not be compared because the character of the energy surface (and accordingly, the number of local minima) varies for each group. If the energy surface for a particular functionality is relatively flat more local minima are found in comparison to a group for which the energy surface is more sharply defined.
Energetic cutoff criteria for filtering various functional group probes. All quantities are in units of kcal· mol−1
Average MCSS energy (DR4)
σ MCSS energy (DR4)
Max/min MCSS energy (DR4)
Average MCSS energy (DR3)
σ MCSS energy (DR3)
Max/min MCSS energy (DR3)
Energy of isolated group
Number of minima retained under various energy cutoff criteria
Total number of minima DR3/DR4
Number of minima retained by cutoff
Experimental ΔGsolv DR3/DR4
Electrostatic ΔGsolv DR3/DR4
Average MCSS energy (DR4)
Average−σ MCSS energy (DR4)
Average MCSS energy (DR3)
Average—σ MCSS energy (DR3)
In designing a combinatorial library it is useful to have an indication of which parts of the molecule should be populated with diverse substituents and which with relatively similar substituents. This information can be derived from MCSS results by computing a score that indicates the spatial and chemical similarity of the minima belonging to the cluster or clusters corresponding to a library position. This is a somewhat different problem than what is usually referred to as similarity scoring because in this case the spatial similarity/diversity is information uniquely available from MCSS functionality maps. It is important to note the MCSS maps provide a collection of likely positions for molecular probes in the binding site of the target. This is a key distinction of the application of chemical diversity in this context: MCSS performs a combinatorial docking; that is, it provides a set of molecular probe positions that could be assembled to create ligand molecules rather than providing information on the placement of complete ligands. Therefore, the distinct positions of the molecular probes are a key element of information to be used in designing combinatorial libraries. Any two-dimensional similarity estimate and most three-dimensional similarity scores lose the spatial information provided by MCSS. This follows from the fact that typically these methods overlap the molecules for maximum similarity prior to scoring them. For whole-molecule comparisons this is appropriate because of the expectation that the molecules would be oriented upon binding to place similar functionalities or pharmacophores in similar positions with respect to the binding site; in other words, molecules should be aligned as closely as possible prior to comparing them . By contrast, when considering molecular probes or fragments of molecules it is important to retain the alternate binding modes represented by each unique molecular probe replica position. There are two reasons for this. First, the unique positions provide a basis for building up molecules and second, aligning the molecular probe replicas essentially destroys the information content available from the three-dimensional structure of the target. The molecular probes have more orientational and positional freedom alone than they would as part of a larger molecule so the precise positions of the probe atoms are likely not to reflect their positions in a bound ligand. Nonetheless the probes’ orientations and positions encode valuable information about the structure and composition of the active site and thus should be preserved and leveraged where possible in analysis.
While sufficient studies have not be performed to completely validate the application of the above formulation for cluster diversity, the proposed measure provides, at the very least, a qualitative measure of the heterogeneity of binding site molecular probe preferences. This information helps complete the picture of the overall character of the chemical space that is more likely to yield hits for the targeted binding region, and may be used to assist in designing a combinatorial library to interrogate the binding site. It should be noted that the numerical trends observed for the diversity measure reflect those seen upon visual and manual inspection of the molecular probe maps.
Based on the MCSS molecular probe maps and experimental observations that tetrapeptides thought to occupy pockets P1 through P4 of MHC Class II proteins inhibit binding of full length peptides , the library was designed to focus on P1 through P4 of the binding groove as well as a region near the break in the α1-helix bounding the binding groove (HLA-DR4 residues S53, F54, and E55) and adjacent to P1 and P2. Initial visual inspection of the MCSS molecular probe maps suggested a framework for the library. Monomers were selected for each site using criteria of similarity to MCSS molecular probes observed at that position, chemical intuition, and availability. In selecting the library composition, it was assumed that the sites were relatively independent; in other words, there was no account for possible interaction between monomers. While additivity of the monomer binding energies is a desired outcome, further computational assessment of the binding energetics of the complete ligands was not performed; thus, no additivity was considered. A six monomer branched library was designed. The details of the library design are described below, in Sect. ‘MCSS maps and design of combinatorial library’. Retrospective quantitative analysis of the maps via clustering, applying energy cutoffs, and computing chemical dissimilarity within clusters in conjunction with knowledge of the sequence binding preferences of the protein were used to validate the methods and verify the architecture and monomer selection for the combinatorial library.
Model building and refinement
The consensus ligand selected by experimentally screening the combinatorial library was modeled into the binding site according to the binding mode hypothesized for the library. The chemical structure of the ligand was built in Quanta using the 2-D builder. This structure was transferred to the 3-D builder in Quanta where initial coordinates were assigned and chirality was examined. Upon exiting the 3-D builder a CHARMM residue topology file for the ligand was written out. In the modeling mode of Quanta, key functionalities of the ligand were manually aligned with their corresponding MCSS minima. The coordinates for the ligand were exported from Quanta in the CHARMM coordinate file and Merck file formats. Partial atomic charges were assigned to the ligand atoms using the Merck Molecular Force Field (MMFF)  with CHARMM. Missing bond, angle, and dihedral parameters for the ligand were taken from the Quanta 4.1 PARM.PRM file or derived from corresponding parameters for chemically similar atom types.
The initial model was refined to rationalize the internal geometry and to optimize the interactions with the protein using molecular dynamics (MD) simulated annealing and molecular mechanics minimization. The model built structure was first minimized (10000 steps Powell minimizer) in the field of a fixed protein. The C26 and C59 atoms of the ligand were also constrained. This minimized structure was annealed using MD (LEAPFROG integrator in CHARMM) starting at 3000 K and cooling in 25 K decrements every 300 dynamics steps to a final temperature of 300K over 16.35 picoseconds using 0.5 fs timesteps. Following annealing, the structure was minimized to a gradient of 0.0001 kcal/Å using the Powell minimizer. The structures were then reannealed using the same annealing and quenching schedule as above but with two different schemes for constraining the ligand. In the first, no constraints were used to hold the ligand near the protein; in the second, the C26 and C59 atoms were harmonically constrained using a mass weighted force constant of 1.0 kcal/mol/Å2 and the ligand was allowed to relax at 300 K. In all cases the portion of the protein within 6.5 Å of the ligand was flexible but subjected to a mass weighted harmonic constraint with force constant of 5.0 kcal/mol Å2. Both annealing/constraint protocols were carried out twenty-five times with varying random number seeds and the final structures with the best energies were selected. The interaction energy between the protein and the ligand ranged from −73.15 to −133.94 kcal/mol.
Split and pool synthesis
Split and pool combinatorial synthesis was used to generate 243040 potential ligands for HLA-DR4 bound individually to solid support (Tentagel resin). The branch position of the library consisted of diamine monomers which could be chemoselectively deprotected by exposure to orthogonal conditions.
Library initialization, deprotection, and split. A linker of Glycine-(6-aminocaproic acid)-β-Alanine was coupled onto TentaGel S NH2 (RAPP, Polymere GmbH, 1.0 g, 0.25 mmol) using an Applied Biosystems Model 431A peptide synthesizer running standard Fmoc-HBTU chemistry. This initialized library was deprotected by twice dissolving the resin in 20% piperidine solution in DMF for 10 min. The library was synthesized using a modified “split and pool” strategy , which allows incorporation of “skip codons” or blank positions resulting in sub libraries [106, 107]. After extensive washing in CH2Cl2 (5 mL × 5), DMF (5 mL × 5), NMP (5 mL × 5), the library was split into 31 portions by suspending the resin in 31 mL DMF and transferring 1 mL aliquots to numerically labeled reaction vessels.
Library tag coupling. Six binary encoding tags were used to encode the first split of 31 monomers. Solutions of each tag were made by first dissolving the tag in DMF (50 μL) and then CH2Cl2 (to a final molarity of 1.6 mM). Solutions of HOBT (100.5 mg, 0.74 mmol, 0.24 M) and 1,3-diisopropylcarboddiimide (DIPC, 117 μL, 0.74 mmol, 0.48 M) were made in DMF and CH2Cl2, respectively. To each of the 31 reaction vessels resulting from the first split, HOBT solution [150 molar equivalent (equiv), 100 μL] was added, followed by a unique combination of tag solutions (0.02 equiv, 100 μL each). After the addition of CH2Cl2 (500 μL), DIPC solution (150 equiv, 50 μL) was added, and the vessel mixed carefully, while being vented. The vessels were shaken overnight, before being washed extensively with NMP (5 mL × 5) and DMF (5 mL × 5). To verify the integrity of the encoding tag, approximately 20% of the vessels were tested by removing a single bead, washing it 10 times in DMF and 10 times in decane, then photolyzing it for 1–2 h and analyzing the tags via gas chromatography.
Library monomer coupling. The vessel encoded with the skip codon was set aside and stored in a desiccator. To each of the remaining vessels, a DMF solution of a specific monomer (500 μL, 16 mmol, 2 equiv) was added, followed by treatment with DMF solutions of HATU (100 μL, 6.8 mg, 18 mmol, 2.2 equiv) and DIEA (100 μL, 2.7 mg, 27 mmol, 3.3 equiv). After shaking for 3 h, the vessels were drained and washed with CH2Cl2 (5 mL × 3), DMF (5 mL × 3), and NMP (5 mL × 3). Coupling reagents (monomer, HATU, DIEA) were re-added to each vessel, and the vessels were shaken overnight to ensure thorough completion of the coupling reaction. The resin was washed again with CH2Cl2 (5 mL × 5), DMF (5 mL × 5), and NMP (5 mL × 5). With exception of the resin in the skip codon vessel, the resin was pooled by being rinsed with DMF into a large polypropylene peptide synthesis tube. The pooled resin was capped two times with a 20% solution of acetic anhydride (Ac2O) in CH2Cl2 for 10 min each. After washing the resin with CH2Cl2 (5 mL × 5), DMF (5 mL × 5), and NMP (5 mL × 5), the resin was deprotected by two treatments with 20% piperidine solution in DMF for 10 min each. The resin was washed extensively with CH2Cl2 (5 mL × 10), DMF (5 mL × 10), and NMP (5 mL × 10), before being recombined with the contents of the skip codon vessel. The library was split again, and the tag and monomer coupling procedure repeated, as described above. At the branch position, the BOC protecting group was removed by treatment two times with a solution of 20% TFA in CH2Cl2 for 30 min. After washing the resin with CH2Cl2 (5 mL × 5), DMF (5 mL × 5), and NMP (5 mL × 5), the resin was neutralized by dissolution in a 20% solution of DIEA in CH2Cl2. Deprotection of the t-butyl ester on the aspartic acid side chain was accomplished by shaking the resin in a solution of 95% TFA in CH2Cl2 for 2 h. After being washed with CH2Cl2 (5 mL × 5), DMF (5 mL × 5), NMP (5 mL × 5), and methanol (5 mL × 5), the resin was neutralized as above.
Screening with antibody based assay
Ligands with affinity for HLA-DR4 were selected using an on-bead enzyme linked immunoassay (ELISA). A portion of the completed, deprotected library (750 mg, estimated to contain three beads bearing each individual compound) was screened in parallel by the following protocol. The 750 mg sample was placed in a polypropylene peptide synthesis vessel and washed five times with water and the screening buffer which consists of 0.5 M NaCl, 75 mM sodium acetate (pH = 4.5) and 1% glycerol. The resin was suspended in 10 mL screening buffer to which HLA-DR4 solution (42 μL, 428.4 pmol, 42 nM) was added. The suspension was held upright in a rotary-mixing incubator at 37 °C for 36 h. Anti-HLA-DR antibody (L-243, 0.6 μg, approximately 4 pmol, 2 nM) and Goat anti-Mouse IgG conjugated to alkaline phosphatase (Gibco BRL, 0.3 μgm 1.5 pmol, 1 nM) were mixed in a buffer of phosphate buffered saline (pH = 7.4), 0.1% Tween 20, and 1% bovine serum albumin (PBS-T + BSA, 1.5 mL), and left at room temperature for one hour. The vessel containing the library was drained, and the resin was washed with PBS-T+BSA (5 mL × 4). The antibody solution was then added to the library resin and the suspension was mixed on a rotator for one hour at room temperature. The vessel containing the library was drained, and the resin was washed with PBS-T + BSA (5 mL × 3). The resin was treated with substrates for alkaline phosphatase [5-bromo-4-chloro-3-indolyl phosphate, p-toluidine salt (BCIP) and nitro-blue tetrazolium (NBT)] in a buffer of 100 mM Tris (pH = 9.5), 100 mM NaCl, and 5 mM MgCl2. This would release dye in the vicinity of and stain beads presenting compounds with affinity for HLA-DR4. After 5 to 20 min, the reaction was quenched by draining the vessel and resuspending the resin four times in 6 M guanidinium hydrochloride (pH = 7.0). The beads were placed in a petri dish containing 0.01% Triton X-100 in water. The darkest stained beads (approximately 70) were physically removed with a 50 μL gastight syringe, and washed with NMP (3 ×), methanol (3 ×), and water (10 ×). These beads were reassayed, as described above, in the absence of HLA-DR4, and false positives (i.e., beads presenting ligands that bind antibody rather than HLA, a total of 4) were removed.
The above procedure was repeated with 7.5 nM concentration of HLA-DR4 in the presence of a competing peptide ligand to be more stringent in selecting beads. The sequence of the resulting hits were determined by reading the binary encoding tags. The frequency of occurrence of monomers at each position in the selected compounds were graphed as a histogram and consensus ligands representing the most frequently represented monomers were selected.
Individual ligand synthesis. The consensus ligands were synthesized on Rink amide methyl benzhydrylamine (MBHA) resin, using manual solid phase synthesis, and conditions identical to those described above for synthesizing the library. All ligands were cleaved from the resin by application of 95% TFA in water for 2.5 h. Ligands were purified by reverse phase-HPLC, running a gradient of 100:0 to 50:50 0.1% TFA/H2O:CH3CN over 30 min. Low-resolution mass spectra (LRMS) for the isolated compounds were obtained on a JEOL AX-505 or JEOL SX-102 mass spectrometer. Data were acquired using fast atom bombardment (FAB). Positive ion mass spectrometry required m-nitrobenzyl alcohol (NBA) as a matrix with NaI added; while negative ion mass spectrometry relied on a matrix of glycerol. Ligand 6A: LRMS (FAB, NBA/NaI) calculated for C45H51F2N7O9 + H+ = 872, found 872. Ligand 6B: LRMS (FAB, negative ion, glycerol) calculated for C36H47F2N7O9 − H+ = 758, found 758.
Characterization of putative ligand
The following assays were used to verify binding by the selected ligands.
On-bead ELISA. Ligands identified during screening were re-synthesized, both as conjugates to Tentagel resin and in un-conjugated form. The resin-conjugated ligands were subjected to the on-bead ELISA described above, and shown to bind HLA-DR4, but not the anti-mouse antibody. This experiment verifies that ligands identified during library screening bind HLA-DR4, when conjugated to a solid support. This on-bead assay was repeated, using HLA-DR4 that was pre-loaded with biotinylated HA peptide. Peptide-loaded HLA-DR4 failed to bind ligand conjugated to resin. This experiment demonstrates that peptide binding and ligand binding to HLA-DR4 are mutually exclusive. This suggests that the selected ligands occupy the peptide binding groove; however, other modes of ligand binding to HLA-DR4, consistent with this data, are also possible.
Further binding studies. 6A, unconjugated to resin, was shown to bind HLA-DR4. First, 6A stabilizes empty HLA-DR4 homodimers during native gel electrophoresis (data not shown). Secondly, 6A forms a complex with HLA-DR4 that can be isolated by gel filtration chromatography. Empty or non-ligand-bound HLA-DR4 is unstable to both electrophoresis and chromatography conditions . These experiments demonstrate that 6A binds to HLA-DR4 to form stable complexes, with properties similar to HLA-DR4 peptide complexes.
Results and discussion
MCSS maps and design of combinatorial library
Summary of clusters of functional group minima for HLA-DR4 that correspond to proposed combinatorial library elements
Number of minima in cluster
Dissimilarity normalized by number of groups
2.625 × 105
1.005 × 106
8.414 × 104
5.934 × 106
1.980 × 105
1.662 × 105
3.368 × 105
Scaffold. The propane molecular probe (PRPN) map served as the basis for designing the scaffold onto which the combinatorial library was built. The PRPN map gives an approximate shape for the library. In fact, PRPN may prove to be a good general purpose group for defining library shape. That is, PRPN may serve as a “groove finding” probe which helps highlight the shape of the binding site into which a ligand could be built. This is true because PRPN is small but has a sufficient number of atoms to have orientational preference and because all PRPN atoms are charge neutral so the probe will not exclude regions because of electrostatic repulsion.
The propane replicas depict a well defined skeleton with tetrahedral geometry near a possible branching position for a library. The map traces out the peptide binding groove as well as a groove near a break in the α1 helix. The resulting chemical library is peptide-like because the chemistry for coupling monomers using a peptide bond is well established and reagents were readily available. Thus, the overall library design results from using the PRPN map to define the approximate shape of the library skeleton and other MCSS groups to define the chemical character of specific parts of the library. Combinatorial chemistry was then used to explore a region of chemical space that fits within the MCSS-derived shape.
The propane map, along with the locations of clusters with relatively high occupancy and the knowledge of established binding pockets defined the positions of the library elements described below.
Diversity Position 1. This position lies between the linker from the bead and peptide pocket one (P1), it is within 6.5 Å of atoms in residues Phe 51, Ala 52, Ser 53, Phe 54, His 260, and Val 264 (F47, A48, S49, F50, H253, V257 in DR3) of the protein. Of these residues, the backbone atoms of DR4 residues F51, S53, F54 would contact a group placed in this position; side chain atoms of A52, H260, and V264 are near this position. In the HLA-DR4 active site map, cluster 12 (clusters 48 and 112 for the HLA-DR3 map) is closest to this position. Cluster 12 from the DR4 map contains hydrophobic, polar, and negatively charged probes. This preference is mirrored in cluster 48 from the DR3 map; cluster 112 has a smaller population of probes and shows a stronger preference for polar probes.
Specificity Position 1. This position corresponds to P1. Both the DR3 and DR4 allotypes of HLA have a preference for phenylalanine at this position [38, 58, 71]. The entrance to this pocket is occupied by cluster 10 in HLA-DR4 (82 in HLA-DR3) and the pocket itself by cluster 6 in DR4 (45 in HLA-DR3) maps. DR4 residues within 6.5 Å of cluster 6 are Phe 24, Ile 31, Phe 32, Trp 43, Ala 52, Ser 53, Phe 54, His 260, Asn 261, Tyr 262, Val 264, Gly 265, and Phe 268 (F20, I27, F28, W39, A48, S49, F50, N254, V257, V258, F261 in HLA-DR3). The side chains of all of these except residues 53, 260, 262, and 265 (HLA-DR4 numbering) potentially contact ligand atoms in this pocket. They create a predominantly hydrophobic pocket. The contents of cluster 6 in DR4 (45 in HLA-DR3) show an enriched population of hydrophobic and aromatic probes (2BTN, ILER, PHER) as well as the hydrophobic tails of the ARGR probe in the pocket. These results agree with the observed preference for hydrophobic side chains in peptides that bind HLA-DR4 (e.g., [37, 58]).
Branch Position. This position originates near Cα of Arg 369 of the CLIP peptide in the complex with HLA-DR3 and leads out toward Diversity Position 2 described below. The side chain of Phe 54 in HLA-DR4 (F50 in HLA-DR3) forms a hydrophobic patch connecting the peptide binding groove to Diversity Position 2. The Branch Position lies above and between P1 and P2. In the HLA-DR4 functional group maps the centroid of cluster 9 lies at the branch point, in the HLA-DR3 maps, cluster 52 occupies the branch point and 104 occupies the region leading out of the binding groove. In HLA-DR4, residues Gln 9, Phe 24, Phe 54, Thr 256, Tyr 257, His 260, and Asn 261 are within 6.5 Å of the centroid of cluster 9. (In HLA-DR3, residues Q5, F20, F50, E51, N249, Y250, H253, and N254 are within 6.5 Å of the centroid of cluster 52 and residues N249, Y250, H253, and N254 are within 6.5 Å of the centroid of cluster 104.) Cluster 9 for HLA-DR4 (cluster 52 for HLA-DR3) shows a fairly even population of molecular probes with a preference for aliphatic moieties (ILER and PRPN probes).
Diversity Position 3. Diversity Position 3 roughly occupies P2 and overlaps, to a degree, with the sidechain of Met 370 of the CLIP peptide in the HLA-DR3 co-crystal structure. Cluster 49 in HLA-DR4 (60 in HLA-DR3) occupies the substituent pocket for this position; cluster 11 (181 in HLA-DR3) lies at the backbone position for this part of the library. In HLA-DR4, residues Phe 54, Glu 55, Gln 57, Gly 58, Ala 59, and Asn 62 are within 6.5 Å of the centroid of cluster 49 (in HLA-DR3, F50, E51, Q53, G54, and A55 are within 6.5 Å of the centroid of cluster 60). DR4 residues Gln 9, Phe 22, Phe 24, Phe 54, Gly 58, ala 59, Asn 62, Tyr 257, and Asn 261 are with 6.5 Å of the centroid of cluster 11 (in DR3, Q5, E7, F18, F20, F50, N58, S185, R246, Y250, and N254 within 6.5 Å of the centroid of cluster 181).
Overall. In total the combinatorial library contained 243040 compounds. These compounds were synthesized on solid phase (see Methods), with three resin beads displaying each library member.
Identity of ligands
Proposed structure and binding mode
Chemical space is a vast resource for pharmaceutical discovery. Combinatorial synthesis methods enable an increased rate of sampling of this space. Nevertheless, large-scale random or uninformed exploration of chemical space remains an inefficient approach to identifying leads for pharmaceutical development. The coupling of computational structure-based ligand design methods, combinatorial synthesis, and large scale screening presented in this paper illustrate a procedure for efficiently identifying regions of chemical space that yield compounds with affinity and specificity for a given target molecule.
By using the MCSS method to compute the functional group binding preferences for a target with known structure, it is possible to generate rapidly hypotheses about the types of compounds that are likely to bind to the target. As a de novo ligand design method, MCSS is a valuable tool because it not only aids chemical intuition but also inspires novel ideas in formulating a scaffold and selecting substituents with which to populate the combinatorial library. In the case of the HLA-DR4 ligands described in this paper, the MCSS results led directly to the creation of the branched library architecture. Also, the clustering and diversity methods described assisted in clarifying which positions should contain diverse substituents and which should be conservatively populated.
One of the limitations of computational ligand design is the difficulty in determining the binding affinity of ligands which makes it hard to discriminate promising compounds. Because combinatorial synthesis and large-scale screening methods enable the evaluation of a large number of compounds they complement the computational methodology; that is, the computational method is able to narrow the search for promising ligands to a particular neighborhood of chemical space and the experimental methods are able to search that neighborhood more efficiently and accurately than available computational methods.
The results suggest that the mixed computational-experimental approach presented here works in practice. In the first synthetic combinatorial library several ligands were identified that compete favorably with known peptide ligands for binding to HLA-DR4. While direct crystallographic verification of the designed ligand complexed with the protein was not possible, the peptide binding exclusion results indicate that the designed ligand occupies the peptide binding groove of HLA-DR4.
Designing a strong binding ligand is an iterative process in which a lead is identified and hypotheses for improving binding and specificity are generated and tested. In the hybrid combinatorial approach the computational results were used to formulate a first generation library that was focused for the HLA-DR4 binding site. The calculations replaced the experimental screen of an exploratory library which would have been difficult given the promiscuous ligand binding characteristics of the target. Experimental methods were used to test the proposed library and computational analyses rationalized the experimental data. In addition, the computational models provide a framework for using the experimental results to guide refinement of the library to optimize binding; that is, to more finely explore the chemical space centered around the identified lead molecules. Even without crystallographic data, model building results provide a context for using structure-based design methods to suggest refinements in the library such as rigidifying the framework to fix the favored library elements in their positions.