A computational study of somatostatin subtype-4 receptor agonist binding

The somatostatin subtype-4 receptor (sst4) is highly expressed in neocortical and hippocampal areas, which are affected by amyloid beta accumulation. Sst4 agonists enhance downstream activity of amyloid beta peptide catabolism through neprilysin and may slow the progression of Alzheimer’s disease (AD). Sst4 is a G protein coupled receptor (GPCR), the structure of which has yet to be resolved. A newly constructed sst4 homology model, along with a previously reported model-built sst4 receptor structure, were used in the present study to gain insights into binding requirements of sst4 agonists employing a set of compounds patented by Boehringer Ingelheim. Besides aiming at delineating binding at the macromolecular level of these recently disclosed compounds, our objectives included the generation of a quantitative structure-activity relationship (QSAR) global model to explore the relationship between chemical structure and affinity. Through the implementation of model building, docking, and QSAR, plausible correlations between structural properties and the binding affinity are established. This study sheds light on understanding binding requirements at the sst4 receptor.


Introduction
Somatostatin (somatotropin release-inhibiting factor, SRIF) is a small cyclic peptide hormone with two active forms, a tetradecapeptide (SRIF-14, see Fig. 1) and an N-terminally extended form (SRIF-28) [1]. A disulfide bond between cysteines at positions 3 and 14 of both SRIFs helps stabilize the structures. Somatostatin is produced by many tissues in the body, principally in the nervous and digestive systems. Specifically, SRIF-14 can be found in the brain, pancreatic islets, stomach, retina, enteric neurons and peripheral nerves, while SRIF-28 is predominantly in intestinal mucosal cells. Somatostatin primarily inhibits endocrine and exocrine secretions, has effects on locomotor, cognition, and autonomic functions in the brain, where it acts as a neurotransmitter and neuromodulator, has direct effects on the thyroid, and regulates cellular differentiation and proliferation [2][3][4].
Somatostatin produces its mechanism of action by binding to high affinity somatostatin receptors. The somatostatin receptor family consists of five subtypes (sst [1][2][3][4][5]. These receptors were cloned in the early 1990s, and are widely distributed in the body such as central nervous system and several peripheral tissues, namely stomach, intestine, and pancreas [5,6]. Sst receptors mediate the inhibitory effects of somatostatin on secretion and proliferation. Of the five subtypes, sst 2 and sst 4 have the highest brain expression (cortex and hippocampus) [7][8][9]. Somatostatin has been shown to regulate neuronal neprilysin activity, which catabolizes beta amyloid (Aβ) [10][11][12][13]. Considering decreased clearance of Aβ is thought to predispose to late onset AD, somatostatin could have therapeutic utility. However, due to somatostatin's poor bioavailability and rapid degradation, efforts have steered clear from using somatostatin or peptide-based therapeutics. While an agonist at sst 2 could have therapeutic utility, sst 2 is also expressed in the pituitary and involved in inhibiting hormone secretion, which would result in adverse side effects. Administration of NNC 26-9100 (see Fig. 1), a non-peptide sst 4 receptor agonist with over 100-fold selectivity versus other sst receptors [14], increased brain neprilysin activity, decreased levels of soluble Aβ42 oligomers, and improved learning and memory behavior in SDAMP8 mice [13,15]. This suggests that agonists at sst 4 may be a promising therapeutic mechanism for the treatment of AD and/or enhancing learning and/or memory.
The sst receptors are of the G i subtype rhodopsin-like GPCRs. GPCRs facilitate signal transduction between cells, hormones, and neurotransmitters [16], with approximately 800 of them found in the human genome [17,18]. Sst 1 and sst 4 belong to one family, while sst 2 , sst 3 , and sst 5 constitute another. It is believed that high sequence similarity implies structural similarity, which in turn parallels pharmacological similarity. The amino acid sequences of human sst 1 and sst 4 receptors are 58% identical and 78% similar, while identity of human sst 4 receptors with sst 2 and sst 3 is 43% and 41% (66% and 67% similarity), respectively [3,5,19]. GPCRs exist in multiple conformational states ranging from the inactive ground state to fully activated states [20,21]. Ligand-free (apo) receptors are not in a fully inactive conformation; however, the ratio of inactive to active states varies depending on the receptor. In particular, those GPCRs that do not bind diffusible ligands most likely will not have an equilibrium between active and inactive conformational states in the absence of a bound ligand [22]. However, upon agonist binding, conformational changes will be initiated leading to stabilization of the activated state of the receptor in support of the induced fit paradigm. In contrast, agonists with low affinity and fast dissociation constants will favor conformational selection [23]. Antagonists on the other hand stabilize the inactive state of the receptor.
Despite recent advances in crystallography, there are still several GPCRs that have yet to be resolved. Thus, homology modeling can be employed to generate model-built structures with varying degrees of accuracy. The foundation of homology modeling lies in the choice of template(s) and the alignment between the template protein and amino acid sequence of the query, the structure of which needs to be generated. In our earlier work [24], we generated an active state homology model of sst 4 using the crystal structure of the nanobody-stabilized β 2 adrenergic receptor (βAR) [25], and were able to validate it with standard techniques, along with a structure-based design strategy. The model corresponds to the activated state due to our interest in sst 4 agonists. Structure-activity relationship (SAR) studies using SRIF indicate that Trp8 and Lys9 are key residues within the binding site and essential for binding [3]. In addition, alanine scanning studies by Lewis et al. demonstrated that whereas Trp8 and Lys9 are necessary for binding to all sst subtypes, Phe6 is specifically important for sst 4 activation [26,27]. Furthermore, it has been reported that aspartic acid in transmembrane helix 3 is essential for binding [28,29], which would be the counterpart of SRIF's Lys9. We performed docking experiments [24] of a series of sst 4 agonists that had been reported in the literature, along with a virtual screening experiment. Some of these derivatives are structurally similar, yet quite diverse in their affinities. However, the receptor can also accommodate substantially diverse ligands. These earlier docking experiments led us to suggest two binding modes that could explain high affinity observed data. In one, involving mostly the ureas and thioureas, the urea nitrogen hydrogens are interacting with Asp90, while the urea carbonyl oxygen hydrogen bonds with His258. Imidazole formed a hydrogen bond with Gln243 and was also predicted to participate in π-π stacking interactions with phenylalanines 239 and 175. The aromatic substituents were buried into the hydrophobic pocket consisting of valines 67 and 259, Ala71, Leu263 and Ile262. The second proposed binding mode included mainly the sulfonamide sst 4 compounds and had the basic amine functionality interact with Asp90. The sulfonamide and amide oxygens were observed to hydrogen bond with His258  [30,31]. In Fig. 2 the general prototypes of these compounds are depicted. Additionally, a new crystal structure of the active human mu-opioid receptor (µOR) bound to agonist (2S,3S,3aR,5aR,6R,11bR,11cS)-3a-methoxy-3,14-dimethyl-2phenyl-2,3,3a,6,7,11c-hexa-hydro-1H-6,11b-(epiminoethano)-3,5a-methano-naphtho[2,1-g]indol-10-ol (BU72) was published (PDB ID: 5C1M) [32]. Because the homology between sst 4 and the mu-opioid is higher than between sst 4 and the βAR, a new homology model of sst 4 based on this crystal structure was constructed in the present study. An integrated computational approach was undertaken encompassing docking experiments of the BI compounds into both model-built sst 4 structures (µOR and βAR-based) for comparative results. We are able to reaffirm the importance of certain amino acids in forming stabilizing complexes with the agonists. Additionally, a consensus QSAR model was developed using the BI compounds and all reported sst 4 agonists to date with molecular weight less than 700. The goal of QSAR was to generate a robust, predictive, and global sst 4 model in order to be able to rank order future synthetic targets.

Methods
From this point onward, the investigated BI compounds will be named MCL, PCT, and EP followed by respective numbers (see Fig. 2). Specifically, the dataset includes 18 MCL, four PCT, and nine EP structures.

Model building
An NCBI protein BLAST-p search (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi) was run to identify the most appropriate template. The sequence similarity, E value, and total score were considered for selection (Table S1). Even though two structures ranked higher than the human mu-opioid receptor (PDB ID: 5C1M), the latter is an active state receptor bound to an agonist, contrary to all others in the list which are bound to antagonists and are thus at a basal level of activation. It should be noted the sequence similarity between sst 4 and 5C1M is 55.3%. Clustal Omega [33] within UniProt [34] was employed to align the sequences of the model and the template 5C1M. The sequence alignment was used as input to Modeler 9.14. One hundred models were generated per run with optimization level set to high. All models were constructed with a high-level loop refinement (three models per structure) at highlevel of optimization. The N-and C-terminal residues that were not aligned with the template were cut, while the discrete optimized protein energy (DOPE) potentials [35] were checked for calculation. Models were evaluated using DOPE, probability density functions (PDF) [35], and QMEAN [36,37] scoring by Swiss-Model [38]. DOPE measures the relative stability of one conformation versus other conformations of the same protein. It derives an atomic distance-dependent statistical potential from a sample of native structures, and is based on a state that corresponds to non-interacting atoms in a sphere, the radius of which depends on a sample native structure. The DOPE energy is not normalized and therefore the absolute score is not meaningful, whereas the relative energies are informative. PDF Total Energy [35] is the sum of the scoring function value of all homology-derived pseudo-energy terms and stereochemical pseudo-energy terms. During model construction, a large number of restraints are generated  4 . From left to right: MCL (3-aza-bicyclo[3.1.0]hexane-6-carboxylic acid amide), PCT (morpholine and 14-oxazepane amides), and EP (aryl and heteroarylfused tetrahydro-1,4-oxazepine amides). A can be a C 1-6 -alkyl or a hydrogen, R 1 and R 2 can be C 1-6 -alkyl, C 3-6 -alkyl, or hydrogens, R 1 or R 2 must be C 1-6 -alkyl or C 3-6 -alkyl, while the second R group can be any of the three proposed substituents. Additionally, C 1-6 -alkyl or C 3-6 -alkyl may be replaced by MeO-or halogen atoms, or alkylene-bridge with up to 2 heteroatoms such as N, O, or S. W is either mono-or a bicyclic moiety consisting of aryl, heteroaryl, heterocyclyl, and cycloalkyl ring structures. Finally, Y can be one of the following: -CH 2 -, -CH 2 CH 2 -, or CH 2 O in order to force a model closer to the structure of the template(s) and to maintain an acceptable geometry. In the end, the value of the objective function (PDF energy) associated with each residue gives an indication of the quality of the model in and around each residue. QMEAN is a composite scoring function that derives absolute quality estimates for the entire structure (global) and individual residues (local) on the basis of one single model. There are two global score values, with one being a linear combination of four statistical potential terms, and the other also evaluating the consistency of structural features with sequence-based predictions. The local scores are linear combinations of the global and local terms on a per residue basis. The QMEAN Z-score provides an estimate of the "degree of nativeness" observed in the model on a global scale.

Model refinement
Protonation states were assigned using the H + + system (http:biophysics.cs.vt.edu) [39][40][41], which computes pK values of ionizable groups and adds missing hydrogens based on the pH of the environment, which was set at 7.4. To identify unfavorable interactions among amino acids, we calculated the energy using Prime 4.1 within Maestro 10.3 (Schrödinger, LLC, New York, NY). With the exception of Pro169, all energies were negative. Finally, the Add Membrane [42][43][44] protocol within Discovery Studio 4.5 (Biovia, San Diego, CA) was employed in order to explore the effect of the lipid bilayer, since sst 4 is an integral membrane protein. The protocol uses a stepwise search algorithm to position the protein structure relative to an implicit membrane in an optimal orientation. Two parallel planes define the membrane boundary, while the nonpolar part of the lipid bilayer is represented as a lowdielectric region between the two planes.

Model assessment and final selection
Assessment of the constructed models was performed using the Verify 3D (Profiles-3D) [45] protocol within Discovery Studio 4.5. Verify 3D relies on the principle that a protein's structure must be compatible with its own sequence. It first reduces the 3D structure to a 1D string of residue environments. These environments are then categorized based on the area of the side-chain that is buried, and the fraction of the side-chain area that is exposed to polar environment. The second step involves calculation of the profile score from the string of residue environments and a precalculated scoring matrix. The scoring matrix is calculated from the probabilities of finding each of the twenty amino acids in environment classes observed in a database of known structures and related sequences. In the case of integral membrane proteins, the effect of a lipid membrane is included in the calculation of Verify 3D scores. PROCHECK [46] in the automated PDBsum [47] workspace was also employed for evaluation of the models. Specifically, PROCHECK assesses the stereochemical quality of the model-built structures by analyzing overall and residue-by-residue geometries, and by checking the ϕ and ψ dihedral angles.

Protein preparation
Protein Preparation Wizard, a function within Maestro 10.3, was employed to prepare the protein for subsequent docking experiments. Bond orders were assigned, and disulfide bonds were created. Hydrogens were assigned at physiological pH, followed by optimization and restrained minimization for heavy atom convergence to RMSD 0.3 Å. The same protocol was also used for the previously constructed active state model of sst 4 [24], since both models are used herein.

Ligand preparation
Compounds were generated in Discovery Studio 4.5, and imported into Maestro 10.3. LigPrep (Schrödinger Release 2015-3: LigPrep, Schrödinger, LLC, New York, NY, 2020) was used to prepare each library. Tautomeric states were set at target pH 7.0 ± 2 using Epik [48,49]. Chiralities (if specified) were retained with at most 32 stereoisomers being checked off and one per ligand low energy conformations selected. Ligands were then selected using Epik's State Penalty. The conformation with the lowest state penalty for each ligand was included in the library for subsequent docking.

Ligand docking
Docking was performed into both model-built sst 4 µORand βAR-based structures, using Glide 6.8 [50,51]. A receptor grid was generated with an inner box (indicating where the ligand center can be) of 14 Å, and an outer box of 30 Å (ligand length). Residues Asp90 and His258 defined the binding pocket of the βAR-based model, whereas Asp90 and Gln243 were used for the µOR-based model, because of their different topologies. MCL and PCT compounds were docked using canonicalization, and only trans conformations were allowed. A scaling factor of 0.8 was set with a partial charge cutoff at 0.15. Six thousand poses per ligand were kept after the initial docking phase, while the best 500 poses for each ligand were kept for energy minimization, with default settings otherwise and 15 resultant poses. Regarding the EPs, the settings were the same as with the PCT and MCL compounds, however, thirty poses per ligand were generated. EP-33 in particular was re-docked with a scaling factor of 0.7, and 40 resultant poses per ligand. Finally, EP-19 and 20 did not result in viable poses, thus different starting conformations were employed. All poses were subsequently refined with MM-GBSA [52], where the ligands were minimized and the receptor remained rigid. Final pose selection was based on visual inspection and the Emodel scoring function [50], along with DG_Bind (binding free energy) [52]. Emodel includes contributions from GlideScore, internal energy, and the Coulomb van der Waals energy; DG_Bind is calculated as E_complex (minimized) − E_ligand (minimized) + E_receptor (minimized)).

QSAR modeling
A statistical analysis of the BI compounds, along with all reported sst 4 agonists to date with molecular weight less than 700 and reliable reported affinity data in humanbased assays (see Table S2 for all structures in the dataset), totaling 147 compounds, was undertaken using Canvas [53]. At first, radial fingerprints [54] were generated for 100 compounds, with atom/bond typing set to 'daylight invariant' , 32-bit precision, no scaling and 'on/ off frequency' to discard bits common to more than 95% of molecules. Direct kernel-based PLS (DKPLS) regression [55] was subsequently selected to generate the model, with pKi being the dependent variable. Using the adaptation of the method by Canvas, before a QSAR model is constructed, a pool of independent x variables is assembled from the union of fingerprint bits that are "on" in at least one compound in the training set. Thus, each x variable represents 'a' chemical fragment found at least once. Kernel non-linearity was set at 0.05, maximum number of KPLS factors were selected to be three, uncertainty on test set predictions was checked with the default value of 10 bootstrapping cycles. Random assignment was 70:30% for training:test sets, while the original seed was 12345. Several models were built by modifying some of the above parameters in order to generate the best model(s). To further validate the models, an additional 47 compounds were screened using a maximum of three KPLS factors with Kernel nonlinearity set to default 0.05, and 10 bootstrapping cycles for uncertainties. In the end, three global, predictive models with two and three KPLS factors were chosen based on the following criteria: test set Q 2 greater than 0.5, Q 2 for the validation set > 0.5, training set R 2 in the range 0.8-0.9, and test set root mean square error (RMSE) < 1.5 × the standard deviation of regression (SD). It should be noted that two of these models were constructed by eliminating two compounds, one at a time. These three global models combined constitute the final 'consensus' QSAR model, where the final predicted Ki for a compound is calculated by taking the average predicted Ki of all three models.

Template selection
Our first aim was to investigate the binding of a new set of compounds in conjunction with compounds used previously to shed light on requirements for binding to sst 4 at the macromolecular level. However, since our last study, several new active state GPCR crystal structures have been released. Thus, we first determined whether these new templates had higher homology to sst 4 than the βAR (PDB ID: 3P0G) using NCBI's BLAST-p search (see Table S1). Given that we intended to dock agonists into the structure, it is imperative that the templates are in the active state. Of all the highly homologous structures on the table, only the µOR receptor was in the active state (PDB ID: 5C1M). Ideally, we would want a template with naturally occurring peptide agonists and crystallized in the active state, because as mentioned in the introduction the natural agonist of sst 4 is a peptide. Delta-opioid receptor (δOR) fits these criteria; however, the crystal structures of δOR (PDB IDs: 4RWA and 4N6H) are bound to antagonists (Table S1). A recently published homology model of sst 4 was based on 4N6H [56]. It should also be noted that several structures appear to have higher homology to sst 4 than the template used in our earlier work (PDB ID: 3P0G), which further reinforced the need to construct a new model using the active µOR receptor structure as a template.

Model generation and refinement
The sequence of sst 4 was aligned to the sequence of the crystal structure of µOR (PDB ID: 5C1M) using Clustal Omega [33] within uniprot.org (see Fig. 3). It can be seen that the highly conserved residues and motifs of GPCRs are aligned. Models were generated using this alignment and loop refinement at high level of optimization. To assess the models, we employed DOPE [35], PDF [35], and QMEAN [36,37] scores in an effort to reduce the number of models for further evaluation. The lower the DOPE score, the better the model out of a set of predicted model structures.
Similarly, a lower PDF total energy indicates a model that is better optimized against the homology restraints. Finally, QMEAN Z-scores around zero indicate good agreement between the model structure and experimental structures of similar size, whereas scores below − 4.0 are representative of low-quality models. Because in our experience, one score is not better than the others to evaluate the quality of a model, we proceeded with the top 13 models for subsequent evaluation.
To select the final model we included the "Add Membrane" [42][43][44] protocol in which the low-dielectric environment of the bilayer is taken into consideration. The top-ranked models were assessed with Verify-Protein (Profiles-3D) [45] and PROCHECK [46]. For integral membrane proteins the membrane is modeled as a planar slab, and a zero solvent accessibility is assigned to the atoms inside the membrane. The Profiles-3D method measures each residue's compatibility. Regions with low scores (a low score is given to a hydrophobic residue on a protein's surface and a polar residue in the protein's core) are indicative of a backbone being incorrectly threaded or suggesting additional structural refinements. Calculations also provide expected scores based on statistical analyses of high-resolution structures from the PDB. In addition, PRO-CHECK [46] was used to check the stereochemical quality of the top ranked model-built structures. Scores for the selected model are shown in Table 1. It can be seen that the model has a Verify score higher than the expected values, while the Ramachandran plots, generated within PRO-CHECK, indicate that no amino acids are in the disallowed regions, therefore no regions with unusual geometry are present.

Comparison of µOR-and βAR-based constructed sst 4 models
In comparing the µOR against the βAR models (see Fig. 4), we noted differences that would potentially impact our docking results. Specifically, while the rotamers for Asp90 and Gln243 seem to be superimposable onto the previous model, both His258 and Trp171 point to opposite directions. Docking experiments were carried out with compounds (see Table 2) for which the Ki values were reported. Poses were selected based on

Comparative docking of MCL compounds into the two model-built sst 4 structures
Our findings point to a consistent binding mode of all MCLs into the βAR-based homology model, with MCL-80 selected as a prototype and depicted in Fig. 5. We observe three hydrogen bonds in the βAR-based sst 4 -MCL complexes: The amide nitrogen interacts with Asp90, the amide carbonyl oxygen forms a hydrogen bond with His258, and the nitrogen of the azabicyclo-hexenyl hydrogen bonds with Gln243. Moreover, Trp171 is interacting with the positively charged nitrogen of the azabicyclohexenyl through a cation-π interaction. This is in agreement with observations made from reported protein crystal structures where among all aromatic residues, Trp is featured more prominently in forming this type of interaction [57]. The azabicyclo-hexenyl is buried deeply into the binding pocket and fits well within the hydrophobic  . Furthermore, MCL-11 is missing the interaction between His258 and the amide carbonyl oxygen; the latter forms a hydrogen bond with the backbone of Trp171 instead. In addition, a halogen bond is predicted between the 5-chlorobenzimidazole and Gln243. With MCL-22 the rest of the interactions are consistent in that the carbonyl oxygen hydrogen bonds with His258, while the azabicyclo-hexenyl nitrogen interacts with Gln243 and also forms a cation-π interaction with Trp171. The hydrophobic and aromatic-aromatic interactions are the same as with MCL-80. Another slight variation was observed with MCL-78 where Asp90 is hydrogen-bonding with azabicyclo-hexenyl, whereas the amide carbonyl oxygen interacts with His258 and Trp171, but no interactions are observed with Gln243. When the same compounds were docked into the µORbased homology model, several of the above key interactions were observed repeatedly. These are contacts with Asp90, Gln243, Phe175, Val67, and Phe239. Specifically, the amide nitrogen and/or carbonyl oxygen hydrogen bonds with Gln243, while the azabicyclo-hexenyl nitrogen forms a hydrogen bond with Asp90. Some variations are observed where hydrogen bonds between both the amide nitrogen and azabicyclo-hexenyl nitrogen are formed with Asp 90, whereas the carbonyl oxygen or the heterocycle (Y in Fig. 2) interact with Gln243. Figure 6 compares complexes of MCL-80, MCL-22, MCL-11, and MCL-78 with the βAR-and µOR-based homology sst4 structures side-by-side. Because Trp171 points to the extracellular domain and His258 has shifted, these two residues no longer make contacts with the ligands. In summary, the above findings into the earlier generated sst 4 model and the one reported herein are in line with previous reports regarding critical for binding residues [26][27][28][29], along with the structure-based strategy developed in our laboratory [24].

Comparative docking of PCT compounds into the two model-built sst 4 structures
With respect to the PCT derivatives bound to the βARbased homology model, hydrogen bonding interactions involve the 1,4-oxazepane functionality, the oxygen of which is interacting with His258, while its amino group hydrogen bonds with Asp90 (data not shown). The isoquinoline is forming hydrophobic interactions with Val259, Ala71, Leu263, and Tyr18, whereas the oxazepane group is in the vicinity of Trp171, Leu87, and Phe239. Because it was not reported which stereoisomer is active for PCT-44 and PCT-47, we set out to delineate differences in their binding modes. In comparing PCT-47 R versus S configurations, there are distinct differences in the Emodel scores. The S isomer has a lower score than the R (-37.6 versus -29.1, respectively), suggesting the (S)-PCT-47 is more favorable than R. When comparing both isomers of PCT-44, they are predicted to form hydrogen bonding of 1,4-oxazepane with His258 and Asp90, and hydrophobic interactions with Tyr265, Leu87, Phe239 (oxazepane) and Val259, Tyr18 and Leu263 (isoquinoline). However, there is a difference in the Emodel score with the S-configuration being more favorable (-37.6 for S versus -36.5 for R). Additionally, both amine hydrogens in the protonated oxazepane of 44R can potentially hydrogen bond with Asp90, contrary to only one hydrogen bond observed for 44S. Similarly, the S configuration of PCT-44 when docked into the µOR-based homology has a more favorable DG_Bind score, as compared to R. Further, both are hydrogen bonding with Asp90 through a bidentate bond between the amide hydrogen and the isoquinoline hydrogen. However, in the S isomer, a hydrogen bond is formed between the amide carbonyl oxygen and Gln243 as well.

Comparative docking of EP compounds into the two model-built sst4 structures
When docking the EP compounds (see Table 2) into the βAR-based homology model, the binding mode is consistent across all high-affinity structures. Specifically, the amide nitrogen forms a hydrogen bond with Asp90 (EP-33), the carbonyl oxygen hydrogen bonds with His258 . Finally, two sets of hydrophobic residues stabilize the sst 4 -EP complex, namely Phe239, Val176 and Tyr240, and Trp171; the second set includes Val67 and Trp76. In regards to µOR-based homology-EP complexes, the amide nitrogen is consistently hydrogen-bonding with Asp90, while the nitrogen-containing heterocycle (Y in Fig. 2) is interacting with Gln243 with the high-affinity compounds. The lowaffinity are lacking the interaction with Gln243.

A global QSAR model and graphical projections onto chemical functionalities
With the number of sst 4 agonists expanding, we are now able to explore plausible correlations between structural properties and Ki for sst 4 using QSAR, thus furthering our ability to expedite drug design efforts by constructing predictive models to rank order future synthetic targets. Towards this goal, the compounds in the present study (see Table 2), along with those reported in the literature to date (Table S2) having less than 700 molecular weight, were used to construct a global, consensus QSAR model. Had we only incorporated the BI compounds, the outcome would not be as applicable to a different chemical space because these structures are not as diverse. The objective is to have a sufficient representation of the chemical space and a minimal number of compounds lacking close analogs. Several models combining DKPLS with the Canvas 2D radial [53] fingerprints were generated (see Table 3 for the models and respective statistics). It should be noted that the Q 2 for all models is higher than 0.7, while typically a Q 2 > 0.5 for the validation set is indicative of a very good model. Our final global sst 4 QSAR model is the consensus of all six models (models 1-3 with two and three KPLS factors) of Table 3. Results for the average Ki and pKi predictions for each compound in the dataset are presented in Table S3, whereas the scatterplot is shown in Fig. 7. It can be seen that the predicted pKi values are very close to the observed. An added advantage in pursuing this work is that it allowed us to project the statistical model onto the chemical structures in order to delineate favorable and unfavorable functionalities. Toward that end, we employed model 1 with three KPLS factors (see Table 3) for projections. Red represents atoms that increase predicted activity, blue are atoms that decrease it, while the color intensity is reflective of the strength of the effect. This is better depicted in Fig. 8 where a visual representation for a set of six compounds is shown. It should be noted that minor modifications in each scaffold can lead to substantial differences in affinity, which is reflected in the color changes. Specifically, within the MCL series, substitution of isoquinoline to quinazoline (MCL-22 versus MCL-138) or addition of a methyl substituent (MCL-148 versus MCL-138) are depicted to decrease activity given the intensity and extent of blue on the nitrogen-containing heterocycle (see While the N-isopropyl acetamide functionality seems to be a steady contributor to the overall affinity, the 3-azabicyclo [3.1.0]-hexenyl is not as strongly impacting the affinity in the moderate and weak MCL compounds (see Fig. 8, top panel, middle and right structures). Similarly, the imidazole is important in CHEMBL54832; however, when it is substituted by a benzene the atomic contributions are neutral (see Fig. 8, middle structure, bottom panel). An additional replacement of the pyridine by another benzene results in further weakening of the activity (see Fig. 8, CHEMBL291474 versus CHEMBL282129). The reason for the observed differences is that features are not independent, since each atom is associated with a host of other fingerprint fragments that collectively signify the statistical relationship between affinity and the presence of that particular feature. Thus, the total atomic contributions of the carbonyl oxygen for the BI compounds allowed us to identify environments which enhance the effect of that atom, and are thus more relevant to SAR. Figure 9 shows three EP compounds, with high, moderate, and low affinity. The difference between EP-33 and EP-6 is the heterocyclic substituent. Even though it would be reasonable to deduce that the replacement of indazole by quinoline is responsible for the reduced activity, when we examine the fragments associated with the carbonyl oxygen, we note that both structures have a fragment extending towards the fused aryl-oxazepine group; however, the contribution of this fragment is much lower in the moderately active EP-6 versus the active EP-33 (see Fig. 9: 0.214 versus 0.944, respectively). In contrast, both fragments are negative for the weak EP-42, which differs from EP-33 by an additional methyl group on the indazole and a heteroaryl-oxazepine group (see Fig. 9). Consequently, atom types are sensitive to the number of heavy atoms connected to each atom so that the ring atoms do not have the same type in these three structures.

Descriptors of the compounds under investigation
Compounds that are potential Alzheimer's therapeutics need to be able to penetrate the blood brain barrier (BBB) in order to exhibit CNS activity. Non-polar drugs cross the BBB by passive diffusion; however, hydrogen bonding significantly affects the CNS uptake profiles of these drugs [58]. In addition, active efflux transporters can potentially 'pump out' drugs into the blood, and in turn influence pharmacological efficacy at a desired dose. P-glycoprotein (Pgp) is one of the better known and most widely studied transporters which can pump a drug back into the blood thus changing its absorption, distribution, and elimination processes [59]. A set of physicochemical descriptors are found to be relevant for CNS drug designing purposes and Pgp efflux modulation with emphasis on more optimal drug-like properties [60,61]. Specifically, cLogP (calculated logarithm of the octanol/water partition coefficient), cLogD (calculated logarithm of the octanol/water distribution coefficient at physiological pH), molecular weight (MW), polar surface area (PSA), number of hydrogen bond donors (HBD) and acceptors (HBA), and pK a , (for the most basic atom) are important. Marketed CNS drugs have on average cLogP close to 2.8 [58,61,62]. Polar surface area is the surface area occupied by nitrogen and oxygen atoms along with their polar hydrogens; median total PSA for CNS drugs is 44.8 Å 2 [61,63,64], with a cutoff for optimal CNS exposure and design less than 60-70 Å 2 [63,64]. The mean molecular weight of all CNS drugs is 305.3 and it is thought that for an ideal agent to cross the BBB, MW should be less than 450 [60,61,[64][65][66]. Moreover, hydrogen bonding is closely associated with PSA and polarity; increased hydrogen bonding potential negatively impacts BBB distribution [58,66,67]. Currently marketed CNS drugs have 2.12 HBA and 1.5 HBD [61,66,68]. Finally, the pK a should be in the range of 4 to 10, whereas limited flexibility is desired [58,69]. CNS drugs are less flexible than other therapeutics in light of the fact that they need to transverse the membrane and thus, the number of rotatable bonds should be taken into account. It is reported that less than 8 rotatable bonds are required for a successful CNS candidate. We calculated the properties for all compounds under investigation (see Table S2), while summarized comparative data of the structures under investigation against CNS drugs is shown in Table 4 (for  Table S4). It can be seen the MW is close to that of CNS marketed drugs (< 350), PSA falls within the acceptable range (< 70 Å 2 ), while hydrophobicity is less than 2.7. Hydrogen bonding does not exceed 4, and the total of hydrogen bonds is less than 8 in each case. Finally, BI compounds are relatively flexible if one notices the number of rotatable bonds is 6 in some cases, which falls within the range of all marketed oral drugs. The latter may prove to be substantial given that cLogP, HBD, and flexibility constitute significant attributes in differentiating CNS-marketed from non-CNS drugs [58,67].

Conclusion
Using a combination of model-building, docking, and QSAR, this study aims at delineating binding requirements at the macromolecular level and developing a strategy that can be used to rank order future synthetic targets. Comparison of the newly constructed active-state sst 4 receptor structure, which is based on a recently solved mu-opioid receptor crystal structure, with our earlier sst 4 model points to two residues, Asp90 and Gln243, as being critical. When the novel patented BI compounds are docked into the β 2 -based sst 4 model-built structure, Asp90, Gln243, His258, and Trp171 are prominent, along with the two hydrophobic clusters consisting of Phe239, Phe175, Val176 and Trp76, valines 67 and 259. Docking of the same compounds into the mu-based sst 4 model, de-emphasizes His258 and Trp171, but maintains all other interactions. At the same time, we generated a comprehensive, consensus QSAR model with sst 4 agonists reported to date that have molecular weight less than 700. The model is based on radial fingerprints and kernel-based partial least squares which can accurately predict experimental pKi values, and thus provide a guide for prioritization in future synthetic efforts. Moreover, we are able to represent which specific features of the compounds in the dataset contribute to affinity by visualizing favorable and unfavorable structural moieties. The findings herein provide an invaluable tool for future prioritization of synthetic efforts for sst 4 agonists.  8 Visual representation for the Canvas DKPLS model 1 built from radial fingerprints for compounds with strong, moderate, and weak affinities (first, second, and third columns, respectively). Top panel shows the atomic effects on affinity from 3-methyl-isoquinoline to 2,8-dimethyl-quinazoline to 2-methyl-quinazoline (left to right). Red and blue denote atoms that increase or decrease affinity, respectively, while the intensity of the color indicates the strength of that effect. Similarly, visualization of atomic effects on affinity by substitution of the imidazole with a benzene (bottom left to middle compound) or replacement of the pyridine by a benzene ring (bottom middle to right) identifies parts of the structures that are more relevant to activity Fig. 9 Three EP structures with different affinities. Green are fragments associated to the carbonyl oxygen atom. Blue denotes atoms that decrease affinity, red indicates atoms that increase affinity. Replacement of indazole (EP-33) by a quinoline (EP-6) results in two fragments that are common, one of which has a markedly reduced contribution (0.214 in EP-6 versus 0.944 in EP-33). Fragments are negative in EP-42, which carries an additional methyl group on the indazole and a heteroaryl-oxazepine group, when compared to EP-33 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.