User-centric design of a 3D search interface for protein-ligand complexes

Diedrich, Konrad; Ehrt, Christiane; Graef, Joel; Poppinga, Martin; Ritter, Norbert; Rarey, Matthias

doi:10.1007/s10822-024-00563-3

User-centric design of a 3D search interface for protein-ligand complexes

Open access
Published: 30 May 2024

Volume 38, article number 23, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

User-centric design of a 3D search interface for protein-ligand complexes

Download PDF

1113 Accesses
1 Altmetric
Explore all metrics

Abstract

In this work, we present the frontend of GeoMine and showcase its application, focusing on the new features of its latest version. GeoMine is a search engine for ligand-bound and predicted empty binding sites in the Protein Data Bank. In addition to its basic text-based search functionalities, GeoMine offers a geometric query type for searching binding sites with a specific relative spatial arrangement of chemical features such as heavy atoms and intermolecular interactions. In contrast to a text search that requires simple and easy-to-formulate user input, a 3D input is more complex, and its specification can be challenging for users. GeoMine’s new version aims to address this issue from the graphical user interface perspective by introducing an additional visualization concept and a new query template type. In its latest version, GeoMine extends its query-building capabilities primarily through input formulation in 2D. The 2D editor is fully synchronized with GeoMine’s 3D editor and provides the same functionality. It enables template-free query generation and template-based query selection directly in 2D pose diagrams. In addition, the query generation with the 3D editor now supports predicted empty binding sites for AlphaFold structures as query templates. GeoMine is freely accessible on the ProteinsPlus web server (https://proteins.plus).

PoseEdit: enhanced ligand binding mode communication by interactive 2D diagrams

Article Open access 29 July 2023

HOMCOS: an updated server to search and model complex 3D structures

Article Open access 13 August 2016

GSP4PDB: a web tool to visualize, search and explore protein-ligand structural patterns

Article Open access 11 March 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

A large number of experimentally determined three-dimensional (3D) structures of biological macromolecules are publicly available thanks to the substantial growth of the Protein Data Bank (PDB) [1] and are easily accessible through its web service. This wealth of data is a fundamental scientific resource for understanding macromolecule-ligand interactions and their functional impact. However, to fully exploit this data resource, search engines have to go beyond basic querying on a textual level and enable direct searching of the most central part of the data, the 3D structures themselves. The capability to retrieve all structures with a similar relative spatial arrangement of chemical features like atoms, functional groups, or intermolecular interactions from the PDB can support numerous applications in life science research. For example, searching a query that covers a ligand binding mode within a binding site may result in potential off-target binding sites with similarly interacting ligands, thereby explaining side effects, mining for interaction geometries [2], searching for residue motifs [3], and assisting drug repurposing [4].

In addition to the web service of the PDB itself [5], several tools have been developed that enable specific types of spatial queries for the PDB: CSD-CrossMiner [6], PRDB [7], PROLIX [8], Relibase and Relibase+ [9], PDBeMotif [10], PELIKAN [11], GSP4PDB [12], GeoMine [13, 14], and nAPOLI [15]. In addition, some commercial and unpublished software applications such as Proasis4 [16] and 3decision [17] offer similar search capabilities. Of the published tools, PRDB, PROLIX, Relibase, Relibase+, PDBeMotif, and nAPOLI are no longer available. CSD-CrossMiner and PELIKAN are desktop applications, while GSP4PDB and GeoMine are accessible on the web. GSP4PDB and GeoMine are freely available. CrossMiner and PELIKAN require a commercial or academic license, respectively. The tools differ significantly regarding the supported query content and what regions of the structures in the PDB are searchable. For example, while the PDB web service allows searching the relative spatial arrangement of α- and β-carbon atoms of specific residues in complete protein structures, PELIKAN permits a query that describes a relative spatial arrangement of arbitrary user-specified heavy atoms and intermolecular interactions to screen ligand-bound binding sites defined by a radius of 6.5 Å of the ligand’s heavy atoms. For a comprehensive overview of the query differences and technical aspects of the different tools, like the underlying data storage approach, see [13].

Due to the multidimensional nature of the data and the varying complexity of the supported 3D query, spatial searches are highly challenging, not only from the developer’s point of view but also from the user’s perspective. In contrast to the simple text-based user input of keywords, scalar values, sequences, or even substructures, specifying relative spatial arrangements of chemical features is a complex task. In general, drawing with a graphical editor substantially simplifies query generation, in contrast to defining the query purely textually.

A two-dimensional (2D) or 3D editor provides a more intuitive interface for placing and specifying chemical features and their geometric constraints. Additionally, both editor types already give life scientists a familiar environment for visualizing chemical structures. Generating queries with such an editor can be further simplified by visualizing a structure of interest as a template, in which the user can select the arrangement of chemical features to search for. Nevertheless, an additional textual specification of the 3D query as a manual post-processing step is useful for adapting its chemical and spatial precision to individual needs.

The PDB web service, CSD-CrossMiner, GeoMine, and PELIKAN provide a 3D editor. A template structure for query design can be used in all tools. While a query can be designed anywhere in a loaded 3D representation of a PDB entry using the PDB web service, the query options in CSD-CrossMiner, GeoMine, and PELIKAN are limited to corresponding 3D-visualized binding sites as structural templates. Query generation from scratch is possible with CSD-CrossMiner and GeoMine via the 3D editor and PELIKAN via a textual and tabular representation. PROLIX enables a purely textual approach. All other tools offer a 2D editor for template-free query generation.

Considering a 2D and 3D editor in comparison, both visualization concepts have advantages and disadvantages for generating spatial queries. A 3D environment is a natural choice because it provides precise spatial information. However, the drawback of a 3D editor is that its usage requires practice and time, especially when using a template for query selection. Due to the high amount of visualized structural information, the chemical features of interest might be visually buried inside the structure and must, therefore, be focused on by users by zooming, translating, and rotating the scene extensively. Therefore, query generation can still be challenging, even though a 3D visualization provides all required information.

In contrast, a 2D environment provides only distorted spatial information due to the dimensionality reduction. Furthermore, converting a 3D template structure into a planar representation prevents the visualization of the entire structure due to consequential structural overlaps causing suboptimal 2D layout quality. Therefore, 2D visualization requires a reduction in the amount of visualized structural information. Even though a 2D visualization provides less information than a 3D visualization, it visualizes and highlights only the most relevant chemical information a user might want to search for. Furthermore, 2D visualization offers chemical structure representation as structure diagrams that are very familiar to scientists. A 2D visualization permits an instant overview of the most relevant selectable chemical features, simplifying query generation.

In this article, we will introduce the latest version of GeoMine. First, we will provide a user-focused overview of GeoMine, including its new features: the 2D query editor and the 3D template type based on the artificial intelligence-predicted AlphaFold structures [18] that are retrieved from the corresponding database at https://alphafold.ebi.ac.uk. We will then present the 2D editor in more detail and showcase the application of the new features of the latest tool version, which exploits all the above-mentioned query generation approaches to design a graphical user interface with the highest usability possible for spatial searching within known and predicted binding sites.

Methods

Features overview

The key features of the most recent release of GeoMine are summarized in the list below. Subsequently, some of these points are illustrated in detail, including the integration of the new features of GeoMine, the 2D query editor, and the AlphaFold-based 3D template type:

A graphical user interface that is freely accessible via the ProteinsPlus [19,20,21] web server (https://proteins.plus).
A fast and precise search functionality that enables 3D querying of ligand-bound and predicted empty binding sites of protein or nucleic acid structures in the entire PDB. In the new GeoMine version, binding sites are predicted by DoGSite3 [22]. The binding sites are post-processed by Protoss [23, 24] to calculate the presence and coordinates of polar hydrogen atoms.
On-the-fly loading of ligand-bound and predicted empty binding sites as query templates created from a PDB structure, an AlphaFold structure, or an uploaded custom structure file in PDB format.
An interactive and user-friendly query generation process in a 2D and 3D editor that allows synchronized query selection in a ligand-bound or predicted empty 3D template binding site and 2D ligand interaction diagram, respectively, as well as its generation from scratch.
A large number of selectable chemical features that include all buried and solvent-exposed heavy atoms of all ligands (e.g., solvent molecules, cofactors, small molecules), simple ions like metal ions, and biomolecular residues (amino acid and nucleic acid residues) in a binding site, as well as visualized aromatic ring centers, secondary structure elements, and hydrogen bond, pi-stacking, cation-pi, metal, and ionic interactions. GeoMine allows combining all of these chemical features into a single complex 3D query.
The placement of hypothetical chemical features in 2D and 3D space for template-free query generation.
A comprehensive specification of the spatial relationships between chemical features through geometric constraints, which include orientations, distance ranges, and angle ranges.
A simple verification and arbitrarily precise specification of the query due to its additional representation in tables, which show various properties of the chemical features and geometric constraints that can be adjusted in detail or kept more generic. For example, users can specify whether a selected atom of the polar residue serine matches only serine residues or all polar residues or residues of any type and class.
The automatic loading of the primary or even all properties of a chemical feature into the query table by chemical feature selection.
A clear visual correspondence of the query visualized in the 2D and 3D editors and the tables achieved by synchronized mouse-over highlighting and individual coloring for the chemical features and geometric constraints of the query.
A user-specified ulterior restriction of the search in the PDB by an optional list of PDB identifiers and by the inclusion or exclusion of results based on 53 additional textual and numerical filter criteria, such as the source organism, the protein class, or the root-mean-square deviation (RMSD) between the match and query points.
An iterative search process of query editing and subsequent searching in already-detected results enabled by a refinement functionality and results history.
The download and upload of a GeoMine query in JavaScript Object Notation (JSON) file format for sharing, archiving, reusing, and later editing.
A comprehensive presentation and comparative analysis of the resulting binding sites by a table with information about the 150 best results and the visualization of these in the 3D editor together with the 3D template binding site as superimpositions of the template and matching binding sites with various 3D visualization options. The ranking and superimpositions of results are based on the RMSD between the chemical point features of the query (atoms, aromatic ring centers, secondary structure elements) and the corresponding ones of the matches.
The download of the table content in JSON or comma-separated values (CSV) format, of the superposed binding sites of the 150 best results in PDB format, and of a file that contains the statistics for all matches.

New features integration

The user can specify a template structure for query generation on the ProteinsPlus landing page (Fig. 1) in several ways. Besides the specification of a Protein Data Bank structure by its 4-letter identifier (Fig. 1a) or a custom structure by a file in PDB format (Fig. 1b), the user can now directly access predicted structures in the AlphaFold database through their UniProt accession numbers. Additional ligands can be uploaded in Structural Data File (SDF) format (Fig. 1c) for the specified template structure. The linked advanced search functionality (Fig. 1d) allows the user to query the Protein Data Bank (Fig. 2a) and AlphaFold database (Fig. 2b) by keywords to search for potential input structures (Fig. 2c).

After input confirmation on the ProteinsPlus landing page (Fig. 1e), the user is forwarded to the ProteinsPlus main page (Fig. 3), which consists of three scrollable sections. The user can select GeoMine from the tool list in the right section to access the tool-specific graphical user interface components, including the new 2D query interface. The central section provides two scrollable lists: the Pockets and Ligands lists (Fig. 3b). The Ligands list contains information about all ions and small molecules of the input structure. The Pockets list provides information about on-the-fly calculated ligand-bound and DoGSite3-predicted empty binding sites. Ligand-bound binding sites are predicted with a so-called “ligand-bias” option, i.e., the solvent grids are biased by the buried fragments of the ligand to enforce these parts to be included in the predicted sites (ligand-biased predicted sites, see [22] for details). A ligand might not be contained in any DoGSite3-predicted binding site, i.e., less than 20% of its heavy atoms lie in the pocket. In this case, a ligand radius-based binding site is created instead, including the ligand and all residues, other small molecules, and simple ions within a radius of 6.5 Å of the ligand’s heavy atoms. In the case of AlphaFold-based input, only predicted empty binding sites are available, as those structures do not contain ligands. A 2D ligand interaction diagram created with PoseEdit [25] and PoseView [26,27,28] as a template for query selection can be loaded for a user-specified ligand from the Ligands list into the 2D editor on the right (Fig. 3c). The corresponding ligand-bound 3D binding site from the Pockets list is then automatically visualized in the 3D editor on the left (Fig. 3a).

Figure 4 provides a detailed view of the 2D ligand interaction diagram content, the 2D editor functionality, and the supported components of the query. A 2D ligand interaction diagram (Fig. 4b) shows an excerpt of the corresponding ligand-bound 3D binding site. The selectable chemical feature types are the same as in the 3D binding sites, but the visualized content is restricted to a specific ligand and directly interacting metals and macromolecular residues. Hydrophobic contacts with residues are not visualized in atomic detail but are indicated by green splines labeled by the corresponding residue identifiers. It is not possible to generate 2D diagrams for predicted empty binding sites. The substantial quantity of solvent-exposed residues in such a binding site cannot be effectively limited, as it is difficult to automatically specify which residues might be more important than others for query selection. A 2D diagram that displays all binding site residues is overly crowded and does not provide any chemical reference point to the user on what to select, rendering the query formulation in 2D space an ineffective alternative. In contrast, a query selection in a predicted empty 3D binding site is more feasible since residues are distinguishable on a spatial level. For example, a user might want to select specific solvent-exposed atoms of nearby residues surrounding a distinct subsection of the binding site. However, for ligand-bound 3D binding sites, it is possible to highlight the ligand, its interaction partners, and the intermolecular interactions in a 2D diagram. This focus increases the clarity of 2D diagrams while providing chemical information useful for query selection even without spatial information.

In addition to the input specification via the Ligands list, users can upload a diagram file in JSON format (Fig. 4d). This upload functionality is particularly useful when users want to improve the automatically generated 2D layout for query selection. With the 2D diagram editing tool PoseEdit, which is also accessible on ProteinsPlus, the user can load and visualize the same 2D diagram to manually rearrange its content for resolving graphical issues like overlapping residues or intersecting intermolecular interactions. The optimized 2D diagram can be downloaded from PoseEdit as a JSON file and can then be uploaded into the 2D editor of GeoMine.

The 2D editor has the same query-building functionality as the 3D editor. Furthermore, the 2D editor is synchronized with the 3D editor and the query tables regarding query generation, visualization, mouse-over highlighting, and coloring. This synchronization allows the simultaneous usage of all query input types in a complementary manner. The query consists of chemical features and geometric constraints that can be added without a template or selected in a template via several user modes (Fig. 4a). A legend below the 2D drawing area (Fig. 4c) explains the precalculated chemical features.

In the Point mode, the user can select so-called points, i.e., heavy atoms, aromatic ring centers, and secondary structure elements, represented by α-carbon atoms of central or terminal protein residues in helices and strands. Solvent-exposed heavy atoms are highlighted by big colored spheres. Like in the 3D editor, hypothetical points can be placed and moved in 2D space. They are automatically placed in the center of the ligand-bound 3D binding site that corresponds to the 2D ligand interaction diagram. The relative position of a hypothetical point can be adjusted via the 3D editor and by distance ranges. Intermolecular interactions are visualized by colored dashed lines and can be selected in the Interaction mode. It is also possible to specify a hypothetical intermolecular interaction between two points in that mode. Any two points can be connected by a distance range in the Distance mode. Lastly, angle ranges can be placed between connected distance pairs and interactions in the Angle mode.

The corresponding tables in the scrollable section below the 2D editor list defined points, distances, interactions, and angles (Fig. 3d). The tables allow further verification and modification of their properties, for example, the residue an atom belongs to or the tolerance value of a distance range. The user can specify that all properties of a selected chemical feature are automatically recognized and set in its corresponding table entry after selection by enabling the checkbox next to the list of modes. Otherwise, only its main properties are set automatically, i.e., the element for atoms and the molecule type for atoms, aromatic ring centers, and secondary structure elements. For a screen recording video demonstrating how to apply the user modes, see Online Resource 1. A 2D diagram of the inhibitor with the internal ProteinsPlus ID 4SP_A_1298 interacting with a cyclin-dependent kinase (PDB code: 1H1S) is shown in the video to exemplify query generation with the 2D editor.

Technical implementation details

The graphical user interface is primarily implemented with HTML, Vanilla JavaScript, and the Bootstrap 3 library (https://getbootstrap.com). Several JavaScript libraries were used to integrate specific frontend components. The 3D viewer uses the NGL library [30, 31] (https://nglviewer.org). The query tables employ the DataTables library (https://datatables.net). The 2D editor is based on the InteractionDrawer JavaScript library (https://github.com/rareylab/InteractionDrawer), which draws interactive 2D ligand interaction diagrams in Scalable Vector Graphics (SVG) format. The web server’s backend is implemented using the Ruby on Rails framework (https://rubyonrails.org) and a MySQL database (https://www.mysql.com).

GeoMine’s searches are performed on a server using a PostgreSQL (https://www.postgresql.org) database, 200 GB of main memory, up to 30 cores of a 2x Intel Xeon Gold 6248 processor (2.5 GHz), and a Dell 1.6 TB NVMe HHHL AIC PM1725b solid-state drive with an XFS file system.

Application

Binding site function prediction and off-target analyses for methyltransferases in Leishmania

In our case study, we want to illustrate how GeoMine can be used to analyze AlphaFold models and assist in suggesting ligands and their binding modes for a predicted protein structure of interest. The resulting complexes can subsequently be used to assess the uniqueness of the 3D arrangement of ligand-interacting binding site atoms using 2D query design. Here, we want to focus on neglected tropical diseases threatening millions worldwide [32]. Their treatment is restricted to a few medications that often harbor severe side effects [33]. Causative agents for these diseases are, among others, parasites of the genus Leishmania. The search for potential therapeutic agents became the focus of academic infection research, which identified several pharmaceutically promising targets [34]. Understanding their structure and function is crucial for future early-phase drug discovery and development.

The protein of interest in this case study is an enzyme called sterol 24-C methyltransferase (SMT) in Leishmania species. The enzyme uses S-adenosyl methionine (SAM) as a cosubstrate and catalyzes the C-C bond formation between a methyl group and the C24 of zymosterol to form ergosterol [35], the major sterol component of these parasites. Several substrate-based inhibitors of the enzyme from Leishmania amazonensis are known [36] and a recent computational study aimed to design novel inhibitors [37]. The authors focused their analyses on the zymosterol binding site of the protein to find novel inhibitors. In contrast, we wondered whether the SAM-binding site might provide a suitable starting point for structure-based design. Due to the unavailability of experimental structures, we used the AlphaFold model of the enzyme from L. donovani (UniProt Accession Q6RW42).

Upon loading the structure on ProteinsPlus by entering its UniProt Accession, we can see the ligand-free structure of the protein. In the Pockets tab, we see two pockets predicted by DoGSite3 for the structure. The first is very large, with a volume of 587 Å³ (P1), while the second is much smaller and mainly occupied by charged residues (P2). We conclude that the first pocket might be the active site responsible for SAM and zymosterol binding. DoGSite3 detects three subpockets in this binding site: a large one with many aromatic atoms and a hydrophobicity ratio of 0.76, which is flanked by residues with low pLDDT scores (P1_1), and two smaller ones with lower predicted hydrophobicity and high pLDDT scores (P1_2 and P1_3, Fig. 5). Therefore, we hypothesized that the smaller subpockets might be the site binding to SAM and rely on these subpockets with an overall higher predicted accuracy in terms of pLDDT.

We performed a molecular docking of SAM with JAMDA [38, 39] into these combined subpockets. However, we obtained highly diverse potential poses partially extending to the P1_1 subpocket. Due to structural uncertainties of the structural model representing a considerable challenge for molecular docking [40], the best-scored pose might not correspond to the native binding mode. To find the most probable of the predicted binding poses, we built a GeoMine model based on flanking solvent-exposed binding site residues (Fig. 5) and a point indicating the position of the ligand and screened for similar binding sites in complex with SAM, its enzymatic product S-adenosyl homocysteine (SAH), or their analog sinefungin (SFG). The corresponding query file in JSON format is available in the Supplementary Information (Online Resource 2). The search finished in 21 s. Intriguingly, we found only one protein ligand-complex with SAH that did not clash considerably with the query protein residues: the SAM-binding pocket of ribosomal RNA large subunit methyltransferase K/L from Escherichia coli (strain K12, PDB code 3v97). The JAMDA pose on rank 6 is similar to the one in the RNA methyltransferase aligned with GeoMine and might provide a reliable binding hypothesis.

One well-known issue of targeting SAM-binding sites is the comparatively high risk of off-target effects and corresponding toxicity when addressing similar conserved interaction patterns in related enzymes [41]. Although we find highly specialized classes of SAM-binding enzymes in nature [42], a close examination of the interaction pattern similarities might help to identify selectivity-mediating site properties and prevent the design of non-selective inhibitors. Therefore, we further explored the unique features of the binding site. We saved the JAMDA pose on rank 6 and uploaded it as complex to ProteinsPlus. The corresponding PDB file is available in the Supplementary Information (Online Resource 3). Next, we used the 2D query feature of GeoMine to model residue atoms potentially interacting with SAM-related compounds. As the binding site is highly buried and the number of interactions is high, it is more convenient to use the 2D representation of the interacting partners in this case. We modeled the pharmacophoric properties of all interacting atoms except for the residues interacting with the carboxylic group of the methionyl moiety and backbone atom of Ile177, as those atoms are far apart from the adenosyl moiety. The resulting query was used to screen for related binding sites of human protein structures in the PDB. The corresponding query file in JSON format is available in the Supplementary Information (Online Resource 4). The search took 19 s. Intriguingly, we could not identify similarities in the SAM binding mode predicted for SMT to the one observed for any human enzymes of known structure in complex with SAM, SAH, or SFG, indicating a unique interaction pattern in this protein.

To compare this finding to the results of similar approaches with other SAM-binding enzymes from L. donovani, we used another SAM binding site of the enzyme alpha N-terminal protein methyltransferase 1 (UniProt accession number A0A3S7X350). A SIENA [43] search in the PDB revealed a highly related SAM-bound structure of the enzyme of L. major (PDB entry 1xtp by the Structural Genomics of Pathogenic Protozoa Consortium). The tool searches for closely related binding sites of other proteins based on perfect k-mer sequence matches in an indexed database of the PDB. As the residues of both active sites overlap nearly perfectly and there are no mutations or gaps in a 5 Å environment, we used a similar GeoMine search strategy to find potentially related sites for this target. As described previously for SMT, we modeled all interacting residue atoms and their distances, omitting the atoms interacting with the carboxylic group of the methionyl moiety and the backbone oxygen atom of Gln165. We omitted the oxygen atom of Thr167 as the ether might represent a comparably weak acceptor. The corresponding query file in JSON format is available in the Supplementary Information (Online Resource 5). The search was performed in 31 s. In contrast to our findings for SMT, we identify several binding sites in human enzymes that are structurally highly related (Fig. 6). The low RMSD values of the matched points indicate a high validity of the hits in terms of matching interacting atoms. A visual inspection of the matches highlights that mainly human N-terminal Xaa-Pro-Lys N-methyltransferase 1, N-terminal Xaa-Pro-Lys N-methyltransferase 2, and mRNA cap guanine-N7 methyltransferase should be considered potential off-targets of compounds addressing similar interacting residues of the SAM site. The match with actin-histidine N-methyltransferase does not lead to a convincing ligand alignment, indicating that the site of this enzyme is different regarding the atoms interacting with SAM. This result suggests that selectively inhibiting this binding site might be more challenging than addressing the one for SMT with an SAM-competitive inhibitor.

In summary, this study illustrates how GeoMine can support the analysis of protein structures concerning ligand binding in just one of the numerous imaginable workflows. Using DoGSite3, putative sites, e.g., from predicted protein structures, can be used as starting points. The fully integrated 2D and 3D query design options paired with the efficient database search capabilities of GeoMine enable on-the-fly structural investigations exploiting data from hundreds of thousands of protein structures. The new functionalities provide easy access to binding site function prediction and automated searches for potential off-targets.

Conclusion

In this article, we present features and exemplary applications of the new version of GeoMine, a search engine for 3D searching in ligand-bound and predicted empty protein binding sites. Exploiting the full capabilities of such a search engine is a considerable challenge from the user’s perspective due to the complexity of 3D molecular arrangements on the atomistic level being part of the query. In related tools, the 3D query formulation is based on either 2D, 3D, or text input. Each of these input types has advantages and disadvantages.

The new version of GeoMine seamlessly integrates all three input types to maximize the usability of the complex 3D query-building process. The newly implemented 2D editor enables a simplified template-free query generation and template-based query selection for ligand-bound binding sites. The 2D templates make optimal use of the editor’s limited 2D space by highlighting only those chemical aspects of the binding site that are most relevant to the ligand’s interaction with a macromolecule and, therefore, particularly interesting to search for. The 2D editor is instantaneously synchronized with the 3D editor and the textual query representation in tables, enabling a synergistic query generation process complemented by all three input types. A seamless integration of the PoseEdit features into GeoMine might further improve the usability of the 2D interface. Finally, predicted empty binding sites of artificial intelligence-based protein structure models can now be used as 3D templates in the 3D editor, giving the user a new starting point to tailor queries of interest to elucidate potential ligands.

The search engine’s extended graphical user interface will support life scientists in effortlessly generating structural 3D queries on the PDB for the functional analysis of macromolecule-ligand interfaces.

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

3D:: Three-dimensional
PDB:: Protein Data Bank
2D:: Two-dimensional
RMSD:: Root-mean-square deviation
JSON:: JavaScript Object Notation
CSV:: Comma-separated values
SDF:: Structural Data File
SVG:: Scalable Vector Graphics
SMT:: Sterol 24-C methyltransferase
SAM:: S-adenosyl methionine
SAH:: S-adenosyl homocysteine
SFG:: Sinefungin

References

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242. https://doi.org/10.1093/nar/28.1.235
Article CAS PubMed PubMed Central Google Scholar
Kuhn B, Gilberg E, Taylor R, Cole J, Korb O (2019) How significant are unusual protein–ligand interactions? Insights from Database Mining. J Med Chem 62:10441–10455. https://doi.org/10.1021/acs.jmedchem.9b01545
Article CAS PubMed Google Scholar
Meng EC, Polacco BJ, Babbitt PC (2009) 3D motifs. Rigden. D.J. (eds) From protein structure to function with Bioinformatics. Springer, Dordrecht, pp 187–216. https://doi.org/10.1007/978-1-4020-9058-5_8
Chapter Google Scholar
Ehrt C, Brinkjost T, Koch O (2016) Impact of binding site comparisons on Medicinal Chemistry and Rational Molecular Design. J Med Chem 59:4121–4151. https://doi.org/10.1021/acs.jmedchem.6b00078
Article CAS PubMed Google Scholar
Bittrich S, Burley SK, Rose AS (2020) Real-time structural motif searching in proteins using an Inverted Index Strategy. PLoS Comput Biol 16:e1008502. https://doi.org/10.1371/journal.pcbi.1008502
Article CAS PubMed PubMed Central Google Scholar
Korb O, Kuhn B, Hert J, Taylor N, Cole J, Groom C, Stahl M (2016) Interactive and versatile Navigation of Structural databases. J Med Chem 59:4257–4266. https://doi.org/10.1021/acs.jmedchem.5b01756
Article CAS PubMed Google Scholar
Mobilio D, Walker G, Brooijmans N, Nilakantan R, Denny RA, DeJoannis J, Feyfant E, Kowticwar RK, Mankala J, Palli S, Punyamantula S, Tatipally M, John RK, Humblet C (2010) Protein relational database and protein family knowledge bases to facilitate structure-based design analyses. Chem Biol Drug Des 76:142–153. https://doi.org/10.1111/j.1747-0285.2010.00994.x
Article CAS PubMed Google Scholar
Weisel M, Bitter HM, Diederich F, So WV, Kondru R (2012) PROLIX: Rapid Mining of protein–ligand interactions in large crystal structure databases. J Chem Inf Model 52:1450–1461. https://doi.org/10.1021/ci300034x
Article CAS PubMed Google Scholar
Hendlich M, Bergner A, Günther J, Klebe G (2003) Relibase: design and development of a database for Comprehensive Analysis of protein–ligand interactions. J Mol Biol 326:607–620. https://doi.org/10.1016/S0022-2836(02)01408-0
Article CAS PubMed Google Scholar
Golovin A, Henrick K (2008) MSDmotif: exploring protein sites and motifs. BMC Bioinf 9:312. https://doi.org/10.1186/1471-2105-9-312
Article CAS Google Scholar
Inhester T, Bietz S, Hilbig M, Schmidt R, Rarey M (2017) Index-based Searching of Interaction patterns in large collections of protein-ligand interfaces. J Chem Inf Model 57:148–158. https://doi.org/10.1021/acs.jcim.6b00561
Article CAS PubMed Google Scholar
Angles R, Arenas-Salinas M, García R, Reyes-Suarez JA, Pohl E (2020) GSP4PDB: a web Tool to visualize, search and explore protein-ligand structural patterns. BMC Bioinf 21:85. https://doi.org/10.1186/s12859-020-3352-x
Article CAS Google Scholar
Graef J, Ehrt C, Diedrich K, Poppinga M, Ritter N, Rarey M (2022) Searching geometric patterns in protein binding sites and their application to Data Mining in protein kinase structures. J Med Chem 65:1384–1395. https://doi.org/10.1021/acs.jmedchem.1c01046
Article CAS PubMed Google Scholar
Diedrich K, Graef J, Schöning-Stierand K, Rarey M (2021) GeoMine: interactive pattern mining of protein-ligand interfaces in the Protein Data Bank. Bioinformatics 37:424–425. https://doi.org/10.1093/bioinformatics/btaa693
Article CAS PubMed Google Scholar
Fassio AV, Santos LH, Silveira SA, Ferreira RS, de Melo-Minardi RC (2020) nAPOLI: a graph-based strategy to detect and visualize conserved protein-ligand interactions in large-scale. TCBB 17:1317–1328. https://doi.org/10.1109/TCBB.2019.2892099
Article CAS Google Scholar
Desert Scientific Software (DesertSci) Proasis4. https://desertsci.com. Accessed 7 May 2024
Discngine 3decision. https://3decision.discngine.com. Accessed 7 May 2024
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589. https://doi.org/10.1038/s41586-021-03819-2
Article CAS PubMed PubMed Central Google Scholar
Schöning-Stierand K, Diedrich K, Ehrt C, Flachsenberg F, Graef J, Sieg J, Penner P, Poppinga M, Ungethüm A, Rarey M (2022) ProteinsPlus: a comprehensive collection of web-based molecular modeling tools. Nucleic Acids Res 50:611–615. https://doi.org/10.1093/nar/gkac305
Article CAS Google Scholar
Schöning-Stierand K, Diedrich K, Fährrolfes R, Flachsenberg F, Meyder A, Nittinger E, Steinegger R, Rarey M (2020) ProteinsPlus: interactive analysis of protein–ligand binding interfaces. Nucleic Acids Res 48:48–53. https://doi.org/10.1093/nar/gkaa235
Article CAS Google Scholar
Fährrolfes R, Bietz S, Flachsenberg F, Meyder A, Nittinger E, Otto T, Volkamer A, Rarey M (2017) ProteinsPlus: a web portal for structure analysis of macromolecules. Nucleic Acids Res 45:337–343. https://doi.org/10.1093/nar/gkx333
Article CAS Google Scholar
Graef J, Ehrt C, Rarey M (2023) Binding site detection remastered: enabling fast, robust, and Reliable binding site detection and descriptor calculation with DoGSite3. J Chem Inf Model 63:3128–3137. https://doi.org/10.1021/acs.jcim.3c00336
Article CAS PubMed Google Scholar
Bietz S, Urbaczek S, Schulz B, Rarey M (2014) Protoss: a holistic approach to predict tautomers and protonation states in protein-ligand complexes. J Cheminform 6:12. https://doi.org/10.1186/1758-2946-6-12
Article CAS PubMed PubMed Central Google Scholar
Lippert T, Rarey M (2009) Fast automated placement of polar hydrogen atoms in protein-ligand complexes. J Cheminform 1:13. https://doi.org/10.1186/1758-2946-1-13
Article CAS PubMed PubMed Central Google Scholar
Diedrich K, Krause B, Berg O, Rarey M (2023) PoseEdit: enhanced ligand binding mode communication by interactive 2D diagrams. J Comput Aided Mol Des 37:491–503. https://doi.org/10.1007/s10822-023-00522-4
Article CAS PubMed PubMed Central Google Scholar
Stierand K, Rarey M (2010) Drawing the PDB - protein-ligand complexes in two dimensions. ACS Med Chem Lett 1:540–545. https://doi.org/10.1021/ml100164p
Article CAS PubMed PubMed Central Google Scholar
Stierand K, Rarey M (2007) From modeling to Medicinal Chemistry: Automatic Generation of two-Dimensional Complex diagrams. ChemMedChem 2:853–860. https://doi.org/10.1002/cmdc.200700010
Article CAS PubMed Google Scholar
Stierand K, Maass PC, Rarey M (2006) Molecular complexes at a glance: automated generation of two-dimensional complex diagrams. Bioinformatics 22:1710–1716. https://doi.org/10.1093/bioinformatics/btl150
Article CAS PubMed Google Scholar
Davies TG, Bentley J, Arris CE, Boyle FT, Curtin NJ, Endicott JA, Gibson AE, Goldin BT, Griffin RJ, Hardcastle IR, Jewsbury P, Johnson LN, Mesguich V, Newell DR, Noble MEM, Tucker JA, Wang L, Whitfield HJ (2002) Structure-based design of a potent purine-based cyclin-dependent kinase inhibitor. Nat Struct Mol Biol 9:745–749. https://doi.org/10.1038/nsb842
Article CAS Google Scholar
Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34:3755–3758. https://doi.org/10.1093/bioinformatics/bty419
Article CAS PubMed PubMed Central Google Scholar
Rose AS, Hildebrand PW (2015) NGL viewer: a web application for molecular visualization. Nucleic Acids Res 43:576–579. https://doi.org/10.1093/nar/gkv402
Article CAS Google Scholar
Mitra AK, Mawson AR (2017) Neglected tropical diseases: epidemiology and global burden. Trop Med Infect Dis 2:36. https://doi.org/10.3390/tropicalmed2030036
Article PubMed PubMed Central Google Scholar
Editorial (2023) Leishmania: an urgent need for new treatments. EBioMedicine 87:104440. https://doi.org/10.1016/j.ebiom.2023.104440
Article Google Scholar
Chawla B, Madhubala R (2010) Drug targets in Leishmania. J Parasit Dis 34:1–13. https://doi.org/10.1007/s12639-010-0006-3
Article PubMed PubMed Central Google Scholar
Nes WD (2000) Sterol methyl transferase: enzymology and inhibition. Mol Cell Biol Lipids 1529:63–88. https://doi.org/10.1016/S1388-1981(00)00138-4
Article CAS Google Scholar
Orenes Lorente S, Rodrigues JCF, Jiménez Jiménez C, Joyce-Menekse M, Rodrigues C, Croft SL, Yardley V, de Luca-Fradley K, Ruiz-Pérez LM, Urbina J, de Souza W, González Pacanowska D, Gilbert IH (2004) Novel azasterols as potential agents for treatment of Leishmaniasis and Trypanosomiasis. Antimicrob Agents Chemother 48:2937–2950. https://doi.org/10.1128/aac.48.8.2937-2950.2004
Article CAS Google Scholar
Sakyi PO, Broni E, Amewu RK, Miller WA, Wilson MD, Kwofie SK (2023) Targeting Leishmania Donovani sterol methyltransferase for leads using pharmacophore modeling and computational molecular mechanics studies. Inf Med Unlocked 37:101162. https://doi.org/10.1016/j.imu.2023.101162
Article Google Scholar
Flachsenberg F, Meyder A, Sommer K, Penner P, Rarey M (2020) A consistent Scheme for gradient-based optimization of protein–ligand poses. J Chem Inf Model 60:6502–6522. https://doi.org/10.1021/acs.jcim.0c01095
Article CAS PubMed Google Scholar
Flachsenberg F, Ehrt C, Gutermuth T, Rarey M (2024) Redocking the PDB. J Chem Inf Model 64:219–237. https://doi.org/10.1021/acs.jcim.3c01573
Article CAS PubMed Google Scholar
Holcomb M, Chang Y, Goodsell DS, Forli S (2022) Evaluation of AlphaFold2 structures as docking targets. Protein Sci 32:e4530. https://doi.org/10.1002/pro.4530
Article CAS Google Scholar
Rudenko AY, Mariasina SS, Sergiev PV, Polshakov VI (2022) Analogs of S-Adenosyl-L-Methionine in studies of Methyltransferases. Mol Biol 56:229–250. https://doi.org/10.1134/S002689332202011X
Article CAS PubMed PubMed Central Google Scholar
Kozbial PZ, Mushegian AR (2005) Natural history of S-adenosylmethionine-binding proteins. BMC Struct Biol 5:19. https://doi.org/10.1186/1472-6807-5-19
Article CAS PubMed PubMed Central Google Scholar
Bietz S, Rarey M (2016) SIENA: efficient compilation of selective protein binding site ensembles. J Chem Inf Model 56:248–259. https://doi.org/10.1021/acs.jcim.5b00588
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank the whole development team of the NAOMI library and ProteinsPlus web server for forming the basis of this work.

Funding

Open Access funding enabled and organized by Projekt DEAL. Development of ProteinsPlus was supported by de.NBI (in part); German Federal Ministry of Education and Research (BMBF) [031L0105]; Development of GeoMine was supported by the German Federal Ministry of Education and Research as part of CompLS and de.NBI [031L0172, 031L0105 to KD and JG]; CE acknowledges financial support from grant HIDSS-0002 DASHH (Data Science in Hamburg - Helmholtz Graduate School for the Structure of Matter).

Author information

Authors and Affiliations

Universität Hamburg, ZBH - Center for Bioinformatics, Albert-Einstein-Ring 8-10, 22761, Hamburg, Germany
Konrad Diedrich, Christiane Ehrt, Joel Graef & Matthias Rarey
Universität Hamburg, Department of Informatics, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Martin Poppinga & Norbert Ritter

Authors

Konrad Diedrich
View author publications
You can also search for this author in PubMed Google Scholar
Christiane Ehrt
View author publications
You can also search for this author in PubMed Google Scholar
Joel Graef
View author publications
You can also search for this author in PubMed Google Scholar
Martin Poppinga
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Ritter
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Rarey
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The concepts behind GeoMine were developed by KD, JG, MP and MR, the concept of the GeoMine web-based interface was developed and implemented by KD. The case studies were designed by CE. The original draft of the manuscript was written by KD and CE, the project and manuscript writing were supervised by MR, database design by JG and MP, supervised by NR and MR. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Matthias Rarey.

Ethics declarations

Competing interests

The authors declare the following competing financial interest(s): ProteinsPlus and the NAOMI ChemBioSuite use some methods that are jointly owned and/or licensed to BioSolveIT GmbH, Germany, MR is a shareholder of BioSolveIT GmbH.

Software availability

The GeoMine application is available at https://proteins.plus. The code of the InteractionDrawer library used for the drawing of interactive 2D diagrams is available at https://github.com/rareylab/InteractionDrawer.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Supplementary Material 3

Supplementary Material 4

Supplementary Material 5

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Diedrich, K., Ehrt, C., Graef, J. et al. User-centric design of a 3D search interface for protein-ligand complexes. J Comput Aided Mol Des 38, 23 (2024). https://doi.org/10.1007/s10822-024-00563-3

Download citation

Received: 03 April 2024
Accepted: 17 May 2024
Published: 30 May 2024
DOI: https://doi.org/10.1007/s10822-024-00563-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

User-centric design of a 3D search interface for protein-ligand complexes

Abstract

Similar content being viewed by others

Introduction

Methods

Features overview

New features integration

Technical implementation details

Application

Binding site function prediction and off-target analyses for methyltransferases in Leishmania

Conclusion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Software availability

Additional information

Publisher’s Note

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation