Background

One of the defining features of organic chemistry is the extremely large diversity of possible molecules. The concept of chemical space, whereby molecules are annotated with a set of quantitative molecular properties and placed in a high-dimensional property space with each dimension corresponding to a different property, offers a practical approach to represent the structural diversity of large molecule collections [128]. Such high-dimensional spaces cannot be visualized directly but can be subjected to various dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection [2932].

To make chemical space easier to inspect, we recently reported an interactive Java Applet representing databases of molecules as color-coded maps produced by projection of high-dimensional property spaces, defined by various molecular fingerprints, into two dimensions [3237]. In these so-called Mapplets the computer screen shows a color-coded 2D-image where each pixel contains one or several molecules projected at that point. The average molecule contained in each pixel is displayed on a side-window on mouse over, with an option to open the complete list of molecules in the pixel in a secondary window, and subsequently to link selected molecules to the database entry, or to perform similarity searches in the parent high-dimensional property space. These Mapplets unfortunately suffer from the typical folding effects encountered when projecting high-dimensional property spaces into 2D [2, 6, 9, 28, 30, 32], which results in (a) many pixels containing molecules piled-up on top of each other, and (b) a poor correlation between distances on the 2D-map and distances in the original high-dimensional property space. In addition the Java Applets must be downloaded and run separately and are not platform independent.

Herein we report webDrugCS, a web application freely accessible at www.gdb.unibe.ch which addresses these limitations by enabling access to molecules via interactive color-coded 3D-spaces in a manner similar to the 2D-mapplet. The website visualizes DrugBank (http://www.drugbank.ca/), a public database listing over 6000 compounds currently in medical use either as FDA approved and marketed drug or as investigational drugs [38]. Similarly to our recently reported PDB-Explorer website to visualize the Protein Databank [39], webDrugCS uses the internet browser of the user to generate the display. DrugBank is represented in the form of color coded 3D-spaces obtained by principal component analysis (PCA) of five different multidimensional property spaces defined by five different fingerprints. These fingerprints describe constitution and topology (42D molecular quantum numbers, MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp) (Table 1). The 3D-spaces are generated using three.js (http://threejs.org/), an open-source JavaScript library/API for animated 3D computer graphics in a web browser. Although less sophisticated than other chemical space visualization tools designed to assess compound collections [4046], webDrugCS provides an unprecedented tool to look at DrugBank and rapidly learn about the structural diversity of small molecule drugs. This feature is not offered at the DrugBank website and at any other currently available online tools such as eDrug3D [47], SuperDrug [48], SuperPred [49], or BalestraWeb [50], which are primarily designed to address specific queries such as drug name, substructure, molecular formula or protein target by providing a limited number of answers.

Table 1 Fingerprints used in this study

Results and discussion

PCA of multidimensional property spaces

In a multidimensional property space dimensions and the position (coordinates) of any molecule are defined by a set of molecular descriptors. PCA is performed as a dimensionality reduction method to obtain 3D- or 2D-representations. In these projections the position of any molecule is defined by its coordinates in the first three respectively two principal components (PCs). Here PCA is used to project DrugBank from each of the five property spaces defined by the fingerprints MQN, SMIfp, APfp, Xfp and Sfp onto the corresponding 3D-space or 2D-map. The cumulative coverage of data variance within the first 3 PCs is larger than 75 % in the case of the fingerprints MQN, SMIfp and APfp, which are relatively simple descriptions of the molecules resulting in a relatively low number of dimensions (Fig. 1a). In these cases a very good correlation is observed between distances in the original high-dimensional property space and the 3D-projection (Fig. 1b). The situation is less optimal for the more complex and higher dimensional fingerprints Xfp and Sfp, where only 42 % respectively 20 % of data variance is covered within the first three PCs. Nevertheless the correlation between distances in the original property space and the 3D-space resulting from PCA is still acceptable (Xfp: 0.8, Sfp: 0.6), implying that these 3D-spaces still contain relevant information about the position of molecules in the original high-dimensional Xfp and Sfp spaces. In particular nearest neighbours in each of the 3D spaces are for the most part closely related molecules in the corresponding high-dimensional property space.

Fig. 1
figure 1

Analysis of the DrugBank database. a Percentage of data variance covered by increasing numbers of PC obtained by PCA of MQN-, SMIfp-, APfp-, Xfp- and Sfp-datasets of DrugBank. b Pearson’s correlation coefficient between pairwise euclidean distances in the n-dimensional PC-subspace and the respective original MQN, SMIfp, APfp, Xfp and Sfp fingerprint spaces, calculated from analyzing 36 M molecule pairs in the DrugBank database. c Percentage of the DrugBank database considering all single occupied bins in the original fingerprint space (black), grid points in 3D-space (blue) and pixels in 2D-space (red). A bin is defined as one particular fingerprint value combination. The 3D-spaces were generated by projecting DrugBank onto a grid of 300 × 300 × 300 grid points. The 2D-maps were generated by projecting the DrugBank onto a map of 300 × 300 pixels

One of the remarkable aspects of the 3D-spaces concerns the resolution of compounds into individual 3D-grid positions after assigning molecules to a 3D-grid point in a 300 × 300 × 300 box covering the range of (PC1, PC2, PC3) values. In the original multidimensional property spaces an excellent resolution is obtained for DrugBank in the sense that almost all DrugBank molecules are encoded by a unique fingerprint bit value combination. This resolution is largely preserved upon PCA and assignment to the 3D-grid, as can be judged by the fact that the percentage of molecules appearing in singly occupied 3D-grid points is comparable to the percentage of molecule having a single fingerprint bit-value combination. The 3D-space is clearly superior in that matter to the 2D-map, where compounds are assigned to 2D-pixels in a 300 × 300 square covering the range of (PC1, PC2) values. In this case a significant folding occurs and only 40–60 % of the compounds appear in single occupied 2D-pixels (Fig. 1c).

As an additional noticeable feature the 3D-representations of the various property spaces represent DrugBank an intuitively logical spatial organization which can be visualized by color-coding each grid-point with a selected property value. As illustrated by screen-shots taken from the web application webDrugCS (details discussed below), striking features include for example parallel stripes grouping compounds of increasing ring count in the MQN 3D-space (Fig. 2a), the separation of molecules according to their number of aromatic carbon atoms in the SMIfp 3D-space (Fig. 2b) and according to their rotatable bond count in the APfp 3D-space (Fig. 2c), and the global separation of the Sfp 3D-space according to the fraction of aromatic atoms (Fig. 2d).

Fig. 2
figure 2

Color coded 3D-spaces of the DrugBank chemical space obtained by taking snapshots from the webDrugCS website (www.gdb.unibe.ch). The color changes in the range blue → cyan → green → yellow → red → magenta with increasing property value. a MQN 3D-space color coded by ring count, shown with open control panel b SMIfp 3D-space color coded by the number of aromatic carbon atoms. c APfp 3D-space color coded by rotatable bond count. d Sfp 3D-space color coded by the fraction of aromatic atoms. The molecule shown in the viewer window is located in the mouse over pixel, which is marked as a white sphere in the image

webDrugCS

WebDrugCS (www.gdb.unibe.ch) is an online application for interactive visualization and exploration of DrugBank in color coded 3D property spaces. The application works on computers, tablets and phones. The starting page of webDrugCS (Fig. 3a) provides two options (1) Selection of molecular fingerprint: Choose between MQN, SMIfp, APfp, Xfp and Sfp fingerprint 3D-spaces by clicking the corresponding field, which opens a new browser tab. (2) External chemical library: The user can input up to 1000 additional molecules in SMILES format, which will be displayed together with DrugBank in any of the selected 3D-spaces. Each of the lines in the text box must represent an individual molecule as SMILES followed by a space and its name or tag. External molecules are viewed by default as dark violet colored grid points.

Fig. 3
figure 3

The webDrugCS website and its functionalities. a Starting page of the webDrugCS. MQN, SMIfp, APfp, Xfp and Sfp 3D-spaces of the DrugBank database can be accessed by clicking on respective buttons. A list of molecules to be mapped on any 3D-space of the DrugBank database can be entered (format: SMILES) into the text box provided in the lower part of the page. See main text for the exact input format and details. b Interactive visualization window for MQN 3D-spaces obtained by clicking the button corresponding to MQN in the starting page. The 3D-space is shown with color coding using HBA atom count. On mouse over the panel at top left displays the molecule at the corresponding grid point. The example shown is cyproheptadine. c The DrugBank page for cyproheptadine was obtained by clicking the drug code displayed in the left panel in b. d Multifingerprint browser window for DrugBank with the cyproheptadine as query, obtained by clicking the “Link to browser” option in the control panel (top right panel in b). e Results window displaying the MQN-nearest neighbors of the query cyproheptadine in DrugBank

The graphical user interface (GUI) of the interactive visualization window is exemplified here with the MQN 3D-space. The GUI consists of a main panel, a molecule view panel, and a control panel. The main panel occupies the entire screen area and displays the 3D-space (Fig. 3b). Each point in the 3D-space is represented as sphere, whose size depends on its distance to the camera. The view angle rotates by dragging the mouse upon left click, and the wheel controls the zoom in/out function.

The view panel is positioned at upper left and shows the structural formula and DrugBank ID of the molecule at the current mouse-over 3D-grid point. Upon selecting a grid point by double click, one can then link to the molecule page at the DrugBank webpage by clicking on the DrugBank ID displayed below the structural formula (Fig. 3c), or access a similarity browser to search for nearest neighbours in the original high-dimensional fingerprint space via the control panel (Fig. 3d/e).

The control panel at top right lists options to change the 3D-space view. Lines 1–3: select a color code according to a descriptor, or a single color code for DrugBank and the uploaded molecule list. Line 4: display the reference 3D-axes. Line 5: hide the DrugBank grid points, leaving only the molecules uploaded by the user as visible points. Line 6: change the 3D-grid point sphere size. Line 7: set the currently selected 3D-grid point as reference pivot point for the 3D-space (after selecting a grid point by double click). Line 8: Reset the view to the default entry view. Line 9: Link to the fingerprint similarity browser, which opens as an additional tab. This browser allows one to perform nearest neighbour searches in DrugBank in any of the five original high-dimensional fingerprint spaces. The browser is built in the same manner as our recently reported ChEMBL similarity browser [32]. Line 10: help function listing the different options.

The external chemical library option in the entry panel (Fig. 3a) is illustrated here for mapping 24 drugs from DrugBank annotated in ChEMBL as β1-adrenergic receptor antagonists. These typical drugs contain a short aliphatic amine or aminoalcohol connected to a mono- or bicyclic aromatic nucleus. Due to their comparable overall composition, molecular shape, pharmacophore and substructural elements these 20 drugs form a relatively tight group in each of the five property spaces in webDrugCS (Fig. 4). In general series of structurally related molecules appear grouped in the various 3D-spaces available with webDrugCS. Note that the option “hideDB” in the control panel allows one to remove the drugbank compounds, which leaves only the external library as visible points.

Fig. 4
figure 4

Mapping of 20 DrugBank comopunds annotated with β1-adrenergic receptor antagonist activity in ChEMBL using the extrenal library option in webDrugCS. The 20 extrenal compounds are shown as white dots, overlayed on DrugBank shown in color-coded representation. Heavy atom count color coding was used for the MQN map (a), SMIfp map (b), APfp map (c) and Xfp map (d), while the N–C=C substructure cound is used for Sfp (e). Four of the 20 selected drugs are shown in e

Conclusion

webDrugCS represents the first online application for visualizing DrugBank in five different 3D property spaces on computers, tablets or phones. In contrast to the other database exploration tools, webDrugCS can be used for curiosity driven exploration independently of specific queries, and is particularly suitable to rapidly gain an overview of the structures of drug molecules. While the present web-based application is currently limited to displaying of a few thousand points, the method might be applicable to displaying larger databases of millions of molecules if significant coding progress can be made.

Methods

Databases The DrugBank database was downloaded in SDF format from http://www.drugbank.ca/. Molecules were processed by checking for valency error, removing counter ions and adjusting their ionization state to pH 7.4, using an in-house built java program utilizing Java Chemistry library (JChem) from ChemAxon, Pvt. Ltd., as a starting point. Duplicates and molecules larger than 50 heavy atoms were removed from the database.

Fingerprints Calculation of MQN, SMIfp, APfp and Xfp fingerprints are discussed in detail in the respective publications from our group. Fingerprints were calculated as described previously using plugins provided in JChem chemistry library.

Principal component analysis

The PCA for each database was performed using an in house written Java program utilizing some of the available mathematical functions from JSci (A science API for Java: http://jsci.sourceforge.net/). The Java source code is based on the tutorial of Lindsay I. Smith (http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf).

3D-space and color coding

The PC-1, PC-2 and PC-3 values were calculated for each molecule in the database. The largest (PCmax) and smallest (PCmin) PC values appearing in the PC-1 or PC-2 or PC-3 values were used to define the value range ΔPC = PCmax − PCmin and set the binning scale as ΔPC/300. The PC-1, PC-2 and PC-3 values were binned onto 300 × 300 × 300 3D-grids using the same absolute bin size on the PC-1, PC-2 and PC-3 axis. Each molecule was assigned to a point on this 3D-grid. The Hue–Saturation–Lightness (HSL) color space was used for color coding, setting the hue value according to the average value of the selected molecular property across all molecules residing at that grid point, and the saturation according to the standard deviation of that value across all molecules within ±5 grid points in each direction. As a result the color change blue–cyan–green–yellow–red–magenta shows an increasing average value of property in a grid point, and saturation to grey indicates a strong gradient of the value in the vicinity.

webDrugCS

The core part of webDrugCS for 3D-rendering and visualization is supported by the Three.js (http://threejs.org/), an open-source JavaScript library/API to create and display animated 3D computer graphics in a web browser. Three.js uses WebGL and runs across various browsers without need for any additional plugins. The webDrugCS has been successfully tested on IE, Chrome and Opera browsers. The only requirement for the webDrugCS is to have JavaScript enabled in a web browser. The source code of the webDrugCS visualizer is available for download at https://github.com/mahendra-awale/webDrugCS.