Introduction

Approximately, 3 % of testosterone (T) is free in serum and biological activity making it in the most abundant androgen [1]. In prostatic stromal and basal cells, T is converted to 5\(\upalpha \)-dihydrotestosterone (DHT) by the 5\(\upalpha \)-reductase (5aR) enzyme through the irreversible reduction of the \(\Delta ^{4}\) bond [2]. DHT is the preferred ligand for the androgen receptor (AR) [3]. Once DHT–AR has been formed, in its activated form, AR undergoes dimerization, phosphorylation, and translocation to the nucleus. Then, it activates the transcription of certain genes that include transcription co-regulators and transcriptional machinery which triggers the synthesis of specific proteins and also cell proliferation [4]. Elevated levels of DHT are related to benign prostatic hyperplasia (BPH) and prostate cancer (PCa) which are two of the most common diseases in men [5, 6].

Due to the crucial role of 5aR in the progress of the prostate pathogenesis, various prostatic 5aR inhibitors have been developed in an effort to block the T conversion and inhibit the effects of DHT in prostate disease. Currently, there is a wide range of azasteroids reported such as finasteride (FIDE), which is a drug broadly used for BPH treatment [7]. However, side effects of FIDE [8, 9] have encouraged the development of new steroidal compounds with improved selectivity toward 5aR.

Over the past several years, the research group of Bratoeff has reported a large set of pregnane and androstane molecules as inhibitors of 5aR. Most of these steroidal compounds have shown potent inhibitory effect in prostatic 5aR enzyme. Moreover, several of these compounds had been able to reduce the weight of prostate in gonadectomized hamster more than FIDE in the in vivo experiments [1024]. However, limited studies have been reported for the computational analysis of the structure–activity relationships (SARs) of pregnane and androstane molecules. This is in part due to the lack of experimental three-dimensional structures of the 5aR enzyme. Recently, a self-organizing molecular field analysis (SOMFA) [25], a 3D-QSAR technique, was employed to explore the SAR of a set of pregnane and androstane derivatives [26] and different azasteroids [27]. In those studies, the master grid for various SOMFA models indicated that bulky groups around C-3, C-6, and C-17 of the steroidal skeleton suggest favorable interactions, whereas electronegative groups around C-3, C-6, and C-17 as well as electropositive groups at C-4 are responsible for the observed variations in the 5aR inhibition. However, drawbacks of 3D-QSAR methods such as molecular alignment and selection of bioactive conformations are well known, making this technique challenging to implement [28]. Therefore, it is convenient to apply additional computational strategies using a fast and robust method to further advance the understanding of the SAR and structure–property relationships of inhibitors of 5aR.

Herein, we report a comprehensive SAR study of 54 5aR inhibitors (53 novel compounds plus FIDE) using the concept of activity landscape modeling (ALM) [29, 30]. ALM has emerged as an approach to rapidly navigate through the SAR of datasets [29, 31]. Of note, a number of ALM methodologies are ligand-based only, so no data on the three-dimensional structure of the receptor or prior knowledge of the putative binding site are required. Indeed, ligand-based methods are particularly useful when no detailed structural information of the putative ligand–target interactions is known or when there is a high uncertainty in the bioactive conformations [32]. ALM emphasizes on the identification of compounds with very similar chemical structures but different potency differences, i.e., activity cliffs [33]. Of note, ALM is part of the broader concept ‘property landscape modeling’ used in chemistry [34]. Here, ‘activity’ (typically referring to biological activity) is a specific case of ‘property.’ In this context, ALM does not necessarily assume that all compounds have exactly the same mechanism of action such as other typical quantitative SAR methods such as QSAR. Indeed, ALM has been proved to be useful to gain insights from datasets where the biological activity has been tested in cell-based assays (where the precise mechanism of action is actually unknown) [35]. This is because, in contrast to traditional QSAR analysis such as 3D-QSAR, ALM does not assume continuous SARs [36, 37].

Methods

Dataset

The chemical structures and biological activity of 53 in-house pregnane and androstene molecules were retrieved from the literature. The activity data for all compounds were obtained in vitro in enzymatic inhibition assays using the same experimental conditions. The experimental \(\mathrm{IC}_{50}\) values against the prostatic 5aR were converted to \(\mathrm{pIC}_{50}\) (\(-\)log \(\mathrm{IC}_{50})\) values. The \(\mathrm{pIC}_{50}\) values ranged from 5.00 to 10.60. The chemical structure of FIDE was retrieved from the ZINC database [38].

Chemical space

A visual representation of the chemical space [39] was generated by conducting a principal components analysis (PCA) of the similarity matrix of the dataset with 54 molecules. This is a well-established method to generate visual representation of chemical spaces [40]. The similarity matrix was calculated using the extended connectivity fingerprints (ECFPs) available in MayaChemTools (http://www.mayachemtools.org) using a neighborhood radius of two [41]. ECFPs have been used to perform activity landscape analysis with interpretable results [32, 42]. The structural similarity was computed with the Tanimoto coefficient (Tc) [43]. The PCA analysis was performed with the FactoMineR R package version 1.29. PCA of the similarity matrix has been used as a strategy to visualize the chemical space of several datasets [43, 44].

Activity landscape

Several methods have been developed for ALM [29]. In this work, the activity landscape of the 5aR inhibitors was studied by means of the structure–activity similarity (SAS) maps [45], which have been extensively employed to characterize the SAR of datasets with activity data for one or more biological endpoints [34, 35, 46, 47]. A detailed description of the construction of SAS maps is provided elsewhere [48]. In summary, the relationship between structural similarity and potency difference (or potency similarity) can be represented in a two-dimensional graph called a SAS map. Usually, the structural similarity (that can be obtained with any single or combination of similarity approaches) is plotted on the x axis and the potency difference is plotted on the y axis [49]. SAS maps can be roughly divided in four zones. The most significant one explored in this work was the upper-right region of the plot that identifies pairs of molecules with high structure similarity but large potency difference. Therefore, this zone identified activity cliffs and was considered the ‘activity cliff region’ of the SAS maps. In order to define the four zones of the SAS maps, we used two thresholds following established criteria [48]: one standard deviation above the mean structure similarity (\(\hbox {ECFPs/Tanimoto}=0.43\)) along the x-axis, and one standard deviation above the activity difference (\(\Delta \mathrm{pIC}_{50} = 2.75\)) along the y-axis. Since the visual interpretation of the SAS maps can be difficult if many data points are present, we implemented the recently developed density SAS maps [32]. In a density SAS map, the frequency of the data points is represented with a continuous color scale as detailed elsewhere [32].

Activity cliff generators

An activity cliff generator is a molecule frequently found in activity cliffs [50]. In medicinal chemistry, the most attractive activity cliff generators are compounds with high biological activity. Identification of activity cliff generators and compounds with similar chemical structure but large potency difference is particular relevant because it points to specific structural data with significant impact on the biological activity. Herein, the activity cliff generators were identified as molecules with high frequency in the ‘activity cliff region’ of the SAS maps.

Results and discussion

Since the activity landscape of a dataset is the association between the chemical space with the structure representation [42], the “Results and discussion” section of the paper are further organized in two parts: (1) exploration of the chemical space of the dataset followed by (2) activity landscape analysis. Analysis of the chemical space provides information of the chemical diversity of the dataset, while the second part discusses the pairwise structural relationships between the chemical structures with the biological activity.

Fig. 1
figure 1

Visual representation of the chemical space of the 54 molecules in the dataset. The PCA shows the relative position of finasteride (FIDE). The visualization was obtained by principal component analysis of the similarity matrix computed with ECFPs. Data points are colored by the \(\mathrm{pIC}_{50}\) values in a continuous scale. Five major groups or clusters (AE) are readily distinguished (see also Table 1)

Chemical space

Figure 1 shows a visual representation of the chemical space of the 53 pregnane and androstene molecules and FIDE. In this plot, each data point represents one compound. Data points are colored by the corresponding \(\mathrm{pIC}_{50}\) value using a continuous scale indicated in the figure. The relative position of FIDE and two representative compounds (activity cliff generators EB-13 and EB-37, discussed below) is shown. The visual representation of chemical space in Fig. 1 is an approach to rapidly classify the compounds in the dataset. Such visual classification provides a first and rapid idea of the different groups (or clusters) of compounds that can be found in the set of 53 pregnane and androstene molecules. Furthermore, adding activity data to the plot enables the identification of regions in chemical space with, for example, smooth SAR or groups of compounds enriched with activity. The distribution in chemical space distinguishes several groups, both by visual inspection and k-means analysis (data not shown). The most significant activity cliffs generators, i.e., EB-13 and EB-37 (cf. chemical structures in Figs. 5, 6, respectively), are included in two different groups. Table 1 contains the main clusters that can be distinguished and the compounds present in each group. The chemical structures are given in the Supporting Information. Cluster A is composed by 6-halopregna-4,6-diene-3,20-dione, pregna-4-en-3,20-dione, and methylenepregna-4,6-diene-3,20-dione skeletons having different aliphatic and aromatic esters as well as aromatic carbamates at \({\text {C-}}17\upalpha .\) Some of these compounds have an alpha epoxide group at C-4, C-5 or C-6. Cluster B includes pregna-4,16-diene-6,20-dione derivatives with aliphatic and aromatic esters at \({\text {C-}}3\upbeta \) position, the 4-azasteroid FIDE are present in this cluster. Cluster C contains the modified D-homo lactone androstane and androstane skeletons having different aliphatic esters at \({\text {C-}}3\upbeta .\) Interestingly, Cluster D has the same skeletons than C, but it differs in the presence of 5\(\upalpha ,6\upbeta \)-dibromo group. Finally, Cluster E contains the 17\(\upalpha \)-methylpregna-4,6-diene-3,20-dione, 17\(\upalpha \)-methylpregna-1,4,6-triene-3,20-dione, pregna-4,6-diene-3,20-dione, and 17\(\upbeta \)-methyl-16\(\upbeta \)-phenyl-D-homoandrost-4,6-diene-3,17a-dione skeletons having different ester groups at the \({\text {C-}}17\upalpha \) position. Of note, the ECPFs were able to distinguish the structures, despite the fact that most of the compounds in the dataset have a steroid skeleton. This is due to the high resolution of this fingerprint-based structure representation [51].

Table 1 Major clusters identified in the visual representation of the chemical space in Fig. 1
Fig. 2
figure 2

SAS map for the 54 molecules in the dataset. Each data point (1431 total) represents a pairwise comparison. a Relative position of the 1431 pairwise structure and potency difference comparisons. b Activity SAS map identifying the data points by the categorical activity of the molecules in each pair. ‘AA’ represents pairs of active molecules, ‘AI’ pairs with one active and one inactive, and ‘II’ pairs in which both compounds are inactive (see text for details). Activity cliffs are located in the upper-right region of the plot herein defined using established criteria [45]. Representative activity cliffs are labeled EB-13/EB-46 (‘x’) and EB-37/EB-38 (‘y’)

Fig. 3
figure 3

Density SAS map of paired comparisons plotting structure similarity versus potency difference. The map is colored by the frequency of data points in the coordinates given

Fig. 4
figure 4

Activity cliff generators: bar graph of compounds in the cliff region of the SAS map. The bars are colored by the relative biological activity (\(\mathrm{pIC}_{50}\) values) of the compounds with respect to the entire set, red compounds with activities above one standard deviation from the mean, blue compounds with an activity below the mean, gray if the activity is between the mean and one standard deviation. Compounds EB-13 and EB-37 are the most active activity cliff generators

Activity landscape analysis

Figure 2a depicts a typical SAS map for this dataset. Each point in the SAS map represents a paired comparison (there are 1431 possible pairs for this dataset of 54 compounds). Importantly, it can be seen that at higher similarity values (x axis), the range of the difference in activity decreases (y axis). This feature is in overall agreement with a continuous SAR, i.e., as two compounds are structurally more similar, the activity values are also similar. This result is further illustrated in panel b. In the SAS map of Fig. 2b, points are distinguished using a categorical classification of the biological activity taking as reference the \(\mathrm{pIC}_{50}\) of FIDE: a compound was defined as ‘active’ in this figure if \(\mathrm{pIC}_{50} \ge 8.0.\) Thus, in Fig. 2 data points labeled as ‘AA,’ are pairs where both compounds are active. Data points labeled as ‘AI’ contain one active and one inactive molecules. Finally, data points in gray (II pairs), contain two inactive compounds. Figure 2b clearly shows that the proportion of red (AA) points increases as the structure similarity also increases. This result highlights the proficiency of the chemical representation used and also suggests a smooth SAR of the dataset.

In the SAS map of Fig. 2a, there are several pairs of compounds with both, low structure similarities and low activity differences (lower left quadrant). These are called ‘similarity cliffs’ [52] or scaffold hops. The relative amount of data points in the similarity cliff region can be visually analyzed in the density SAS map of Fig. 3. Note that the density SAS map can be clearly differentiated from the SAS maps in Fig. 2 in that Fig. 3 is focused on the counts of data points in each different regions of the map. Indeed, density SAS maps have been recently developed as complementary tools to alleviate the issue of crowded SAS maps that may be difficult to interpret [53].

Figure 2a, b shows that there are few but notable data points in the upper-right zone of the plot, this is, pairs if compounds with high structure similarity but have large potency difference. These exceptions to the continuous SAR are the activity cliffs and are discussed in the next section. Of note, the activity cliffs are not systematically identified in 3D-QSAR studies such as SOMFA (vide supra). The identification and interpretation of the activity cliffs is an outcome of particular significance of the ALM study reported in this work.

Activity cliffs generators: identification

Figure 4 shows the distribution of molecules in the activity cliff region of the SAS map. The most frequent compounds with high activity were EB-13 (\(\mathrm{pIC}_{50} = 10.20;\, \mathrm{IC}_{50} = 0.063\,\mathrm{nM}\)) and EB-37 (\(\mathrm{pIC}_{50} =10.60;\,\mathrm{IC}_{50} = 0.025\,\mathrm{nM}\)) with frequencies of five and three, respectively. In other words, EB-13 forms five activity cliffs and EB-37 forms three activity cliffs (see discussion below). Figure 4 also shows that there were additional activity cliff generators with high activity (red bars) but with lower frequencies such as EB-32, EB-33, and EB-42 (frequency of one). Finally, there were compounds with relative low \(\mathrm{pIC}_{50}\) values (lower than the median \(\mathrm{pIC}_{50}\) values of the entire dataset) that form several activity cliffs. One example was EB-34 (\(\mathrm{pIC}_{50} = 6.96;\, \mathrm{IC}_{50} = 110\,\mathrm{nM}\)) that forms four activity cliffs (Fig. 4). The SAR of the two most prominent activity cliff generators is discussed in the next section.

Fig. 5
figure 5

SAR of the activity cliff generator EB-13

Activity cliffs generators: interpretation of the SAR

As discussed above, analysis of the chemical space and activity landscape of the dataset of inhibitors of 5aR led to the rapid identification of two activity cliff generators (EB-13 and EB-37) that occupy different regions in the chemical space of this dataset. In this section, we discuss an interpretation of the SAR of each of the two generators.

SAR of activity cliff generator EB-13

Figure 5 shows the chemical structure, \(\mathrm{pIC}_{50}\) value, and structural similarity (Tc/ECFPs) values of the five compounds forming activity cliffs with EB-13, which is the 17\(\upalpha \)-acetoxy-16-methylpregna-4,6-diene-3,20-dione derivative \((\mathrm{pIC}_{50} =10.20).\) In this figure, the standard numbering for the molecular scaffold is shown for EB-13. All compounds in this figure have a potency difference \(\Delta \mathrm{pIC}_{50}\,{>}2\) with EB-13. The position in the SAS maps of a representative activity cliff, compound pair EB-13/EB-46, is shown in Fig. 2 (e.g., data point labeled with an ‘x’). Comparison of the structure and activity of the activity cliff EB-13/EB-46 (Fig. 5) led to the remarkable finding that the 17\(\upbeta \)-methyl group in EB-13 can drastically change the inhibitory potency in three log units.

Overall, the molecules structurally related to the activity cliff generator EB-13 are pregnane derivatives with three conjugated double bonds and different esters attached to \({\text {C-}}17\upalpha \) position (Fig. 5). The inhibitory effect of these compounds has been explained with a plausible mechanism proposed wherein the inhibition of the 5aR enzyme is based on the Michael type addition reaction of the 5aR enzyme to the steroidal enone, dienone, or trienone to form irreversible adducts [54]. For EB-46 and EB-53, the different \(\mathrm{pIC}_{50}\) values could be explained by the number of unsaturated bonds in the steroidal skeleton. Therefore, we can infer that the trienone in EB-53 has more resonance stabilization than EB-46, thus it is less susceptible to Michael addition and therefore EB-53 is the less active molecule in Fig. 5.

Fig. 6
figure 6

SAR of the activity cliff generator EB-37

An additional intriguing finding that emerged from the analysis of the activity cliff generator in Fig. 5 is that the most similar steroidal compounds to EB-13 (i.e., with Tc \(>\)0.40) have in common two double bonds between C-4 and C-5, and C-6 and C-7, respectively. Consequently, the changes in activity can be attributed to the \({\text {C-}}17{\upalpha }\) substituent. In general, a bulky ester group at \({\text {C-}}17{\upalpha }\) decreases the activity for this series. However, comparing the \(\mathrm{pIC}_{50}\) values of EB-6 with EB-7, and EB-9, it can be concluded that the p-substituted aromatic esters slightly increase the potency.

SAR of activity cliff generator EB-37

Figure 6 illustrates the three compounds forming activity cliffs with EB-37 (the standard numbering for the molecular scaffold is shown for EB-37). All compounds in this figure have a potency difference \(\Delta \mathrm{pIC}_{50}\,{>}3\) with EB-37, which is \(5\upalpha ,6\upbeta \)-dibromo-17a-oxa-D-homoandrostane-\(3\upbeta \)-yl-3\(^{\prime }\)-oxahexanoate \((\mathrm{pIC}_{50} = 10.60).\) The position in the SAS map of the representative activity cliff EB-37/EB-38 is shown in Fig. 2 (labeled with a ‘y’). The structural difference in this activity cliff is the side chain at \({\text {C-}}3\upbeta \) indicating that the more lipophilic character of the hexanoate moiety in EB-37 versus the ethoxyacetate in EB-38 seems to favor the interaction with 5aR.

All molecules in Fig. 6 lack of an unsaturated ketone (in contrast to EB-13 and all compounds in Fig. 5). This observation is consistent with the different relative positions of EB-13 and EB-37 in chemical space (Fig. 1). Such structural difference can be related to a different mechanism of action. For example, the electrophilic C-5 and C-6 positions form irreversible adducts with nucleophilic residues and therefore promotes the inhibition of catalytic activity of the 5aR enzyme. Another mechanism suggested is that these compounds could have an alternative binding mode due to the pseudosymmetry of C-19 steroids, causing the C-3 end to be in the usual position of C-17. This finding was previously observed for DHT steroid in the \(17\upbeta \)-HSD1 enzyme [55] and provides support for the hypothesis that the interchange of D-ring with the A-ring in androstene series illustrated above makes the \({\text {C-}}3\upbeta \) (or pseudo \({\text {C-}}17\upbeta \) position) ester moiety responsible for the lipophilic interaction at the hypothetical pocket in 5aR in a similar way found by the 4-azasteroids, wherein lipophilic ketones and amides at \({\text {C-}}17\upbeta \) increase the inhibitory effect in the 5aR type II enzyme [7].

Conclusions and perspectives

Systematic characterization of the chemical space of 53 5aR inhibitors here presented demonstrated the structural uniqueness with respect to the approved drug FIDE and was able to distinguish different chemical classes, for example, those represented by the activity cliff generators EB-13 and EB-37. Systematic comparison of the structural activity and potency difference of each pair of molecules in the dataset showed that, in general, as the structural similarity increases, the potency difference decreases, in overall agreement with the similarity principle. However, the dataset analyzed in this work has two chemically distinct and potent activity cliff generators, EB-13 (\(\mathrm{pIC}_{50}=10.20;\, \mathrm{IC}_{50} = 0.063\) nM) and EB-37 (\(\mathrm{pIC}_{50}=10.60;\, \mathrm{IC}_{50} = 0.025\) nM), this is, compounds with high affinity to 5aR that that are very similar to compounds analogues but have large potency differences. Although the current study is based on a relatively small set of steroidal derivatives, the findings of this investigation complement those of previous studies. The outcome of the activity landscape analysis also provided hypothesis that compounds may be inhibiting 5aR in different forms, for example, forming irreversible adducts through a Michael type addition or having binding modes different from FIDE. The application of activity landscape analysis to identify possible different mechanisms of action or alternative binding modes is related to the concept ‘activity landscape sweeping’ introduced recently [53]. Of note, in contrast to other quantitative approaches to analyze SARs (such as QSAR), ALM does not require that the compounds analyzed have exactly the same mechanism of action. This is because activity landscape studies do not assume continuous SARs [36, 37]. Despite the fact that further experimental investigations are needed to assess this hypothesis, novel non-4-azasteroidal inhibitors can be developed in order to improve the potency and selectivity in the prostatic 5aR enzyme and then be used for the treatment of BPH and PCa. Indeed, once the underlying SAR of the set of 53 5aR inhibitors has been explored, what was the focus of this work; the next logical steps are design new compounds and predict their activity. As previously discussed in the literature, understating the SAR of a data set become before prediction [37].

Supporting information

List of the 54 compounds used in this study (53 pregnane and androstene compounds and FIDE).