Objective

UHRF1 functions as an epigenomic controller and is involved in various cellular mechanisms that lead to tumorigenesis [1]. UHRF1 has been proven to increase the activity and specificity of DNMT1 [2]. The SRA domain of UHRF1 is a DNA-binding domain and recognizes 5-methylcytosine (5mC) in hemimethylated CpG dinucleotides [3,4,5,6,7]. Due to the 5mC binding epitope architecture, the SRA domain is a highly promising site for small molecules targeting [8]. The SRA domain of UHRF1 interacts directly with DNMT1 and thereby provides improved substrate (hemimethylated DNA) access to the catalytic center of DNMT1, leading to an increase of DNA methylation activity [9]. In vitro studies have shown that UHRF1 can cause a fivefold increase in DNMT1 activity, and the SRA domain on its own can lead to a 1.9-fold increase in the activity of DNMT1. The interaction between UHRF1 and DNMT1 causes a nearly two-fold increase in the preferential targeting of hemimethylated DNA by DNMT1 [2]. Significantly, the expression levels of UHRF1 were described to be 5- to 70-folds lesser than those observed for HDAC1 and DNMT1 in healthy tissues. Thus, any potential adverse effects that may result due to the inhibition of UHRF1 expression or function are expected to be reasonably manageable when compared with consequences that are caused by the direct inhibition of DNMT1 [10]. Thus, preventing the interaction between the SRA domain and hemimethylated DNA via small molecules is a viable strategy to prevent aberrant DNA methylation [2]. Additional information about targeting the SRA domain for anti-cancer drug development was published earlier [1].

Data description

The identification of small molecules that are predicted to bind to the SRA domain of UHRF1 was performed via virtual screening using Schrodinger’s Small Molecule Drug Discovery Suite. The crystal structures of UHRF1 is available in the public domain. The structure of the SRA domain and its interaction with hemimethylated DNA has been published [3, 5, 6]. The small molecule libraries were screened using the SRA domain (PDB Id: 3DWH) [7]. The downloaded PDB structure was prepared using the protein preparation wizard, which confirmed structural correctness at the start of the screening work. The Asp469 residue, which forms a hydrogen bond with the methylcytosine [6], was chosen as the active site, and a primary grid was prepared 10 A0 from the Asp469 residue [1]. The other residues that were selected to define the grid were Tyr466 and Tyr478 that sandwich 5-methylcytosine, and also Thr479, that is known to play a crucial role in the preferential recognition of cytosine [6].

A personal computer with the i7-4700MQ quad-core processor and 32 GB memory was used for this work. The small molecule libraries in the SDF format were prepared with LigPrep, to generate precise 3D molecular models for virtual screening. Epik was utilized for the consistent estimation of pKa values and to return chemically functional structures. The compounds were subjected to a filter to eliminate reactive compounds and analyzed via QIKPROP for the reliable projection of the ADME properties of the small molecules. The structure-based screening was performed using Schrodinger’s virtual screening workflow, which involves sequentially running Glide HTVS, Glide SP, and Glide XP on the prepared compound libraries. The virtual screening workflow removed 90% of the compounds at each phase, thus permitting only the top 10% of the small molecules on to the next step [1].

Nearly 2.4 million small molecules were screened using the SDF files of compound libraries from ChemDiv (San Diego, CA) and Timtec (Newark, DE). The numbers mentioned in parenthesis is the number of small molecules of the library. TIMTEC’s libraries include the Actimol collection (127,937), HTS part I, and HTS part II (400,000 & 491,349). ChemDiv libraries that were screened were Discovery Chemistry 1, 2 and 3 (350,000, 350,000 and 277,772) and New Chemistry 1 and 2 (250,000 and 206,249). The focused libraries from ChemDiv that were screened include bromodomain (6114), cancer stem cells (19,956), 3D mimetics (9461), soluble diversity (9624), targeted diversity (46,817), and methyltransferase (11,647) libraries. The specific libraries were chosen to facilitate the identification of diverse drug-like molecules that are likely to interact with an anti-cancer drug target with a crucial role in epigenomic regulation.

The data is available in the form of Maestro pose viewer files that is output by Glide. Glide is a sophisticated numerical algorithm optimized for docking accuracy and database enrichment. The pose viewer file contains a set of selected entries in Maestro in which the first entry is the protein (SRA domain), and all the other entries are poses of the docked ligand. After entering into the Pose Viewing Mode, the ligand poses can be navigated. The output files thus provide information about the identified molecules and visualize the predicted interactions with the SRA domain of UHRF1 (Table 1).

Table 1 Overview of data files/data sets

Limitations

  • The present investigation is limited to the selected small molecule libraries from ChemDiv and Timtec.

  • The structure-based virtual screening was carried out using most of the default parameters of the Schrodinger’s Small Molecule Drug Discovery Suite.

  • The small molecule hits that were identified in the present study only narrow down the number of compounds that needs to be evaluated initially in an in vitro assay.

  • The small molecules identified in this study have not been evaluated in a biochemical or biophysical assay. Some of the identified small molecules may not show a binding response to the SRA domain of UHRF1 in a biochemical or biophysical assay. If a successful binding interaction is detected in an in vitro assay, the molecules need to be validated further in a series of biochemical, biophysical, and cell-based assays.