INTRODUCTION

The functioning of a eukaryotic promoter is a complex process including the interaction between cis- and trans-regulatory elements of the promoter and transcription factors (TF), the enzymes of the transcription complex, and regulatory RNA [1]. Epigenetic mechanisms such as conformational changes in chromatin, DNA methylation, histone modifications [2], as well as physicochemical processes related to phase separation [3] play a substantial role in transcription regulation. However, the overall regulation of promotor activities is based on TF binding with specific sequences in DNA and interaction between the factors.

At present, more than 1600 human TFs are known altogether, and their number is continuously increasing, but the sequences of DNA binding sites are unknown for many of them (more than 500 factors) [4]. The essential features of TF binding sites are their degeneracy and cross affinity of different TFs to the same DNA sequence [5]. This results in the probabilistic and competitive regulation of transcription.

The peculiarities of genes transcription, together with metabolic rearrangements in transformed cells, determine the ability of tumors to grow slipping out of control of the surrounding tissues and a whole body; therefore, the study of these processes is of theoretical and practical importance for oncology. Some of the known TFs are considered as diagnostic markers for tumors or their metastases. For example, CDX2 is a sensitive marker of colorectal adenocarcinomas, TTF1 is a significant marker for adenocarcinomas of the lung, PAX8 can be a marker for gynecologic tumors, etc. [6]. That is why the study of TFs, in particular, the presence of their recognition profiles in the promoters, remain relevant.

At present, the accumulated data on the structure of promoters have already opened up great prospects for using them in practical medicine, e.g., as components of genetic engineering vectors of gene therapy for tumors [7, 8]. However, the choice of optimal promoters, as well as the search of new promoters for genetic engineering therapeutic vectors, are still nontrivial problems. Great interest in promoters has been recently renewed due to the need of designing vaccine vectors for prevention and treatment of infectious diseases and cancer [9, 10].

Previously, we cloned a number of gene promoters regulating the proliferation of cells exhibiting tumor-specific activity [11, 12] (Table 1). The present work was aimed at revealing the peculiarities of these promoters, as well as chimeric promoters derived from the latter [13], which determine their enhanced activity in tumor cells. For this purpose, we have compared these promoters with a set of housekeeping gene promoters with respect to composition of the known TF recognition profiles, as well as searched for nucleotide motifs de novo in the promoters.

Table 1. The promoters studied in the present work

RESULTS

Predominant profiles in tumor-specific promoters. Previously we have shown that relatively short (hundreds of nucleotides) promoters of some genes (Table 1) involved in the regulation of proliferation have a tumor-specific activity [11, 12]. It is known that the most important regulatory elements of the analyzed promoters are concentrated within a region of approximately up to –500 bp from the transcription start site (TSS) [12, 14, 15]. Therefore, we chose seven DNA sequences from the EPDnew eukaryotic promoter database with the coordinates [–499; +100] relative to the transcription start site, which corresponded to the previously studied tumor-specific promoters. Since there is more than one promoter known for some of these genes, we selected only the promoters that had been studied in direct experiments (Table 1). For comparison, we used 23 housekeeping gene promoters of the same length. Using the CiiiDER software [16] and the JASPAR bases of transcription factor profiles (http://jaspar.genereg.net/) [5], we have performed the comparison (enrichment) of tumor-specific promoters with housekeeping gene promoters with respect to the composition of TF recognition profiles (Fig. S1, Supplementary Information). At the same time, we compared the numbers of promoters in both groups with the respective TF profiles; the result was estimated using the Mann–Whitney test. The complete results of promoter comparison are given in Table S1 in Supplementary Information. Tables 2 and S2 (see Supplementary Information) show the TF profiles, which are statistically significantly (p < 0.05) predominant in tumor-specific promoters compared to housekeeping gene promoters (hereinafter, conditionally tumor-specific TFs).

Table 2. The transcription factors with the profiles predominant in tumor-specific promoters compared to the promoters of housekeeping genes

Among the seven promoters, the MCM2, CKS1B and PLK1 promoters are more enriched in the TF profiles of this group than other promoters (13, 10 and 9 profiles, respectively) (Table S3 in Supplementary Information). More than half of tumor-specific promoters were shown to contain the recognition profiles of factors SREBF2, ZNF75D, Zfx (each in six promoters), RUNX2 and ETS2 (each in five promoters), and Creb3l2 (in four promoters) (Table 2). Preferential locations in the promoters were successfully identified for three TFs (Fig. 1). For example, the SREBF2 profiles in four out of seven promoters are localized in the region [–458;–384] relative of TSS on the noncoding DNA strand. The ZNF75D profiles take the region [–253;–184] relative to TSS also on the noncoding DNA strand in four promoters and nearby TSS [–24;+30] on the coding strand in three promoters. The RUNX2 profiles are localized on the coding strand in the region [–414;–247] in four promoters.

Fig. 1.
figure 1

The positions of recognition profiles of transcription factors SREBF2, ZNF75D and RUNX2 in promoters. Transcription start sites (0) are indicated according to the EPDnew eukaryotic promoter database. The bold arrows indicate Motif 1.

Interaction between transcription factors and promoters. Using the Pathway Сommons resource [17], we have constructed a network of direct regulatory interactions between the revealed TFs and the products of the studied tumor-specific genes (Fig. 2). The scheme also shows the proliferating cell nuclear antigen (PCNA), because it interacts with the products of the genes, the promoters of which have been studied. PCNA is involved in the regulation of cell proliferation and has a marked tumor-specific expression [18]. However, the PCNA promoter that we cloned [19] was active both in tumor and normal cells [20] and, hence, initially was not included in the differential analysis of promoters. In the present work, the PCNA (FP026732) promoter is considered as nonspecific.

Fig. 2.
figure 2

The direct regulatory interactions between the products of genes under study (in black circles) and transcription factors (Pathway Commons): (a) for conditionally tumor-specific transcription factors; (b) for conditionally nonspecific transcription factors. The dashed–dotted arrows show expression regulation; the dashed arrows show modifications; the solid arrow shows transport control; the dashed lines without arrows show binding into a complex.

As one can see from Fig. 2a, MCM2 and PLK1 take the central position in this regulatory network, which is in agreement with the presence in their promoter of a great number of profiles of conditionally tumor-specific TFs. The Creb3l2, CENPB, SIX1, ZNF75D, Zfx, HOXD11 and KLF11 factors were not included into the regulatory network though their recognition profiles are present in tumor-specific promoters much more often than in housekeeping gene promoters. There are no direct data on the involvement of these seven TFs in the regulation of promoters under study. However, their selection does not seem to be random, because these TFs, according to the GEPIA2 database, demonstrate differential expression or are associated with specific prognosis in some tumors (Table S2 in Supplementary Information) and, consequently, can play important roles in carcinogenesis.

Nonspecific transcription factors. Additionally, we have determined the TF profiles that more often occur in the promoters of housekeeping genes compared to tumor-specific promoters (Tables S4 and S5 in Supplementary Information; hereinafter, conditionally nonspecific factors). It should be noted that these factors also demonstrate differential expression and can be prognostic markers in some cancers. At the same time, six out of seven TFs of this group participate in the development and differentiation of various types of cells (Mafb [21], Arid3a [22], MEIS3 [23], BHLHA15 [24], BSX [25], E2F1 [26]). The TBP factor is related to the general transcription mechanisms [27]. According to Pathway Commons, only TBP and E2F1 directly interact with the products of tumor-specific genes under study (Fig. 2b).

Search of nucleotide motifs de novo. For improving the search of regulatory DNA sequences typical of tumor-specific promoters, we searched for motifs de novo using the MEME Suite software [28].

We have performed discriminative search of nucleotide motifs in seven tumor-specific promoters relative to housekeeping gene promoters. Motif 1 of 44 bp found thereby is present in all tumor-specific promoters (Fig. S2, Supplementary Information). According to the GOMO test, the functions of Motif 1 are associated with the activity of olfactory receptors, the sensory perception of smell and the G-protein coupled receptor protein signaling pathway. In six out of seven promoters under study, Motif 1 is localized approximately in the [–490;–210] region relative to TSS. In the CKS1B promoter, Motif 1 is localized in the [‒47;–4] region, in the opposite direction relative to the promoter. It is probably due to the fact that the CKS1B/SHC1 promoter is bidirectional. It can be supposed that Motif 1 consists of three shorter submotifs. Therefore, we have analyzed these submotifs separately using the Tomtom program from MEME Suite and the database of vertebrate TFs (Table S6, Supplementary Information).

Submotif 1a contains the recognition profiles of myocyte enhancer factors MEF2, which can activate the genes induced by growth factors and stress and are involved in both the suppression and progression of cancer under various conditions [29]; the profile of the forkhead E1 factor associated with thyroid cancer [30] and the profile of the POU homeobox transcription factor, which is one of the master regulators of small cell lung cancer [31]. The NR2E1 factor is also important for carcinogenesis, because it is associated, in particular, with metastasis of breast cancer [32]. PHOX2B has been defined as the key regulator of differentiation and maintenance of stem cell properties of neuroblastoma [33]. Sp100 is a component of promyelocytic leukemia (PML) nuclear bodies [34].

Submotif 1b contains the recognition profile for retinoid X receptors (RXRs) and retinoic acid receptors (RARs), the nuclear receptors mediating the biological effects of retinoids associated with tumor development [35]. Activation of the DMRT1 gene was found in testicular germ cell neoplasia [36]. The interferon-regulating factor 3 (IFN3) is involved in important processes such as anticancer immunity and resistance to some bacterial and viral infections [37].

Submotif 1c includes the sequence responsible for cAMP (cAMP-response element) and capable of binding TFs of the CREB3 family regulating the proliferation and migration of cancer cells [38], as well as the profile of the ATF6 factor involved in the unfolded protein response; the overexpression of ATF6 correlates with tumor aggressiveness [39]. ZNF135 is a target of regulation by differentially expressed microRNA in nasopharyngeal carcinoma [40].

Certainly, Motif 1 may be shown to contain the recognition profiles of other TFs, in addition to those listed above, as well as unknown TFs, due to degeneracy. Nevertheless, our data suggest that Motif 1 can also be typical of other tumor-specific promoters.

We have tried to find other promoters containing Motif 1. For this purpose, we have scanned 29 598 human promoters from the EPDnew promoter database (all in the [–499;+100] coordinates relative to TSS) using the FIMO program from MEME Suite. We have found 4733 sequences in both orientations with p < 0.0001, including all of the seven promoters under study. Ten promoters containing the sequences closest to Motif 1 are presented in Table 3. According to The Human Protein Atlas, the respective nine (90%) genes are prognostic markers for tumors (mainly unfavorable). It is notable that the position of Motif 1 in the presented promoters relative to TSS approximately corresponds to that in the seven analyzed tumor-specific promoters.

Table 3. The ten promoters most similar to Motif 1 from the EPDnew database

Among the seven promoters under study, the MCM2 and BIRC5 promoters proved to be the closest ones to Motif 1. Separate analysis of the motifs de novo in these two promoters using MEME Suite revealed a consensus Motif 2 of 184 bp in the coordinates [–579;–396] for MCM2 and [–496; –303] for BIRC5 relative to TSS (Fig. S3, Supplementary Information). Interestingly, the region of similarity between the MCM2 and BIRC5 promoters is noticeably wider; it includes 296 bp and has the same order of shorter motifs. The motif similar to Motif 2 can also be found in another five promoters using FIMO, but with very low estimates (data not shown). Therefore, we consider such similarity to be the peculiar feature of the MCM2 and BIRC5 promoters.

Analysis of chimeric promoters. We have studied the chimeric promoters possessing the higher activity in A431 epidermoid carcinoma cells compared to normal fibroblasts (Fig. S4, Supplementary Information) [13]. CiiiDER was used in the search of conditionally tumor-specific and nonspecific TF profiles in chimeric promoters, as well as in nonspecific CMV and PCNA promoters. The profiles of Creb3l2, ETS2, HEY1, HEY2 and SREBF1 (var. 2) have been found only in the chimeric promoters with enhanced activity in the А431 cells (CH2, CH20, CH26, Fig. S4 in Supplementary Information), but they are absent in nonspecific CH10, CMV, PCNA promoters (Table S7, Supplementary Information). All profiles of conditionally nonspecific TFs prevailing in housekeeping gene promoters (Table S5, Supplementary Information) occur in at least one of the three nonspecific promoters (Table S8, Supplementary Information).

Though this observation can hardly be assessed statistically, we are inclined to consider it as a tendency that confirms our conclusions but requires further verification. Only two chimeric promoters, CH8 and CH16, contain Motif 1 of 44 bp in length within a promoter fragment POLD1 (data not shown).

DISCUSSION

The accumulated data on the structure and functions of promoters have already opened up great prospects for their use in practical medicine, e.g., as components of gene engineering vectors in the gene therapy for tumors [7, 8]. The clinical trials of gene therapy constructs containing viral and human promoters have been completed (ClinicalTrials.gov Identifier: NCT01455259, NCT00891748, NCT00197522, NCT00051480; see the review by Ginn et al. [8], etc.). In addition to the gene therapy for cancer, native promoters can be used in gene engineering vectors for treating other diseases, in industry and in various biotechnological processes. Recently, there has been a renewed interest in promoters due to need for vaccine vectors for preventing or treating infectious diseases and cancer [9, 10]. The approaches are being developed with construction of hybrid promoters [41, 42] and chimeric promoters [13]. Therefore, the study of the primary structure of promoters, in particular, tumor-specific promoters, is still relevant.

In the present work we have studied seven tumor-specific promoters (Table 1) with respect to the content of recognition profiles of the known TFs. In spite of the fact that transcription regulation involves DNA sequences both adjacent to TSS and remote, it has been shown that certain regulatory functions are maintained in relatively short proximal promoters being the minimum (core) promoters with adjacent cis-regulatory elements, with a total length of several hundreds of nucleotides [43]. This is also true for the promoters under study, where many important regulatory elements are concentrated within a region of approximately up to –500 bp from TSS [12, 14, 15]. Such size is convenient for genetic manipulations with these promoters.

The comparison of the promoters revealed 17 TF recognition profiles, more frequent in tumor-specific promoters than in housekeeping gene promoters (conditionally tumor-specific TFs, Table 2). Table S2 (see Supplementary Information) presents recognition profiles of these factors according to the JASPAR database [5]. The profiles of the SREBF2, ZNF75D, Zfx, RUNX2, ETS2 and Creb3l2 factors were found in more than half of tumor-specific promoters. The rest of 17 TFs occurred in some tumor-specific promoters but in none of the 23 promoters of the reference group. We have also succeeded in determining the preferential locations of recognition profiles of the SREBF2, ZNF75D and RUNX2 factors in tumor-specific promoters (Fig. 1). It should be noted that the CKS1B and PLK1 promoters containing 10 and 9 TF profiles of this group, respectively (Table S9, Supplementary Information), previously have shown the maximum tumor specificity compared to other promoters [12].

Some tumor-specific promoters contain the recognition profiles of TFs with not enough data on their involvement in the regulation of these promoters. Such factors are Creb3l2, CENPB, SIX1, ZNF75D, HOXD11, Zfx and KLF11.

Probably, our results will be an argument in favor of the study of the role of these TFs in regulation of the respective genes.

In addition, we have detected seven profiles of TFs represented to a greater extent in housekeeping gene promoters than in the promoters under study (Tables S4 and S5, Supplementary Information). Six out of seven TFs of this group participate in the development and differentiation of cells of various types (see above). Cell differentiation is often considered as a process opposite to malignant transformation and proposed to be used for treating tumors [44]. Our observations show that TF selection with respect to nonspecificity is nonrandom in this case. At the same time, the differential expression of these TFs in some tumors and their association with prognosis (Table S4, Supplementary Information) demonstrate the need for more thorough investigation of this problem.

Obviously, the binding of a transcription factor to DNA depends not only on the nucleotide sequence but also on many other parameters: profile orientation, adjacent sequences, interaction between this TF and other regulatory molecules and sequences, etc. [43]. In addition, conditionally tumor-specific TFs can be related not to the malignant transformation of cells but rather to the proliferative functions of genes, and this difference should be studied in each particular case. Nevertheless, the findings provide an opportunity to investigate promoter regulation more precisely in direct experiments.

Using a de novo discriminatory search of motifs, we have found a sequence of 44 bp (Motif 1), which is present in all seven tumor-specific promoters but is not characteristic of housekeeping gene promoters. In the EPDnew database, we have found a group of promoters with such a motif, the respective genes being the prognostic markers for tumors. We believe that at least some of these promoters may have a tumor-specific activity.

We have shown that two promoters, BIRC5 and MCM2, contain a highly similar motif of 184 bp. Moreover, the region of similarity between these two promoters spreads to 296 bp. We assume that such similarity is a peculiar feature of these two promoters associated with their regulation rather than the typical common feature of tumor-specific promoters. Though this observation is interesting, it is beyond the framework of our research.

Our conclusions can be partially verified by the study of TF profiles in chimeric promoters. Five out of 17 profiles of conditionally tumor-specific TFs, namely, the Creb3l2, ETS2, HEY1, HEY2 and SREBF1 (var. 2) profiles, have undergone selection on A431 cells and were chosen in chimeric promoters with respect to tumor specificity [13]. Only two chimeric promoters, CH8 and CH16, contain Motif 1 of 44 bp in length, which has been found in seven native tumor-specific promoters. Together with other data, it probably means that the presence of Motif 1 in a promoter is a sufficient but not necessary condition for tumor specificity. Obviously, these issues require more comprehensive investigation.

EXPERIMENTAL

Human promoter sequences were obtained from the EPDnew database (Eucaryotic Promoter Database, https://epd.epfl.ch/EPDnew_database.php) [45] in the [–499;+100] coordinates relative to the transcription start site (TSS) (Table 1). The search and comparison of TF profiles in promoters and the search of motifs de novo were performed using CiiiDER [16] and MEMESuite [28] (https://meme-suite.org/ meme/) with default parameters and the bases of transcription factor binding profiles: JASPAR (http://jaspar.genereg.net/) [5] and Jolma, 2013 [46]. The differences between the groups of promoters with respect to the frequency of TF profiles were estimated by the statistical Mann–Whitney test, primarily taking into account the “significance score” and “gene p‑value” parameters (CiiiDER). TF expression in tumors is presented according to the GEPIA2 database (http://gepia2.cancer-pku.cn/). The prognostic significance of TFs in tumors was determined using The Human Protein Atlas (https://www.proteinatlas.org/) [47]. The networks of regulatory interactions were constructed using the Pathway Commons resource (http://www.pathwaycommons.org/) [17].

CONCLUSIONS

It has been shown that the tumor-specific promoters under study differ from nonspecific housekeeping gene promoters in the presence of recognition profiles for 17 transcription factors, as well as a 44-bp motif, which can be promising objects for studying tumor-specific regulation of gene expression. The presence of these sequences in any unknown promoter may be indicative of its tumor specificity. We believe that our results can contribute to the choice of promising native promoters to be used in gene engineering antitumor constructs such as the vectors with killer genes or vector vaccines. Among the seven promoters under study, the MCM2, CKS1B and PLK1 promoters are more enriched with the profiles of conditionally specific TFs than other promoters, which is in agreement with the data on considerable tumor specificity of these promoters [12]. Some chimeric promoters that we have obtained previsouly can also be of practical importance [13].

It should be emphasized that the conclusions of this work are based on the theoretical analysis of nucleotide sequences of the promoters. We considered TF recognition profiles only as nucleotide motifs with certain variability. Therefore, all conclusions relate only to the presence of such sequences in the promoters. The actual role of transcription factors in the tumor-specific regulation of promoters should be studied in direct experiments. We hope that our work will contribute to better understanding of how tumor-specific gene transcription works and will open up new opportunities for creating artificial promoters and gene engineering vectors used in the gene therapy for tumors.