Ribosomal RNA Transcription Machineries in Intestinal Protozoan Parasites: A Bioinformatic Analysis

Purpose Ribosome biogenesis is a key process in all living organisms, energetically expensive and tightly regulated. Currently, little is known about the components of the ribosomal RNA (rRNA) transcription machinery that are present in intestinal parasites, such as Giardia duodenalis, Cryptosporidium parvum, and Entamoeba histolytica. Thus, in the present work, an analysis was carried out looking for the components of the rRNA transcription machinery that are conserved in intestinal parasites and if these could be used to design new treatment strategies. Methods The different components of the rRNA transcription machinery were searched in the studied parasites with the NCBI BLAST tool in the EuPathDB Bioinformatics Resource Center database. The sequences of the RRN3 and POLR1F orthologs were aligned and important regions identified. Subsequently, three-dimensional models were built with different bioinformatic tools and a structural analysis was performed. Results Among the protozoa examined, C. parvum is the parasite with the fewest identifiable components of the rRNA transcription machinery. TBP, RRN3, POLR1A, POLR1B, POLR1C, POLR1D, POLR1F, POLR1H, POLR2E, POLR2F and POLR2H subunits were identified in all species studied. Furthermore, the interaction regions between RRN3 and POLR1F were found to be conserved and could be used to design drugs that inhibit rRNA transcription in the parasites studied. Conclusion The inhibition of the rRNA transcription machinery in parasites might be a new therapeutic strategy against these microorganisms. Supplementary Information The online version contains supplementary material available at 10.1007/s11686-022-00612-7.


Introduction
Intestinal protozoal infections are a major health problem, especially in developing countries, where poor household hygiene practices, inadequate sanitary facilities, and low socioeconomic conditions favor their spread [1]. In particular, protozoa are responsible for important intestinal diseases in humans, with high morbidity and, in some cases, mortality [2]. Giardia duodenalis and Cryptosporidium parvum are the most common pathogenic intestinal protozoan parasites with an annual incidence of about 10,000 cases each in the United States and Europe alone, whereas for Entamoeba histolytica, a worldwide annual incidence of 100 million cases is estimated [3]. In general, these parasites cause poor digestion, impair absorption and increase nutrient loss, among other things. Indeed, even in asymptomatic infections, subtle damage and disturbances of intestinal function may occur [4]. One aspect to highlight is the increase in treatment failure and the appearance of strains resistant to current drugs due to their massive and inappropriate use, which has led us to the need to devise new treatment strategies [2,5,6].
On the other hand, the efficient growth and proliferation of parasites require a balanced production of ribosomes for protein synthesis [7][8][9][10]. Notably, the rate-limiting step of ribosome biogenesis is the synthesis of ribosomal RNA (rRNA) by RNA polymerase I (Pol I) [10][11][12]. The rRNA transcription machinery comprises three main components: the Pol I enzyme, the TBP (TATA-binding protein)-TAF (TBP-associated factor) complex SL1 (selectivity factor 1) and the trans-activator protein UBF (upstream binding factor) [13]. Currently, little is known about the components of the rRNA transcription machinery that are present in intestinal parasites, but it is known that if any of these do not function properly, the parasites die due to cell cycle arrest and apoptosis [14][15][16][17]. In this sense, the rRNA transcription machinery becomes a feasible target for the design of new anti-parasitic drugs. The proposal of this work was to identify the putative components of the ribosomal RNA transcription machinery in the three most prominent intestinal protozoan pathogens, G. duodenalis, C. parvum, and E. histolytica. Furthermore, special emphasis was placed on the interaction between RRN3 and POLR1F, which is a key step to link Pol I with the rest of the components of the transcriptional machinery and where anti-parasitic drugs might be designed.

Database Screening
The amino acid sequences of the proteins involved in the initiation of rRNA transcription in humans were obtained from the UniProt Knowledgebase (UniProtKB) [18] using the name of each protein. To find orthologs of human proteins, the whole genome sequences of the intestinal parasites G. duodenalis (Assemblage A_isolate_WB), C. parvum (Iowa II) and E. histolytica (HM1-IMSS) were examined in the corresponding databases of the EuPathDB Bioinformatics Resource Center [19] and using the NCBI BLAST tool [20] with default search parameters. The rRNA transcription machinery of Saccharomyces cerevisiae and the genome of Entamoeba dispar were also analyzed for comparative purposes.

Sequence Analysis
The structural domains of rRNA transcription machinery proteins present in all organisms were predicted and analyzed using InterPro [21]. Multiple sequence alignments of RRN3 orthologs and POLR1F orthologs were performed using Clustal Omega in CLC Genomics Workbench 21 (Qiagen Bioinformatics, Aarhus C, Denmark). Based on these alignments, the identity and similarity percentages between the orthologs were calculated. Three-dimensional structures were predicted using SWISS-MODEL [22], and illustrations were made using UCSF Chimera software [23].

TBP is the Only Subunit of the TBP-TAF Complex SL1 that has Identifiable Orthologs in All Species Analyzed. Pol I-Specific Factor RRN3 is also Conserved
The only component of the TBP-TAF complex SL1 that was found in all the organisms analyzed was TBP, with one ortholog in G. duodenalis, two in C. parvum and three in both E. histolytica and E. parvum. Since TAF II 12 orthologs were identified only in E. histolytica and E. dispar, G. duodenalis and C. parvum were the species with the fewest identifiable subunits of the SL1 transcription factor. RRN3 orthologs were found in the genome of all organisms, but the sequences of E. histolytica and E. dispar orthologs diverged widely. Regarding the UBF transcription activator, two orthologs were identified in G. duodenalis and three in both E. histolytica and E. dispar. No orthologs for this protein were found in C. parvum. The summary of the results is presented in Tables 1 and 2.

Intestinal Protozoan Parasites Lack Orthologs of the POLR1E, POLR1G, and RPA3 Subunits in RNA Polymerase I Complex
Ortholog search for the 14 major subunits of RNA polymerase I was also performed. Thus, it was found that for the subunits POLR1A, POLR1B, POLR1C, POLR1D, POLR1F, POLR1H, POLR2E, POLR2F and POLR2H there is at least one ortholog in each species. In contrast, for the POLR1E, POLR1G and RPA3 subunits, no orthologs were found in any of the analyzed parasites. Orthologs of the POLR2K subunit were identified in G. duodenalis and C. parvum, but not in E. histolytica and E. dispar. Interestingly, no orthologs were found for the POLR2L subunit in E. dispar. In this way, E. dispar is the organism with the fewest identifiable components of RNA polymerase I. The summary of the results is presented in Tables 1 and 2. Figure 1 shows the rRNA transcription machinery in each species analyzed according to our bioinformatic analysis. All the orthologs of TBP, POLR1D and POLR1F identified in parasites were proteins smaller than those in humans. The reduction in the number of amino acids was between 26.5% and 42.5% for TBP, between 15.8% and 26.3% for POLR1D, and between 24.3% and 37.3% for POLR1F. Except for C. parvum, the RRN3 orthologs had between 14.6% and 28.4% fewer amino acids than the human protein, but RRN3 HEAT repeats are conserved in all species studied. POLR1A orthologs in G. duodenalis and C. parvum had between 4.9% and 24.5% more amino acids than the human protein, but for the E. histolytica and E. dispar orthologs the number of amino acids was between 4.9% and 19.8% less than the human counterpart. Meanwhile, POLR1B orthologs from G. duodenalis and C. parvum were proteins with 6% to 13.4% more amino acids than human protein, but almost       [ 31] all orthologs from E. histolytica and E. dispar had similar amounts. In contrast, POLR1C orthologs in G. duodenalis and C. parvum maintained a similar number of amino acids as human protein, but orthologs in E. histolytica and E. dispar were proteins with 15.9% to 18% less amino acids. Most of the identified orthologs of POLR1H, POLR2E, POLR2F, and POLR2H in parasites were very similar in size to human and yeast proteins. A schematic representation of these data appears in Supplementary Fig. 1. and Uaf30 in yeast. Yeast Pol I consists of A190 and A135 (POLR1A and POLR1B in humans) plus Rpb5, Rpb6, Rpb8, Rpb10 and Rpb12 (POLR2E, POLR2F, POLR2H, POLR2L, POLR2K in humans) and the heterodimer AC40-AC19 (POLR1D-POLR1C in humans). The Pol I core is completed with A12.2 (POLR1H in humans), the A43-A14 heterodimer (POLR1F-RPA3 heterodimer in humans) and the A49 and A34.5 subunits (POLR1E and POLR1G in humans). For the subunits appearing in gray, no orthologs were found in the parasites studied

The RRN3 and POLR1F Orthologs Maintain Important Residues for Their Interaction in All the Species Studied
Despite poor sequence conservation (Table 3), sequence alignments of POLR1F orthologs showed that the region mediating its interaction with the RRN3 subunit is conserved in intestinal protozoan parasites ( Fig. 2A). Furthermore, three-dimensional predictions of these parasitic proteins revealed strong structural similarity to their human and yeast counterparts, where the binding area remains exposed in all cases to facilitate their interaction with RRN3 (Fig. 2B).
On the other hand, sequence alignment analysis of the RRN3 orthologs also revealed low conservation (Table 3), but residues of a serine patch that serve this protein to bind to POLR1F had high conservation (Fig. 3A), particularly those corresponding to residues S101, S102 and S185 of yeast RRN3. With the three-dimensional predictions, a high structural similarity of the RRN3 orthologs was observed, where the characteristic HEAT repeat fold (repeats of alpha helices joined by a short loop) is maintained. In addition, the identified serine patch residues are exposed in all cases and would allow their interaction with POLR1F orthologs (Fig. 3B). These findings suggest that the interaction points between RRN3 and POLR1F are conserved in different species, including intestinal protozoan parasites.

Discussion
Transcription of rRNA by Pol I is the key regulatory step in ribosome production and is tightly controlled by an intricate network of signaling pathways and epigenetic mechanisms [24]. The transcription by Pol I requires the formation of a preinitiation complex (PIC) that directs promoter-specific transcription of rDNA and whose components are the Pol I enzyme, the TBP-TAF complex SL1 and UBF [25]. The only subunit of the TBP-TAF complex SL1 that was identified in all the species analyzed was TBP. This responds to the fact that TBP is considered the most conserved initiation factor in archaeo-eukaryotic transcription initiation complexes [26]. No orthologs of the three Pol I-specific TAFs were identified, but all species have other members of the TFIID (transcription factor II D) family protein encoded in their genome that could carry out this function. UBF plays an essential role in maintaining a state of euchromatin on rDNA and enhancing rRNA expression [27], and that is why it is interesting that in C. parvum, a UBF ortholog was not found. UBF is not essential for the initiation of transcription in vitro, but it is essential for the formation of PIC in vivo and functions in the pre-and post-initiation steps [25,27]. However, the genome of C. parvum exhibits other proteins with HMG (high mobility group) boxes whose specific function has not yet been characterized. Almost all the core is required for RNA elongation by Pol I [28], was not found in any of the analyzed parasite species. Therefore, the question arises how these organisms carry out rRNA elongation. The heterodimer formed by the POLR1F and RPA3 subunits plays a role in recruiting Pol I to the promoter region [29]. In the analysis, only orthologs were found for the POLR1F subunit, thus all intestinal protozoan parasites lacking identifiable RPA3 orthologs. RPA3 prevents DNA rehybridization during transcription and, in parallel, recruits and activates different proteins and complexes [30]. For proteins in which orthologs were not found, it does not necessarily mean that these are not present in these organisms. It may be that their identity is very low and a special search is required to find them, as was the case with RRN3 of E. histolytica [31]. Using the BLAST tool with default search parameters, RRN3 orthologs were found in the genome of histolytica and E. dispar. However, previously Srivastava et al. [31] reported a putative E. histolytica ortholog of RRN3 and of which there is a homologue in E. dispar. Differences in the components of the rRNA transcription machinery may be due to differences in the promoter, how the rDNA is organized in the genome of the parasites, and the number of copies, among other things. The way in which rDNA is organized in intestinal protozoan parasites varies between species, for G. duodenalis and C. parvum the classical conformation of repeats in tandem is maintained [32,33], but for E. histolytica, these genes are located on extrachromosomal circular DNA molecules [34]. Regarding the number of rRNA copies, G. duodenalis has approximately 86 copies [32], C. parvum 5 copies [33] and E. histolytica approximately 200 copies [35]. Furthermore, the rDNA promoters of G. duodenalis and C. parvum have not been identified, but that of E. histolytica has [36]. Interestingly, in G. duodenalis, the presence of binding sequences for TBP and TAF in the intergenic region of the rDNA were identified [8].
The interaction between RRN3 and the A43 subunit (POLR1F in humans) is essential for the recruitment of Pol I into the preinitiation complex in the rDNA promoter [37,38]. Important to this interaction is a conserved region of 22 amino acids in A43 [37] and a conserved serine patch on the surface of RRN3 which is formed by residues S101, S102, S109, S110, S145, S146, S185 and S186 [39]. This interaction is also regulated by phosphorylation of both proteins [39,40]. In the conducted research, both regions were found to be partially conserved in intestinal protozoan parasites, with 8 residues (out of 22) highly conserved in the A43 counterparts and the residues corresponding to S101, S102 and S185 in the RRN3 counterparts ( Figs. 2A, 3A). There are currently two drugs in cancer clinical trials that target the RNA polymerase I transcription (CX-5461 and CX-3543) and, in particular, CX-5461 does this by preventing the interaction between SL1 and Pol I in the rRNA promoter [41]. In this way, based on our analysis, molecules similar to CX-5461 could be designed against intestinal parasite rRNA transcription machineries as a new treatment strategy. Although it should also be considered that the subspecies and variants of the mentioned protozoa may have differences in sequence and structure. Given the differences between human and parasitic proteins, it may be possible to design molecules that specifically inhibit this machinery in parasites (and thus not affect the hosts), where the residues and regions that stand out in this work can be taken as a starting point.
Funding Open access funding provided by Uppsala University. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest
The authors declare no conflict of interest.
Ethics statement This material is the original work of the author and has not been previously published elsewhere. Bioinformatic work does not require an ethical permit and the goal is to reduce work with animals.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Fig. 3 RRN3 orthologs conserve some residues that constitute a serine patch and are known to interact with POLR1F. A Sequence alignments of the identified RRN3 orthologs. The degree of conservation is shown in color and arrowheads are placed to indicate the residues that could interact with POLR1F. Predicted structures of human RRN3 (B) and their orthologs in Saccharomyces cerevisiae (C), Giardia duodenalis (D), Cryptosporidium parvum (E), Entamoeba histolytica (F), and Entamoeba dispar (G). The place where the serine residues that interact with POLR1F are located are marked with red arrows (color figure online) ◂