Our recent progress in epigenetic research using the model ciliate, Tetrahymena thermophila

Epigenetic research focuses on heritable changes beyond the DNA sequence, which has led to a revolution in biological studies and benefits in many other fields. The well-known model ciliate, Tetrahymena thermophila offers a unique system for epigenetic studies due to its nuclear dimorphism and special mode of sexual reproduction (conjugation), as well as abundant genomic resources and genetic tools. In this paper, we summarize recent progress made by our research team and collaborators in understanding epigenetic mechanisms using Tetrahymena. This includes: (1) providing the first genome-wide base pair-resolution map of DNA N6-methyladenine (6mA) and revealed it as an integral part of the chromatin landscape; (2) dissecting the relative contribution of cis- and trans- elements to nucleosome distribution by exploring the unique nuclear dimorphism of Tetrahymena; (3) demonstrating the epigenetic controls of RNAi-dependent Polycomb repression pathways on transposable elements, and (4) identifying a new histone monomethyltransferase, TXR1 (Tetrahymena Trithorax 1), that facilitates replication elongation through its substrate histone H3 lysine 27 monomethylation (H3K27me1).


Introduction
Eukaryotic genomes have multiple layers. Beyond genetic basis, DNA methylation, noncoding RNA and histone modifications/variants serve as a platform of epigenetic signals which regulate the gene expression leading to profound impact on cell development (Badeaux and Shi 2013;Bird 1992;Jeffares et al. 1998;Poole et al. 1998;Reisenauer et al. 1999;Zacharias 1993).
Tetrahymena thermophila is a eukaryotic unicellular model organism (see review in Orias et al. 2011). Having long been a genetic model system of choice, abundant genomic resources and tools have expanded Tetrahymena's utility for epigenetic studies. Tetrahymena contains one diploid germline micronucleus (MIC) and one polyploid somatic macronucleus (MAC) within the same cell compartment ( Fig. 1a) (Karrer 2012). MAC is transcriptionally active while MIC is inert during vegetative growth. During the sexual reproduction stage-(conjugation), the parental MAC degrades and a new MAC develops from the zygotic nucleus. This nuclear dimorphism features the distinct epigenetic landscapes established on the isogenic background, which makes Tetrahymena an ideal model for Edited by Jiamei Li.
Ting Cheng, Yuanyuan Wang, Jie Huang, Xiao Chen, and Xiaolu Zhao contributed equally to this article. epigenetic studies. For example, histone variants and chromatin modifications were identified and were linked to the different transcription states by comparing the transcriptionally silent MIC with the active MAC during vegetative reproduction (Chalker et al. 2013). Among these, the histone variant H2A.Z was found to be associated with active gene transcription activity in the Tetrahymena MAC genome (Allis et al. 1980;Chalker et al. 2013;Hayashi et al. 1984;Henikoff and Smith 2015).
Our group is dedicated to understanding epigenetic regulation mechanisms using Tetrahymena thermophila as a model system (Fig. 1b-e). A summary of our recent progress is presented here.
The genomic distribution and determinants of DNA N 6 -methyladenine (6mA) in Tetrahymena N 6 -methyladenine (6mA) is reported to be involved in many biological processes in prokaryotes including the restriction modification system (Harrison and Karrer 1985). By contrast, 6mA in eukaryotes was largely neglected due to its low abundance in most genomes until recent studies that reported the existence of 6mA in multiple eukaryotes including Chlamydomonas reinhardtii (Fu et al. 2015), Caenorhabditis elegans (Greer et al. 2015), Drosophila melanogaster (Zhang et al. 2015), mouse embryonic stem cells (Schiffers et al. 2017;Wu et al. 2016), and humans (Xiao et al. 2018;Xie et al. 2018). Tetrahymena thermophila was among the first eukaryotes reported to contain 6mA (Gorovsky et al. 1973). 6mA is the only detectable DNA methylation in Tetrahymena and its 6mA level (0.6%-0.8%) is at least one order of magnitude higher than that in most of higher eukaryotes, thus rendering it an ideal system to study 6mA (Gorovsky et al. 1973;Wang et al. 2017a, b). Moreover, 6mA is present in the transcriptionally active MAC but absent in the transcriptionally silent MIC (Fig. 2a) (Gorovsky et al. 1973;Wang et al. 2017a), providing a near-perfect internal parallel control to more rigorously interrogate the impact of 6mA. However, lack of genome-wide distribution information drastically hinders further investigation of the functions and regulation mechanisms of 6mA.  Wang et al. 2017a;Xiong et al. 2016;Gao et al. 2013;Zhao et al. 2019 We generated the first genome-wide, base pair-resolution map of 6mA in Tetrahymena, by employing single-molecule real-time (SMRT) sequencing ( Fig. 2b) (Wang et al. 2017a) 6mA preferentially located in the consensus sequence of 5′-AT-3′ (Fig. 2c), indicating that DNA sequence per se provides necessary information to locate 6mA. However, in a genome as AT rich as Tetrahymena, only a small percentage of adenines (0.66%) are methylated, suggesting that other factors beyond sequence also contribute to 6mA distribution. Further analysis revealed that 6mA accumulated in linker DNA regions between well-positioned nucleosomes (Fig. 2d, g), but the direction of the causal effect remains to be determined . Moreover, nucleosomes adjacent to 6mA were enriched with the conserved histone variant H2A.Z (Fig. 2e) which is dynamically regulated by ATP-dependent chromatin remodelers (Havas et al. 2001;Jaskelioff et al. 2000). Given that 6mA accumulated right downstream of the transcription start sites (TSS of RNA polymerase II (Pol II) transcribed genes) (Fig. 2f), we posit that 6mA was recruited to the promoter region by a similar mechanism to H2A.Z. It was previously reported that 6mA is either positively or negatively correlated with transcription (Fu et al. 2015;Liang et al. 2018;Wang et al. 2018;Wu et al. 2016;Zhou et al. 2018), and usually enriched in specific pathways in metazoa such as in mouse brain (Yao et al. 2017) and in human cancer cells (Xiao et al. 2018;Xie et al. 2018). In Tetrahymena, however, the correlation between gene expression level and the level/amount of 6mA was rather weak (Fig. 2h). Given that most Tetrahymena genes (> 90%) are decorated with 6mA, it is not surprising that 6mA in unicellular eukaryotes may have not evolved to regulate specific pathways. Collectively, the evidence suggests that distribution of 6mA in Tetrahymena is determined by both DNA sequence and the chromatin environment, and its special pattern is probably achieved by specific methyltransferase(s) deposition.
To better decipher the function and biological significance of 6mA, it is essential to perturb 6mA levels in vivo by manipulating its catalyzing enzymes. Our recent work identified a candidate 6mA methyltransferase (unpublished  (Wang et al. 2017a). a 6mA is present in the transcriptionally active MAC but absent in the transcriptionally silent MIC). Note the absence of 6mA signal in the MIC (white arrowheads). b Genome-wide distribution of 6mA and its motif, as well as nucleosomes and H2A.Z signals on ten longest scaffolds. c SMRT sequencing identified 5′-AT-3′ as the most representative motif. d Distribution profiles of nucleosome (blue), H2A.Z (green) and 6mA (pink) around the transcription start site (TSS). e Linker DNA with 6mA is enriched with H2A.Z. f 6mA preferentially distributed on RNA polymerase II transcribed genes. g Relative distance of 6mA to nucleosome dyad. h Correlation matrix of 6mA and gene expression level data), the deletion of which impaired cell growth and development. Systematic investigations are underway to find out how this methyltransferase regulates 6mA level and how the change in 6mA level affects cell fitness. It will also be interesting to look into other factors involved in the regulation of 6mA such as demethylase(s), reader(s), and chromatin modifications and how 6mA interacts with them to jointly shape the chromatin environment.

Epigenetic regulation of nucleosome distribution
How cis-and trans-determinants coordinate with each other and shape nucleosome distribution has long been debated. For example, an early study focusing on in vitro reconstituted nucleosomes showed that nucleosome distribution might be dependent largely on intrinsic DNA sequence rather than trans-acting factors (Sekinger et al. 2005). However, comprehensive in vivo mapping of nucleosomes in budding yeast, using high-throughput sequencing following micrococcal nuclease (MNase) digestion, showed that nucleosome distribution was affected by its chromatin contexts in which gene regulatory elements function on a genomic scale (Lee et al. 2007).
Tetrahymena offers an ideal system to resolve the relative contributions of cis-and trans-determinants as it benefits from nuclear dimorphism whereby the MAC and MIC share almost identical genetic sequences but possess different chromatin landscapes. A previous study, which reported the genome-wide nucleosome maps in the macronuclear genome of Tetrahymena thermophila, revealed the presence of stereotypical nucleosome pattern near TSSs, which is guided by DNA sequence and trans-acting factors (Beh et al. 2015). To acquire the nucleosome landscape in the MAC and MIC, we first optimized the nuclei isolation, MNase digestion and mono-nucleosome purification procedures in Tetrahymena (Fig. 3a). A detailed analysis of MNase-seq data revealed that the nucleosome positioning (e.g., translational positioning, the exact 147 bp of genomic DNA occupied by a subset of nucleosomes) differ significantly between Tetrahymena MAC and MIC (Xiong et al. 2016). Well-positioned and precisely phased nucleosomes are abundant in the transcriptionally active MAC, but are diminished in the transcriptionally inert MIC (Fig. 3b-c), suggesting that transcription-associated transdeterminants promote nucleosome positioning. Nonetheless, the nucleosome occupancy (the probability of a DNA sequence being associated with any nucleosome in a population) is similar between MAC and MIC, as well as with in vitro reconstitution and predictions based upon DNA sequence features (Fig. 3d), revealing strong contributions from cis-determinants (DNA sequence features). It should be noted that well-positioned nucleosomes in the MAC are often matched with GC content oscillations while stereotypical nucleosome arrays in the MIC, although diminished, are still detectable (Fig. 3e), suggesting the coordinate contribution of cis-and trans-determinants. Recently, N 6 -methyladenine (6mA) DNA methylation, as one of the potential cis-determinants, has been reported to play a role in nucleosome positioning (Fu et al. 2015). The result from our work showed that, in Tetrahymena, linker DNA regions with 6mA are usually flanked by well-positioned nucleosomes (Fig. 2e, 3f) (Wang et al. 2017a). By in vitro nucleosome assembly in Tetrahymena, a recent study revealed that the distribution pattern of nucleosomes could be recapitulated in native genomic DNA but not in DNA without 6mA . A study of another ciliate, Oxytricha trifallax, demonstrated that nucleosome fuzziness increased after loss of 6mA in the linker DNA (Beh et al. 2019). Overall, our studies and those of others have revealed the intrinsic repulsion between 6mA and nucleosomes which contributes to determining the nucleosome position. However, the causal functions of related factors and the precise mechanisms of how they coordinate with each other remain to be elucidated.

RNAi-dependent Polycomb repression controls transposable elements in Tetrahymena
Transposable elements (TEs) are DNA sequences which can induce mutations and affect the organism's genome structure by changing their positions within a genome (Bourque et al. 2018). They are drivers of host genome evolution as well as threats to host genome integrity (Bennetzen and Wang 2014;Freeling et al. 2015;Lynch 2007). Therefore, different hosts have developed a wide variety of defense mechanisms to control TE expression, including small RNA, sequencespecific repressors, DNA, and chromatin modification pathways (Berrens et al. 2017;Ecco et al. 2017;Goodier 2016;Liu et al. 2017;Molaro and Malik 2016). In flies and mammals, Piwi-interacting RNAs (piRNAs), which are produced from piRNA clusters genomic loci, mediate TE silencing (Guzzardo et al. 2013;Siomi et al. 2011). In nematodes, Argonaute-small RNA complexes are used as transgenerational binary signals and program the ON/OFF expression state for TEs (Seth et al. 2013). Plants and yeasts identify and repress TEs by recognizing their intrinsic features (Lee et al. 2013;Slotkin et al. 2009). Yet little is known about how TEs are controlled in the single cell organism Tetrahymena thermophila.
Under unfavorable environmental conditions, Tetrahymena cells of two different mating types pair with each other and initiate conjugation. The zygotic MIC differentiates into the new MAC, accompanied by a large quantity of programmed genome rearrangement events (Chalker et al. 2013;Noto et al. 2015;Xu et al. 2019). Thousands of MIC-specific internally eliminated sequences (IES), where most TEs and their remnants are nested, are removed, leaving behind MAC-destined sequences (MDS) (Fig. 4a) (Coyne et al. 2012;Fillingham et al. 2004;Wuitschick et al. 2002). Previous studies have revealed an RNAi and Polycomb group (PcG) proteins-related pathway (Fig. 4b) (Noto and Mochizuki 2017). First, this is initiated by the RNA polymerase II (Pol II)-catalyzed bi-directional transcription of long noncoding RNA (ncRNA) in the meiotic MIC (Chalker and Yao 2001;Mochizuki and Gorovsky 2004). Second, Tetrahymena scnRNAs (~ 26-32 nt siRNAs), produced from ncRNA in the MIC by the Dicer protein Dcl1p (Malone et al. 2005;Mochizuki and Gorovsky 2005), are bound to the Argonaute protein Twi1p (Noto et al. 2010;Woehrer et al. 2015). Third, H3K27 and H3K9 methylation are subsequently deposited by the RNAi machinery and EZL1 (Polycomb group protein, an E(z) homologue in Tetrahymena) complex (Liu et al. 2004(Liu et al. , 2007. Lastly, these conserved histone modifications are recognized by PDD1 (chromodomain-containing effector, analogous to HP1), which plays an important role in forming heterochromatic structures containing the IES sequences (Coyne et al. 1999;Madireddi et al. 1996;Schwope and Chalker 2014). It was previously reported that many Tetrahymena IES sequences are from TEs (Fillingham et al. 2004), but the mechanism of how TEs are propagated and controlled in Tetrahymena remains elusive.
In our recent work, we and our collaborators analyzed the polyadenylated RNA transcripts from ~ 10,000 IESs in knockout strains of three key players (DCL1, EZL1 and PDD1) in the RNAi-dependent Polycomb repression pathway (Zhao et al. 2019). We found a significantly higher level of polyadenylated RNA in the mutants than that in the wild type (Fig. 4c). These IES-specific polyadenylated transcripts (many containing TE-related sequences) have mRNA features, such as poly-A tailing, strand specificity, splice sites and a similar codon usage pattern with that of MDS (Fig. 4d), demonstrating the capacity of protein coding, and probably TE coding. As evidence, we detected a dramatic increase in germline mobilization of a recently active TE in the mutants, this mobilization being achieved via a "cut-andpaste" manner, i.e., TE is excised from the genomic locus, leaving an AT footprint in the original locus (Fig. 4e). Both mRNA and ncRNA (scnRNA) were detected in the same IES locus, but mRNA was only produced in the developing MAC where key components in Pol II-driven mRNA biogenesis are present. More intriguingly, mRNA levels of TEcontaining IESs are positively correlated with late-scnRNA, while are negatively correlated with early-scnRNA (Fig. 4f, g) . Together, these results indicate that the balance between ncRNA and mRNA production was probably mediated by RNAi-dependent Polycomb repression and co-transcriptional processing, which is essential for TE control. Considering the conservation of key components in this RNAi-dependent Polycomb repression pathway and the wide distribution of similar pathways throughout eukaryotes (Fig. 4h), we posit that interplay between RNAi and Polycomb repression may be a universal way for TE silencing and transcriptional repression of developmental genes.
Previous studies have revealed the recent transposition of TEs in Tetrahymena based on TE insertion polymorphisms in some IES (Huvos 2004) and purifying selection in predicted coding sequences of some potentially active TEs (Fillingham et al. 2004;Gershan and Karrer 2000;Hamilton et al. 2006Hamilton et al. , 2016. Our results demonstrate that Polycomb repression defects result in not only the activation of TE transcription but also the germline mobilization of TE. Future studies will focus on the mechanism underlying the molecular nature of TEs, such as where the mobilized TEs reassemble into the genome and which factors determine their insertion sites.
We and our collaborators identified the sole homolog to Arabidopsis ATXR5 and ATXR6 in Tetrahymena, i.e., Tetrahymena Trithorax-Related 1 (TXR1) (Fig. 5a)  . TXR1 is present in the transcriptionally active MAC but is absent in the silent MIC and can specifically deposit one methyl group to H3K27 (Fig. 5b)  . Knockout of TXR1 resulted in severe replication stress, manifested by slower growth with a prolonged S-phase (Fig. 5c), excessive accumulation of single-stranded DNA (ssDNA) near replication origins (Fig. 5d, e), production of aberrant replication intermediates (RI) such as single-strand breaks (SSBs) and double-strand breaks (DSBs) (Fig. 5f), and the consequent activation of robust DNA damage response (DDR) including key players involved in ssDNA binding/ sensing and DNA repair (Fig. 5g)  . It should be noted that all the above-mentioned phenotypes occurred only in ΔTXR1 cells, but not in ΔEZL2 cells, in which one of the Tetrahymena E(z) homologues responsible for H3K27me2 and H3K27me3 was deleted, emphasizing the specific role of TXR1-dependent H3K27me1. Furthermore, a point mutation at histone H3 lysine 27 could partially phenocopy TXR1 knockout, corroborating H3K27me1 as a key player in DNA replication. A more recent study discovered that K27Q mutation in the replacement variant H3.3 further aggravated the replication stress of K27Q mutation in the canonical H3, supporting the assertion that both H3 and H3.3 are physiologically relevant substrates of TXR1 (Zhao et al. 2016). This is, however, distinct from the strong canonical H3 preference in Arabidopsis homologues ATXR5 and ATXR6 (Jacob et al. 2014). Additionally, TXR1 could functionally interact with the replication key player PCNA (proliferating cell nuclear antigen) via its PIP (PCNA-interacting protein) motif, which is potentially important for its recruitment to active replication forks (Raynaud et al. 2006). These findings support the presence of a conserved pathway through which TXR1 and H3K27me1 facilitate replication elongation at newly initiated origins.
Our work revealed a new scenario that deletion of a histone methyltransferase could severely impair the replication elongation instead of replication license, but the regulatory mechanism of lysine monomethylation in DNA methylation has yet to be elucidated. Focusing on the modulating mechanism of TXR1, we recently discovered that, in addition to its important role in replication, TXR1 may also be  ). a Domain structure comparison of Tetrahymena TXR1 with Arabidopsis ATXR5 and ATXR6. b TXR1 is required for H3K27me1 and EZL2 for H3K27me2 and H3K27me3. c ΔTXR1 cells have significantly increased doubling time (t 2 ). d Accu-mulation of single-strand DNA (ssDNA) in ΔTXR1 cells. Note the absence of ssDNA signal in the MIC (white arrowheads). e ssDNA is specifically accumulated in intergenic regions, including replication origins. f Higher level of single-strand breaks (SSB) and doublestrand breaks (DSB) in ΔTXR1 cells. g The activation of DNA damage response (DDR) in ΔTXR1 cells involved in transcription regulation (unpublished data). This is consistent with the finding that H3K27me1 in Arabidopsis exhibits two distinct distribution patterns in constitutive heterochromatin and genic regions, suggesting its role in transcription regulation (Roudier et al. 2011). These bring us to one of the most basic questions in biology: with the same DNA template, how can cells solve the conflicts of replication and transcription? (García-Muse and Aguilera 2016; Hamperl et al. 2017;Lin and Pasero 2017). It will be interesting to dissect roles played by TXR1 in replication and transcription, and how it contributes to coordinating conflicts between these two essential processes.

Summary and perspectives
Epigenetic studies have kept yielding cutting-edge knowledge in different eukaryotic models including Tetrahymena. Benefiting from the unique biological properties such as nuclear dimorphism, Tetrahymena has become a powerful model for epigenetic studies. In this paper, we review recent progress made by our group using Tetrahymena to elucidate the distribution and mechanism of DNA 6mA methylation, nucleosome occupancy driven by both cis-and trans-determinants, transposable elements regulated by RNAi-dependent Polycomb repression and H3K27me1-related regulation of DNA replication.
To improve the understanding of epigenetics using Tetrahymena, future studies will focus on: (1) the regulation mechanisms of 6mA together with its co-factors; (2) the relationship of 6mA and nucleosomes in the chromatin environment; (3) the molecular nature of TEs; and (4) roles of TXR1 in replication and transcription.