Mapping exosome-substrate interactions in vivo by UV cross-linking

The RNA exosome complex functions in both the accurate processing and rapid degradation of many classes of RNA in eukaryotes and Archaea. Functional and structural analyses indicate that RNA can either be threaded through the central channel of the exosome or more directly access the active sites of the ribonucleases Rrp44 and Rrp6, but in most cases, it remains unclear how many substrates follow each pathway in vivo. Here we describe the method for using an UV cross-linking technique termed CRAC to generate stringent, transcriptome-wide mapping of exosome–substrate interaction sites in vivo and at base-pair resolution.


Introduction
We present a protocol for the identification of RNA interaction sites for the exosome, using UV cross-linking and analysis of cDNA (CRAC) [1,2]. A number of related protocols for the identification of sites of RNA-protein interaction have been reported, including HITS-CLIP, CLIP-Seq, iCLIP, eCLIP, and others [3][4][5][6]. These all exploit protein immunoprecipitation to isolate protein-RNA complexes. CRAC is distinguished by the inclusion of tandem affinity purification and denaturing purification, allowing greater stringency in the recovery of authentic RNA-protein interaction sites.
To allow CRAC analyses, strains are created that express a "bait" protein with a tripartite tag. This generally consists of His6, followed by a TEV-protease cleavage site, then two copies of the z-domain from Protein A (HTP). The tag is inserted at the C terminus of the endogenous gene within the chromosome. The fusion construct is the only version of the protein expressed and this is under the control of the endogenous promoter. Several alternative tags have been successfully used, including a version with N-terminal fusion to a tag consisting of 3Â FLAG-PreSission protease (PP) cleavage site-His6 (FPH) [7]. This is a smaller construct and is suitable for use on proteins with structures that are incompatible with C-terminal tagging. An additional variant is the insertion of a PP site into a protein that is also HTP tagged. This allows the separation of different domains of multidomain proteins. Importantly, the intact protein is cross-linked in the living cell, with domain separation in vitro. This has been successfully applied to the exosome subunit Rrp44/Dis3 to specifically identify binding sites for the PIN endonuclease domain [8].
Briefly, during standard CRAC analyses, covalently linked protein-exosome complexes are generated in vivo by irradiation with UV-C (254 nm). This generates RNA radicals that rapidly react with proteins in direct contact with the affected nucleotide (zero length cross-linking). The cells are then lysed and complexes with the bait protein are purified using an IgG column. Protein-RNA complexes are specifically eluted by TEV cleavage of the fusion protein and cross-linked RNAs trimmed using RNase A/T1, leaving a protected "footprint" of the protein binding site on the RNA. Trimmed complexes are denatured using 6 M Guanidinium, immobilized on Ni-NTA affinity resin and washed under denaturing conditions to dissociate copurifying proteins and complexes. The subsequent enzymatic steps are all performed on-column, during which RNA 3 0 and 5 0 ends are prepared, labeled with 32 P (to allow RNA-protein complexes to be followed during gel separation) and linkers ligated. Note, however, that alternatives to using 32 P labeling have been reported (e.g., [6]). The linker-ligated, RNA-protein complexes are eluted from the Ni-NTA resin and size selected on a denaturing SDS-PAGE gel. Following elution, the bound RNA is released by degradation of the bait protein using treatment with Proteinase K. The recovered RNA fragments are identified by reverse transcription, PCR amplification and sequencing using an Illumina platform.
Relative to CLIP-related protocols, CRAC offers the advantages of stringent purification, that substantially reduces background, and on-bead linker ligation that simplifies separation of reaction constituents during successive enzymatic steps. It also avoids the necessity to generate high-affinity antibodies needed for immunoprecipitation. Potential disadvantages are that, despite their ubiquitous use in yeast studies, tagged constructs may not be fully functional. This can be partially mitigated by confirming the ability of the tagged protein to support normal cell growth and/or RNA processing, or by comparing the behavior of N-and C-terminal tagged constructs. Additionally, because linkers are ligated to the protein-RNA complex, a possible disadvantage is that UV-cross-linking of the RNA at, or near, the 5 0 or 3 0 end it may sterically hinder on-column (de)phosphorylation and/or linker ligation. With these caveats, CRAC has been successfully applied to >50 proteins in budding yeast, and in other systems ranging from pathogenic bacteria to viral infected mouse cells [7,9].

Materials
All steps should be performed wearing disposable gloves and materials should be free of DNase and RNase. Prior to each CRAC experiment, pipettes should be cleaned with DNAZap (Thermo-Fisher; AM9890) to avoid DNA contamination at the PCR step, followed by RNaseZAP (ThermoFisher; AM9890) treatment, and rinsed with deionized water. All the buffers should be prepared with deionized water and free of RNases; however, DEPC treatment is not normally essential. To minimize buffer contamination, adjust the pH by taking small aliquots for measurements. Filter-sterilize stock solutions following preparation, and store at 4 C. Where required, add β-mercaptoethanol and protease inhibitors to the buffers shortly before use. Wash buffers should be prepared immediately before starting the CRAC experiment.

Yeast Strains
Purification of the RNA-protein complex requires that the protein of interest is tagged, generally with the HTP (His Â 6-TEV protease cleavage site-Protein A Â 2) tandem affinity tag [1,2]. In order to study RNA targets of the exosome, strains were prepared carrying tagged, intact Rrp44 and versions that lacked exonuclease or endonuclease activity, expressed from the chromosomal RRP44 locus or from a single copy plasmid in rrp44Δ strains. Both were studied by CRAC to confirm that recovered RNAs are similar [10]. Then, strains expressing mutant and wild-type versions of Rrp44 from a single copy plasmid were used for CRAC.
We also tagged genomic copies of the nuclear exosome exonuclease Rrp6, the exosome core subunits Csl4 (exosome cap) and Rrp41 (exosome channel), and both wild-type and mutated components of the TRAMP complex (exosome cofactors) Mtr4, Mtr4arch, Air1, Air2, Trf4 and Trf5. The untransformed, parental yeast strain (BY4741) was used as a negative control throughout the analyses.

Growth Media
Tryptophan absorbs 254 nm light, potentially interfering with cross-linking, and should be omitted from growth media. We use Yeast Nitrogen Base (YNB, Formedium) supplemented with 2% glucose and amino acids without tryptophan, unless other amino acids need to be omitted for plasmid maintenance.

Buffers and Solutions
To avoid potential contamination, check pH of buffers by pipetting a small volume onto pH paper. 12. RNace-IT (Agilent) RNase A+T1, working stock prepared by diluting 1:100 in water, store long term at À20 C.
13. ATP, 100 mM and 10 mM solutions in water, aliquot and store at À20 C, avoid repeated freezing and thawing.
15. Proteinase K (Roche Applied Science), prepare 20 mg/ml stock in deionized water, aliquot and store at À20 C.

Oligonucleotides
All oligonucleotides were supplied by Integrated DNA Technologies (IDT) and are listed in

Methods
Appropriate negative controls and experimental replicates are required to determine the background signal and true positive binding sites. We routinely use the (untagged) yeast parental strain as a negative control, performing a minimum of two biological and technical replicates for each sample. It is commonly observed that technical replicates (even samples from the same culture) processed in two independent CRAC experiments show more differences than two biological replicates (independent cultures) processed together. 3. Take 5 μl aliquots of the cleared lysate ("crude lysate") for troubleshooting the purification, and store at À20 C.

Cell Culture and UV Cross-Linking
4. Mix remaining lysates with IgG Sepharose beads and rotate for a minimum of 2 h at 4 C. This step can be extended to overnight.
5. Collect the beads by pulsing to 1000 rpm at 4 C and remove most of the supernatant. 6. Take a 5 μl aliquot of supernatant for troubleshooting ("IgG supernatant") and store at À20 C.
7. Wash beads twice with 10 ml of TN1000 and twice with 10 ml of TN150. Buffers used in this and all subsequent steps should not contain protease inhibitors. For each wash, gently agitate at 4 C for 5 min.
8. Gently resuspend gently the beads in 600 μl of TN150 and transfer to a 1.5 ml tube.

TEV Cleavage
1. Add 20-30 units of TEV protease to the beads and mix by inverting tube.
2. Incubate at 18 C for 2 h with shaking (make sure beads remain in suspension).
3. Pass the mixture through a microcentrifuge column (SnapCap) to remove the beads. Spin column (1000 rpm) to collect all the eluate in a 1.5 ml tube. Do not put eluate on ice.

RNase Digestion and Binding to Ni-NTA Resin
The concentration of RNaceIT used to footprint (trim) RNAs on protein of interest is determined empirically. Ideally, the reads will be long enough to map uniquely (~17 nt) but short enough to give good resolution of the protein-binding site. We aim to generate an average RNA length of~30 nt. Commercially available RNase stock is highly concentrated, so to minimize discrepancies between experiments it is practical to prepare a working stock of RNases (we use 1:100 dilution in water), store it at 4 C and use for all subsequent experiments. 10. Western blot with "Crude Lysate," "IgG supernatant," and "Tev Eluate" controls can be carried out in parallel of the steps below (Fig. 1a). If needed, additional controls can be prepared by taking aliquots of the experiments in other steps, such as pipetting out 30 μl of IgG beads before addition of TEV protease and 30 μl after TEV treatment to control binding and cleavage efficiency.

Linker Ligation at both Ends of RNAs on Beads
Enzymatic reactions are performed on beads (Ni-NTA resin) contained in Snap cap columns. A metal rack for 1.5 ml microcentrifuge tubes greatly simplifies working with these columns by helping them being vertical and cold when placed in the ice bucket. To prevent contamination caused by buffer dripping from the column, it is very important to first open the column lid, then open the press-on bottom stopper, before transferring the column between tubes. It is also essential to close the bottom of columns before closing the cap. All washes are performed under gravity flow. However, it happens that some batches of columns do not drip, or drip really slowly; in that case centrifugation might be necessary.
Guanidine contained in Wash Buffer I inhibits enzymatic reactions and must be removed completely before each enzymatic step. For efficient removal of guanidine traces, Wash Buffer 1 should be pipetted directly onto the beads at the bottom of the column. Then, 1Â PNK buffer should be pipetted so that it rinses the side of the columns. 2. Incubate at 37 C for 30 min.
3. Wash the resin once with 400 μl of Wash Buffer I and three times with 400 μl of 1Â PNK buffer.

On Bead Ligation of 3 0 miRCat-33 Linker
The 3 0 -linker is a DNA oligonucleotide that has a blocked 3 0 end to prevent self-ligation and a 5 0 -end that is preactivated by adenylation (AppN. . .). T4 RNA ligase usually activates its substrate by preadenylation using ATP. Employing a preadenylated linker allows the reactions to be performed in the absence of ATP. This decreases the risk of circularizing any remaining 5 0 -phosphorylated RNA; a side reaction that would otherwise be expected. Moreover, addition of ATP in the mix could inhibit the reaction, as the active site of T4 RNA ligase would get adenylated and could not transfer the adenosine to any substrate as the linker is already adenylated. 3. Wash the resin four times with 400 μl of Wash Buffer I and three times with 1Â PNK buffer. Additional washes can be done to remove most free radioactive ATP and decrease the chance of radioactive contamination at later stages. Perform the washes until the radioactivity of the flow through measured with a manual Geiger counter falls to approximately 10-15 cps.

On-Column Ligation of the 5 0 Adapters
These linkers have blocked 5 0 end to prevent self-concatenation. Moreover, they contain barcodes allowing distinction of samples in case of multiplexing and random nucleotides to distinguish molecules with same 5 0 -and 3 0 -end (allowing removal of PCR duplicates). It is crucial to use different barcodes for each sample. 5. Expose the membrane (wrapped in cling film or protected by a transparent plastic film) to a high-sensitivity X-ray film at À80 C. If samples are highly radioactive, a 30-60 min exposure time should be enough. Overnight exposure is often required for samples with weaker radioactive signal. Ensure that a chemiluminescent marker is included to realign membrane and film after developing.
6. Develop the X-ray film and align it to the membrane using the chemiluminescent rulers. Cut out the smear corresponding to the size of the protein-RNA complex for all the samples. Cut at the same place in the negative control lane. Use clean scalpel for each sample. The first incision can be made in the middle of the band corresponding to the protein of interest plus the smear above to get most cross-link species. Once membrane fragments have been excised, they can be stored overnight (or longer) at À20 C or À80 C. An example of a radiolabeled blot for Rrp44-HTP is shown in Fig. 1b 2. Add 50 μl of 3 M sodium acetate (pH 5.2) and 500 μl of phenol-chloroform-isoamyl alcohol (25:24:1). Vortex and centrifuge for 5 min at room temperature.
3. Transfer the aqueous phase to clean microcentrifuge tube and add 1 ml of ice cold absolute ethanol and 20 μg of GlycoBlue.
Incubate at À80 C for 30 min and centrifuge at 16,000 Â g and 4 C for 30 min. Wash the pellet with 500 μl of ice cold 70% ethanol and centrifuge for 20 min. Aspirate the supernatant and air dry.

Reverse Transcription of Purified RNA
To increase the efficiency of this step, prepare fresh dNTP dilution prior RT or aliquot and store at À20 C to avoid multiple thawing.

PCR Amplification of cDNA Libraries
The number of cycles used to prepare cDNA libraries should be optimized for the template and limited to minimize artifacts due to overamplification, that is, the frequency of PCR duplicates. Generally, 21-22 cycles have been sufficient to produce complex libraries from cDNA generated from Exosome subunit-bound RNA, however we typically vary between 19 and 24 cycles and will increase number of independent PCR reactions (up to 5) for samples with low abundance of cDNA.
1. To 3 μl of cDNA template, add 47 μl of PCR master mix containing: 5 μl of 10Â LA Taq buffer, 1 μl of 10 μM P5 Solexa primer, 1 μl of 10 μM pE_miRCat reverse primer, 5 μl of (fresh) 10 mM dNTPs, 0.5 μl of LA TaKaRa Taq polymerase, and 37.5 μl of nuclease-free water. We prepare three or more PCR reactions per sample to increase the complexity of our libraries.
2. The reaction is run with the following cycling conditions:

Size Selection of cDNA Libraries on Gel
At this stage, it is possible to adjust library size distribution and enrich the DNA library for cDNA of a certain length before sequencing. This size selection is dependent on the length of sequencing that will be used, the protein, and the biological questions CRAC is supposed to answer. If 50 bp sequencing length is planned, it is not useful to recover extra-long cDNAs; moreover longer sequences will decrease resolution of protein binding sites.
On the other hand, for most proteins, it is preferable to avoid overpopulation of the library by short sequences (shorter than 20 nt), which are difficult to map confidently. In some case, these general guidelines have to be adjusted for biological relevance: for instance, cDNA libraries from Rrp44-HTP are cut just above 130 nt to also recover short sequences enriched in cDNAs corresponding to RNAs bypassing the long exosome channel and directly accessing Rrp44.
1. Prepare a 3% Metaphor agarose gel using 1Â TBE buffer (with 1:1000 SYBR Safe) and store it at 4 C for a minimum of 30 min. Preparing a Metaphor gel takes longer than preparing a standard agarose gel, and it is common for the agarose to form "lumps" which are hard to dissolve. One option is to let the Metaphor powder to soak for 30 min in 1Â TBE before agitating it on a magnetic stirrer hot plate. A second option is to microwave the mixture before agitating it on a magnetic stirrer hot plate. The gel can be prepared the day before and stored at 4 C wrapped in cling film. 11. Dry the columns by spinning at 16,000 Â g for 2 min at room temperature. Transfer the columns to clean 1.5 ml microcentrifuge tubes.
12. Add 20 μl MilliQ water on membrane and let stand for 2-5 min. Elute the purified cDNA by spinning at 16,000 Â g for 1 min at room temperature.
13. Quantify the cDNA library using a Qubit high sensitivity DNA assay kit and fluorometer and store the libraries at À20 C.

Sequencing
The samples can be submitted for single end sequencing on Illumina MiSeq, HiSeq, MiniSeq, or NextSeq platforms. The read depth required for sufficient coverage of binding sites will depend on the number of RBP binding sites and complexity of the library generated (i.e., number of PCR duplicates). The exosome binds a huge diversity of targets. Since the highest proportion of the reads are aligned to ribosomal RNA, it is necessary to sequence deeply enough to detect less frequently bound targets. We generally aim to generate 17-35 nt trimmed RNA fragments that contain enough sequence information for a unique alignment, and that are short enough to ensure the protein interaction site is contained within the sequenced portion. We routinely use Illumina 50 bp single end sequencing, which is long enough to sequence into the 3 0 adapter sequence.

Analysis of CRAC Datasets
Analysis of sequences obtained from exosome subunits CRAC experiments was done using custom scripts and software packages. The pyCRAC [11] software, a suite of python scripts which can be used to analyze sequencing data obtained from protein-RNA UV cross-linking protocols, includes most of the necessary tools. Here, we will describe the main steps of processing and the most commonly used modules of the pyCRAC software for our analysis.

Preprocessing
Step: Demultiplexing, Quality Filtering, Trimming of Adapters The 5 0 adapters mentioned above contain barcodes allowing multiplexing of several samples in a sequencing lane. In addition to barcodes, 5 0 adapters contain three random nucleotides allowing removal of PCR duplicates. This allows detection of reads with the same start and end positions that arise from PCR duplication of a single cDNA rather than independent linker ligation events.
For multiplexed samples, we first split the output file from sequencing by barcodes, using pyCRAC package. fastq where barcodes.list is a tab-delimited text file containing the list of barcodes used in the experiment with corresponding names of samples, used in output files names. Here is an example of how the file should appear: The random nucleotides will be stripped in this step and will be placed into the header of each sequence of the ouput fastq files. Later steps can make use of this information in order to collapse PCR duplicates (see Subheading 3.6.2). It is important to note that the standard version of this script requires the adapters to be designed as shown in Table 1.
Sequencing data are then quality filtered and adapters trimmed using Flexbar (https://github.com/seqan/flexbar) [12] with parameters -at 1 -ao 4. where input.fastq and flexbar.fastq are the input and output fastq files names respectively. When useful, for instance when proportion of 3 0 oligoadenylated reads must be calculated (see Subheading 3.6.6), "-g" parameter can be added to tag reads with 3 0 adapter. Then "grep" can be used to retain only these reads.

Collapsing
Then, sequences can be collapsed, thanks to the random nucleotides present in 5 0 linker as mentioned in Subheading 3.6.1, using pyFastqDuplicateRemover.py script from pyCRAC software, so that reads having identical ends and identical random nucleotides in the 5 0 barcode are counted as one. This step can be skipped if the analysis aims to study ribosomal RNA. Indeed, with the linkers mentioned above, collapsing allows to keep only 64 alternatives sequences (3 random nucleotides ¼ 4 3 possibilities); since the exosome strongly binds to pre-rRNA, collapsing would lead to flattening exosome binding peaks across pre-RNA. However, this step is essential for study of exosome binding on RNA polymerase II transcripts.

Alignment
Reads are then aligned to the Saccharomyces cerevisiae genome (SGD v64) using Novoalign (Novocraft) with genome annotation from Ensembl (EF4.74) [13], supplemented with noncoding sequences as described [14], with parameters -r Random. where Saccharomyces_cerevisiae.EF4.74.novoindex is the genomespecific index file generated by novoindex, and flexbar_comp.novo is the output file name. The "-r Unique" or "-r All" parameters are useful especially for study of exosome binding across tRNAs which share common sequences [10]. "-r" Unique will lead to preferential loss of a subset of sequences (e.g., ribosomal sequences which are represented by two identical RDN37 sequences in the yeast reference genome).
By default, NovoAlign filters out all reads shorter than 17 nt (as shorter reads are unlikely to map uniquely to the yeast genome). For datasets obtained from Rrp44 CRAC, it was useful to align shorter sequences [15] enriched for species targeted to Rrp44 exonuclease site and bypassing the exosome channel (Rrp44 protects 9 nt while exosome + Rrp44 protects 31-33 nt). In some analyses, we then used "-l 9" parameter (instead of -l 17 default).

Counting Overlaps with Genomic Features
To study distribution of reads across the genome, we use pyRead-Counters.py from the pyCRAC package. A GTF format file for genome annotation is required by the pyCRAC software and is critical to the interpretation of the output of the pyCRAC pipeline. pyCRAC is sensitive to the formatting within the GTF file and we find it useful to check the annotated GTF file using the pyCheckGTFfile.py command to ensure that the GTF file is suitable for use with the pyCRAC software.  The output files are (1) a gtf file that can be used as input files in numerous analyses within pyCRAC package, (2) a hit table file presenting the counts of reads mapped to each genomic feature within each defined RNA class in absolute value and read number normalized per kilobase per millions (if -rpkm parameter is specified in the command line).

Distribution along Genes
To observe binding distribution of exosome subunits across individual genes, we use pyPileup.py from the pyCRAC package. The output is a tab-delimited file that can be plotted to obtain a visual overview of binding along the gene of interest. This gives particularly good quality plots for RNAs that are strongly targeted by the exosome. where sequence.tab is a tab-delimited file with genes name and sequences and gene.list is a text file with the names of genes for which you want to generate output files.
-r parameter allows the user to indicate the length of flanks to be added on 5 0 and 3 0 ends of genes.
To study binding across a particular class of RNA, metagene plots are generated. We used custom-made scripts, still not available online. However, the computeMatrix, plotProfile, and plotHeatmap modules of the deepTools software allow for similar analyses [16].

Oligo-A Reads
Selection of reads containing 3 0 nonencoded A tracks, allows identification of targets oligoadenylated by TRAMP prior binding of the exosome. We use custom-made scripts giving as output files (1) a fasta file containing only oligo-A reads, used for downstream analyses, (2) a text file with the ratio of oligo-A to total reads, and (3) a text file with the list of nonencoded 3 0 tails.