Human samples for iPSC generation
We reprogrammed fibroblasts of two sALS patients (further referred as sALS-1-1/2 and sALS-2-1/2), two age-matched controls (Ctrl-1-1/2 and Ctrl-2-1/2, respectively) and included two additional controls (CV-B and Ctrl-3-1/2) and 2 fALS (fALS-1-1 and fALS-2-1) cases with pathogenic variants in TDP-43 in this study. The study approval for cell lines originating from Ctrl-1, Ctrl-2, sALS-1 and sALS-2 was granted by the local ethics committee (No. 4485, FAU Erlangen-Nürnberg). The 2 sALS patients did not have a family history of ALS and did not harbor known ALS-causing mutations determined by C9ORF72 repeat length analysis and exome sequencing. Both patients were diagnosed with clinically definite ALS according to the revised El Escorial criteria . Patient sALS-2 presented with cortical and spinal motor neuron involvement starting in the right arm. In the latest follow-up (11 years after onset), he had been on a ventilator for the past 6 years, presenting with a revised ALS functional rating scale (ALSFRSr) of 1/48. In contrast, patient sALS-1 exhibited slower progression and an ALSFRSr of 39/48 12 years after disease onset. The use of cell lines originating from Ctrl-3, fALS-1 and fALS-2 were approved by the Institutional Review Board of the University of California, San Diego. The CV-B iPSC line, originating from Craig Venter, is publicly available and has been previously described . Ctrl-3 and fALS-1 have been described previously . Line fALS-2 was obtained by courtesy of Kevin Eggan and has been described before . Detailed clinical information and reprogramming method for each line is stated in Online Resource Table 1. For the hereditary spastic paraplegia with pathogenic variants in SPG11 (SPG11-HSP) cases, 6 lines from 3 SPG11-HSP patients (UKERi4AA-S006, UKERi4AA-S-14A, UKERiK22-S-001, UKERiK22-S-003, UKERiG7G-S-001, UKERiG7G-S-008) and 6 lines from 3 Ctrls (UKERi33Q-S-006, UKERi33Q-R-106, UKERi82A-S-004, UKERi82A-S-022, UKERi55O-S-002, UKERi55O-S-004) were used that have been previously characterized [25, 53]. Additionally, an isogenic pair of wt HuES6 and a homozygous SPG11 knock-out was used that has been previously characterized . Pluripotency as well as normal karyotype was confirmed in all iPSC lines.
Induced pluripotent stem cell (iPSC) culture
iPSC were derived from fibroblasts and reprogrammed using Yamanaka factors [27, 62]. They were cultured on Matrigel-coated plates using mTESR1 or mTESR plus at 37 °C with 5% CO2. The media was exchanged every 24 hours for mTESR1 and every 48 hours for mTESR plus. When iPSCs reached about 80-90% confluency, the cells were passaged using Accutase and plated in a 1:3 to 1:6 ratio with mTESR1 + 10μM ROCK-inhibitor (RI) or with ReLeSR in a 1:3 to 1:6 ratio with mTESR+. The media was changed the day after passaging.
Generation of NOVA1 knock out iPSC
The CV-B iPSC line was chosen to generate NOVA1 K.O. Two adjacent gRNAs targeting exon2 of NOVA1 (gRNA 1: GGATCTATAATTGGGAAGGG; gRNA 2: GGACTTAGACAGCTTGATGG) were cloned into the lentiCRISPR v2 backbone (Addgene plasmid #52961) . The media was aspirated from CV-B iPSC growing in a 6-well format and 1ml of OptiMEM was given to the well. The two gRNA plasmids were jointly transfected (3ug per gRNA plasmid) in CV-B iPSC using lipofectamine 3000. After 2h, the OptiMEM solution was aspirated and mTeSR plus was added. To select for transfected cells, 24h after transfection, the media was supplemented with 0.5μg/ml puromycin, 48h after transfection with 1μg/ml puromycin and 72h after transfection with 0.25μg/ml puromycin and 96h after transfection, puromycin was withdrawn from the media. After 3d recovery, the cells were dissociated with Accutase and 2000 cells were plated onto a 6-cm dish with mTeSR plus containing CloneR. Two days after, the media was changed to mTeSR plus and cultured for an additional 10 days until iPSC clones become clearly visible. The clones were then picked into a 96-well plate that was duplicated 4 days later. One 96-well plate was lysed in QuickExtract buffer (Lucigen) for genotyping using GoTaq2x MasterMix and two primers flanking the two gRNAs (fwd: CCCAGTGCTTTAGTTGCTGT; rev: TCTTCACCATTGCACCTACCT). Candidate clones were identified by a reduction in PCR product size. All clones were expanded, checked for expression of pluripotency markers, and cryobanked. The presence of frameshift mutations was confirmed via Sanger sequencing of the PCR products. As controls, we used clones that underwent the CRISPR/Cas9 process but where no genome editing was achieved.
Motor neuron (MN) differentiation
A dual-SMAD inhibition-based protocol  was used to generate MNs (Online Resource Fig. 1a) as described previously [32, 42, 44]. Briefly, iPSC were dissociated into single cells using Accutase. After cell counting, 2x105 cells per cm2 (i.e. 2x106 cells for one well of a six well plate) were plated on a Matrigel-coated well (17.5µg/cm2 of Matrigel in DMEM/F12) with mTESR1 or mTESR plus supplemented with 5μM RI. Around 24h later, the cells reached 95% confluency. For the differentiation into MN, a basal-media, further referred as N2B27 (DMEM/F12, 1xN2, 1x B27, 100μM ascorbic acid, 1xPen/Strep), was used for culture and dilution of compounds. From day one to day six, the cells were incubated with N2B27 supplemented with 1μM Dorsomorphin (Dorso), 10μM SB431542 (SB), 3μM CHIR99021 (CHIR). From day 3 on, 5μM RI were supplemented and the media volume was gradually increased (0.2 ml/cm2 on days 1 and 2, 0.3 ml/cm2 on days 3 and 4, and 0.4 ml/cm2 on the following days). The media was exchanged on daily basis. From day six to day 15 the cells were incubated with 0.5xN2B27 with Dorso, SB, RI and, in addition, 1.5μM retinoic acid (RA) and 200nM of Smoothed agonist (SAG) and 5uM RI with 0.416 ml/cm2. When the cells reached the stage of MN progenitors (MNPs, day 15 of differentiation), the MNPs were dissociated into single cells using Accutase. 0.5x106 cells/ml were plated on poly-D lysine and laminin (PDL/Lam)-coated 6-well plates in N2B27 with RA, SAG, RI, and 2ng/ml of brain-derived neurotrophic factor (BDNF), glial-derived neurotrophic factor (GDNF), and ciliary neurotrophic factor (CNTF), respectively. This combination of growth factors will be referred to as neurotrophic factors (NFs) further on. At this point the cells were frozen in vials containing handy cell numbers for future use (1x106, 5x106 or max. 10x106 cells). The appropriate amount was centrifuged again and resuspended in 500μl N2B27 and 500μl of a freezing solution consisting of 80% FBS and 20% DMSO. The vials were put in a freezing container at −80 °C for three hours. Afterward, the vials were immediately transferred into liquid nitrogen or −150°C .
For further culture, the media containing RA, SAG, 2μM RI and NFs was exchanged completely every other day until day 22. On day 22 of MN differentiation, the media was switched to N2B27 with NFs, 2μM RI and the γ-secretase inhibitor DAPT (2μM). On day 24, the cells were fed with the same media again. From day 25 on the cells were cultured until day 30 solely with N2B27 supplemented with NFs and 2μM RI.
To ensure an appropriate cell densities for immunofluorescence (IF) staining and imaging, a portion of the cells were passaged with Accutase between day 22 and 25 for each differentiation round on the same day. The procedure follows the dissociation at day 15 as explained above.
Lentivirus production and infection of iPSC-derived MNs
Lentivirus was produced as described previously . Plasmids compatible for lentiviral packaging were generated by assembling NOVA1 ORF or EGFP ORF with a P2A-PuroR DNA fragment into a CAG promoter-containing backbone . Briefly, 2nd generation lentivirus packaging plasmids were used to produce VSV-G pseudotyped lentivirus containing CAG-ORF-V5-P2A-PuroR in Lenti-X-293T cells with PEI as a transfection (10µg VSV-G, 15µg psPAX2, 20ug ORF plasmid) reagent in 15-cm dish formats. Five hours after transfection, the media was changed to DMEM/F12+Glutamax containing 10% FBS. 24h and 48h after transfection the media was harvested and pooled together. The media was then centrifuged at 1,000g for 5 mins to remove cell debris. Lenti-X-concentrator was used according to the manufacturer’s instructions to concentrate the virus by 100x. The virus stock was stored at −80 degrees. Different viruses used for a given experiment were always prepared together.
For infection of NOVA1 and EGFP overexpression in iPSC-MN resulting in the generation of the analyzed RNA-seq datasets, 30µl of virus concentrate was applied per well of a 24-well dish with MNs at day 22 of differentiation. At day 24 of differentiation 1μg/ml puromycin was added to the media to get rid of non-transduced cells and kept in the media till day 29 of differentiation. At day 26, a second round of lentivirus infection with 30μl of virus concentrate was applied to get a sufficient degree of overexpression. At day 30, one well of each condition was lysed in TriZOL for RNA and another one in RIPA buffer to confirm overexpression by Western blot.
All FACS experiments were performed with 6 control and 6 ALS lines in parallel. For flow cytometry the MNPs or MNs were dissociated with Accutase for 30 min at 37 °C and resuspended in FC buffer (2% FCS, 0.01% sodium azide in PBS). Cells were dispensed into 5 ml tubes (Sarstedt) at 500,000 cells per well. For intracellular antigens, cells were fixed and permeabilized with 100µl BD Fixation/Permeabilization Solution (BD Bioscience) for 10 minutes, then 1ml of BD Perm/Wash Buffer is added and the cells are incubated for 5 minutes and subsequently centrifuged at 1500 rpm for 3min. For intracellular staining of MNPs, an OLIG2 antibody (AB9610, Millipore, 1:100) was incubated in BD Perm/Wash Buffer for 30 minutes and after washing, the cells were incubated with an AlexaFluor-488 anti-rabbit IgG (1:500), anti PAX6-APC and anti NESTIN-PerCp-Cy5.5 for an additional 30 minutes. After a wash step, the cells were resuspended in 350µl FACS buffer containing DAPI (1μg/ml). For intracellular staining of neurons, the cells were stained with anti bIII-Tubulin (NB600-1018AF405, NovusBio, 1:100) or anti-ISL1 (562547, BD Bioscience, 1:200) for 30min. The gating strategy for all FACS experiments included undifferentiated iPSC as a negative control to validate specificity of the antibody since iPSC did not express any of the genes of interest at relevant levels on RNA level as determined by an in-house RNA-seq dataset from iPSC (RPKM OLIG2: 0; RPKM PAX6: 0.06; RPKM ISL1: 0.01; RPKM bIII-tubulin: 0.002) . Additional controls included applying an antibody solution missing one antibody in the full cocktail (“minus 1 control”) that were used to determine potential bleed-through of the fluorophores. The flow cytometry experiments were performed with a Cytoflex S (laser 405nm, 488nm, 561nm, 638nm; Beckman Coulter) and analyzed with the CytExpert 2.4 software.
To determine cell death via FACS, we used a commercially available kit that uses a fluorescent 660-DEVD-FMK caspase-3/7 inhibitor reagent (ab270785, abcam) and a fixable cell permeability dye (Live-or-Dye, 32008-T, Biotium). The caspase assay and Live-or-Dye assay reagents were dissolved in 50µl DMSO, respectively and aliquoted and stored at -20 degrees. For the assay, MN were grown in 24-well plates. At the day of analysis, the media was aspirated from the plate and 150µl DMEM/F12+Glutamax containing 0.48µl 660-DEVD-FMK caspase-3/7 inhibitor reagent and 0.15μl Live-or-Dye assay were applied. The cells were then incubated for 45 minutes at 37 degrees C. Subsequently, the cells were dissociated, fixed and stained as stated above. To correctly determine bleed through, single incubation controls (either with 660-DEVD-FMK caspase-3/7 inhibitor reagent or Live-or-Dye assay) were used. The number of Casp3/7+ISl1+Live-or-Dye- cells vs. Casp3/7-ISl1+Live-or-Dye- were determined as the final readout.
Immunofluorescence staining of cultured cells
Cells were fixed in 4% paraformaldehyde (PFA) for 10 minutes at room temperature and subsequently washed 3x with PBS for 3 mins each. A total of 6 different ALS and 6 different Ctrl lines were processed for all stainings in parallel.
The cells were permeabilized and blocked in 0.3% Triton X-100 and 5% donkey serum in PBS for 30 mins at room temperature. Afterward, the cells were incubated with primary antibodies (ISL1: 39.4D5-s, DSHB, 1:250; beta-III-Tubulin: ab18207, Abcam, 1:1000; NOVA1, ab183024, Abcam, 1:500; TDP-43, A303-223A, Bethyl, 1:1000) at 4 °C overnight. After washing, incubation with secondary antibodies and nuclei staining using 1µg/ml DAPI was performed. The slides were mounted using Mowiol solution.
Imaging was performed with a Zeiss Observer.Z1 including Apotome technology. For the quantification of ISL1 positive cells, a custom ImageJ script, counting at least 250 neurons per sample was used. For determining NOVA1 and TDP-43 intensities in iPSC-MN, 15 positions with relatively sparsely located ISL1-positive cells were imaged using a 63x objective and 9 Z-stacks with 0.5μm thickness. Using Zen blue software (Zeiss), raw apotome phase correction was performed and the Z-stacks were merged using extended depth of focus (maximum intensity projection) using default conditions and exported as .tiff files. CellProfiler  was used for subsequent image analysis. Briefly, all nuclei and ISL1 cells primary objects were identified by DAPI and ISL1 images, respectively (Otsu thresholding method with 3 classes). Neuron somas were identified using the propagation method on the log-transformed beta-III-tubulin images (Otsu thresholding method with 3 classes, middle class assigned to the foreground). To reduce falsely positively assigned objects, somas with a size > 30,000 pixels were not further considered. Objects were related with each other to only consider further ISL1 positive nuclei and the corresponding somata. By subtracting the nuclear area from the soma area, the cytoplasmic area was determined. Using the measure intensity plugin, cytoplasmic and nuclear intensity of our proteins or interest in MNs was measured. A median filter (window: 5) was applied before intensity measurement to reduce salt and pepper noise. For NOVA1, the first 28 MNs per line were considered and for TDP-43 20 MNs per line were considered.
The MNs, which form a dense network of cells, were washed with warm DPBS. The sheet of cells was transferred to a 1.5ml Eppendorf tube and centrifuged. After removing the supernatant, the pellet weight was determined. For 1μg cell pellet, 4μl of ice-cold RIPA buffer were added to facilitate the lysis of the cells. The cell lysate was sonicated for five minutes with 30s on/off on a low intensity level using a Bioruptor. Afterward, the lysate was centrifuged at 100,000g for 30min at 4 °C. This supernatant represents the soluble protein fraction. The pellet was washed with ice-cold RIPA once and then resuspended in 4μl urea buffer (7M urea, 3M thiourea, 4% CHAPS, 30mM Tris) per original μg of pellet weight. Subsequently, the suspension was sonicated using a Bioruptor under the same conditions as described before. The sonicated suspension is centrifuged at 100,000g for 30min at 4 °C again. The resulting supernatant reflects the insoluble fraction. To each fraction, loading buffer and DTT (final concentration of 100mM) was added and the samples were incubated at 55 °C for 30min. The samples were then used for Western blotting. Western blot band intensities were quantified by densitometry and normalized to total protein determined from separately run Coomassie-stained gels.
For regular total cell lysates, cells were lysed in Radio-immunoprecipitation buffer (RIPA) followed by sonication with a Bioruptor and the protein concentration was estimated by bicinchoninic acid (BCA) assay. Equal concentrations were applied. All immunoblots were run on 4-12% Bis-Tris gels with NuPAGE MOPS running buffer for 90 minutes at 180V. Proteins were transferred to a PVDF membrane with 10% methanol in NuPAGE transfer buffer at 30V overnight at 4 °C. The membrane was then blocked for 1h in 5% bovine serum albumin (BSA) in TBS-T and primary antibody (NOVA1: ab183024, Abcam, 1:2000; NOVA2: 55002-1-AP, ProteinTech, 1:1000; RBFOX2: A300-864A, Bethyl Laboratories Inc., 1:1000; RBFOX3: ab177487, abcam, 1:1000; ELAVL4: sc-48421, Santa Cruz Biotech., 1:1000; FXR2: MA1-16767, ThermoFisher, 1:1000; GAPDH: CB1001, Millipore, 1:5000; TDP-43: A303-223A, Bethyl Laboratories Inc., 1:5000) was incubated over night at 4°C. Afterward, the membrane was washed twice for approximately seven minutes in 5% BSA in TBS-T and then incubated for one hour with the secondary HRP-conjugated antibody for one hour at room temperature. The membrane was washed three times with TBS-T and incubated in the dark with ECL solution. Film development was performed in the dark with various exposure times. For quantification of Western blots, densitometric analysis was performed using Fiji.
Label-free quantification of peptides from urea-soluble fractions (representing insoluble protein; no DTT and loading buffer added) was done using a Thermo Scientific Orbitrap Fusion instrument at the Core Facility for Medical Bioanalytics at the University Hospital Tübingen. Peptide identification and assignment was performed using MaxQuant . LFQ intensities representing peptide quantities were used for further calculations of proteins enriched in the urea fraction. Controls (CV-B, Ctrl-3-1, Ctrl-1-1, Ctrl-1-2, Ctrl-2-1, Ctrl-2-2) were compared to ALS samples (sALS-1-1, sALS-1-2, sALS-2-1, sALS-2-2, fALS-1-1, fALS-2-1). For further calculations, +1 was added to all values to avoid divisions by 0. A fold changed was computed (mean(LFQ intensity+1)ALS / mean(LFQ intensity+1)Ctrl). To test for statistical significance, Welch’s test was performed on log2-transformed LFQ intensity+1 values. A P value below 0.05 was considered as significant. Additionally, we used a fold change cutoff of greater than 1.5. The SciPy package was used for statistical calculations and seaborn package for visualization in Python 3.0.
TriZOL was used for RNA extraction according to the manufacturer’s instructions. All steps were performed on ice or 4 °C. The RNA was resuspended in DNase/RNase free H2O. For obtaining ultrapure RNA without any residual phenol-chloroform contamination, the Zymo RNA Clean & Concentrator-5 kit was used.
RNA-seq library preparation
For the RNA-seq dataset sALS vs. Ctrl of this study, the RNA-seq libraries were prepared using the Illumina TruSeq Stranded mRNA Library Prep Kit and performed according to the manufacturers protocol. In total, 1μg of RNA was used as input and poly-A-selection was performed. The library was quantified using a Tapestation (Agilent Technologies 2100 Bioanalyzer). To ensure library quality was suitable the fragment size was checked. The prepared library was sequenced on an Illumina HiSeq4000 instrument at the IGM Genomics Center at UCSD. Libraries were sequenced in 100bp, paired-end mode. For all other RNA-seq datasets generated in this study (fALS vs. Ctrl, NOVA1 overexpression vs. EGFP overexpression, NOVA1 K.O. vs. NOVA1 wt, SPG11-HSP vs. Ctrl), library preparation and sequencing was performed at Genewiz Germany GmbH with Illumina Stranded mRNA library preparation and sequencing in 150bp paired-end mode. All libraries were sequenced to a depth of > 40x106 reads.
Additional RNA-seq data from ALS patients were obtained from the AnswerALS consortium (http://data.answerals.org). Data present by December 2020 were downloaded. To ensure adequate comparability of patients and controls, stringent inclusion criteria were set (Online Resource Fig. 3b). All ALS patients considered were diagnosed with at least clinically probable, laboratory supported ALS and presented lower motor neuron signs in at least 3 different sites. Furthermore, ALS patients were subdivided according to their C9ORF72 hexanucleotide repeat expansion (C9) status and sex. Additionally, for C9-negative ALS patients and Controls, a family history of neurological disorders was excluded. Due to an insufficient number of male individuals that passed these criteria, this resulted in datasets from 9 sALS, 5 C9-ALS and 9 Ctrl individuals (consisting of female individuals only) that were used for further analysis.
For datasets used for investigating TDP-43 related changes in NOVA1 expression, 3 TDP-43 depletion datasets (GSE121569 , GSE122069 , GSE27394 ) and 2 datasets with cytoplasmic TDP-43 (GSE65973 , GSE157467 ) were used.
RNA-seq mapping and alternative splicing analysis
All RNA-seq datasets were analyzed in the same fashion unless stated otherwise. The RNA-seq reads were adaptor trimmed using Cutadapt (version 1.10) , aligned to hg19 (GRCh37) with STAR (version 2.5.0b)  and sorted using samtools (version 0.1.18) . The aligned reads were assigned to the gencode annotation (version 19) using featureCounts  and Reads Per Kilobase of transcript, per Million mapped reads (RPKM) were calculated.
For differential splicing rMATS (version 4.1.0) was used  with the following specifying flags: -b1 samples-in-control-condition.bam -b2 samples-in-target-condition.bam -t paired --variable-read-length --anchorLength 1 --tstat 6 --novelSS --libType fr-firststrand. The output file considering the junction counts only was used for further analysis. Due to order of the assignment of groups in -b1 and -b2, negative value of the InclusionLevelDifference reflects an inclusion of a given exon in the samples of the target condition and a positive value of the InclusionLevelDifference reflects an exclusion of a given exon in the samples of the target condition.
All downstream analysis were performed in Python 2.7. Only exon junctions that were covered with at least 10 counts in each sample of a given dataset were considered. A unique index was generated, referring to a specific AS event with the aim to identify the exact same exon junction in separate rMATS analyses. For pair-wise comparisons of datasets, the files were joined and only exons that passed the coverage threshold in both datasets were considered. An exon was called as differentially alternatively spliced in each dataset if the FDR was below 0.05 and the absolute value of the InclusionLevelDifference was more than 0.1. The overlap of differentially spliced events was visualized with the Venn function in matplotlib library. The significance of the overlap and odds ratio (OR) was determined by Fisher’s exact test using the stats module in SciPy. The value and significance of the Spearman correlation analysis of the InclusionLevelDifference in two given datasets was also computed with the stats module in SciPy. For k-means clustering, the k-means method from the sklearn.cluster module from sciki-lean  was used (specifications: init=’ranodm’, n_clusters=8, n_init=10, max_iter=300, random_state=42). AS events significant in any of the four ALS datasets were clustered into 8 clusters according to the inclusion value differences in the four ALS datasets, the SPG11-HSP, and the NOVA1 o.e. and K.O. dataset.
eCLIP-seq library preparation and raw data processing
The standard eCLIP-procedure was performed for TDP-43 [67, 68] and the single-end sequencing version of the assay was applied to the eCLIP-seq experiments for NOVA1, NOVA2 and RBFOX2  with some modifications to adapt it to in vitro cultured MNs. Briefly, for iPSC-derived MNs, one full 10-cm culture plate (containing ~5x107 cells) was UV cross-linked (254 nm, 400 mJ/cm2), flash frozen and stored at -80°C. For eCLIP-seq, the cells were lysed, sonicated and cleared by centrifugation (15min at 15,000g, 4 °C). This lysate was used for further downstream preparations. The antibody (TDP-43: A303-223A, Bethyl Laboratories, Inc.; NOVA1: ab183034, abcam; NOVA2: 55002-1, ThermoFisher; RBFOX2: A300-864A, Bethyl Laboratories, Inc.) was bound on Dynabeads and incubated for 2h at room temperature with the pre-cleared lysate. Afterward, the beads were washed and the barcoded 3’RNA linkers were ligated on the bound RNA on the beads. The IP and inputs were loaded on a gel and transferred onto nitrocellulose. A 70 kDa fragment was excised from the nitrocellulose membrane for each lane starting from the size of the protein of interest to capture most of the IPed RNA and a size-matched input (input) as a control. Membrane fragments were treated with protease and released RNA was extracted using Zymo RNA Clean & Concentrator-5 columns. The RNA was reverse transcribed with a primer specific to the ligated RNA adapter. Excess primers were digested with ExoSAP, the RNA was removed by NaOH treatment, the cDNA was purified using MyONE Silane beads, and 5’ adapters were ligated on the cDNA. After quantification, the library was amplified using Q5 PCR mix (NEB) and size selected. The exact libraries were quantified using a D1000 tape on an Agilent Bioanalyzer. For the TDP-43 eCLIP-seq experiments, the libraries were sequenced in 50bp paired-end mode, while NOVA1, NOVA2 and RBFOX2 eCLIP-seq experiments were 75bp single-end.
To obtain a peak file with enrichments over inputs with log2(FC) and P values, a standard peak calling pipeline was use as described previously  (code available on github: https://github.com/YeoLab/eclip). Briefly, the adapters were trimmed and the reads were aligned to the UCSC hg19 genome built using STAR . A custom script performed removal of PCR duplicates and CLIPper  was used to call peaks. To calculate enrichments over input, reads in IP and SMI were compared at regions identified by CLIPper in the IP sample. Fold change and P values were calculated by Fisher’s exact test. A threshold was applied to call significantly enriched peaks. For TDP-43, the threshold was set to log2(FC) > 5 and a –log10(P value) > 5 and for NOVA1, NOVA2 and RBFOX2 the threshold was set to log2(FC) > 3 and a –log10(P value) > 3.
Region analysis and enrichment of k-mers
To determine the distribution of transcript regions within the significantly enriched peaks of an eCLIP-seq dataset, the annotator tool was used (https://github.com/byee4/annotator). To analyze sequence preferences in the significantly enriched peaks, we performed k-mer analysis (https://github.com/byee4/clip_analysis). The occurrence of each possible 4-mer or 6-mer was calculated for a peak file and a Z test was performed to calculate the frequency of each 4-mer or 6-mer over the background.
Determination of enrichment of eCLIP-seq peaks in AS events
To determine if an eCLIP-seq peak was present in an exon junction, the rMATS output of interest was converted into a bed format and intersected with the significant eCLIP-seq peak file using pybedtools (-u True) . The different regions within the exon junction were also extracted from rMATS, acknowledging that upstream regions of exon junctions on the negative strand will be displayed as downstream regions by rMATS. The statistical significance of the enrichment was computed using hypergeometric test with all events that passed the coverage threshold as the background.
Human postmortem tissue immunofluorescence staining and imaging analysis
All postmortem CNS tissues were acquired by way of an Investigational Review Board and Health Insurance Portability and Accountability Act compliant process. All samples (n = 9 sALS; n = 7 Ctrl, including 2 AD patients) had postmortem intervals <8 h. The sALS tissues were from patients who had been followed during the clinical course of their illness and met El Escorial criteria for definite ALS. Genetic variants that are known to cause ALS were excluded.
Six micrometer thick tissue sections from lumbar spinal cord were cut from a block of formalin-fixed paraffin embedded tissue and deparaffinized using CitriSolv (2x15mins) and a series of gradually decreasing ethanol wash steps (2x100%, 1x90%, 1x70% and 1x0%, 5 minutes each). Subsequently, antigen retrieval was performed using high pH antigen retrieval solution at 120 °C for 15 minutes. The staining vessel containing the samples was allowed to cool down on the benchtop for 30 minutes, washed 2x for 5 minutes with PBS and permeabilized in 0.2% Triton-X-100 in PBS for 10 minutes. Afterward, the tissue was blocked with 2% FBS in PBS for 60 minutes followed by the primary antibody incubation in 2% FBS in PBS over night at 4 °C (TDP-43: H00023435-M01, Novus Biologica, 1:500 dilution; NOVA1: ab183034, abcam, 1:500 dilution; beta-III-Tubulin: ab107216, abcam, 1:1000). The next day, the tissue was set to room temperature for 3 minutes, then washed 3x5 minutes in PBS and blocked in 2% donkey serum with 0.2% Triton-X-100 in PBS for 30 minutes. Subsequently, the secondary antibodies were incubated for 1h at room temperature (AlexaFluor-488, anti-rabbit; AlexaFluor-555, anti-mouse; AlexaFluor-633, anti-chicken; all at 1:500 concentration). After three 5-min wash steps in PBS, the nuclei were stained with 1µg/ml DAPI for 10 min and then washed again twice for 5 min in H2O. To reduce autofluorescence, the samples were incubated in 0.1% (w/v) Sudan Black in 70% ethanol for 15 seconds and washed twice for 5 min each in PBS immediately after. The slides were submersed in H2O and mounted with ProLong Gold Antifade with DAPI (ThermoFisher).
All slides were imaged using the identical imaging conditions with a Zeiss Spinning Disc Axio Observer Z1. To promote unbiased image selection, a 10x overview of the beta-III-tubulin signal of anterior horn of the spinal cord was taken and using the tiles setup, regions containing MNs were selected and subsequently imaged with a Plan-Apochromat 63x oil magnification objective and 23x0.5um z-stacks. Using Zen software, the tiles were conjoined, and a maximum intensity projection was computed and converted to a TIFF which was used for further analysis. Imaging was performed blinded manually on a single-MN level in Fiji . Only MNs with a visible nucleus as determined by DAPI were analyzed. The mean fluorescent intensity (MFI) of NOVA1 and beta-III-tubulin was measured in the nucleus, soma and lipofuscin areas in each neuron. An area was considered as lipofuscin when the typical autofluorescence pattern was observed in all channels. Additionally, the area of the three locations of interest was measured and the lipofuscin-free cytoplasmic area was calculated by subtracting the nuclear and lipofuscin areas from the soma area. The MFI of the cytoplasm was then calculated as followed: MFIcytoplasm = ((MFIsomaxareasoma) − (MFInucleusxareanucleus) − (MFIlipofuscinxarealipofuscin))/areacytoplasm. Qualitatively, each MN was assessed for TDP-43 pathology, specifically if there is a loss in nuclear TDP-43 and whether there is cytoplasmic TDP-43 aggregation present and if the aggregation is dot-like or skein-like/Lewy-body-like.
Analysis of the data was performed in Python 3. Only MNs with an areasoma > 1000µm2 were considered. For the analysis between MNs within a sALS patients, the raw MFI was used and MFIs in MNs with nuclear TDP-43 and loss of nuclear TDP-43 within a single patient were considered as matched values. To compare MNs from different individuals with each other, the MFI of NOVA1 and TDP-43 were normalized to the beta-III-tubulin MFI of the lipofuscin-free soma area, with the aim to reduce inter-individual variability due to technical differences such as postmortem interval, variability within fixation procedure and storage time of the tissue.
NOVA1-STMN2 expression correlation analysis in postmortem ALS tissues
To investigate changes in NOVA1 expression in ALS patient tissue with varying TDP-43 dysfunction, we used 4 publicly available sALS gene expression datasets: GSE76220  and GSE18920  containing spinal motor neuron RNA-seq and microarray datasets, respectively, from laser capture microdissected tissue; and GSE122649 and GSE124439, containing RNA-seq data from motor and non-motor frontal cortex . For GSE76220, reads were aligned to the human genome, assigned to genes and converted to RPKM values as described before. For GSE56504, microarray gene expression values of NOVA1 and STMN2 were extracted, by only considering the probe with the highest signal for NOVA1 (3558755) and STMN2 (3104516) using GEO2R. For GSE122649 and GSE124439, the counts table provided via the GEO accession number was used and counts were transformed into reads per million (RPM) values. To correlate NOVA1 and STMN2 expression values in the different datasets, Spearman R correlation analysis was performed in Python 3 using the SciPy package and scatter blots were visualized using the lmplot function within the seaborn package in Python 3. To pool the samples of different datasets within the two anatomical regions (spinal motor neurons and frontal cortex), NOVA1 and STMN2 expression values were converted into z-scores based on the mean and standard deviation of controls in the respective dataset. Since GSE76220 and GSE18920 have a partially overlapping set of patients, the duplicated patients were omitted from the GSE18920 dataset. ALS samples were then assigned to either a STMN2 regular or a STMN2 low group by setting the z-score cutoff value at -2. Using the SciPy package in Python, statistical significance in expression in the 3 groups (Ctrl, ALS STMN2 regular, ALS STMN2 low) was computed using Kruskal-Wallis test and individual differences between two groups were determined using Wilcoxon rank-sum tests.
GraphPad Prism 9.0 was used for generation of simple blots and computation of P values of simple pair-wise or grouped analyses. Wherever possible, non-parametric tests were used. For all pair-wise analyses, 2-sided Mann-Whitney U test was performed. To compare significance between groups consisting of multiple individuals (postmortem imaging), nested one-way ANOVA was used. Kruskal-Wallis test was used to determine a difference in more than 2 groups and Dunn’s post hoc test was applied to identify differences between individual groups. A mixed effects model analysis was performed when a significance was computed to determine the difference between more than one categorical independent variable when matched values were compared. 2-way ANOVA was used for similar comparisons of unmatched samples. P value < 0.05 was considered significant (*P value < 0.05; **P value < 0.01; ***P value < 0.001, ****P value < 0.0001).