Single-nucleus chromatin accessibility profiling highlights distinct astrocyte signatures in progressive supranuclear palsy and corticobasal degeneration

Tauopathies such as progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD) exhibit characteristic neuronal and glial inclusions of hyperphosphorylated Tau (pTau). Although the astrocytic pTau phenotype upon neuropathological examination is the most guiding feature in distinguishing both diseases, regulatory mechanisms controlling their transitions into disease-specific states are poorly understood to date. Here, we provide accessible chromatin data of more than 45,000 single nuclei isolated from the frontal cortex of PSP, CBD, and control individuals. We found a strong association of disease-relevant molecular changes with astrocytes and demonstrate that tauopathy-relevant genetic risk variants are tightly linked to astrocytic chromatin accessibility profiles in the brains of PSP and CBD patients. Unlike the established pathogenesis in the secondary tauopathy Alzheimer disease, microglial alterations were relatively sparse. Transcription factor (TF) motif enrichments in pseudotime as well as modeling of the astrocytic TF interplay suggested a common pTau signature for CBD and PSP that is reminiscent of an inflammatory immediate-early response. Nonetheless, machine learning models also predicted discriminatory features, and we observed marked differences in molecular entities related to protein homeostasis between both diseases. Predicted TF involvement was supported by immunofluorescence analyses in postmortem brain tissue for their highly correlated target genes. Collectively, our data expand the current knowledge on risk gene involvement (e.g., MAPT, MAPK8, and NFE2L2) and molecular pathways leading to the phenotypic changes associated with CBD and PSP. Supplementary Information The online version contains supplementary material available at 10.1007/s00401-022-02483-8.


Suppl.Fig.02 Epidemiological and sequencing metadata
A Projections of technical metadata onto the UMAP embedding of high qualityfiltered barcodes indicating the respective variable as color code. Shown are read depth, fraction of reads in peaks ('FriP') score, duplicate likelihood, promoter and enhancer ratios. A pie chart displays group-wise contributions to the entire cell pool. As reference, the first an second panel include cluster identifiers (ID) and cell type assignments, respectively. B Boxplot comparisons of nuclei sequencing read depth across included cases. Outliers are depicted as black dots. The hinges of each box correspond to the 25th and 75th percentiles with medians drawn as black bar. The 1.5-times inter-quartile ranges are shown as black whiskers. Red dots show means of the distributions. C Boxplot comparisons of patient-specific and disease-relevant parameters, such as age at death, disease duration and post mortem interval. The plot structure equals to B. Color coding indicates the neuropathological diagnose, while asterisks denote the degree of significance with 'ns' = 'not significant', *p<.05, and **p<.01. Analysis of variance (Anova) results are shown at the bottom of each triple comparison. D Correlation matrix heatmap of epidemiological and technical parameters between average-aggregated barcodes of single human cases. Pearson's R is displayed by the color shading and labels. Paired correlations that do not comply with a p<.05 are shown as empty boxes. Abbreviations: PMI, post mortem interval; TSS, transcription starting site; UMAP, uniform manifold approximation and projection.  Thal-Phase # cells: OPC

Suppl.Fig.04 Cross-case correlation of cell type frequencies and Thal phases
Stratified by previously defined cell types and subpopulations, each scatter plot depicts the relation between Thal-phase (Aβ+ plaque distribution) on the x-axis and the relative cell type frequency across cases in a group-agnostic way. A regression line is drawn in black, while the confidence interval is shown in grey. Pearson's R and p-values are depicted in the upper right corners. Color code indicates case identity.

Suppl.Fig.09 Biological pathways indicated by altered TFs in tauopathy brains
A GO enrichment analysis of TFs that exhibited significant TFME deviations in PSP astrocytes. Only the top 25 enrichment results with an adjusted p-value <.05 are depicted. Rows correspond to the GO terms comprising MF, BP, and CC, while TF Ensembl IDs are given on the x-axis. The color shading indicates the enrichment score from negative (blue) through zero (grey) to high (red) values. B GO enrichment analysis of TFs that exhibited significant TFME deviations in CBD astrocytes. The plot structure equals to A. C Combined heatmap-upset plot of the top 10 GO terms enriched in the set of differentially active TFs in PSP astrocytes. The lower matrix shows the coenriched GO term logic. The resulting intersection of GO terms involves those TFs that are highlighted in the heatmap above. Whether these TFs exhibit positive or negative TFME values changes is indicated by the color code. D Combined heatmap-upset plot of the top 10 GO terms enriched in the set of differentially active TFs in CBD astrocytes. The plot structure equals to C. E Bubble-connection graph depicting the top 10 co-enriched GO terms of significant TF alterations in PSP and CBD. The yellow dots represent specific GO terms, while their size equal to the number of term-assigned genes (# genes) that are found in the queried data sets. These genes are displayed as red (codownregulated), green (co-upregulated), or grey dots (conflicting direction). F The extent of TFME alterations differs between astrocytes of PSP and CBD origin. Y-axes depict TFME median value deviations from Ctrls' medians in astrocytes. The dots represent single TFMs, while violin convexities indicate the distribution over TFME deviations. For a more intuitive comprehension a horizontal line marks the zero line. The degree of statistical significance is given for the CBD vs. PSP comparison (Wilcoxon rank-sum test).

Suppl.Fig.10 Loss of immaturity markers and acquisition of a reactive inflammatory state
A Heatmap displaying the pseudotime changes in TFME of the TFs belonging to the two major, inverse running clusters as determined by k-means clustering in tradeSeq. Results relate to Ctrl/CBD astrocytes and their pseudotemporal transition only. TFME courses (rows) were clustered hierarchically (Euclidean distance, complete method) and results indicated as dendrogram on the left. B Heatmap displaying the pseudotime changes in GA of tauopathy-associated as well as significantly altered genes in the start-vs.-end and association test in tradeSeq. Candidates had to comply fulfill the criterion of p<.05 (Wald-statistic, Bonferroni correction). The results relate to Ctrl/CBD astrocytes and their pseudotemporal transition only. GA courses (rows) were clustered hierarchically (Euclidean distance, complete method) and results indicated as dendrogram on the left.

B
C D E Suppl. Fig.11 Suppl. Fig.11 Analysis of discretized TFME changes across pseudotime in CBD astrocytes A Projections of pseudotime steps (left) and the MA0099.2_FOS::JUN enrichment (right) onto the UMAP embedding of Ctrl/CBD astrocytes indicating the respective variable as color code. B-E Boxplots showing the TFME values of selected TFMs in Ctrl/CBD astrocytes over 5 discrete pseudotime steps. The color code emphasizes the assigned pseudotime bin. Statistical comparison was conducted between each time step > 1 and the first one (Wilcoxon rank-sum test) and results expressed as asterisks where *p<.05, **p<.01, ***p<.001 and ****p<.0001. On the right, the position weight matrix is displayed via motif sequence logos. The information content is depicted as bits for every base position of the motif sequence.

B C
Suppl. Fig.14 Assessing the involvement of protein degradation pathways on a system level A-C Gene accessibility (GA) heatmaps of genes associated with three major protein homeostasis pathways (UPS, A; CMA, B; UPR, C) in astrocytes. Every column corresponds to a single nucleus and every row to a specific gene, while the color gradient indicates the extent of GA from negative (blue) through intermediate (black) to high (red) values. Rows were clustered hierarchically (Manhattan distance, Ward-D2 method) and results indicated as dendrogram on the left. The column order was fixed while the colored bars at the top inform about the cell type and disease diagnosis. Gene names comply with the Ensembl identifiers. D Heatmap displaying the pseudotime changes in GA of tauopathy and degradation pathways-associated, significantly altered genes in the group-wise comparisons (Fig. 6H, main manuscript). The results relate to Ctrl/CBD astrocytes and their pseudotemporal transition exclusively. Gene assignments to these pathways are indicated by colored bars on the left. E Bubble-connection graph depicting the top 3 co-enriched and top 2 exclusively enriched GO terms of TF signatures in PSP and CBD. The yellow dots represent specific GO terms, while their size equal to the number of term-assigned genes (# genes) that are found in the queried data sets. These genes are displayed as red (co-downregulated), green (co-upregulated), or grey dots (conflicting direction). 'Aonly terms' correspond to PSP-related and 'B-only terms' to CBD-related ones. Abbreviations: CMA, chaperon-mediated autophagy; UPS, ubiquitin-proteasomesystem; UPR, unfolded-protein-response.   Suppl. Fig.15 Re-analysis of the dataset with exclusion of #CBD3 -part 1: dataset characterization A Plot is analogous to Fig.2a. Projections of cluster-cell type assignments and metadata onto the UMAP embedding. Color coding and labels indicate the cell type or sub-cell type identity where applicable. B Plot is analogous to Fig.2c. Boxplots of relative cell type frequencies show reductions in excitatory neurons in PSP (left) and in all neuronal populations in CBD. Higher astrocyte and oligodendrocyte frequencies can be detected in the remaining CBD samples, when compared to the Ctrls' mean (vertical dashed line). Total numbers of cells (# of cells) are indicated as bar plots on the right. C.Plot is analogous to Fig.3c  Suppl. Fig.16 Re-analysis of the dataset with exclusion of #CBD3 -part 2: Pseudotime trajectory of CBD astrocytes A Plot is analogous to Fig.4a. All PSP-, CBD-, and Ctrl-derived astrocytes (excluding #CBD3) re-embedded in UMAP, stratified by group entity (first, second, third panel), and depicted after k-means clustering in a merged UMAP (fourth panel). One cluster (#3) remains specific for CBD astrocytes. Color code indicates group entity or cluster assignments in the first three or the fourth panel, respectively. B Plot is analogous to Fig.4b Suppl. Fig.17 Suppl. Fig.17 Re-analysis of the dataset with exclusion of #CBD3 -part 3: TF networks associated with an astrocytic tauopathy state A Plot is analogous to Fig.5b. Evaluation parameters of classification performance of the trained XGB model on the 20% test set-split of astrocytic nuclei (excluding #CBD3). Overall, more than 81% of predictions were correct (overall accuracy) and the model performs substantially with a Cohen kappa of 68.1%. B Plot is analogous to Fig.5a. Confusion matrix displaying the intersections of the XGB model's predictions (rows) and the actual labels (columns). Each square contains the percentual proportion (large digits) and total numbers (small digits) of test set samples (=20% astrocytic nuclei excluding #CBD3) with the assigned prediction-label-relation. The sums of each row or column are depicted in the rightmost column or undermost row, respectively. The total sample number (i.e., nuclei of the 20% test set-split) is shown in the downright corner. C.Plot is analogous to Fig.5d-f. Lime feature importance bar diagrams of the most certainly correctly classified barcodes of each group entity. The bar direction and bar color indicate the feature weights (~importance) assigned to the TFM, which are given as y-axis breaks. Note, feature weight was assigned to specific TFME value ranges. Each panel is complemented by the group entity label, the models' calculated probability, and the explanatory model's fit value.   Suppl. Fig.19 Co-accessibility at JUNB and TFEB target genes in CBD & PSP astrocytes.

GWASsumAD
A-C Co-accessibility plots at genomic loci including genes that are correlated with TFEB and/or JUNB motif enrichment (A CTSD, B MAP3K8) in astrocytes. Chromosomal location is depicted at the top of each panel, followed by Cicero's co-accessibility links in PSP/CBD (upper) and Ctrl (middle) astrocytes. The y-axes correspond to the extent of co-accessibility, while the value 0.1 is chosen as cutoff to reduce depiction of technical or biological noise. Annotations for this genomic frame (gene location +/-10e4 bp) including protein coding gene isoforms as well as lincRNAs are given in the bottom panel. The distribution of summed ATAC peaks are given beneath each co-accessibility plot. Abbreviations: lincRNAs, long intervening/intergenic noncoding RNAs.