PLCG2 protective variant p.P522R modulates tau pathology and disease progression in patients with mild cognitive impairment.

A rare coding variant (rs72824905, p.P522R) conferring protection against Alzheimer's disease (AD) was identified in the gene encoding the enzyme phospholipase-C-γ2 (PLCG2) that is highly expressed in microglia. To explore the protective nature of this variant, we employed latent process linear mixed models to examine the association of p.P522R with longitudinal cognitive decline in 3595 MCI patients, and in 10,097 individuals from population-based studies. Furthermore, association with CSF levels of pTau181, total tau, and Aβ1-42 was assessed in 1261 MCI patients. We found that MCI patients who carried the p.P522R variant showed a slower rate of cognitive decline compared to non-carriers and that this effect was mediated by lower pTau181 levels in CSF. The effect size of the association of p.P522R with the cognitive decline and pTau181 was similar to that of APOE-ε4, the strongest genetic risk factor for AD. Interestingly, the protective effect of p.P522R was more pronounced in MCI patients with low Aβ1-42 levels suggesting a role of PLCG2 in the response to amyloid pathology. In line with this hypothesis, we observed no protective effect of the PLCG2 variant on the cognitive decline in population-based studies probably due to the lower prevalence of amyloid positivity in these samples compared to MCI patients. Concerning the potential biological underpinnings, we identified a network of co-expressed proteins connecting PLCG2 to APOE and TREM2 using unsupervised co-regulatory network analysis. The network was highly enriched for the complement cascade and genes differentially expressed in disease-associated microglia. Our data show that p.P522R in PLCG2 reduces AD disease progression by mitigating tau pathology in the presence of amyloid pathology and, as a consequence, maintains cognitive function. Targeting the enzyme PLCG2 might provide a new therapeutic approach for treating AD.


The German study on aging, cognition and dementia cohort (AgeCoDe)
The AgeCoDe study is a general practice (GP) registry-based longitudinal study in elderly individuals that recruited patients aged 75 years and above in six German cities from 2003 to 2004 [29]. Exclusion criteria were consultations only by home visits by the GP, residence in a nursing home, severe illness the GP would deem fatal within 3 months, insufficient facility in German, deafness or blindness, lacking the ability to consent and not being a regular patient of the participating practice. A total of 3,327 patients gave informed consent for participation and received follow-up assessments every 18 months. Up to 8 follow-up visits were considered with a maximal follow-up time of 12.3 years. All assessments were performed by trained physicians and psychologists at the patient's home environment using standardized questionnaires. Valid PLCG2 genotype data were available in 1,978 participants. Of those, 1,967 participants were all included in the analysis of the cognitive decline in the population-based studies. Nine participants were excluded due to missing APOE-ε4 genotype and two participants were excluded due to age below 75 indicating failure to meet the AgeCoDe inclusion criteria.
At baseline and each follow-up, AgeCoDe participants received an MCI diagnosis according to Winblad criteria [60] which were operationally defined by evidence for cognitive impairment (-1SD) in the neuropsychological test battery of the SIDAM, self-report of deterioration of memory functions and preserved activities of daily living (SIDAM-ADL-Scale) [63]. Using the data from the baseline assessment of the full sample, the prevalence of MCI was estimated at 15.4% [29] which closely resembles meta-analytic prevalence estimates form general population-based studies [48]. For the analysis of the cognitive decline in MCI patients, 929 participants who received an MCI diagnosis at any visit of the AgeCoDe cohort were included. However, for this analysis, we used the time point of MCI diagnosis as the new analysis baseline, i.e. no observations recorded before the MCI diagnoses were considered. This approach was chosen to align the data from AgeCoDe MCI patients with the data from the other three memory clinic cohorts with longitudinal data (i.e. DCN, ADC, and FACE) since the observation also started at the time point of MCI diagnosis in these cohorts. For the MCI analysis up to 8 followup visits were included with an observation time of up to 12.2 years.
The present study was approved by the respective ethics committees, and written informed consent was obtained from all participants before inclusion. All study procedures complied with national legislation and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association.

Three City study (3C)
The 3C study enrolled 9,294 non-institutionalized participants aged 65 years or above, sampled from the electoral rolls of three French cities of Bordeaux, Dijon, and Montpellier between 1999 and 2001 [2]. Inclusion criteria were (1) living in these cities or their suburbs and registered on the electoral rolls, (2) aged 65 years and over and (3) not institutionalized.
Each participant signed informed consent. Partial refusal of the participation in specific parts of the examination (e.g. blood sampling or magnetic resonance imaging) did not result in exclusion from the study. Details about the study design of 3C were reported previously [2]. Health-related data were collected using standardized questionnaires during face-to-face interviews. Participants were assessed for up to 10 years in approximately 2 years intervals.
The 3C study did not implement a standardized, regular assessment of MCI diagnosis at each visit but an assessment of dementia diagnosis. 2.2% of the 3C study participants were classified as demented at baseline [2].
The study protocol was approved by the ethical committee of the University Hospital of Kremlin-Bicêtre. The study was conducted according to the principles expressed in the Declaration of Helsinki.

Longitudinal Aging Study Amsterdam (LASA)
The prospective cohort study Longitudinal Aging Study Amsterdam (LASA) consists of a nationally representative sample of older adults aged between 55 and 85 years who were recruited from the regions around Zwolle, Oss, and Amsterdam in the Netherlands. The LASA study incorporates three cohorts. The first LASA cohort included 3,107 participants and was selected from the municipal registries of the regions of the Netherlands named above between 1991 and 1992, with an oversampling of the oldest old and older men. The subjects of the first LASA cohort were followed-up for up to 24 years in 3-year intervals. A second cohort was recruited between 2002 and 2003 in the same regions as the original cohort and consisted of 1,002 subjects that were assessed for up to 13 years again in three-year intervals. The third cohort included 1,023 participants that were recruited between 2012 and 2013 and received two assessments (in three-year intervals) so far. At all visits, interviews were performed by trained interviewers in the respondents' home environment. Further information is provided by Huisman and colleagues [18]. In LASA, no formal assessment of dementia or MCI was performed. In total 5,132 participants were included at baseline (3,107 participants were included in cohort 1, 1,002 in cohort 2 and 1,023 in cohort 3). The LASA study is conducted in line with the Declaration of Helsinki and was approved by the medical ethics committee of the VU University medical center.

Supplementary text 2. Genotyping
To genotype the rs72824905 variant (p.P522R) in samples from FACE, UKB, DCN and AgeCoDe, a custom TaqMan® SNP genotyping assay was designed using the available Applied Biosystems online assay design tool. Oligonucleotide primers were ordered from Applied Biosystems (Thermo Fisher Scientific) and performed according to the manufacturer's instructions in a Quantstudio-6™ Real-Time PCR Systems (Thermo Fisher Scientific). The quality of the assay was confirmed by direct sequencing of heterozygous samples. Assays' accuracy was checked by including positive and negative controls in each experiment.
ADC was genotyped using the Illumina Global Screening Array (Infinium-global-screening-array-24-v1 with GSAsharedCUSTOM_20018389_A2) and applied established quality control methods [8]. We used high-quality genotyping in all individuals (individual call rate > 98%, variant call rate > 98%), individuals with sex mismatches were excluded and Hardy-Weinberg equilibrium-departure was considered significant at p < 1 × 10 −6 . The PLCG2 variant was part of the custom content of the GSA array.
In ADNI, genotyping was performed using the Illumina Human610-Quad BeadChip, HumanOmniExpress BeadChip and Illumina Omni 2.5M platforms as previously described and applied established quality control methods were applied including exclusion of duplicates, highly related individuals and non-caucasian subjects (based on a ±6 SD from the man cut-off for the first principal component and a ±3 SD cut-off for the second principal component). The p.P522R genotypes were derived from imputation using the HRC Michigan Imputation Server [34]. At the HRC server, SHAPEIT was used to phase the data, and imputation was performed using the HRC reference panel (hrc.r1.1.2016) and a cut-off for imputation quality of R²=0.3. Imputed genotypes were returned by the service. For the current analysis, we used "best-guess" genotypes of p.P522R based on the platform with the highest quality (genotype probability>=0.8).
In the 3C study, genomic DNA samples of 6,636 individuals were transferred to the French Centre National de Génotypage (CNG) as part of a previous replication effort [52]. Samples that passed DNA quality control were genotyped using the Agena Bioscience MassARRAY platform. We also excluded samples with more than three missing genotypes and males heterozygous for X chromosome variants present within the panel. Variants were excluded based on missingness >5%, Hardy-Weinberg equilibrium (in cases and controls separately) < 1 × 10 −5, and differential missingness between cases and controls < 1 × 10 −5. After exclusion, 6,201 samples were available.
The LASA cohorts were genotyped using commercial genotyping arrays and then followed a genotyping quality control and an imputation step. For genotyping, the Infinium-global-screening-array-24-v1 was used in 1,899 samples and the AXIOM-NL array was utilized for 625 [11]. The following quality control (QC) criteria were used with the individuals genotyped: individuals with a call rate <98%, an excess heterozygosity rate or a gender mismatch, and individuals which were duplicates or PCA outliers using a PCA projection of the study samples onto 1KG were removed. A total of 2,358 samples passed the QC (1,779 GSA and 579 Axiom array). At the variant level following QC criteria were used: all monomorphic markers and markers with Hardy-Weinberg p>10 -6 or call rate <98% were removed. For imputation, the dataset was prepared using scripts provided online (Haplotype Reference Consortium [HRC] imputation preparation and checking, version 4.2.5). Imputation to the HRC was performed at the HRC Michigan Imputation Server [34]. At the HRC server, SHAPEIT2 (version 2, .r790) was used to phase the data, and imputation was performed using the HRC reference panel (version 1.0) using Minimac 3. Imputed genotypes were returned by the service. Imputation quality (R2) was 0.89 for GSA

Supplementary text 3. Examination of the effect of population stratification
To evaluate the impact of adjusting for potential population stratification using principal components (PC), we restricted the analysis of the effect of p.P522R in MCI to those samples with GWAS data available. Thus, we could calculate PC for the sample and include them in the analysis. All cases from ADNI and ADC had GWAS data available, while 632 (68.2%) MCI patients of AgeCoDe and 348 (65.8%) DCN and 1,060 (97.1%) of the FACE cohort provided GWAS data. We then rerun only analyses showing significant associations in the main analyses in the paper with and without inclusion of the first two PCs in our model. We assessed the perceptual change in the parameters. As suggested previously on confounding effects [31], changes of less than 10% were considered an indication that the effects found in our study are independent of adjustments for PCs. Data was not pooled across cohort as PCs were computed per cohort and therefore represent a cohort-specific measure to correct for population stratification that could not be harmonized across cohorts.
In the analyses of cognitive decline, we did not re-analyze the data from ADC since the low number of carriers and the short follow-up for these carriers had already affected the stability of our estimations in the main analyses (see supplementary figure 2). in the main analysis and in all individual cohorts derived from the interaction of p.P522R and the linear term indicating that the stronger change for the quadratic term could represent random noise. In line with this, the pvalues for the association of p.P522R with the cognitive decline that takes into account both, the linear and the quadratic term, remained unchanged by PC adjustment. We therefore conclude that PC adjustment had no effect on the associations of p.P522R in each cohort.
For the analyses on CSF biomarkers, no change above 10% in the estimates for the effect of p.P522R was observed for pTau 181 and tTau for which we observed a significant influence in our main analysis. Changes in estimates for Aβ 1-42 may again arise from random noise in the non-significant association with p.P522R. Of note, the effect observed on Aβ 1-42 in DCN is in the opposite direction as in the ADNI cohort suggesting again a random noise more than a true association.

Supplementary text 4.1: CSF collection
In the ADC cohort, a lumbar puncture was performed using a 25G needle. The CSF was collected in polypropylene tubes and centrifuged at 1800/2100 g for 10 min at 4°C. The CSF was immediately stored at −20°C until further analysis (maximum 2 months). CSF levels of Aβ 1-42 and pTau 181 and total Tau were measured using commercial ELISA immunoassays. Quantification was performed in the neurological laboratory of the VU University Medical Center in Amsterdam following the protocols described by Mulder and colleagues [39]. For the definition of AD biomarker abnormality, we applied previously published cut-off values, which are the standard in the Amsterdam CSF laboratory [39,66]. Abnormal CSF-Aβ 1-42 was defined as values <640 pg/ml, abnormal CSF-total tau (tTau) was defined as values >375 pg/ml, and CSF-pTau 181 >52 pg/ml.
In the DCN and the memory clinic Bonn samples, lumbar punctures from the L3/L4 or L4/L5 intervertebral region were performed at the respective department of neurology or psychiatry. The CSF samples were kept on ice for a maximum of 1 h and then centrifuged for 10 min (2000 g at 4°C). Samples were aliquoted to 0.25 ml and stored in polypropylene tubes at −80 °C until analysis. All CSF samples from DCN and 257 samples from the memory clinic of Bonn were sent to the Department of Psychiatry in Erlangen for quantification. Levels of Aβ 1-42 and pTau 181 and total Tau were measured using commercially available ELISA immunoassays INNOTEST ® β-amyloid  and INNOTEST ® PhosphoTAU (181p) (Innogenetics), in accordance with the protocols described by Lewczuck and colleagues in an ISO9001-certified laboratory under a routine qualitycontrol regimen (intra-assay coefficients of variation: 2.3%-5.9%; interassay coefficients of variation: 9.8%-13.7%) [24,25]. We performed all analyses in duplicate and used the mean of the two. We defined abnormally low CSF Aβ 1-42 < 600 pg/mL, abnormally high CSF tTau > 300 pg/mL, and abnormally high CSF pTau 181 > 60 pg/mL based on DCN specific, previously published cutoff values.
In 27 CSF samples collected at the university hospital Bonn, CSF samples were processed using the protocol established by the local biomarkers laboratory. Briefly, CSF samples kept at 4 °C until processing for biobank storage at -80 °C. Processing was completed on the day of lumbar puncture and samples were stored for no longer than 4 weeks until analysis. Samples and calibrators were run in duplicates, and samples with a coefficient of variation (CV) > 20% were repeated. A pooled and aliquoted CSF sample was run as an internal standard on each assay plate to control for inter-run variance. For AD core biomarkers, the V-PLEX Aβ Peptide In the ADNI cohort, CSF was collected as previously described [50]. In brief, CSF samples were collected in the morning after an overnight fast. After collection and transfer into polypropylene tubes, CSF samples were frozen on dry ice within 1 hour after collection followed by shipping to the ADNI Biomarker Core Laboratory at the University of Pennsylvania Medical Center on dry ice. Afterward, thawing (1 hour) at room temperature and gentle mixing, aliquots (0.5 mL) were prepared and stored in bar code-labeled polypropylene vials at −80°C. All CSF biomarkers were quantified using the multiplex xMAP Luminex platform (Luminex Corp, Austin, TX) with Innogenetics (INNO-BIA AlzBio3; Ghent, Belgium; for research use-only reagents) immunoassay kit-based reagents. We used < 192 pg/mL as the cut-off for Aβ 1-42 , >23 pg/mL as the cut-off for pTau 181 and 93 pg/mL as the cut-off for tTau as previously recommended.

Supplementary text 4.2: Harmonization of batch effects
Due to technical differences, CSF samples analyzed with different quantification batches are not measured on the same scale. To pool and jointly analyze CSF samples quantified with different batches, a harmonization procedure is required. In this study, the method described by Zhou and colleagues was applied [65]. This method allows the harmonization of non-overlapping samples. To do so a transformation is determined that converts CSF samples analyzed with different batches to the same scale and provides a p-value to assess the accuracy of the transformation. The Matlab (MATLAB Release 2016b The MathWorks, Inc., Natick, Massachusetts, United States) implementation created by Zhou and colleagues was used (https://github.com/hzhoustat/PNAS2018). For the harmonization efforts, we included patients with dementia of the Alzheimer's type in the analyses to broaden the range of pathologies and the sample size. This might improve the removal of methodological differences.
To ensure that the transformation only harmonizes differences in the scales induced by batches it is necessary to account for sample differences due to sample selection bias or different population characteristics. To this end, a set of relevant covariates have to be identified whose influence is then removed by stratifying the sample according to these covariates. To identify those covariates, graphical cause model techniques can be applied. Zhou and colleagues demonstrated that for the harmonization of CSF in AD research, age and diagnosis are important determinants. However, especially in the context of genetics and samples with highly different age ranges, the APOE-ε4 allele is another important factor since it influences CSF levels and the age at onset of Alzheimer's disease [28,41]. Hence, APOE-ε4 was added to the causal graph proposed by Zhou and colleagues [65]. A graphical representation of the extended graph is presented below: As can be seen from the figure, bias induced by sample selection and population characteristics is selectively linked to the CSF measures via the three variables age, APOE-ε4 and diagnosis. Removing their influence will enable correction for differences in scales between different CSF batches.
On the other hand, no differences in population characteristics between the samples from the DCN and UKB cohort were expected, regardless of the laboratory and the batch used for quantification since patients referring to the memory clinic of the university hospital of Bonn were included in both cohorts and since inclusion criteria for both cohorts were largely similar. Both cohorts did indeed not differ in frequency of age groups (i.e. <60/60-70/71-75/>75), χ²(3)=2.19, p=0.534), APOE-ε4 (χ²(1)=2.80, p=0.094) and gender (χ²(1)=2.37, p=0.124). However, the frequency of diagnoses (i.e. MCI or dementia) was different and the diagnosis was therefore used for stratification during the computation of the harmonization transformation.
MCI patients of the DCN and the ADC cohort are expected to differ concerning relevant sample characteristics because they were recruited in different countries using different study protocols. Both cohorts differed concerning age (χ²(3)=14.140, p=0.002) and APOE-ε4 (χ²(1)=8.36, p=0.004) but not with regard to gender (χ²(1)=2.37, p=0.124). The former two variables were used to stratify the samples during the harmonization.
For the ADNI cohort, sample characteristics were also expected to differ from those of the CSF Erlangen sample. In line with this expectation, we observed significant differences regarding age (χ² (3)

Supplementary text 5. Neuropsychological assessments
In all MCI samples, the MMSE was considered for the assessment of cognition [13]. The test is a global screening for dementia consisting of 30 items. This test was chosen because it was the only test available across all cohorts that provided a sensitive global assessment of cognitive change across different cognitive domains. Previous research has shown that it provides good sensitivity to cognitive changes in MCI patients [46].
In the 3C study, the global cognition of the participants was also assessed using the MMSE. Besides, the Isaac set test (IST) [20] and the Benton visual retention test (BVRT) [4] were assigned as a measure of verbal fluency and episodic memory, respectively. In the 3C study, the IST requires the naming of words of different, consecutively administered semantic categories (i.e. colors, animals, fruits and cities) for 15 seconds. The BVRT consists of 15 stimulus cards presented for 10 seconds. Afterward, the participants are asked to select each stimulus among 3 distractors.
In AgeCoDe, the CERAD delayed word list recall and animal fluency task of the CERAD neuropsychological test battery [38] were considered for cognitive assessments in addition to the MMSE. Similar to the 3C study, they assess verbal fluency, episodic memory, and global cognition, respectively. The CERAD delayed word list recall consists of 10 words that are presented and have to be recalled in three consecutive learning trials. After a delay, the participant is asked to freely recall from long-term memory. The animal fluency task requires the participant to name as many animals as possible in one minute.
In LASA, the global cognition of the participants was also assessed using the MMSE. Episodic memory was assessed using the 15 Words Test, the Dutch version of the Auditory Verbal Learning Test [47,49]. The tests consisted of 15 words which were learned during 3 trials. After every trial, the respondent was asked to recall as many words as possible. After a distraction period of 20 minutes, the respondent was asked to name again the learned words. The total number of words learned during 3 tests is the recall score (range 0-45). The number of words reproduced after 20 minutes is the delayed recall score (range 0-15) which was used in this research [9].

Supplementary text 6.1: Latent process linear mixed models for the analysis of the cognitive decline
We chose to model the continuous cognitive decline in patients with MCI rather than conversion to dementia since the analysis of continuous traits can improve statistical power to detect genetic determinants [51]. Besides, simulation has demonstrated that the application of linear mixed models of cognitive decline may outperform survival analysis concerning the sensitivity for modulators of disease progression [10].
However, cognitive tests used as outcomes in these analyses frequently show unequal interval scaling and ignoring this issue can introduce bias in the analysis of cognitive decline [44]. Taking this source of bias into account is recommended by current guidelines for longitudinal dementia research [59]. Therefore, we used linear mixed models with a latent process as implemented in the R package LCMM [45]. These models jointly estimate a latent process representing the true change of cognition over time and a link function that relates this process to the observed cognitive measurements. To select the most appropriate link function adjusting for unequal interval scaling, we compared several options: a linear link function, a beta link function, and quadratic I-splines with varying numbers of knots placed at the percentiles or equidistant along with the distribution of the outcome. All possible link functions were fitted using models including a random intercept and a linear or quadratic polynomial trend of time in the random and/or fixed effects (i.e. linear fixed and random effect, quadratic fixed and linear random effect, and quadratic random and fixed effect). An evaluation of the need for non-linear decline is recommended by guidelines for statistical analysis in dementia research [59]. Besides, the omission of a relevant non-linear effect from the mixed model can induce spurious effects for predictors that are correlated with the person-specific mean of the variable showing the non-linear effect that was omitted [3]. In our model, omitting a non-linear trend of time would possibly introduce spurious effects for p.P522R because the variant is related to survival [56] so that p.P522R can be expected to show a higher mean observation time.
Only polynomial trends up to the second-order (i.e. quadratic) were examined. More complex models including higher polynomial terms of time could not be reliably fitted in the limited number of p.P522R carriers in our cohorts and would have resulted in overfitting to random variation in the longitudinal cognitive data of the carriers. In the pooled cohort of MCI patients, we allowed the fixed effects of the polynomial trends of the time to vary across cohorts. All models were also combined with different zero-mean Gaussian stochastic processes that take into account correlations between the observations besides the random effects. We considered an uncorrelated errors structure, a first-order autoregressive and Brownian motion process.
Among all fitted models, those with the lowest BIC and appropriate model fit according to a visual inspection of residual plots were chosen to assess the effect of p.P522R. These factors (i.e. link function, the shape of the cognitive decline trajectory, residual error structure) were evaluated in combination since the interaction of these factors concerning model fit is hard to predict a priori.
The procedures described above were repeated for each outcome in each of the population-based cohorts (AgeCoDe, 3C, and LASA) and the sample of MCI patients pooled across cohorts. However, for the analysis of the MMSE in LASA and the 3C study, a beta-link function was chosen a priori since this link function has been shown to provide a good modulation of the unequal interval scaling of this test in the literature and provided the best model fit in the AgeCoDe cohort and the MCI sample [44]. The selection of the same link function in the analyses may also contribute to the comparability of results across cohorts.
Of note, the time scales used to set up the linear mixed model with a latent process differed between the MCI samples and the population-based cohorts. In the population-based cohorts, chronological age at each visit was used as the most natural time scale to study the cognitive change in elderly populations [59]. We also included age at baseline to model the convergence of age-related cognitive trajectories of individuals from different birth cohorts [16,59]. Analyses, therefore, focus on the age-related cognitive change over time.
In the MCI samples, however, time from diagnosis was used because for this analysis, the focus was more on disease progression in the at-risk stage for AD. Nevertheless, we controlled for age at baseline to take into account different chronological ages of the MCI patients and to correct for birth cohort differences. All MCI samples were pooled and analyzed jointly. This approach is called integrative data analysis and it is recommended in case of rare predictor variables to enable the application of more complex, accurate statistical models, reduce the influence of outlying observations and to maximize the power to detect associations [7]. The use of the MMSE as a common test across cohorts assured that the cognitive decline was assessed using a homogenous, harmonized outcome measure in all cohorts.
Only after the selection of the best fitting combination of link-function, the shape of the trajectory of the cognitive decline and additional correlation structure between observations, covariates and p.P522R were included in the model as well as their interaction with all polynomials of time. The significance of the effect of p.P522R on the cognitive decline was assessed using a multivariate Wald test of the joint effect of all rare variant-time interactions as implemented in the LCMM package.

Supplementary text 6.2: Computation of effect sizes in linear mixed models with non-linear effects
In case of a non-linear trajectory of cognition over time, the effect of a predictor under study (i.e. p.P522R or APOE-ε4 in this analysis) changes may change over time as the speed of cognitive changes is not constant over the observation period. It is therefore not straightforward to provide a single effect size of the association of the predictor and the cognitive decline. Consequently, we computed effect sizes at multiple time points of the cognitive trajectory to evaluate the possible changes in the association. To this end, we derived expected differences in cognition at each time point concerning the baseline level of cognition based on the fixed effect estimates in the latent process linear mixed models. As those estimates were on an arbitrary, latent scale, we used the expected variance of the latent process at the last evaluated tie point to standardize the estimates. Estimates of this variance were derived from the random effect variance-covariance matrix of the latent process linear mixed models.

Supplementary text 6.3: Generalized additive models
Generalized additive models are a flexible method to assess the relationship between a dependent variable and a set of independent variables where the functional form of the relationship does not have to be linear. To this end, it is assumed that the dependent variable is related to the independent variables via a smooth function. The model can be written as: where Y is the dependent variable, E() is the expected value, a is the intercept of the model, s n () are smooth functions and 1 , …, x n are independent variables. The smooth functions can be estimated from the data while taking into account the model complexity and fit to the data. In the current analyses, the R-package mgcv [62] was used to estimate the generalized additive models. The smooth functions were represented by thin plate regression splines [61].
To assess the interplay of p.P522R and Aβ 1-42 levels in CSF concerning tTau and pTau 181 , we estimated a varying-coefficient model that estimated a smooth relationship between Aβ 1-42 levels and tTau or pTau 181 levels as well as an effect for p.P522R that was allowed to vary smoothly with the levels of amyloid pathology: E( ) = a + s 1 (Aβ 1−42 )+s 2 (Aβ 1−42 )•p.P522R+ 3 ( )+sex+APOE-ε4 +CSF sample Analyses were controlled for sex, APOE-ε4 and CSF sample as parametric linear effects. For age, a smooth term was estimated to take into account a potential non-linear relationship to CF biomarkers.
To infer those levels of Aβ 1-42 at which p.P522R is associated with tTau or pTau 181 levels, we performed posterior simulations as implemented in the "plot_smooth" and "plot_diff" functions form the R-package itsadug [57]. Estimates of the expected relationship of Aβ 1-42 and tTau or pTau 181 for p.P522R carrier and non-carrier as well as expected differences between the two groups were derived. Simultaneous confidence intervals (as opposed to point-wise confidence intervals) were computed to account for multiple testing.

Supplementary text 6.4: Structural Equation modeling
We fitted structural equation models to assess mediation of the effect of p.P522R on the cognitive decline via CSF biomarkers in Mplus version 7.31 [40]. As recommended we did structural equation modeling instead of estimates from separate regression and linear mixed models to estimate mediation since this can slightly increase the efficiency of the estimation and simplifies the use of multiple mediators and the modeling of their interplay [19]. To assess the fit of all structural equation models, we provide the root mean square error of approximation (RMSEA), the comparative fit index (CFI) and the standardized root mean square residual (SRMR). An RMSEA<0.5, a CFI>0.95 and an SRMR<0.08 indicate a good fit to the data [17].
We modeled the effect of p.P522R on Aβ 1-42 and pTau 181 taking into account the effect of Aβ 1-42 on pTau 181 that is postulated by the amyloid cascade hypothesis [21]. All three variables were allowed to predict the cognitive decline that was derived using latent growth curve models. In the second series of models, pTau 181 was replaced by tTau. We did not include tTau and pTau 181 in a single model due to the high correlation between the two biomarkers in our sample (r=0.82).
Since the structural equation model does not allow a straightforward implementation of link functions taking into account unequal interval scaling in cognitive tests as implemented in the latent process linear mixed models described above, the normalized version of the MMSE was used [43]. This version of the MMSE is corrected for unequal interval scaling due to a transformation derived from several large cohort studies.
The latent growth curve models used to model the cognitive decline in the normalized MMSE require data to be assessed at fixed time points that should not show large inter-individual variation and reasonable overlap of observations at different time points for the same individuals (covariance coverage) to result in a successful estimation. We, therefore, included data that were assessed in yearly intervals for up to 4 years which reflects a common follow-up scheme across cohorts. One additional assessment after 6 months was included due to sufficient covariance coverage.
All estimations were performed using full information maximum likelihood (FIML) that can handle missing data which is missing at random. We modeled the cognitive decline using three latent factors representing cognitive level at baseline, linear and quadratic change over time, respectively. The need for a quadratic slope was assessed using the Akaike information criterion (AIC) and sample-size adjusted BIC. Finally, 95%-Confidence intervals for the indirect effects were derived using 1000 bootstrap samples [30]. When examining the interaction effect of p.P522R and CSF biomarkers significance of the interaction was assessed using a multivariate Wald test on the effect of the interaction on the linear and the quadratic slope.

Supplementary text 6.5: Selection of covariates
When assessing the influence of p.P552R, we considered age, gender, education, APOE-ε4 status and cohort as covariates. Education was operationalized as participation in secondary education. APOE-ε4 status was defined as the presence or absence of the APOE ε4 allele.
We chose the covariates to exclude the possibility that a random association between these standard demographic variables and p.P522R might bias our results. Likewise, APOE-ε4 as the strongest genetic risk factor for AD might influence our results due to a random co-occurrence of p.P522R and APOE-ε4. Since none of these variables is thought to mediate the effect of p.P522R on the outcomes under study (i.e. CSF AD biomarkers and cognitive decline) inclusion for these variables might increase the statistical power to detect associations [36]. Cohort was included as a covariate to control for residual differences between samples that were not removed by harmonization strategies (i.e. common neuropsychological test across cohorts, statistical harmonization of continuous CSF AD biomarkers; [65]). In the CSF analyses, CSF sample was used instead of recruiting cohort as a covariate since batch effects are expected to be the most important source of heterogeneity between samples.
In some analyses, additional covariates were included. In the analysis of the cognitive decline in the 3C study, study center was included as an additional fixed effect to control for the difference between the study centers as it is common practice in analyses of the 3C study. In LASA, the platform used for genotyping was included to control for differences between assessment methods of p.P522R. This was not necessary for other cohorts because here, identical genotyping procedures were applied.
Importantly, in the case of the detection of significant associations of p.P522R, analyses were repeated without adjustment for covariates to examine the sensitivity of the results to covariate selection.

Supplementary text 6.6: Enrichment analyses for the gene co-expression network
Fisher's exact test were applied to assess enrichments for genes co-expressed with PLCG2 and other gene lists.
We assumed a total population size of 19,080 genes since according to the number of all genes examined by the GeneFriends tool [54].
To assess the enrichment of PLCG2-related genes co-expressed with APOE or TREM2, respectively, we used Fisher's exact test (two-sided). When computing these tests, APOE, TREM2, and PLCG2 themselves were not considered as a part of the population because those genes were chosen to construct the co-expression network.
To assess the enrichment of differentially expressed genes in microglia under neurodegenerative conditions among genes in the gene set shared between APOE, TREM2, and PLCG2, we first derived lists of those differentially expressed genes from the literature. From the work of Keren-Shaul and colleagues [22], 500 mouse genes differentially expressed in disease-associated compared to homeostatic microglia (see supplementary table  1 of the publication by Keren-Shaul and colleagues [22]) were merged to their human homologs using the homologene package in R and the genes included in the GeneFriends database. This resulted in a list of 320 genes. From the work of Friedman and colleagues [14], we extracted human genes differentially expressed in microglia of wild type compared to hMAPT -P301S mice (see supplementary data 2 of the publication by Friedman and colleagues [14]) were extracted. Of those 318 were included in the GeneFriends database. Regarding differential expressions in human AD patients, we selected the list of the genes published by Mathys and colleagues [33] (see supplementary table 7 of the publication by Mathys et al. 2019 [33]) yielding 75 genes that were also included in the GeneFriends database. For all analyses on differentially expressed genes in microglia under neurodegenerative conditions, we considered APOE, TREM2, and PLCG2 as part of the underlying population of genes. Note. Sample size n=3595. Unequal interval scaling was adjusted using a beta-link function. A Brownian motion process zero-mean Gaussian stochastic process was used to model the correlation between observations besides the random effects. Multivariate Wald test refers to the joint test of the interaction of the genotype (i.e. p.P522R or APOE-ε4) with polynomials of time. a: Analysis adjusted for cohort, age at baseline, sex, education, and APOE-ε4 status. b: effect on cognitive function at the median of observation time (i.e. two years). Est=Estimate; SE=Standard error; p=p-value; df=degrees of freedom; χ²=statistic of the multivariate Wald test.

Supplementary Figure 2. Cohort-specific effects of p.P522R on the cognitive decline in the Mini-Mental State Examination
Note. a: Fundacio ACE (FACE), b: German study on aging, cognition, and dementia (AgeCoDe), c: Alzheimer's disease neuroimaging (ADNI) cohort, d: Dementia competence network (DCN) cohort e: Amsterdam Dementia cohort (ADC). In the ADC cohort, only one carrier provided more than 1 year of followup (and up to 4 years of follow-up) so that the displayed trajectory relies almost completely on one observation and is therefore prone to overfitting and measurement error. f: Difference between p.P522R carrier and noncarrier on the cognitive change from baseline. Differences were derived based on the p.P522R*time and p.P522R*time² terms of the linear mixed model with a latent process using the pooled sample of all cohorts but an interaction effect between p.P522R and polynomials of time and cohort. Differences are displayed on the scale of the latent process that was standardized using the expected variance of the latent process of the last time point considered (i.e. 12 years after baseline, see supplementary text 6.2). The prediction was not derived from stratified analyses to avoid different scaling of the latent process across cohorts due to the difference in the estimated link function relating the latent process to the observed MMSE values. . a unequal interval scaling of the outcome was adjusted using a beta-link function. b unequal interval scaling of the outcome was adjusted using a I-spline function with 6 equidistant interior knots. c unequal interval scaling of the outcome was adjusted using a I-spline function with 4 interior knots placed at the quartiles of the outcome distribution. d unequal interval scaling of the outcome was adjusted using a I-spline function with 3 equidistant interior knots.

Supplementary
In the 3C study and AgeCoDe, a Brownian motion process residual error structure was used. In LASA, a first-