A comprehensive study of the genetic impact of rare variants in SORL1 in European early-onset Alzheimer’s disease

The sortilin-related receptor 1 (SORL1) gene has been associated with increased risk for Alzheimer’s disease (AD). Rare genetic variants in the SORL1 gene have also been implicated in autosomal dominant early-onset AD (EOAD). Here we report a large-scale investigation of the contribution of genetic variability in SORL1 to EOAD in a European EOAD cohort. We performed massive parallel amplicon-based re-sequencing of the full coding region of SORL1 in 1255 EOAD patients and 1938 age- and origin-matched control individuals in the context of the European Early-Onset Dementia (EOD) consortium, originating from Belgium, Spain, Portugal, Italy, Sweden, Germany, and Czech Republic. We identified six frameshift variants and two nonsense variants that were exclusively present in patients. These mutations are predicted to result in haploinsufficiency through nonsense-mediated mRNA decay, which could be confirmed experimentally for SORL1 p.Gly447Argfs*22 observed in a Belgian EOAD patient. We observed a 1.5-fold enrichment of rare non-synonymous variants in patients (carrier frequency 8.8 %; SkatOMeta p value 0.0001). Of the 84 non-synonymous rare variants detected in the full patient/control cohort, 36 were only detected in patients. Our findings underscore a role of rare SORL1 variants in EOAD, but also show a non-negligible frequency of these variants in healthy individuals, necessitating the need for pathogenicity assays. Premature stop codons due to frameshift and nonsense variants, have so far exclusively been found in patients, and their predicted mode of action corresponds with evidence from in vitro functional studies of SORL1 in AD. Electronic supplementary material The online version of this article (doi:10.1007/s00401-016-1566-9) contains supplementary material, which is available to authorized users.


Introduction
Alzheimer disease (AD) is the most common neurodegenerative disease and the predominant cause of dementia worldwide. Up to 10 % of AD patients is diagnosed with early-onset AD (EOAD), manifesting symptoms before the age of 65 years [18]. EOAD has a very strong genetic component, with a heritability estimate of 92-100 % [53]. A positive family history of AD is present in 35-60 % of EOAD patients, with up to 15 % of familial EOAD cases showing autosomal dominant inheritance [8,19]. However, mutations in the known causal genes, encoding Amyloid Precursor Protein (APP), Presenilin 1 and 2 (PSEN1 and PSEN2), explain only 5-10 % of EOAD cases [4,19,53].
Rare variants in the sortilin-related receptor (SORL1) gene have been shown to contribute to early-onset as well as late-onset familial AD [35,37,48]. SORL1 was originally identified as a risk gene for AD in a candidate-gene based association study [42]. Early replication studies showed discrepant findings, possibly due to allelic heterogeneity, locus heterogeneity or lack of statistical power due to small cohort size. Nonetheless, the association was subsequently confirmed in meta-analyses [20,24,39,50] and in genome-wide association studies (GWAS) including Korean, Japanese and Caucasian individuals [24,33,39]. The protein encoded by SORL1 is a type-1 transmembrane, mosaic protein showing homology to the vacuolar protein sorting 10 (Vps10p) family, and the lipoprotein receptor-related proteins (LRP) [52]. The protein SORL1 is unique among the Vps10p-family proteins as it contains additional ligand-binding structures within the LRP domains, including a β-propeller domain, a low-density lipoprotein receptor class A domain, and a fibronectin type-3 domain [2,16]. The SORL1 protein interacts directly with the APP protein through its complementtype repeats within the low-density lipoprotein receptor class A domain [1,2], and via a six amino acid-stretching FANSHY motif located in the cytoplasmic tail of SORL1 [14]. Interaction with the protein APP, results in sequestering of APP away from the secretase cleavage route, inhibiting formation of the amyloid-β (Aβ) peptide [2,14,32,36]. Functional characterization of downstream effects of variants identified in familial early-onset and late-onset AD patients elucidated a protective role for SORL1 in the amyloidogenic pathway. Investigation of the functional implications of the familial variant, p.Gly511Arg, showed disrupted interaction of the Vps10p domain with amyloid-β monomers, resulting in reduced lysosomal targeting of Aβ peptide by SORL1 [7]. Two additional rare variants, p.Glu270Lys and p.Thr947Met, were reported in familial late-onset AD patients of Caribbean-Hispanic origin. Both increased Aβ1-40 and Aβ1-42 secretion, and APP levels at the cell surface in transfected cell lines [48].
In this study, we investigated the contribution of genetic variants in the SORL1 coding region to the occurrence of AD in pan-European cohorts of 1255 early-onset AD patients and 1938 age-matched non-affected control individuals.

Study population
The cohort under study consisted of 1255 EOAD patients originating from Flanders-Belgium (n = 312), Spain (n = 342), Portugal (n = 106), Italy (n = 205), Sweden (n = 183), Germany (n = 100), and Czech Republic (n = 7), and 1938 age-matched European control individuals originating from Flanders-Belgium (n = 748), Spain (n = 306), Portugal (n = 130), Italy (n = 444), Sweden (n = 303), and Czech Republic (n = 7) (supplementary  table 1a). An additional set of patients (n = 30), from the same source population, carrying a known pathogenic mutation in APP, PSEN1 or PSEN2, were not included in the study cohort, but used for comparison of clinical characteristics. Mean onset age of the patient cohort was 59.0 ± 6.2 years. Mean age at inclusion for the control cohort was 66.4 ± 9.8 years. In both the patient and the control cohort, 60 % was female. In the patient cohort, information on familial history of AD was available for 759 (60 %) individuals. A positive familial history (defined as presence of at least one first-degree relative with AD) was present for 327 (43 %) individuals, while 432 (57 %) individuals were considered sporadic patients. DNA and medical/demographic information on patients and control individuals from Spain, Portugal, Italy, Sweden, Germany, and Czech Republic was ascertained through the EU EOD consortium as previously described (details are provided in supplementary table 1b) [5,11,45,46]. Consensus diagnosis of possible, probable or definite AD was given according to the National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer Disease and Related Disorders Association (NINCDS-ADRDA) [29] and/or the National Institute on Aging-Alzheimer's Association (NIA-AA) diagnostic criteria [17,30]. Belgian patients were ascertained at the memory clinics of Middelheim and Hoge Beuken, Hospital Network Antwerp (ZNA), Antwerp [12], and the University Hospitals of Leuven (UHL), Leuven [30]. Belgian control individuals were either recruited from partners of patients and screened for neurological or psychiatric antecedents or neurological complaints or organic disease involving the central nervous system, or community-recruited control individuals who were included after interview concerning medical and familial history and cognitive screening by means of the Mini Mental State Examination (MMSE > 26) [15].

SORL1 sequencing
Sequencing of SORL1 exons 2-48, and at least 15 nt of each exon-intron flanking region, was performed by target enrichment using MASTR technology (Multiplicom, Niel, Belgium). PCR primers flanking each target region were designed using mPCR software (Multiplicom, Niel, Belgium). Target region size for amplification was set at 500 nt. In total, all target regions were covered by 46 amplicons in nine multiplex PCR reactions. Subsequent indexing and sequencing was performed with extension of target-specific primer sequences with universal tag sequences (5′-TCGTCGGCAGCGTCAGATGTG-TATAAGAGACAG-Fwd and 5′ GTCTCGTGGGCTCG-GAGATGTGTATAAGAGACAG-Rev). Optimal annealing temperature and relative amounts of PCR primers for all targets were established for uniform amplification of each target in the multiplex reaction. Multiplex PCR reactions were performed on 20 ng genomiphied DNA (Illustra GenomiPhi V2; Thermo Fisher, MA, USA). Amplification quality and efficacy was verified by fragment analysis on an ABI 3730 automated sequencer (Applied Biosystems, CA, USA). Subsequently, multiplex PCR amplicons of each individual were pooled to obtain equimolar concentrations of all amplicons. Library purification was performed with AMPureXP beads (Beckman Coulter, CA, USA). Amplicon-specific barcodes (Nextera XT; illumina, CA, USA) were incorporated in a universal PCR step on the pooled libraries. Barcoded samples were subjected to bridge amplification and bead purification prior to sequencing. Sequencing was performed on the Illumina MiSeq platform, using the Illumina reagent kit v2, generating 2 × 250 bp paired-end reads. Trimming of Illumina adapters from raw sequencing Fastq files was performed by Fastq-mcf. Read alignment and mapping was done against whole reference genome hg19 using the Burrows-Wheeler Aligner [26]. Variant calling and annotation was performed using GATK version 2.2 [28] and annotated using the Genomecomb software pipeline [40]. Variants with a read depth below 20 reads or with an imbalanced reference/variant allele read depth exceeding 3:1 were considered false calls. All remaining variants with predicted effect on protein sequence were included in subsequent manual read inspection using the Integrative Genomics Viewer software [41]. In total, 92 % of the SORL1 target sequence was sequenced at >20× read depth for all included individuals.
Due to high GC content (74 %), SORL1 exon 1 was sequenced using simplex PCR amplification followed by Sanger sequencing using the BigDye termination cycle sequencing kit v3.1 on the ABI 3730 DNA Analyzer. Sequences were analyzed using Seqman (DNAstar, WI, USA) and NovoSNP software [51]. Rare variant validation was performed on genomic DNA by Sanger sequencing, as was segregation analysis of variant p.Tyr1816Cys. Variant position on genomic level was based on Genbank accession number NC_000011.9, transcript position was based on NM_003105.5, and protein-level position on NP_003096.1.  [47], implemented within the YASARA molecular graphics suite (http://www.yasara.org/) for missense variants located within the Vps10p domain.

RNA sequencing
RNA sequencing data were generated for the p.Gly447Argfs*22 frameshift variant carrier. Total RNA was isolated from Epstein-Barr virus immortalized lymphoblast cells derived from whole blood lymphocytes. RNA isolation was performed using 1.0 × 10 7 lymphoblast cells with the RNeasy mini kit (Qiagen Inc., Valencia, CA, USA) according to manufacturer's protocol. Depletion of genomic DNA from the RNA sample was performed by turboDNase treatment (Life Technologies, Carlsbad, CA, USA). RNA quality control to determine RNA concentration and RIN value was performed using the Agilent Technologies 2100 Bioanalyzer. RIN value was measured at 9.3 in a concentration of 86 ng/µl total RNA. The sequencing library was constructed using Truseq stranded mRNA Library Prep Reagent Set (Illumina, San Diego, CA, USA). Library preparation was performed using 1 mg total RNA and included poly-A selected RNA extraction, RNA fragmentation, and randomhexamer-primed reverse transcription. Sequencing of prepared libraries was performed using an Illumina HiSeq 2000 sequencer, generating 126,949,218 101-nucleotide pairedend sequence reads. Data analysis was performed using an in-house developed processing pipeline. Removal of read adapters and trimming of read ends was performed using Trimmomatic [3]. Trimmed reads were mapped against the UCSC human reference genome hg19 [43] using the Bowtie short read aligner integrated in Tophat2 [21]. Post-alignment QC and filtering of mapped-reads was performed with RSeqQC [49]. Variant calling was performed by employing GATK [28], VARSCAN [23] and VEP [31] software.

Nonsense-mediated mRNA decay
Nonsense-mediated mRNA decay (NMD) was investigated for p.Gly447Argfs*22. NMD was inhibited in Epstein-Barr virus immortalized lymphoblast cell lines derived from the p.Gly447Argfs*22 carrier and two noncarrier controls with 150 μg/mL cycloheximide (Sigma, St Louis, MO, USA) at 37 °C for 4 h, as previously described [10]. After incubation, RNA was isolated using the RNeasy mini kit. Depletion of genomic DNA from the RNA sample was performed by turboDNase treatment. Subsequent cDNA synthesis was performed using superscript III first-strand cDNA kit, oligoDT and random hexamers primers (Life Technologies, Carlsbad, CA, USA). Realtime quantitative PCR was performed to investigate the effect of p.Gly447Argfs*22 on SORL1 expression using SYBR Green technology (Life Technologies, Carlsbad, CA, USA). SORL1 expression levels were measured in triplicate, with three measurements per experiment in two separate experiments. Expression of SORL1 in untreated lymphoblast cells was quantified and analyzed with qBase-Plus (Biogazelle, Ghent, Belgium). Effect of CHX incubation on SORL1 expression in the p.Gly447Argfs*22 carrier and two non-carrier controls was quantified using the 2 − C T (Livak) method [27].

Statistical analysis
Low-frequency (MAF between 0.01 and 0.05) and common (MAF ≥ 0.05) SORL1 coding variants were tested for deviations from Hardy-Weinberg equilibrium using PLINK [38]. Allele frequencies of common and low-frequency variants in patients and controls were compared by X 2 statistics. Odds ratios and 95 % confidence intervals were calculated by logistic regression modeling, corrected for gender and APOE ε4 allele carrier status using PLINK. Nominal p values were corrected for the number of variants tested using Bonferroni correction. SORL1 variants with MAF <0.01 were included in rare variant burden analysis for individuals originating from Spain, Italy, Portugal, Sweden,  [37], and based on uniprot information. Protein-level variant position was based on NP_003096. Vps10p vacuolar protein sorting 10 domain, LDLR low-density lipoprotein receptor domain, TM transmembrane domain and Belgium. Individuals originating from Czech Republic (7 patients, 7 controls) and Germany (100 patients, 0 controls) were excluded from the analysis based on cohort size. Rare variant burden analysis was performed by collapsing alleles of all rare coding variants across the full SORL1 coding sequence or separately for each functional protein domain using an optimized sequence kernel association test (SKAT-O test), adjusted for sample size <2000. Rare variants association tests were performed using the R package SeqMeta [9]. SKAT-O meta-analysis was performed using standard beta-weights, and correction for gender and APOE ε4 carrier status of included individuals. Presented SKAT-O meta p values represent minimal p values over ρ as proposed by Lee et al. [25]. Correction for multiple testing was performed using Šidák correction. Functional protein domains were determined according to Pottier et al. [37]. Differences in relative lymphoblast SORL1 expression were calculated using an unpaired nonparametric (Mann-Whitney) test.
Blocking of nonsense-mediated decay by CHX treatment showed significant increase of SORL1 expression in the p.Gly447Argfs*22 carrier compared to non-carrying control individuals (Mann-Whitney p value 0.03) (Fig. 2b).
In addition to these 8 PTC mutations, we observed 84 rare missense variants in the patient/control cohort. Of the total identified rare missense variants and PTC mutations, 44 (48 %) were only observed in the patient cohort (supplementary table 2). In addition, 19 rare missense variants (21 %) were present in both patient and controls, and 29 variants (32 %) were only observed in controls (supplementary tables 3, 4). One patient carried a frameshift (p.Cys1103Valfs*4) and a missense variant (p.Asp2065Val); three patients and two controls carried double missense variants. Of the rare variants observed in this study, 30 (33 %) were not previously reported in any of the screened databases, the majority of which [22 (73 %)]
The variant was also present in an affected sister, and not present in an unaffected sister (supplementary figure 1).
Additional clinical information was available for 6 PTC carriers. All presented with an insidious memory dysfunction. In one carrier (p.Cys752Serfs*21), disease onset was also accompanied by apathy. Further progression of disease in the carriers was typical of AD, with progression to a global cognitive deterioration and functional dependence. Of note, in patient DR12.1 (p.Gly447Argfs*22), the onset of visual hallucinations, a fluctuating extrapyramidal syndrome and a REM-sleep behavior disorder, after a disease duration of 11 years, led to the suspicion of a concomitant Lewy body pathology.
Neuropathological examination was not available for SORL1 PTC carriers, but has been performed in 3 SORL1 missense carriers [DR112.1 (p.Leu762Pro), CS540 (p.Ala1548Thr) and CS770 (p.Gly1447Ser)]. All three had high-level AD neuropathologic changes (A3B3C3) [34], confirming the clinical AD diagnosis. Neuronal loss, gliosis and abnormal protein deposition-mostly in the form of senile plaques and neurofibrillary tangles (Fig. 4)-were most pronounced in the neocortical areas, amygdala, hippocampus and parahippocampal cortex, while the striatum, thalami, brainstem and cerebellum were more spared. A diffuse amyloid angiopathy, in DR112.1 most pronounced in the occipital cortex and cerebellum, was present in all three patients. Hippocampal sclerosis was absent. Isolated α-synuclein immunoreactive Lewy bodies and Lewy neurites were observed in the amygdala of CS770, but absent from CS540. No α-synuclein immunohistochemistry was performed in DR112.1.

Rare variant association analysis
The frequency of rare PTC mutations and missense variants in SORL1 was 8.8 % (111 carriers/1255 patients) in the  Table 2). Most significant enrichment of these variant in patients was found for the fibronectin type III protein domain (SKAT-O p value 0.01) (Supplementary  table 8). The fibronectin type III domain is the largest protein domain of the SORL1 protein, spanning amino acids 1527-2108 [37,44]. The cumulative minor allele frequency was largest for this domain, yet variants were identified in each of the SORL1 functional protein domains (Fig. 1). When excluding PTC variants from this analysis, findings remained the same, with association over the full protein (p value 0.0007), and strongest association for fibronectin III domain (p value 0.013).

Single variant analysis of low-frequency and common variants
We identified six common (MAF ≥ 0.05) variants in the SORL1 coding sequence, including one missense variant p.Ala528Thr, and five synonymous variants (p.His269His, p.Thr833Thr, p.Ser1187Ser, p.Asn1246Asn, and p.Ala1584Ala). In addition, we identified five lowfrequency variants (MAF 0.01-0.05), including three missense variants and two silent variants. Low-frequency missense variant p.Glu270Lys was previously associated with AD in Caribbean-Hispanic familial late-onset AD patients and Northern-European sporadic late-onset AD patients with MAF below 0.01, and was shown to segregate within affected Caribbean-Hispanic families [48].  tary table 6). Although one synonymous variant showed nominal significance, none of low-frequency and common variants showed significant association with EOAD after correction for multiple testing.

Discussion
We performed a systematic screening of the complete coding sequence of SORL1 in a large EOAD patient/control cohort in the frame of the BELNEU and EU EOD consortia. We found an increased burden of rare PTC and nonsynonymous variants in the EOAD patients, of whom 8.8 % carried one or more SORL1 variants. These independent findings corroborated previous reports of an increased frequency of rare SORL1 variants in EOAD [35,37] Strikingly, PTC mutations were exclusively observed in patients. These variants most likely lead to a significant loss of SORL1 protein due to NMD mRNA decay of the mutant transcript. Indeed, we observed reduced SORL1 expression in lymphoblast cells of the p.Gly447Argfs*22 carrier, which increased upon blocking of NMD, indicative of haploinsufficiency. Further, the mode of action of these predicted loss-of-function mutations is in line with the observation of reduced SORL1 expression in post-mortem brain [6] and in human neuronal stable cell lines [1] leading , suggesting a less aggressive disease process. Because our study is limited to EOAD, the upper limit of onset age reported here is determined by the clinical criteria of EOAD, but this does not exclude a role for rare SORL1 variants in LOAD. In fact, rare SORL1 variants have previously been associated with familial LOAD by Vardarajan et al. [48]. In contrast to PTC mutations, the frequency of rare missense variants in healthy controls was non-negligible (5.6 %), and included a substantial proportion (69 %) of predicted pathogenic missense variants. This can in part be explained by a lower penetrance of SORL1 missense variants compared to PTC variants, and/or a variable degree of pathogenic relevance of the identified missense variants for AD. Pathogenicity may differ depending on parameters like the nature of the amino acid substitution, or location of the mutation in specific protein domains, at methylation sites or within adapter protein binding motifs. This necessitates functional follow-up to investigate effects of SORL1 variants, e.g., on APP trafficking, amyloid-β formation and clearance, to define functional relevance of each rare missense variant. Whereas others have reported that missense variants in SORL1 may lead to autosomal dominant AD, the relatively high frequency of predicted pathogenic variants in healthy controls in our study indicates that in the absence of functional evidence of pathogenicity, the observation of a SORL1 missense variant should be interpreted with caution. This caveat notwithstanding, meta-analysis showed a significant enrichment of rare missense variants in patients, which remained significant after exclusion of PTC mutations. This adds to the growing evidence that SORL1 missense variants may play a role in AD susceptibility. Of note, we obtained evidence of rare variant association in this hypothesis-driven, single gene resequencing study, but in the context of a whole exome sequencing study, this finding would not have survived multiple testing correction, illustrating the need for large sample sizes in hypothesis-free rare variant studies.
We observed missense variants in SORL1 throughout the different protein coding domains from Vps10p to FAN-SHY motif, only sparing the propeptide (Fig. 1). One of the missense variants, exclusively found in the patient cohort, p.Gly511Arg, had been detected in two affected relatives of a French autosomal dominant EOAD family [37]. This missense variant was shown to disrupt APP sorting from the trans-Golgi network to the lysosomal degradation pathway through abolished interaction of SORL1 with amyloid-β [7]. We observed p.Gly511Arg in a sporadic patient from Italy with an age at onset of 55 years. We could not perform segregation analysis for this variant due to absence of DNA of relatives. For missense variant p.Tyr1816Cys, located in the fibronectin type III domain, and detected in a patient from Italy with an age at onset of 63 years and a reported familial history of AD, we demonstrated the presence of the variant in an affected relative while absent from an unaffected relative. The elucidation of the crystal structure of the Vps10p and β-propeller domains suggested that amyloid-β monomers are bound by the SORL1 Vps10p domain through beta-sheet interaction, binding amyloid-β inside a tunnel structure formed by a 10-bladed betasheet propeller [22]. Rare variants located in the Vps10p domain, such as p.Gly511Arg putatively affect SORL1amyloid-β interaction by destabilization of the SORL1 beta-sheet structure or disruption of the amyloid-β binding motif. Interestingly, patient-only missense variants affecting the Vps10p domain showed strongest Gibbs free energy changes, indicating strongest effects on SORL1 protein stability.
An alternative functional consequence of rare coding variants involves disruption of the anti-amyloidogenic APP trafficking pathway mediated by SORL1. A binding region for APP at the SORL1 protein is located at the cytoplasmic tail of the protein, where a six amino acid-stretching FANSHY motif is involved in binding the retromer adapter complex. We identified one patient-only missense variant, p.Asn2174Ser, in an Italian sporadic patient with onset age 57 years, altering the third amino acid in the FANSHY sequence from asparagine to serine. The retromer complex functions as the seed of direct interaction of SORL1 with APP. Site-directed mutagenesis disrupting the FANSHY motif in vitro has been shown to be amyloidogenic by ablation of the sequestering of APP by SORL1 to the trans-Golgi network [13,14].
In contrast with previous investigations of SORL1 coding variants in late-onset AD cohorts, we could not identify a significant association of common and low-frequency variants (MAF > 0.01) with disease status. Initial association of variants p.Ala528Thr and p.Glu270Lys was reported for familial late-onset cases of Caribbean-Hispanic origin [48]. Discrepancies between variant frequency and direction of effect could be due to cohort ethnicity and founder effects. Absence of significant association for these variants in our EOAD patient/cohort analysis might also reflect reduced pathogenic relevance of these variants in EOAD compared to late-onset AD.
In conclusion, the study we performed represents one of the largest systematic screenings of SORL1 in EOAD patients and control persons. PTC variants were identified exclusively in patients, and their mode of action corresponds with evidence on the inverse relation between SORL1 expression and amyloid-β formation from in vitro functional studies of SORL1 in AD. The increased proportion of familial disease among PTC variant carriers is indicative of a strong effect on AD pathogenesis. Rare missense variants were associated with increased risk of early-onset AD. Some of these rare missense variants may also exert a strong effect on individual and familial risk of AD. The substantial frequency of (predicted pathogenic) variants in healthy controls, however, necessitates further research on the functional impact of the identified rare SORL1 variants to elucidate the affected pathways.

Compliance with ethical standards
All participants and/or their legal guardian gave written informed consent for participation in clinical and genetic studies. Autopsied patients or their legal guardian gave written informed consent for inclusion in neuropathological studies. Clinical study protocol and the informed consent forms for patient ascertainment were approved by the ethic committee of the respective hospitals at the cohort sampling sites. The genetic study protocols and informed consent forms were approved by the Ethics Committees of the University of Antwerp and the University Hospital of Antwerp, Belgium.