Background

Parkinson’s disease (PD) is a multifaceted and highly complex neurodegenerative disorder. Multiple genes have been implicated in PD. Variants in GBA1 (Glucocerebrosidase A) are considered a common genetic risk factor for PD [1,2,3,4,5,6]. Biallelic (homozygous or compound heterozygous) variants in GBA1 classically cause Gaucher’s disease (GD) and an increased PD risk has been observed in patients with GD and asymptomatic carriers of heterozygous variants [7,8,9,10]. Glucocerebrosidase enzymatic activity is reduced in patients with PD who carry a GBA1 heterozygous variant compared to non-carriers, and it is even lower in GBA1 homozygotes/compound heterozygotes [10]. Common GBA1 variants in PD include p.E365K (NM_000157.4, c.1093G > A), p.T408M (NM_000157.4, c.1223 C > T), p.N409S (NM_000157.4, c.1226 A > G), and p.L483P (NM_000157.4, c.1448T > C). However, the classification of the pathogenicity of GBA1 variants and their effect on PD is still ongoing. For this reason, the reported frequencies of GBA1 variants across studies are rather inconsistent, with frequencies ranging from 1.8% up to 47% depending on the ethnicity of the samples and the GBA1 variants investigated [2, 5, 11]. In a previous study on the Norwegian population, 311 patients with PD were included and screened for the two common GBA1 variants (i.e., p.N409S and p.L483P) [12]. Seven patients (2.3%) that carried a heterozygous GBA1 variant were found: four of the patients had a p.N409S (1.3%) and three had a p.L483P (1.0%) substitution.

Another challenge that arises when sequencing the GBA1 gene is the nearby pseudogene GBAP1. GBAP1 shares 96% exonic sequence homology with the GBA1 coding region with the highest homology between exons 8 and 11. In this region, most pathogenic variants have been reported, usually resulting from recombination events, e.g. gene conversion, fusion, or duplication [13]. This complex regional genomic structure complicates PCR and DNA sequencing. To avoid the pseudogene, one method to analyze GBA1 is by long-read sequencing [14]. This technology provides full-length GBA1 sequencing to detect exonic and intronic variants and recombinant alleles in combination with phase information, at high multiplex capacity [15, 16].

Herein, we have comprehensively characterized GBA1 in a sample from the Norwegian population by (1) employing and evaluating Oxford Nanopore sequencing as a strategy, (2) determining the frequency of variants within the GBA1 gene in patients with PD and healthy controls by providing an update on previous reports [12, 17], and (3) reviewing current literature on newly identified variants that add to pathogenicity determination.

Results

Long-read sequencing of GBA1

After Nanopore sequencing of the long-range PCR products, we obtained a mean read length of 5.2 kb (SD = ± 2.1 kb) and a mean read quality Phred score of 14.2 (SD = ± 0.4) across all raw sequencing data. After length and quality filtering and read trimming, we obtained a mean read length of 9.0 kb (SD = ± 0.2 kb) and a mean read quality Phred score of 15.7 (SD = ± 0.5). For the filtered samples the mean coverage was 193.1X (SD = ± 187.5X), ranging between 20.6X and 1820.8X across all samples. Nevertheless, the coverage per sample was consistent over all positions (Supplementary Fig. 1).

Bioinformatic pipeline comparison

In total, 79 rare GBA1 variants (gnomAD frequency < 2%) were detected in the Nanopore sequencing analysis after filtering. Out of the 79 rare GBA1 variants, 18 variants were categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance” (Supplementary Table 1). The remaining 61 rare GBA1 variants were categorized as “benign” or “likely benign”. The number of GBA1 variants differed across all six analysis pipelines (Table 1).

Table 1 GBA1 variants sequenced with Oxford Nanopore and analyzed with six different pipelines using NGMLR and Minimap2 aligners and BCFtools, Clair3, and Pepper-Margin-Deepvariant callers

BCFtools

With BCFtools, 64 rare annotated GBA1 variants were detected in 313 samples, resulting in 433 calls in total. The calls were identical with both aligners (i.e., NGMLR and Minimap2). Of these, 15 rare variants, categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance”, were found in 111 samples (i.e., 120 calls), again independent of the aligner. After Sanger sequencing, 13 variants in 110 samples (i.e., 115 calls) were validated, indicating five false-positive calls but no false-negative calls.

Clair3

For the pipeline using Clair3 as a variant caller after a preceding alignment with NGMLR, 65 rare annotated GBA1 variants were detected in 308 samples, resulting in 426 calls. Of these calls, 117 calls referred to 14 rare variants, categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance”, and were detected in 108 samples. After Sanger sequencing, 112 calls in 107 samples were validated, implying that five calls were false-positive, with an additional three calls being false-negative. When Minimap2 was used as an aligner, again 65 rare annotated GBA1 variants were detected, however, here they were detected in 311 samples (i.e., 429 calls). With this pipeline, 15 rare annotated GBA1 variants, categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance”, were found in 110 samples (i.e., 119 calls). Of these, 113 calls in 108 samples could be validated with Sanger sequencing, indicating six false-positive calls and two more false-negative calls.

Pepper-Margin-Deepvariant

Lastly, the Pepper-Margin-Deepvariant pipeline was used for variant calling. After preceding alignment with NGMLR, 453 calls of 72 rare annotated GBA1 variants were detected in 318 samples, including 123 calls of 17 rare variants, categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance”, in 114 samples. Only 108 calls in 103 samples were validated with Sanger sequencing, leading to 15 false-positive and additional seven false-negative calls. Similarly, when Minimap2 was used for the alignment, 76 rare annotated GBA1 variants were detected in 331 samples, resulting in 481 calls. Of these, 18 rare variants, categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance”, were detected in 120 samples (i.e., 130 calls). Here, 109 calls in 104 samples were Sanger validated, indicating 21 false-positive and six false-negative calls.

GBA1 variant frequencies in the Norwegian population

In total, 13/18 rare distinct variants within GBA1 were validated in 462 Norwegians with PD and 367 healthy controls (Fig. 1, Supplementary Table 2), whereas 5/18 could not be validated. Two of the 13 rare GBA1 variants were predicted to be “pathogenic” or “likely pathogenic” (p.L483P, p.S146X) and eleven GBA1 variants were of “uncertain significance” (p.G493D, p.N409S, p.T408M, p.A380T, p.R368C, p.E365K, p.D337G, p.S310G, p.R301H, p.R159W, p.R78C). The total carrier frequency of rare GBA1 variants predicted as “pathogenic”, “likely pathogenic”, or of “uncertain significance” was 17.1% (79/462) in the PD cases and 8.4% (31/367) in the controls (OR = 2.24 [1.44, 3.47]) (Table 2). The carrier frequency of known GBA1 risk variants for PD, including p.L483P, p.N409S, p.T408M, p.E365K, and p.R159W, was 15.2% in the PD cases and 7.9% in the controls (OR = 2.08 [1.32, 3.29]) (Table 2). With regard to the common and frequently investigated GBA1 risk variants for PD, the frequency of carrying either a p.L483P or p.N409S variant was 4.3% in patients with PD and 1.1% in healthy controls (OR = 4.11 [1.39, 12.12]) (Table 2).

Fig. 1
figure 1

Schematic representation of the exonic (gray boxes) and intronic (gray lines) structure of the GBA1 gene highlighting “pathogenic” (red)/“likely pathogenic” (orange)/“uncertain” (yellow) variants found in our Norwegian series

Table 2 Number and frequency of patients and controls with GBA1 variants in our Norwegian series

In addition, samples with a rare “pathogenic”, “likely pathogenic”, or “uncertain” GBA1 variant were further examined for possible structural variants (SVs). However, all samples tested negative for SVs.

Literature review

We performed a systematic literature review to summarize GBA1 variants detected in PD and their frequencies across different populations. In total, 100 articles on GBA1 variant frequencies across populations were included in the overview (Supplementary Table 3). Most of the studies assessed the p.L483P and p.N409S variants, with p.L483P variant frequencies ranging between 0% and 8.3% for cases with PD and between 0% and 1.4% in the general population. The highest frequency in PD patients was observed in a sample from the Japanese population [18]. For p.N409S, the frequencies ranged from 0% to 26.3% in patients with PD and from 0% to 5.96% in controls, with the highest frequency reported in a sample from the Ashkenazi Jewish population [19]. In a previous study on the Norwegian population, the frequency of p.L483P was predicted to be 0.5% in patients with PD (2/442) and 1.4% in controls (6/419) (OR = 0.31 [0.06, 1.56]) [17]. For the p.N409S variant, the frequencies reported were 0.2% in patients with PD (1/442) which was comparable to controls (1/419) (OR = 0.95 [0.06, 15.2]) [17]. In addition, the study reported GBA1 p.E365K, detected in 4.3% of patients with PD (18/442) and in 6.6% of controls (29/419) (OR = 0.57 [0.13, 1.04]), and p.T408M, found in 1.7% of patients with PD (7/442) and in 3.6% of controls (16/419) (OR = 0.41 [0.17, 1]).

Discussion

Variants in the GBA1 gene are known to affect PD risk, however, the frequency and pathogenicity of these variants are still under debate. The latter is further complicated as the frequencies of GBA1 variants vary by population. In our study, we used the pathogenicity scoring of ACMG, Varsome, ClinVar, SIFT, Polyphen2, CADD, and GERP + + and categorized the variants found into “pathogenic”, “likely pathogenic”, and variants of “uncertain significance”. Here we report 13 rare variants within GBA1 in Norwegian PD cases, with two of them predicted to be “pathogenic” or “likely pathogenic”, and eleven GBA1 variants of “uncertain significance”. In total, we found a “pathogenic” or “likely pathogenic” variant (e.g., p.L483P, p.S146X) in 1.5% of PD cases and in 0% of healthy controls. However, pathogenicity scoring with ACMG does not take the association with PD into account and consequently underestimates the frequency of risk variants in GBA1. Therefore, we further investigated known GBA1 risk variants associated with PD. In our study, 15.2% of patients with PD carried a common GBA1 risk variant (i.e., p.L483P, p.N409S, p.T408M, p.E365K, and p.R159W), compared to 7.9% of controls.

In a systematic literature review, we evaluated the frequency of variants in the GBA1 gene in patients with PD and in the general population in 100 studies (Supplementary Table 3). The frequencies of GBA1 risk variants range between 0% and 26.3% in patients with PD, highlighting the importance of investigating GBA1 variants across populations. In a previous study on the Norwegian population, two known risk variants, p.L483P and p.N409S, were investigated among others [17]. The frequency of p.L483P was predicted to be 0.5% in patients with PD and 1.4% in controls, while 0.2% of the patients with PD and controls had a p.N409S variant [17]. Therefore, the odds of carrying a p.L483P or a p.N409S variant in patients with PD are lower than in healthy controls (OR = 0.31 [0.06, 1.56], OR = 0.95 [0.06, 15.2]). In contrast to this, we found slightly higher frequencies in our Norwegian series. The p.L483P variant was found in 1.3% of Norwegian PD cases and in 0% of controls, the p.N409S variant in 2.8% of PD patients and 1.1% of controls, leading to higher odds of carrying these variants in patients with PD compared to controls (OR = 2.63 [0.85, 8.13]).

These variants are classically found in GD but are also associated with PD risk. However, some variants in GBA1 show associations with PD but do not cause GD, e.g., the GBA1 variants p.E365K and p.T408M [6]. In our Norwegian series, the p.E365K variant was found in 7.1% of cases with PD and in 2.7% of the general population (OR = 2.75 [1.33, 5.65]). The p.T408M variant was found in 2.2% of cases with PD and in 3.5% of the general population (OR = 0.6 [0.26, 1.39]). In a previous study on the Norwegian population, they reported p.E365K in 4.3% of patients with PD and in 6.6% of controls (OR = 0.57 [0.13, 1.04]), and p.T408M in 1.7% of patients with PD and in 3.6% of controls (OR = 0.41 [0.17, 1]) [17]. Several other studies have also reported higher variant frequencies of p.E365K or p.T408M in their control population compared to their patients with PD [10, 20,21,22,23,24,25,26,27], questioning the pathogenicity of these GBA1 variants and their contribution to causing PD that was previously assessed in two meta-analyses. In the summary of our systematic literature review (Table 3), we have also found on average higher frequencies of these two GBA1 variants in patients with PD compared to healthy controls. Nevertheless, a better definition of variants associated with PD is strongly needed by classifying variants into categories relevant to the disease.

Table 3 Summary of frequencies in patients with PD and controls in publications on GBA1 variants

Although the pathogenicity and the pathomechanism of several GBA1 variants in patients with PD are still under discussion, gene-targeted therapy might help to treat GBA1-PD. So far, there are several strategies for the treatment of GBA1-PD with ongoing clinical trials. Some of the therapeutic approaches for GBA1 include substrate reduction therapy targeting glycosylceramide synthase inhibition, the inhibition of glucocerebrosidase transportation, the development of glucocerebrosidase activators, and gene therapy targeting the replacement of mutated GBA1 with WT copies of the gene [28].

In addition to variable inclusion criteria, the sequencing method and data analysis, and the chronology when it was performed, has a major influence on the detection rate and GBA1 variant frequencies reported. Through the years, DNA sequencing technologies evolved tremendously with new sequencing techniques and especially new prediction tools improving the accuracy of variant detection. Continual refinement in these tools enables more comprehensive identification of variants and highlights the need to re-evaluate known genes as time goes by. Long-read sequencing in combination with the latest data analysis tools enabled us to determine the frequency of GBA1 variants in the Norwegian population with higher precision than before. We evaluated the consensus and accuracy of six different pipelines using two different aligners (NGMLR and Minimap2), as well as three different variant callers (BCFtools, Clair3, and the Pepper-Margin-Deepvariant pipeline). BCFtools performed best with regard to the number of true-positive, false-positive, and false-negative hits, independently from the aligner used. However, one limitation is that we could not fully evaluate pipeline sensitivity. As we only confirmed variants called by our Nanopore analysis pipelines by Sanger sequencing, we underestimate false-negative variants that were not initially called. With Oxford Nanopore technology we assessed the precision of long-read sequencing and consensus data analysis and detected 115 real GBA1 variant calls, while five variant calls were false-positives. Thus, > 95% of called variants were true-positive. Nanopore long-read sequencing is an accurate tool to detect genetic variations and with further development in flow cells and sequencing kits, the accuracy of variant detection is likely to increase. Another advantage of this technology is the capacity to multiplex samples, which decreases analysis costs of the full 8.9 kb GBA1 gene to $13 USD per sample, which is lower than other DNA sequencing methods [29]. A strength of Oxford Nanopore long-range sequencing is to specifically target the GBA1 gene without sequencing the pseudogene GBAP1, and to detect all disease-causing variants including information on phase [15, 16].

Conclusions

In conclusion, we have demonstrated that Oxford Nanopore sequencing is an efficient and scalable tool for investigating GBA1 variants. We thoroughly evaluated six different data analysis pipelines and found the pipeline consisting of an alignment with either NGMLR or Minimap2 and variant calling with BCFtools to perform best with regard to detecting variants in GBA1. With our established and validated workflow, we demonstrated that the frequency of the two common GBA1 risk variants for PD (i.e., p.L483P and p.N409S) in Norwegian patients with PD is 4.3% and higher than in the general population (1.1%). Furthermore, we reviewed current literature on GBA1 variant frequencies in PD across populations, thereby adding to pathogenicity determination. Given the importance of this gene, further functional studies on the pathogenicity of GBA1 are needed to assess their effect on PD.

Methods

Demographics

A sample of 462 Norwegian patients with PD was included in this study (Table 4). All patients were referred by general practitioners and other hospitals and have been clinically examined and observed longitudinally at the outpatient clinics of three hospitals in Central Norway. One hundred eighty (39%) of the patients were men, and 282 (61%) were women. The mean age at disease onset in the patient group was 60.3 years (SD = ± 9.7 years, range 26 to 88 years). Forty-three out of 462 patients were probands with a family history of PD. Patients with a known genetic cause of PD were not included. In the clinical assessment, patients with PD had an average score of 2.72 (SD = ± 0.86) on the Hoehn and Yahr scale and 357 patients reported a tremor, while 62 had no tremor. In addition, a group of 367 healthy Norwegian individuals (mean age 64.0 years) originating from the same geographic region and without signs of a movement disorder was included to determine the variant frequency in the general population (Table 4). Some of the patients and controls included in this study have been previously included [12]. However, in this previous report only the p.N409S and the p.L483P have been screened by PCR amplification and subsequent digestion of the PCR product with restriction enzymes and separation of resulting fragments by agarose gel electrophoresis.

Table 4 Demographics of the Norwegian patients with PD and healthy controls

Genetic analysis

Long-read Oxford Nanopore sequencing

We used blood-derived genomic DNA samples from all PD patients and controls. Informed consent was obtained from all participating individuals. We enriched for GBA1 by amplifying an 8.9 kb sequence, which covered all coding exons, the introns between them, and part of the 3’ UTR region (hg38: chr1:155232501–155,241,415), described previously [15, 16] using the LongAmp Taq PCR Kit. Subsequently, 1.3 µg of each patient-derived PCR product was barcoded with the Native 96 Barcoding Kit (EXP-NBD196) and multiplexed. The libraries were generated with the Ligation Sequencing Kit (SQK-LSK109) for long-read Nanopore sequencing on R9.4.1 flow cells (FLO-MIN106) on a GridION.

Bioinformatic analyses

Data acquisition and run monitoring was carried out with MinKNOW (version v21.05.25 and later). The integrated Guppy algorithm (version v5.0.16 and later) was used for base-calling with the super-accurate base-calling model, de-multiplexing, and FAST5 and FASTQ file generation. The base-called reads were filtered with Filtlong (v.0.2.0) (https://github.com/rrwick/Filtlong) to only include the best 50% of the reads, based on Phred quality scores (q-score) in the FASTQ files, with a minimum read length of 8 kb. Afterwards, the reads were trimmed with NanoFilt [30] (v2.8.0) and 75 bp were cropped from the front of the reads and 20 bp from the end. Subsequently, the nanopore reads were aligned against the reference sequence (hg38). We used two different aligners: NGMLR [31] (v0.2.7) and Minimap2 [32] (v2.22). Then, the alignments were sorted and indexed with SAMtools [33] using v1.9 for the NGMLR alignment, and v1.15 for the Minimap2 alignment. In addition, the coverage for each sample was calculated using SAMtools [33] (v.1.15). The processed BAM files were analyzed with three different variant callers: BCFtools [33] (v1.9), Clair3 (v0.1-r11), and the Pepper-Margin-Deepvariant pipeline [34] (Supplementary Fig. 2). Finally, the resulting VCF files containing the SNPs within GBA1 were annotated using ANNOVAR [35] (version 2020-06-11).

Sanger sequencing and structural variant detection

Sanger sequencing was performed for all individuals with a rare “pathogenic”/“likely pathogenic”/“uncertain” GBA1 variant, as previously described [36]. Individuals with a rare “pathogenic”/“likely pathogenic”/“uncertain” GBA1 variant were further examined for possible structural variants (SVs) due to the high exonic sequence homology between GBA1 and GBAP1. We used two sets of primers, as previously described [16], to detect reciprocal crossovers between the gene and pseudogene resulting in a 20.6 kb deletion or a 20.6 kb duplication using the LongAmp Taq PCR Kit. The resulting PCR products were subsequently run in a 1.5% agarose gel.

Sensitivity assessment

To assess the performance of variant calling, we stratified the detected GBA1 variants into true-positive, false-positive, and false-negative calls for each data analysis pipeline (NGMLR + BCFtools, NGMLR + Clair3, NGMLR + Pepper-Margin-Deepvariant, Minimap2 + BCFtools, Minimap2 + Clair3, Minimap2 + Pepper-Margin-Deepvariant). True-positive variants were defined as GBA1 variants detected with Nanopore sequencing that were validated with Sanger sequencing. False-positive variants are those identified with Nanopore sequencing but not confirmed with Sanger sequencing. False-negative variants were determined as those not called with a specified data analysis pipeline, but later validated with Sanger sequencing. Evaluation of each data analysis pipeline was based on the ratio of false-positive to false-negative calls.

Pathogenicity scoring

GBA1 single nucleotide variants were first filtered based on GnomAD frequency < 2%. Pathogenicity classification and scoring were assessed with American College of Medical Genetics and Genomics (ACMG) [37] criteria. This included using Varsome [38], ClinVar [39], SIFT [40], Polyphen2 [41], CADD [42], and GERP++ [43]. GBA1 variants were categorized as “pathogenic”, “likely pathogenic”, or of “uncertain significance” and further validated by Sanger sequencing. The p.E365K variant, which is a known risk factor for PD, was additionally categorized as of “uncertain significance”, despite the “likely benign” score from ACMG. Variants categorized as of “uncertain significance” without information from mutation predictors were not Sanger sequenced.

Statistical analysis

To compare the number of Norwegian PD patients with and without GBA1 variants to the frequencies of GBA1 variants in controls, odds ratios were calculated. For this, the number of patients with PD carrying a GBA1 variant was multiplied by the number of controls without a GBA1 variant and subsequently divided by the number of patients with PD without a GBA1 variant that was multiplied by the number of controls carrying a GBA1 variant.

Literature review

We performed a systematic literature review to summarize GBA1 variants detected in PD and the frequency across different populations (Supplementary Fig. 3). We searched for literature via PubMed that was published before August 4, 2022, using the search term “GBA” AND “Parkinson” AND “prevalence” OR “GBA” AND “Parkinson” AND “frequency”, while setting the species filter to “Human” and the language filter to “English”, resulting in 94 articles. These were screened based on the title, abstract, and full text, excluding all articles not directly screening for variants in the GBA1 gene in patients with PD. Of these 94 articles, 41 articles were excluded. Reasons for exclusion were reviews or comments without new data (n = 11), articles that did not perform GBA1 variant screening or examine GBA1 variant frequency in their study population (n = 22), and articles that did not include patients with PD (n = 10). Multiple reasons for exclusion were possible. In addition to the articles found via the search term, suitable articles that were referenced in this literature were also included in the overview.