Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes

Antikainen, Anni A.; Haukka, Jani K.; Kumar, Anmol; Syreeni, Anna; Hägg-Holmberg, Stefanie; Ylinen, Anni; Kilpeläinen, Elina; Kytölä, Anastasia; Palotie, Aarno; Putaala, Jukka; Thorn, Lena M.; Harjutsalo, Valma; Groop, Per-Henrik; Sandholm, Niina

doi:10.1038/s41598-024-61840-7

Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes

Article
Open access
Published: 11 June 2024

Volume 14, article number 13453, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes

Download PDF

Anni A. Antikainen^1,2,3,
Jani K. Haukka^1,2,3,
Anmol Kumar^1,2,3,
Anna Syreeni^1,2,3,
Stefanie Hägg-Holmberg^1,2,3,
Anni Ylinen^1,2,3,
Elina Kilpeläinen⁴,
Anastasia Kytölä⁴,
Aarno Palotie^4,5,6,
Jukka Putaala⁷,
Lena M. Thorn^1,2,3,8,
Valma Harjutsalo^1,2,3,
Per-Henrik Groop^1,2,3,9,
Niina Sandholm^1,2,3 &
the FinnDiane Study Group

327 Accesses
3 Altmetric
Explore all metrics

Abstract

Individuals with type 1 diabetes (T1D) carry a markedly increased risk of stroke, with distinct clinical and neuroimaging characteristics as compared to those without diabetes. Using whole-exome or whole-genome sequencing of 1,051 individuals with T1D, we aimed to find rare and low-frequency genomic variants associated with stroke in T1D. We analysed the genome comprehensively with single-variant analyses, gene aggregate analyses, and aggregate analyses on genomic windows, enhancers and promoters. In addition, we attempted replication in T1D using a genome-wide association study (N = 3,945) and direct genotyping (N = 3,263), and in the general population from the large-scale population-wide FinnGen project and UK Biobank summary statistics. We identified a rare missense variant on SREBF1 exome-wide significantly associated with stroke (rs114001633, p.Pro227Leu, p-value = 7.30 × 10^–8), which replicated for hemorrhagic stroke in T1D. Using gene aggregate analysis, we identified exome-wide significant genes: ANK1 and LRRN1 displayed replication evidence in T1D, and LRRN1, HAS1 and UACA in the general population (UK Biobank). Furthermore, we performed sliding-window analyses and identified 14 genome-wide significant windows for stroke on 4q33-34.1, of which two replicated in T1D, and a suggestive genomic window on LINC01500, which replicated in T1D. Finally, we identified a suggestively stroke-associated TRPM2-AS promoter (p-value = 5.78 × 10^–6) with borderline significant replication in T1D, which we validated with an in vitro cell-based assay. Due to the rarity of the identified genetic variants, future replication of the genomic regions represented here is required with sequencing of individuals with T1D. Nevertheless, we here report the first genome-wide analysis on stroke in individuals with diabetes.

The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay

Article Open access 21 July 2023

Novel functional insights into ischemic stroke biology provided by the first genome-wide association study of stroke in indigenous Africans

Article Open access 05 February 2024

Stroke genetics informs drug discovery and risk prediction across ancestries

Article Open access 30 September 2022

Introduction

Stroke is a notable cause of mortality and long-term disability worldwide, with diabetes among the most important risk factors. The standardized incidence ratio is roughly sixfold among individuals with type 1 diabetes (T1D) compared to the general population¹. Furthermore, 537 million adults live with diabetes today and the prevalence is rising². Even though much of this trend is driven by an increase in obesity and insulin-resistant type 2 diabetes (T2D), the incidence of insulin-dependent T1D has increased as well³. T1D is a lifelong condition caused by an autoimmune reaction towards the pancreas and treated with daily insulin injections. The strokes themselves may be of hemorrhagic (20%) or ischemic (80%) origin and classified into even more specific subtypes. Interestingly, the two diabetes types affect stroke risk differentially: T1D increases the risk of both ischemic- and hemorrhagic stroke^4,5, while the risk imposed by T2D has been estimated more modest for hemorrhagic strokes⁵. Importantly, T1D predisposes individuals to cerebral small-vessel disease and strokes of microvascular origin^6,7. Diabetes causes also other complications, of which diabetic kidney disease (DKD) and severe retinopathy predict cerebrovascular disease in T1D⁸. Understanding stroke pathophysiology in diabetes is important for improving treatment and quality of life for individuals with T1D.

Stroke heritability has been estimated to vary between 30 and 40% in the general population⁹. Stroke heritability varies greatly depending on the subtype, with the largest heritability estimates for large artery atherosclerotic stroke and lobar intracranial hemorrhage, and the lowest for small vessel disease⁹. To date, 126 common genomic loci have been associated with stroke or its subtypes with genome-wide significance^10,11. Associations at many of the known common stroke loci overlap with other cardiovascular phenotypes, e.g., coronary artery disease (CAD)⁹. Our previous study suggested a heritable component of stroke in individuals with T1D as a history of maternal stroke was associated with hemorrhagic stroke in T1D¹². However, very few studies have investigated genetic risk factors for stroke in diabetes^13,14,15, and no genome-wide studies in individuals with diabetes yet exist. On the other hand, genetic studies on CAD in diabetes have identified a few diabetes-specific loci^16,17, although still pending external replication, and have replicated three known general population CAD risk loci in diabetes: CDKN2B-AS1, PSRC1 and LPA^15,16,18.

A substantial proportion of heritability remains unexplained for stroke⁹. Rare genetic variants with minor allele frequency (MAF) of ≤ 1% may significantly contribute to stroke heritability. In fact, some rare monogenic disorders have stroke as one of their manifestations^9,10,19. In GWASs, the imputation accuracy of rare variants may be limited, and largely depends on the minor allele count (MAC) in the reference sample²⁰. Rare variants can be reliably studied with next-generation sequencing-based techniques such as whole-genome sequencing (WGS) and whole-exome sequencing (WES). We have previously used WES to identify protein coding variants associated with lipid and apolipoprotein traits in T1D²¹. In the general population, novel stroke risk loci have been identified with WGS²². However, UK Biobank WES analysis for cardiometabolic traits did not discover exome-wide significant stroke risk genes²³.

Historically, the Finnish population has been isolated and, thus, represents a unique genetic background with enrichment of low-frequency deleterious variants²⁴, which may in part enable the discovery of rare disease-associated variants. Here we studied genetics of stroke and its subtypes with WGS and WES in Finnish individuals with T1D with multiple statistical approaches by focusing on rare and low-frequency genomic variants. We aimed both to find stroke-risk loci specific to individuals with T1D, and to identify risk loci generalizable to the non-diabetic population, since discovery of rare variants is more probable in a high-risk Finnish diabetic population. Finally, we performed cell-based in vitro experiments to further validate a discovered promoter region. Altogether, here we report the first genome-wide study on stroke genetics in diabetes.

Results

Study design

The study is part of the Finnish Diabetic Nephropathy (FinnDiane) Study; an ongoing nationwide multicenter study established to identify factors leading to diabetic complications²⁵. We studied WGS in 571 and WES in 480 non-related and non-overlapping individuals with T1D, entailing 112 and 74 stroke cases, respectively (Table 1, Table S1, Fig. S1 and S2). We aimed to find rare and low-frequency genetic variants associated with stroke in T1D. Therefore, we performed single variant analyses across the genome (MAC ≥ 5), using fixed-effects meta-analysis for variants available in both data sets, with a minimal adjustment setting i.e., the calendar year of diabetes onset, sex and two first genomic data principal components, and repeated the analyses with an additional DKD adjustment (Fig. 1). We performed gene aggregate analyses (cumulative MAC, CMAC ≥ 5) with the minimal adjustment separately with protein-altering variants (PAVs) and protein-truncating variants (PTVs); and repeated the analyses with an additional DKD adjustment. Finally, we conducted minimally adjusted intergenic aggregate analyses within genomic windows by statistically up-weighting functionally important and rare variants; and within established enhancers and promoters by weighting variants according to their rarity. Furthermore, we performed stroke subtype association analyses for the lead findings.

Table 1 Clinical characteristics of study participants in the next-generation sequencing data sets.

Full size table

Single variant analyses

We sought for genetic variants associated with stroke using non-overlapping WES and WGS data, and discovered a suggestively stroke-associated locus, 4q33-34.1, with the minimally adjusted model (4:170787127, p-value = 8.83 × 10^–8, MAF = 3.7%, Table 2, Fig. 2). The variant was unavailable for replication in the T1D specific GWAS and in the FinnGen general population GWAS summary statistics. However, the variant with the third lowest p-value on 4q33-34.1 was available but did not replicate for stroke in T1D nor the general population (Table 2). As DKD is a common diabetic complication that has been reported to predict incident stroke in T1D⁸, we performed additional analyses adjusted for DKD, and discovered a rare missense variant on SREBF1 exome-wide significantly (p-value < 3 × 10^–7) associated with stroke (rs114001633, p.Pro227Leu, p-value = 7.30 × 10^–8, MAF = 0.26%) (Table 2, Fig. S3). Due to the rarity of the variant, we performed additional genotyping for replication, whereby the variant did not replicate for stroke (Table 2), but replicated for hemorrhagic stroke in T1D (p-value = 0.02, N = 3,263, Table S2). Since rs114001633 did not pass MAC threshold in the hemorrhagic stroke sub-analysis of the discovery cohort (Table S2), further replication in additional individuals with T1D is needed to confirm the potential association with stroke, specifically with hemorrhagic stroke, in T1D.

Table 2 Lead variants discovered with single variant association analyses (p-value < 5 × 10^–7).

Full size table

Gene aggregate analyses

To improve statistical power for rare and low-frequency variants, we performed gene aggregate analyses. With the minimally adjusted models, low-frequency PAVs on ANK1 were associated with stroke (p-value = 2.23 × 10^–6, CMAC = 247), even more strongly with ischemic stroke (p-value = 1.31 × 10^–6, CMAC = 225) (Fig. 3A, Fig. S4, Tables 3, S3 and S4). Furthermore, nine genes were suggestively associated with stroke through rare or low-frequency PAVs (Fig. 3A). Of these, the aggregate of PAVs on TARBP2 was associated with ischemic stroke (p-value = 1.71 × 10^–7, CMAC = 5, MAF ≤ 1%), and on CLEC4M with hemorrhagic stroke (p-value = 4.74 × 10^–15, CMAC = 11, MAF ≤ 1%). Of note, rare PAVs on GCDH were suggestively associated with stroke (p-value = 3.26 × 10^–5, CMAC = 6): GCDH loss-of-function variants have been previously associated with metabolic stroke and cerebral hemorrhage²⁶.

Table 3 Variants in ANK1, LRRN1, HAS1, and UACA (SKAT-O).

Full size table

With the models additionally adjusted for DKD, rare PAVs on LRRN1 were associated with stroke (p-value = 3.49 × 10^–6, CMAC = 15), and suggestively with ischemic stroke (p-value = 8.69 × 10^–6, CMAC = 12; Fig. 3B, Fig. S5, Tables 3, S3 and S5). Furthermore, eight genes were suggestively associated with stroke through rare or low-frequency PAVs (Fig. 3B). In the stroke subtype analysis for the lead genes, the aggregate of rare PAVs on MAP3K12 was associated with ischemic stroke (p-value = 1.72 × 10^–7, CMAC = 17), and on MTRNR2L7 with hemorrhagic stroke (p-value = 2.24 × 10^–6, CMAC = 6). MAP3K12 and TARBP2 are located close to each other on the genome, thus, they may represent the same association signal through linkage disequilibrium (LD) or modifier effects onto the causal gene (Fig. S6).

We then investigated the role of more severe PTVs, i.e. putative loss-of-function variants, for stroke. Low-frequency PTVs on ARPC5 were associated with stroke, while rare PTVs on HAS1 (i.e., hyaluronan synthase 1) were suggestively associated with stroke (Fig. 3A, Fig. S4, Tables 3 and S4). Furthermore, in the analysis for stroke subtypes, the aggregate of rare PTVs on HAS1 was associated with ischemic stroke (p-value = 7.39 × 10^–7, CMAC = 7). With the additional DKD adjustment, rare PTVs on HAS1 (p-value = 3.11 × 10^–5, CMAC = 7), rare PTVs on UACA (p-value = 6.77 × 10^–5, CMAC = 6), and low-frequency PTVs on ARPC5 (p-value = 4.15 × 10^–5, CMAC = 39), were associated with stroke (Fig. 3B, Fig. S5, Tables 3 and S5).

Replication of gene aggregate findings

We attempted T1D specific replication within the FinnDiane GWAS data, by including also five directly genotyped variants, both using the gene aggregate approach and by inspecting the exonic variants individually. Despite the uncertainty of genotype imputation and our limited statistical power for rare variants, ANK1 and LRRN1 showcased weak evidence of replication in T1D: Although ANK1 did not reach significance for stroke with SKAT-O (Tables 4 and S6), one of the available fifteen variants was associated with stroke (rs779805849, p-value = 0.017) (Table 3, Fig. 4), and two additional variants with hemorrhagic stroke (rs146416859 and rs61753679, p-value < 0.05) (Table S4). LRRN1 did not replicate for stroke in FinnDiane with rare PAVs (p-value = 0.50, N_variant = 4) (Tables 4 and S7). However, when we extended the model to low-frequency PAVs (Tables 4 and S7), thus improved statistical power and imputation quality, LRRN1 replicated for ischemic stroke (p-value = 0.039, N_variant = 6). UACA contained two rare PTVs associated with stroke, of which one replicated through genotyping (p-value = 0.0030, Tables 3 and S5). However, the variant was ultra-rare, and replication thus uncertain. We were unable to replicate HAS1 in T1D due to missing data; we directly genotyped one variant but found no rare allele carriers. ARPC5 did not replicate.

Table 4 Lead genes with replication in individuals with T1D and in the general population.

Full size table

We further attempted replication in the general population by look-ups from two UK Biobank WES studies^23,27 (Tables 4, S8 and S9). Importantly, HAS1 replicated for stroke with rare loss-of-function variants (MAF ≤ 1%: p-value = 0.035²⁷) and with ultra-rare deleterious variants (MAF ≤ 0.1%: p-value = 0.012²³), while UACA replicated with ultra-rare deleterious variants (MAF ≤ 0.01%: p-value = 0.035²⁷). Finally, LRRN1 replicated for stroke with an ultra-rare deleterious variant model (MAF ≤ 0.001%: p-value = 0.026²⁷), although not for ischemic stroke. ANK1 did not replicate with the deleterious missense variant model in the general population²⁷.

Out of the suggestive genes, FOXO1, TARBP2, and MAP3K12 showcased weak replication in T1D (Tables 4, S4-S7). One variant within FOXO1 replicated for hemorrhagic stroke with the minimal adjustment (p-value = 0.012), two within MAP3K12 for hemorrhagic stroke with the additional DKD adjustment (p-value = 0.013); and TARBP2 replicated for hemorrhagic stroke with the minimally adjusted SKAT-O (p-value = 2.59 × 10^–4, MAF ≤ 1%). UK Biobank general population gene burden WES analysis look-ups supported stroke associations for UTS2, MAP3K12, and FOXO1 (Tables 4 and S9)^23,27.

Known Mendelian stroke genes in T1D

Variants on Mendelian stroke risk genes may for instance cause small vessel disease or cerebral cavernous malformations, which can eventually lead to stroke⁹. We inspected the association of 17 autosomal genes previously linked to stroke through nonsynonymous variants (ABCC6, KRIT1, ADA2, COL3A1, COL4A1, COL4A2, COLGALT1, HTRA1, NOTCH3, RNF213, TREX1, CCM2, PDCD10, CTSA, APP, CST3, ITM2B)¹⁹ (Fig. S7). Rare PAVs on KRIT1 were associated with stroke (p-value = 0.018) and ischemic stroke (p-value = 0.0092). Furthermore, rare PAVs on ADA2 and on TREX1 were associated with hemorrhagic stroke (p-value = 0.027 and p-value = 0.010, respectively). Loss-of-function variants on KRIT1 cause vascular malformations, while ADA2 has been linked to autoinflammatory small vessel vasculitis and TREX1 to small vessel disease^9,19.

Sliding-window analyses

To increase statistical power for low-frequency and rare variants on non-coding regulatory regions, we performed genome-wide sliding-window aggregate analyses with the minimal adjustment. We found further evidence for the 4q33-34.1 genomic region as we discovered fourteen windows within the region, with a genome-wide significant association between an aggregate of low-frequency variants and stroke (MAF ≤ 5%; Fig. 5A, Table S10). Importantly, two of these windows (4:170782001–170786000, p-value = 3.40 × 10^–8, CMAC = 934; and 4:170784001–170788000, p-value = 1.10 × 10^–8, CMAC = 1190) and ten individual variants within the 4q33-34.1 genomic region replicated for stroke in T1D (FinnDiane GWAS: p-value < 0.05; Table S11). To identify the most likely effector genes for the 4q33-34.1, we inspected variant expression quantitative trait loci (eQTL) from GTEx Portal and eQTLGen Consortium²⁸, and functional genomics from the 3D Genome Browser²⁹. 4q33-34.1 is located in the same topologically associating domain with distal promoters of GALNTL6, MFAP3L and AADAT in the frontal lobe and hippocampus (Fig. S8). In addition, promoter capture high-throughput chromosome conformation capture (PCHi-C) links could be identified for a few individual variants, e.g., for GALNTL6 in the hippocampus, and AADAT and MFAP3L in the dorsolateral prefrontal cortex.

When we inspected rare variants (MAF ≤ 1%), we discovered multiple suggestive windows, e.g., close to or within the CNTN1, CNTN4, LINC01500, and TGOLN2 genes (Fig. 5B, Table S10). In stroke subtype analysis, the CNTN1 window was genome-wide significantly associated with hemorrhagic stroke (12:40950001–40954000: p-value = 2.10 × 10^–8, CMAC = 24). Interestingly, CNTN1 and CNTN4 are located on different chromosomes, but belong to the same contactin protein family; however, replication is pending. The suggestive window near LINC01500 (14:59004001–59008000: p-value = 2.53 × 10^–7, CMAC = 19) replicated for stroke in T1D (FinnDiane GWAS: p-value = 0.015, CMAC = 56). Four variants within the window were available in the FinnGen general population GWAS, and one replicated for stroke (rs1281241634, p-value = 0.029) (Table S11). According to PCHi-C, the LINC01500 intronic window looped to the DACT1 promoter on the dorsolateral prefrontal cortex (Fig. S9). Finally, the TGOLN2 window replicated for hemorrhagic stroke in T1D (FinnDiane GWAS: p-value = 0.037).

Promoters and enhancers

As a more targeted aggregate approach to explore the non-coding genome, we studied rare and low-frequency variants on established regulatory regions using the minimal adjustment. We discovered three enhancers with suggestive stroke-associated enrichment of rare or low-frequency variants within intronic regions of TRPM3, LOC105378983, and BDNF, encoding brain-derived neurotrophic factor (Tables S12 and S13, Fig. S10). The BDNF enhancer was significant after multiple testing correction for ischemic stroke (p-value = 1.01 × 10^–6, CMAC = 6). Regional aggregate replications were not possible in the T1D specific GWAS (N_variant < 2), and individual variants were missing or did not replicate. PCHi-C linked the BDNF enhancer to its promoter on specific brain regions (Fig. S9).

We did not identify stroke-associated promoters after correction for multiple testing (p-value < 3 × 10^–7, Fig. S11). The strongest associations were two TGOLN2 promoters (p-value = 5.60 × 10^–6, CMAC = 9, MAF ≤ 1%), located on the previously mentioned TGOLN2 window, and a TRPM2-AS promoter (p-value = 5.78 × 10^–6, CMAC = 33, MAF ≤ 1%; Tables S14 and S15). The aggregate of rare variants on TRPM2-AS promoter nearly replicated for stroke in T1D (FinnDiane GWAS: p-value = 0.053). When we inspected variants individually, one out of nine available variants replicated in the general population for ischemic stroke (FinnGen GWAS: p-value = 0.038). In GTEx, rs762428 within the TRPM2-AS promoter associated significantly to TRPM2 level in whole blood (NES = -0.63) and lungs (NES = -0.41, p < 0.001), also nominally in other tissues such as the hypothalamus (NES = -0.42). TRPM2 encodes a calcium-permeable and non-selective cation channel expressed mainly in the brain. The gene has been linked to ischemic stroke³⁰, and belongs to the same protein subfamily as the above mentioned TRPM3. TRPM2 inhibitors have been proposed as a drug target for central nervous system diseases³¹, thus, our results suggested that these inhibitors could be beneficial also for stroke in T1D, although further validation of the genetic associations are needed.

We performed luciferase promoter analysis of the stroke-associated sequence within the TRPM2-AS promoter region to experimentally confirm its promoter activity (Fig. 6). As we detected TRPM2-AS expression in HELA cells but not in HUVEC or HEK-293 cells using semi-quantitative RT-PCR, the luciferase analysis was performed in HELA cells, which indicated strong promoter activity. The most strongly stroke-associated variant, rs753589764, did not significantly affect luciferase activity under normal cell culture conditions (p-value = 0.27, 22 technical repeats). However, we cannot rule out a variant effect under cellular stress, e.g., oxidative stress, or in other cell lines, and therefore, further promoter experiments should be performed in future.

Discussion

Stroke heritability has been estimated to range between 30 and 40%, but the genomic loci identified thus far explain only a small fraction of heritability⁹. One potential explanation underlying the missing heritability are rare variants missed by GWAS. Therefore, we performed WES and WGS in a total of 1,051 Finnish individuals with T1D to discover rare and low-frequency variants associated with stroke and its major subtypes, either specific for T1D, or generalizable to the non-diabetic population. We identified multiple significant loci with evidence of replication, including protein altering or truncating variants on ANK1, HAS1, UACA, and LRRN1, as well as a 4q33-34.1 intergenic region.

With single variant analyses, we identified a missense variant on SREBF1 (rs114001633, p.Pro227Leu), which was exome-wide significantly associated with stroke, and further replicated for hemorrhagic stroke in T1D. As the variant was ultra-rare, and we had a relatively small number of hemorrhagic stroke cases, further replication is needed in T1D to conform this finding. SREBF1 encodes a transcription factor involved in lipid metabolism and insulin signaling³².

Gene aggregate tests (SKAT-O) detected four genes within which PAVs (ANK1 and LRRN1) or PTVs (HAS1 and UACA) were associated with stroke with evidence of replication; LRRN1, HAS1, and UACA after adjustment for DKD. ANK1 did not replicate in T1D with the gene aggregate approach, however, one out of the fifteen available variants replicated for stroke in T1D (rs779805849, p.Val136Glu). Of note, SIFT and PolyPhen predicted many ANK1 variants as deleterious^33,34. ANK1 encodes ankyrin-1, within which variants cause hereditary spherocytosis, an inherited disease that changes the shape of red blood cells³⁵. Previous genome-wide association studies have linked the gene to T2D³⁶, while another gene from the ankyrin protein family, ANK2, is a previously identified stroke risk locus³⁷.

Rare PAVs on LRRN1 were associated with stroke. LRRN1 did not replicate with the corresponding model in T1D, however; with a model extended to low-frequency PAVs, LRRN1 replicated for ischemic stroke. Rare variant replication is problematic with GWAS data due to the uncertainty of the imputation, which may explain the need of increasing the allele frequency threshold to observe a successful replication. Furthermore, LRRN1 was nominally associated with stroke in the general population through an aggregate of ultra-rare loss-of-funtion and deleterious missense variants²⁷. LRRN1 encodes leucine rich repeat neuronal protein 1, with a brain-enriched expression profile.

HAS1 consistently replicated for stroke with rare loss-of-funtion and deleterious variant aggregate models in the general population^23,27, while UACA replicated for stoke with one ultra-rare deleterious variant model²⁷. HAS1 encodes an enzyme producing hyaluronan and with expression induced by inflammation and glycemic stress³⁸. Of note, an increased hyaluronan turnover has been suggested to follow ischemic stroke³⁹. No additional HAS1 PTV carriers were identified among the T1D replication cohort, thus, a diabetes-specific replication is pending. Nevertheless, HAS1 PTVs may be of particular importance in T1D, as dysregulation of endothelial glycocalyx hyaluronan has been suggested to contribute to diabetic complications⁴⁰. Finally, it must be noted that PTVs have not been functionally confirmed as loss-of-function, but the annotations are predictions; PTV at the beginning of a gene is likely more severe than at the end, and in fact, PTVs closer to the HAS1 transcription start site were more strongly associated with stroke.

To increase statistical power on regulatory regions, we performed statistical aggregate tests in genomic windows, enhancers and promoters^41,42. Of note, we extended genomic window length from the default to increase statistical power, which however also reduced precision as the causal region might be narrower. We found fourteen genome-wide significant stroke-associated windows with low-frequency variants on 4q33-34.1, of which two replicated for stroke in T1D. According to eQTLs and PCHi-C interactions, 4q33-34.1 variants most likely target GALNTL6, MFAP3L or AADAT. We also discovered a suggestively stroke-associated window through rare variants within LINC01500, which replicated for stroke in T1D. According to PCHi-C, the LINC01500 window targets a promoter of DACT1. Finally, an aggregate of rare variants was suggestively associated with stroke on TRPM2-AS promoter, which nearly replicated in T1D (p-value = 0.053). Importantly, transient receptor melastatin 2 (TRPM2) has been previously associated with ischemic stroke^30,31. Our functional cell-based assay validated the TRPM2-AS region promoter activity. However, the most strongly stroke-associated variant, rs753589764, did not associate with TRPM2-AS promoter activity under normal cell culture conditions in HELA cells.

Limitations of the study include the limited statistical power due to moderate sample size at the discovery stage, replication of rare variants with imputed GWAS data, and non-conservative statistical estimates for the rarest variants due to case–control imbalance (≈1:6), especially for the stroke subtypes. We were able to improve the statistical power on exomes by meta-analyzing WES and WGS, and we performed the stroke-subtype specific analyses only for a limited number of suggestive findings to avoid spurious signals due to unstable statistical estimates. To further improve statistical power, we performed statistical aggregate tests on gene exons and on intergenic regions, i.e., enhancers, promoters, and genomic windows. Of note, we studied only transcribed enhancers, and thus, some enhancers could have been missed. We defined promoters with an arbitrarily selected 1,000 bp extension downstream TSS, which may not have always been optimal as the promoter lengths vary. Further limitations are the lack of sequencing-based replication data in individuals with T1D, and that we regarded nominal significance as replication (p-value < 0.05). However, we sought for replication by combining available data sources, i.e., FinnGen (Finnish general population GWAS), UK Biobank (general population WES), and FinnDiane (GWAS and genotyping in Finnish individuals with T1D). Of note, stroke cases were younger and had a shorter diabetes duration than controls in the FinnDiane cohorts; the difference being the most extreme in the discovery cohorts, which may have imposed unsuccessful replication for variants with an age or diabetes duration dependent effect. Importantly, gene burden variant selection criteria did not perfectly match to ours within UK Biobank WES^23,27, especially with the low-frequency protein altering variant models, which may explain some unsuccessful gene aggregate replications. Finally, while conducting the analyses in an isolated population has certain advantages for variant discovery, it also raises the question of generalizability of the findings to other populations. In addition to the replication attempted in the UK Biobank, further research is needed to validate our findings in non-Finnish individuals with T1D.

The strengths of this study include a well characterized cohort and comprehensively performed single variant and aggregate analyses both for the coding and non-coding regions of the genome. Stroke is a challenging phenotype to address with ICD codes and many loci associated with rare stroke phenotypes may go unnoticed even with large population-wide genetic studies. We performed analyses for well-defined stroke phenotypes verified by trained neurologists. Furthermore, as we conducted the analyses in specific high-risk individuals from an isolated population, thus with less genetic and phenotypic diversity, we had improved statistical opportunities to identify genetic risk loci.

In conclusion, we studied rare and low-frequency stroke-associated genetic variants with whole-exome or whole-genome sequencing in 1,051 individuals with T1D and report the first genome-wide study on stroke genetics in diabetes. The results highlight 4q33-34.1, SREBF1, and ANK1 for stroke in T1D; and HAS1, UACA, LRRN1, LINC01500, and TRPM2-AS promoter as stroke risk loci that likely generalize to the non-diabetic population. The represented results require future validation with next-generation sequencing in a larger cohort of individuals with T1D.

Methods

Materials

We studied WGS in 571 and WES in 480 non-related individuals with T1D, entailing 112 and 74 stroke cases, respectively (Table 1, Table S1, Fig. S1 and S2). Patients in WGS and WES were non-overlapping. The patient selection for both data sets were originally designed for DKD, such that half of the individuals had severe DKD, and half had no DKD (i.e., normal albumin excretion rate) despite a long duration of T1D^21,43. Importantly, this resulted in stroke cases being younger and having shorter diabetes duration than controls, contradictory to presumption. Individuals in the present study were diagnosed with T1D by their attending physician and had diabetes onset age < 40 and insulin initiated within one calendar year from the diabetes diagnosis. Stroke cases were identified for the participants from Finnish registries based on ICD codes until the end of 2017 (Table S16). The phenotypes were verified, and stroke cases classified into ischemic- and hemorrhagic strokes by trained neurologists using medical files and brain imaging data. For individuals without data verified by neurologists available (N_WGS = 27, N_WES = 2), we considered only the registry data, excluded controls with intermediate stroke phenotypes (e.g., transient ischemic attack), and were unable to classify stroke cases into ischemic- and hemorrhagic subtype. Importantly, we required stroke to have occurred after T1D diagnosis, and controls to have > 35 years of age and > 20 years of diabetes duration. Next-generation sequencing data was processed to GRCh38 reference panel, and variants annotated with SNPEff v.5 software⁴⁴ (Fig. S12). In variant QC, for autosomal variants, we required Hardy–Weinberg equilibrium (HWE) p-value > 10^–10 and variant call rate > 98%; and for X chromosome variants, only variant call rate > 98%. The pipeline is described in Detailed Methods of the Supplementary Information.

Within the FinnDiane study, we have GWAS data for almost the entire cohort, i.e., 6,458 individuals with T1D or their relatives. GWAS data has been previously processed to GRCh37 reference genome. However, we have now lifted the genotyping positions over to GRCh38, re-imputed the data to SISu v3 reference panel, and annotated with SNPEff v.5 software⁴⁴ (Fig. S13). We attempted replication in individuals with T1D within the FinnDiane GWAS data, non-overlapping to sequencing data (N = 3,945, Table S17 and S18, Fig. S14), and restricted to high imputation quality variants (r² > 0.80), and by directly genotyping twelve lead variants for replication (N = 3,263, Table S19, Fig. S15). Stroke cases were younger and had shorter diabetes duration than controls in the replication cohorts, comparably to the discovery cohorts, although with a less extreme difference. Of note, variant genotyping was performed with one Agena iPlex multiplexing assay at the Institute for Molecular Medicine Finland, Helsinki, Finland (Table S20), and the genotyping replication limited to individuals within GWAS data in order to perform relatedness adjustment. Stroke phenotype and control criteria within replication in T1D were defined similarly to the WES and WGS data.

Single variant analyses

We analyzed the genome with an additive inheritance model. For variants available in WES and WGS data, we performed score test with rvtests (version 20190205)⁴⁵, followed by fixed-effect inverse variance based meta-analysis (Total MAC ≥ 5, and MAC ≥ 2 in WES and WGS) with metal (version 20110325)⁴⁶. For variants available only in one data set we utilized exact Firth regression (MAC ≥ 5)⁴⁵. Importantly, Firth logistic regression has been suggested the most conservative statistical test for joint rare variant analyses, especially with case–control imbalance, while score test to have the highest statistical power for rare variant meta-analyses⁴⁷. The additive single variant analyses were adjusted for the calendar year of diabetes onset, sex, and two first genomic data principal components (i.e., minimal adjustment setting), and additionally for DKD, which is one of the most important risk factors of stroke in T1D⁴⁸. WGS and WES stroke controls are older and have longer T1D duration than cases—contrary to true stroke predisposition—due to next-generation sequencing patient selection optimization for DKD by considering T1D duration. Thus, in order to avoid statistical bias, we adjusted for the calendar year of diabetes onset; a major stroke risk factor correlated with age, T1D duration, and T1D treatment quality.

Gene aggregate analyses

In order to improve statistical power for rare (MAF ≤ 1%) and low-frequency (MAF ≤ 5%) variants, we performed gene aggregate analyses with an optimal unified sequence kernel association test (SKAT-O) meta-analysis with MetaSKAT (version 0.81)⁴⁹, separately within two distinct classes (Table S21): protein-altering variants and protein-truncating variants i.e., the more severe putative loss-of-function variants⁵⁰. Importantly, the protein-altering variant class entail protein-truncating variants in addition to variants that alter the amino acid sequence. Of note, SKAT-O maximizes statistical power by optimally combining sequence kernel association test and burden test⁵¹. All variable sites (MAC ≥ 1) were accepted into gene aggregate analysis, and the aggregate tests were required to entail at least two variants (N_variant ≥ 2), with a cumulative MAC (CMAC) across all included variants within the gene ≥ 5. We adjusted the analyses for the calendar year of diabetes onset, sex, and the two first genomic data principal components, and additionally for DKD. We did not report genes with all variants in perfect LD, and inspected individual variant stroke-associations within the genes using the score test fixed-effects meta-analysis^45,46. Multiple testing correction, based on the number of tested genes, resulted in significance thresholds of p-value < 4 × 10^–6 for PAVs (MAF ≤ 1%: N_gene = 11,954; MAF ≤ 5%: N_gene = 13,069), p-value < 8 × 10^–5 for PTVs with MAF ≤ 1% (N_gene = 663), and p-value < 6 × 10^–5 for PTVs with MAF ≤ 5% (N_gene = 908). In addition, we investigated stroke-associations for 17 autosomal Mendelian stroke risk genes regardless of CMAC¹⁹, and were able to report associations for 13 of them.

Sliding-window and regulatory region aggregate analyses with whole-genome sequencing

To increase statistical power for low-frequency and rare variants on intergenic regions, we performed functionally informed sliding-window analyses, i.e., aggregate analyses within 4,000 base pair (bp) regions (N_variant ≥ 2, CMAC ≥ 5)—separated by 2,000 bps—with variants statistically weighted according to their rarity and functional importance using STAAR-O (STAAR R package 0.9.6)^41,52. Functional importance was defined with Combined Annotation-Dependent Depletion (CADD) data⁵² using variant MAF (to up-weight rarer variants), pre-computed CADD score, and the first annotation principal component from seven annotation classes (Fig. S16, Table S22), calculated following the guidelines⁴¹. Of note, the scores were utilized on the PHRED scale. We adjusted the analyses for the calendar year of diabetes onset, sex, and the two first genomic data principal components.

We studied established regulatory regions, i.e., enhancers and promoters (N_variant ≥ 2, CMAC ≥ 5), as defined in FANTOM5 cap analysis of gene expression (CAGE) human data reprocessed to the GRCh38 reference genome⁴², with promoters defined as the transcription start site (TSS) extended to 1,000 bp, and weighted by the variant rarity in PHRED scale. FANTOM5 atlases have been measured with multiple human primary cell lines, tissues, and cancer cell lines^53,54. The regulatory regions were analyzed with STAAR R package 0.9.6⁴¹, by adjusting for calendar year of diabetes onset, sex, and two first genomic data principal components. With low-frequency variants, the multiple testing corrected significance thresholds were p-value < 2.9 × 10^–7 for promoters (N_region = 172,134) and p-value < 2.6 × 10^–6 for enhancers (N_region = 19,472). For rare variants, the thresholds were p-value < 3.5 × 10^–7 (N_region = 141,779) and p-value < 4.3 × 10^–6 (N_region = 11,665), respectively. We did not report regions with all variants in perfect LD.

Replication

Within the FinnDiane GWAS data, we attempted replication of high imputation quality genetic variants (r² > 0.80) with score test (rvtests 20190205⁴⁵) and had good statistical power (> 80%) to detect a nominal association with an odds ratio (OR) ≥ 2.5 for additive low-frequency variants (MAF = 1%) (Fig. S17)⁵⁵. However, for rare variants with MAF = 0.1% and OR < 9, we had only limited power to detect an association even with nominal significance (p-value < 0.05). Thus, we considered nominal significance as the replication threshold (p-value < 0.05). We attempted direct genotyping for replication for twelve variants, but minor allele carriers were observed only for seven of them (Table S20). We performed single variant analyses for the genotyped variants similarly with score test, except for one LRRN1 variant with linear regression and no relatedness adjustment (stats R package 4.2.1) due to lack of alternative allele carriers among individuals with the required relatedness information. Most variants within the aggregate discoveries were rare or ultra-rare (MAF≈0.1%), making replication with imputed genomic data problematic. Nevertheless, we attempted replication within the FinnDiane GWAS data (r² > 0.80) by including also the directly genotyped variants (SKAT-O, STAAR-O). We performed SKAT-O using GMMAT R package 1.3.2 by imputing missing genotype dosages to mean⁵⁶, while intergenic aggregate analyses were performed similarly with STAAR R package⁴¹. Replication analyses were adjusted comparably to the discovery stage analyses, except that relatedness in replication was accounted for with relatedness matrices instead of genomic principal components (Balding-Nichol’s approximation kinship matrix in single variant analysis and GEMMA relatedness matrix in aggregate analyses)^45,57. We attempted replication in the general population for genetic variants from the large-scale population-wide FinnGen project release 6 GWAS data with phenotypes best matching our definitions (https://www.finngen.fi/en) (Table S23), and for the gene aggregate discoveries from UK Biobank summary statistics^23,27. Of note, no proxies in LD were found for the lead single variant findings (rs4435704, rs4401420), and thus, we did not consider linkage disequilibrium in replication beyond the traditional imputation approach.

Functional characterization of the genetic variants and regions

We inspected genetic variant characteristics from GTEx Portal, eQTLGen Consortium (p-value < 0.05)²⁸, RegulomeDB⁵⁸, YUE Lab²⁹, and the Ensemble Variant Effect Predictor^33,34,59. Functional characterization of the TRPM2-AS promoter is described in Detailed Methods of the Supplementary Information. In short, we assessed TRPM2-AS expression in three cell lines (HELA, HEK-293, HUVEC) and noted expression in HELA cells. We then assessed the influence of the chromosomal location and the genotype of the most strongly stroke-associated variant (rs753589764) on promoter activity in HELA cells under normal cell culture conditions with a dual-luciferase reporter assay (22 technical repeats).

Detailed methods

Detailed Methods are available in the Supplementary Information.

Ethical approval

The study protocol has been approved by the ethics committee of the Helsinki and Uusimaa Hospital District (491/E5/2006, 238/13/03/00/2015, and HUS-3313-2018), and performed in accordance with the Declaration of Helsinki. All participants gave informed consent before participation.

Data availability

Individual-level data for the study participants are not publicly available, because of the restrictions due to the study consent provided by the participant at the time of data collection. The readers may propose collaboration to research the individual level data with correspondence with the lead investigator. Gene aggregate test stroke summary statistics are provided in the Supplementary Data.

Code availability

We utilized public softwares for the statistical analyses: plink2 (https://www.cog-genomics.org/plink/2.0/), metal (http://csg.sph.umich.edu/abecasis/metal/), rvtests (http://zhanxw.github.io/rvtests/), metaSKAT (https://cran.r-project.org/web/packages/MetaSKAT/), STAAR-O (https://github.com/xihaoli/STAAR), and GMMAT (https://github.com/hanchenphd/GMMAT).

References

Harjutsalo, V., Barlovic, D. P. & Groop, P.-H. Long-term population-based trends in the incidence of cardiovascular disease in individuals with type 1 diabetes from Finland: A retrospective, nationwide, cohort study. Lancet Diabetes Endocrinol. 9, 575–585 (2021).
Article PubMed Google Scholar
Sun, H. et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res. Clin. Pract. 183 (2022).
Forlenza, G. P. & Rewers, M. The epidemic of type 1 diabetes: what is it telling us?. Curr. Opin. Endocrinol. Diabetes Obes. 18, 248–251 (2011).
Article PubMed Google Scholar
Ståhl, C. H. et al. Glycaemic control and excess risk of ischaemic and haemorrhagic stroke in patients with type 1 diabetes: a cohort study of 33 453 patients. J. Intern. Med. 281, 261–272 (2017).
Article PubMed Google Scholar
Janghorbani, M. et al. Prospective study of type 1 and type 2 diabetes and risk of stroke subtypes: the Nurses’ Health Study. Diabetes Care 30, 1730–1735 (2007).
Article PubMed Google Scholar
Thorn, L. M. et al. Clinical and MRI features of cerebral small-vessel disease in type 1 diabetes. Diabetes Care 42, 327–330 (2019).
Article PubMed Google Scholar
Putaala, J. et al. Diabetes mellitus and ischemic stroke in the young. Neurology 76, 1831–1837 (2011).
Article CAS PubMed Google Scholar
Hägg, S. et al. Incidence of stroke according to presence of diabetic nephropathy and severe diabetic retinopathy in patients with type 1 diabetes. Diabetes Care 36, 4140–4146 (2013).
Article PubMed PubMed Central Google Scholar
Dichgans, M., Pulit, S. L. & Rosand, J. Stroke genetics: discovery, biology, and clinical applications. Lancet Neurol. 18, 587–599 (2019).
Article PubMed Google Scholar
Debette, S. & Markus, H. S. Stroke genetics: discovery, insight into mechanisms, and clinical perspectives. Circ. Res. 130, 1095–1111 (2022).
Article CAS PubMed Google Scholar
Mishra, A. et al. Stroke genetics informs drug discovery and risk prediction across ancestries. Nature 611, 115–123 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ylinen, A. et al. The impact of parental risk factors on the risk of stroke in type 1 diabetes. Acta Diabetol. 58, 911–917 (2021).
Article PubMed PubMed Central Google Scholar
Syreeni, A. et al. Haptoglobin genotype does not confer a risk of stroke in type 1 diabetes. Diabetes 71, 2728–2738 (2022).
Article CAS PubMed Google Scholar
Dahlström, E. H. et al. The low-expression variant of FABP4 is associated with cardiovascular disease in type 1 diabetes. Diabetes 70, 2391–2401 (2021).
Article PubMed Google Scholar
Vujkovic, M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691 (2020).
Article CAS PubMed PubMed Central Google Scholar
Antikainen, A. A. V. et al. Genome-wide association study on coronary artery disease in type 1 diabetes suggests beta-defensin 127 as a risk locus. Cardiovasc. Res. 117, 600–612 (2021).
Article CAS PubMed Google Scholar
Qi, L. et al. Association between a genetic variant related to glutamic acid metabolism and coronary heart disease in individuals with type 2 diabetes. JAMA 310, 821–828 (2013).
Article CAS PubMed Google Scholar
Fall, T., Gustafsson, S., Orho-Melander, M. & Ingelsson, E. Genome-wide association study of coronary artery disease among individuals with diabetes: the UK Biobank. Diabetologia 61, 2174–2179 (2018).
Article CAS PubMed PubMed Central Google Scholar
Grami, N. et al. Global assessment of Mendelian stroke genetic prevalence in 101 635 individuals from 7 ethnic groups. Stroke 51, 1290–1293 (2020).
Article PubMed Google Scholar
Si, Y., Vanderwerff, B. & Zöllner, S. Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms. Genetics 217, 0011 (2021).
Article Google Scholar
Sandholm, N. et al. Whole-exome sequencing identifies novel protein-altering variants associated with serum apolipoprotein and lipid concentrations. Genome Med. 14, 132 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hu, Y. et al. Whole-genome sequencing association analyses of stroke and its subtypes in ancestrally diverse populations from trans-omics for precision medicine project. Stroke 53, 875–885 (2022).
Article CAS PubMed Google Scholar
Jurgens, S. J. et al. Analysis of rare genetic variation underlying cardiometabolic diseases and traits among 200,000 individuals in the UK Biobank. Nat. Genet. 54, 240–250 (2022).
Article CAS PubMed PubMed Central Google Scholar
Locke, A. E. et al. Exome sequencing of Finnish isolates enhances rare-variant association power. Nature 572, 323–328 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Thorn, L. M. et al. Metabolic syndrome in type 1 diabetes: Association with diabetic nephropathy and glycemic control (the FinnDiane study). Diabetes Care 28, 2019–2024 (2005).
Article PubMed Google Scholar
Zinnanti, W. J. et al. Mechanism of metabolic stroke and spontaneous cerebral hemorrhage in glutaric aciduria type I. Acta Neuropathol. Commun. 2, 13 (2014).
Article PubMed PubMed Central Google Scholar
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK biobank participants. Nature 599, 628–634 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. The 3D genome browser: A web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 19, 151 (2018).
Article PubMed PubMed Central Google Scholar
Zong, P., Lin, Q., Feng, J. & Yue, L. A systemic review of the integral role of TRPM2 in ischemic stroke: From upstream risk factors to ultimate neuronal death. Cells 11, 491 (2022).
Article CAS PubMed PubMed Central Google Scholar
Belrose, J. C. & Jackson, M. F. TRPM2: A candidate therapeutic target for treating neurological diseases. Acta Pharmacol. Sin. 39, 722–732 (2018).
Article CAS PubMed PubMed Central Google Scholar
DeBose-Boyd, R. A. & Ye, J. SREBPs in lipid metabolism, insulin signaling, and beyond. Trends Biochem. Sci. 43, 358–368 (2018).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Article CAS PubMed Google Scholar
Park, J. et al. Mutational characteristics of ANK1 and SPTB genes in hereditary spherocytosis. Clin. Genet. 90, 69–78 (2016).
Article CAS PubMed Google Scholar
Yan, R. et al. A novel type 2 diabetes risk allele increases the promoter activity of the muscle-specific small ankyrin 1 gene. Sci. Rep. 6, 25105 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Malik, R. et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes. Nat. Genet. 50, 524–537 (2018).
Article CAS PubMed PubMed Central Google Scholar
Siiskonen, H., Oikari, S., Pasonen-Seppänen, S. & Rilla, K. Hyaluronan synthase 1: A mysterious enzyme with unexpected functions. Front. Immunol. 6, 1–11 (2015).
Article CAS Google Scholar
Katarzyna Greda, A. & Nowicka, D. Hyaluronidase inhibition accelerates functional recovery from stroke in the mouse brain. J. Neurochem. 157, 781–801 (2021).
Article CAS PubMed Google Scholar
Wang, G., Tiemeier, G. L., van den Berg, B. M. & Rabelink, T. J. Endothelial glycocalyx hyaluronan: Regulation and role in prevention of diabetic complications. Am. J. Pathol. 190, 781–790 (2020).
Article CAS PubMed Google Scholar
Li, X. et al. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat. Genet. 52, 969–983 (2020).
Article CAS PubMed PubMed Central Google Scholar
Abugessaisa, I. et al. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies. Sci. Data 4, 1–10 (2017).
Article Google Scholar
Sandholm, N. et al. The genetic landscape of renal complications in type 1 diabetes. J. Am. Soc. Nephrol. 28, 557–574 (2017).
Article PubMed Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Article CAS PubMed Google Scholar
Zhan, X., Hu, Y., Li, B., Abecasis, G. R. & Liu, D. J. RVTESTS: An efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics 32, 1423–1426 (2016).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ma, C., Blackwell, T., Boehnke, M. & Scott, L. J. Recommended joint and meta-analysis strategies for case-control association testing of single low-count variants. Genet. Epidemiol. 37, 539–550 (2013).
Article PubMed PubMed Central Google Scholar
Hägg, S. et al. Different risk factor profiles for ischemic and hemorrhagic stroke in type 1 diabetes mellitus. Stroke 45, 2558–2562 (2014).
Article PubMed Google Scholar
Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013).
Article CAS PubMed PubMed Central Google Scholar
Rivas, M. A. et al. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348, 666–669 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012).
Article PubMed PubMed Central Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Article CAS PubMed Google Scholar
Fantom Consortium. A promoter-level mammalian expression atlas. Nature 507, 462 (2014).
Article ADS Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Moore, C. M., Jacobson, S. A. & Fingerlin, T. E. Power and sample size calculations for genetic association studies in the presence of genetic model misspecification. Hum. Hered. 84, 256–271 (2019).
Article CAS PubMed Google Scholar
Chen, H. et al. Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole-genome sequencing studies. Am. J. Hum. Genet. 104, 260–274 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Pruim, R. J. et al. LocusZoom: Regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hahne, F. & Ivanek, R. Visualizing genomic data using gviz and bioconductor. In Statistical Genomics: Methods and Protocols Vol. 1418 (eds Mathé, E. & Davis, S.) 335–351 (Humana Press, New York, 2016).
Chapter Google Scholar

Download references

Acknowledgements

We are indebted to the late Carol Forsblom (1964–2022), the international coordinator of the FinnDiane Study Group, for his considerable contribution. The skilled technical assistance of Heli Krigsman, Hanna Olanne, Maikki Parkkonen, Mira Rahkonen, Anna Sandelin, and Jaana Tuomikangas is gratefully acknowledged. We also want to acknowledge all the physicians and nurses at each FinnDiane center participating in the recruitment and characterization of the individuals with T1D (Table S24) and the FinnDiane participants. In addition, we acknowledge the participants and investigators of the FinnGen study. We acknowledge the ELIXIR Finland node, hosted at the CSC – IT Center for Science for ICT resources, enabling the WES and WGS data processing. Finally, we want to acknowledge Bert Vogelstein and Jukka Kallijärvi for material provided for the in vitro promoter experiments: pBV-Luc plasmids (Bert Vogelstein) and renilla control plasmid (Kallijärvi lab, Folkhälsan Research Center). We utilized data provided by GTEx. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data reported here were obtained from the GTEx Portal on 01/04/2022: https://gtexportal.org/home/. The full list of doctors and nurses examining the participants is supplied in Supplementary Table S24.

Funding

This work was supported by grants from Folkhälsan Research Foundation; Wilhelm and Else Stockmann Foundation; “Liv och Hälsa” Society; Sigrid Juselius Foundation (220027); Helsinki University Central Hospital Research Funds [TYH2018207]; Novo Nordisk Foundation [NNF OC0013659 and NNF23OC0082732], Academy of Finland [299200 and 316664]; European Foundation for the Study of Diabetes (EFSD) Young Investigator Research Award funds; an EFSD award supported by EFSD/Sanofi European Diabetes Research Programme in Macrovascular Complications; Finnish Foundation for Cardiovascular Research; Finnish Diabetes Research Foundation; Aarne Koskelo Foundation; and the Ida Montini Foundation. The funders had no role in the study design, collection, analysis, interpretation and writing of the manuscript.

Author information

A list of authors and their affiliations appears at the end of the paper.

Authors and Affiliations

Folkhälsan Institute of Genetics, Folkhälsan Research Center, Helsinki, Finland
Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop, Niina Sandholm, Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop & Niina Sandholm
Department of Nephrology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop, Niina Sandholm, Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop & Niina Sandholm
Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop, Niina Sandholm, Anni A. Antikainen, Jani K. Haukka, Anmol Kumar, Anna Syreeni, Stefanie Hägg-Holmberg, Anni Ylinen, Lena M. Thorn, Valma Harjutsalo, Per-Henrik Groop & Niina Sandholm
Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
Elina Kilpeläinen, Anastasia Kytölä & Aarno Palotie
Analytic and Translational Genetics Unit, Department of Medicine, Department of Neurology and Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
Aarno Palotie
The Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
Aarno Palotie
Neurology, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
Jukka Putaala & Jukka Putaala
Department of General Practice and Primary Health Care, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
Lena M. Thorn & Lena M. Thorn
Department of Diabetes, Central Clinical School, Monash University, Melbourne, VIC, Australia
Per-Henrik Groop & Per-Henrik Groop

Authors

Anni A. Antikainen
View author publications
You can also search for this author in PubMed Google Scholar
Jani K. Haukka
View author publications
You can also search for this author in PubMed Google Scholar
Anmol Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Anna Syreeni
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Hägg-Holmberg
View author publications
You can also search for this author in PubMed Google Scholar
Anni Ylinen
View author publications
You can also search for this author in PubMed Google Scholar
Elina Kilpeläinen
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Kytölä
View author publications
You can also search for this author in PubMed Google Scholar
Aarno Palotie
View author publications
You can also search for this author in PubMed Google Scholar
Jukka Putaala
View author publications
You can also search for this author in PubMed Google Scholar
Lena M. Thorn
View author publications
You can also search for this author in PubMed Google Scholar
Valma Harjutsalo
View author publications
You can also search for this author in PubMed Google Scholar
Per-Henrik Groop
View author publications
You can also search for this author in PubMed Google Scholar
Niina Sandholm
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

the FinnDiane Study Group

Anni A. Antikainen
, Jani K. Haukka
, Anmol Kumar
, Anna Syreeni
, Stefanie Hägg-Holmberg
, Anni Ylinen
, Jukka Putaala
, Lena M. Thorn
, Valma Harjutsalo
, Per-Henrik Groop
& Niina Sandholm

Contributions

A.A.A. analyzed the data, wrote the manuscript, and contributed to interpretation of the data, conception and study design, and pre-processing of whole-exome sequencing data. J.H. pre-processed the whole-genome sequencing data and contributed to computational analyses and conception and study design. N.S. contributed to acquisition of phenotypic and genotypic data, conception and study design, manuscript writing and interpretation of data. A.Ku. performed laboratory experiments. A.S, E.K., A.Ky., and A.P. contributed to acquisition and data processing of genetic data. S.H.-H. and A.Y. contributed to acquisition of phenotypic data. J.P., L.M.T., V.H. and P.-H.G. contributed to interpretation of data, acquisition of phenotypic data, and to conception and study design. J.H., A.Ku., A.S., S.H.-H., A.Y., E.K., A.Ky., A.P., J.P., L.M.T, V.H., P.-H.G., and N.S. revised the manuscript critically for important intellectual content. All authors gave final approval of the version to be submitted and any revised version.

Corresponding authors

Correspondence to Per-Henrik Groop or Niina Sandholm.

Ethics declarations

Competing interests

P.-H.G. has received investigator-initiated research grants from Eli Lilly and Roche, is an advisory board member for AbbVie, Astellas, AstraZeneca, Bayer, Boehringer Ingelheim, Cebix, Eli Lilly, Janssen, Medscape, Merck Sharp & Dohme, Mundipharma, Nestlé, Novartis, Novo Nordisk and Sanofi; and has received lecture fees from AstraZeneca, Bayer, Boehringer Ingelheim, Eli Lilly, Elo Water, Genzyme, Merck Sharp & Dohme, Medscape, Novartis, Novo Nordisk, PeerVoice, Sanofi, and Sciarc. The other authors declare that they have no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Antikainen, A.A., Haukka, J.K., Kumar, A. et al. Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes. Sci Rep 14, 13453 (2024). https://doi.org/10.1038/s41598-024-61840-7

Download citation

Received: 25 September 2023
Accepted: 10 May 2024
Published: 11 June 2024
DOI: https://doi.org/10.1038/s41598-024-61840-7
Springer Nature Limited

Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes

Abstract

Similar content being viewed by others

The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay

Novel functional insights into ischemic stroke biology provided by the first genome-wide association study of stroke in indigenous Africans

Stroke genetics informs drug discovery and risk prediction across ancestries

Introduction

Results

Study design

Single variant analyses

Gene aggregate analyses

Replication of gene aggregate findings

Known Mendelian stroke genes in T1D

Sliding-window analyses

Promoters and enhancers

Discussion

Methods

Materials

Single variant analyses

Gene aggregate analyses

Sliding-window and regulatory region aggregate analyses with whole-genome sequencing

Replication

Functional characterization of the genetic variants and regions

Detailed methods

Ethical approval

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Consortia

the FinnDiane Study Group

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation