Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Usoltsev, Dmitrii; Kolosov, Nikita; Rotar, Oxana; Loboda, Alexander; Boyarinova, Maria; Moguchaya, Ekaterina; Kolesova, Ekaterina; Erina, Anastasia; Tolkunova, Kristina; Rezapova, Valeriia; Molotkov, Ivan; Melnik, Olesya; Freylikhman, Olga; Paskar, Nadezhda; Alieva, Asiiat; Baranova, Elena; Bazhenova, Elena; Beliaeva, Olga; Vasilyeva, Elena; Kibkalo, Sofia; Skitchenko, Rostislav; Babenko, Alina; Sergushichev, Alexey; Dushina, Alena; Lopina, Ekaterina; Basyrova, Irina; Libis, Roman; Duplyakov, Dmitrii; Cherepanova, Natalya; Donner, Kati; Laiho, Paivi; Kostareva, Anna; Konradi, Alexandra; Shlyakhto, Evgeny; Palotie, Aarno; Daly, Mark J.; Artomov, Mykyta

doi:10.1038/s41467-024-50304-1

Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Article
Open access
Published: 23 July 2024

Volume 15, article number 6212, (2024)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Complex trait susceptibilities and population diversity in a sample of 4,145 Russians

Download PDF

Dmitrii Usoltsev ORCID: orcid.org/0000-0001-8072-310X^1,2,3,4,5,
Nikita Kolosov^1,2,3,4,5,
Oxana Rotar¹,
Alexander Loboda^1,2,3,
Maria Boyarinova¹,
Ekaterina Moguchaya¹,
Ekaterina Kolesova¹,
Anastasia Erina¹,
Kristina Tolkunova¹,
Valeriia Rezapova^1,2,3,
Ivan Molotkov^4,5,
Olesya Melnik¹,
Olga Freylikhman ORCID: orcid.org/0000-0002-2850-728X¹,
Nadezhda Paskar¹,
Asiiat Alieva¹,
Elena Baranova¹,
Elena Bazhenova¹,
Olga Beliaeva¹,
Elena Vasilyeva¹,
Sofia Kibkalo¹,
Rostislav Skitchenko ORCID: orcid.org/0000-0002-7203-8206¹,
Alina Babenko¹,
Alexey Sergushichev ORCID: orcid.org/0000-0003-1159-7220²,
Alena Dushina⁶,
Ekaterina Lopina⁶,
Irina Basyrova⁶,
Roman Libis⁶,
Dmitrii Duplyakov ORCID: orcid.org/0000-0002-6453-2976^7,8,
Natalya Cherepanova⁸,
Kati Donner⁹,
Paivi Laiho ORCID: orcid.org/0009-0004-3302-4321¹⁰,
Anna Kostareva^1,2,
Alexandra Konradi ORCID: orcid.org/0000-0001-8169-7812^1,2,
Evgeny Shlyakhto¹,
Aarno Palotie ORCID: orcid.org/0000-0002-2527-5874^3,9,11,
Mark J. Daly ORCID: orcid.org/0000-0002-0949-8752^3,9,11 &
…
Mykyta Artomov ORCID: orcid.org/0000-0001-5282-8764^{1,2,3,4,5,9,11}

3265 Accesses
14 Altmetric
Explore all metrics

Abstract

The population of Russia consists of more than 150 local ethnicities. The ethnic diversity and geographic origins, which extend from eastern Europe to Asia, make the population uniquely positioned to investigate the shared properties of inherited disease risks between European and Asian ancestries. We present the analysis of genetic and phenotypic data from a cohort of 4,145 individuals collected in three metro areas in western Russia. We show the presence of multiple admixed genetic ancestry clusters spanning from primarily European to Asian and high identity-by-descent sharing with the Finnish population. As a result, there was notable enrichment of Finnish-specific variants in Russia. We illustrate the utility of Russian-descent cohorts for discovery of novel population-specific genetic associations, as well as replication of previously identified associations that were thought to be population-specific in other cohorts. Finally, we provide access to a database of allele frequencies and GWAS results for 464 phenotypes.

Mexican Biobank advances population and medical genomics of diverse ancestries

Article Open access 11 October 2023

The first insight into the genetic structure of the population of modern Serbia

Article Open access 07 July 2021

FinnGen provides genetic insights from a well-phenotyped isolated population

Article Open access 18 January 2023

Introduction

Linking inherited DNA variation to the disease risks is one of the main goals in modern predictive medicine. Large-scale projects such as the UK Biobank¹, FinnGen² and Biobank Japan³ have made a substantial contribution to the understanding of human biology and the advancement of personalized medicine. The increasing ethnic diversity of genetic studies resulted in the discovery of population-specific susceptibility loci that could not be identified in other ancestries⁴. Inclusion of understudied populations into biobank initiatives improves fine-mapping accuracy of previously identified GWAS signals and novel risk gene discovery efforts⁵.

Genetic studies in the multinational Russian population, which is geographically located at the crossroads of Europe and Asia, could have a potential power to detect historic origins of population-specific variants. Additionally, genetic data from the Russian population could be a powerful source of replication for population-specific associations found in three largest biobanks, UK biobank¹, FinnGen², and Biobank Japan³.

Historically, polygenic trait genetics has been omitted in Russia, resulting in a lack of GWAS studies based on local cohorts. Russian-descent individuals were involved primarily either as a small part of consortia datasets or as a basis for population genetic studies lacking phenotypic information^6,7,8,9,10.

At the same time, there are more than 150 local ethnicities in Russia that would greatly benefit from the local large genetic variation studies. This becomes especially important in light of the lack of transferability of polygenic risk score models between ancestries^11,12,13. The aforementioned variety of ethnicities represents the genetic history of populations between Eastern Europeans, Finns, and Asians. For example, a study of whole genome sequencing data previously showed that populations in the northern regions of Russia, west of the Ural Mountains, not only belong to a Finno-Ugric language group but are also genetically close to Finns, while populations in the central regions of western Russia showed similarities with eastern Europeans¹⁴. Furthermore, close genetic relation between north-western Russians and Finns was also observed in Balto-Slavic speaking populations comparison¹⁵. In addition, another study showed that Siberian populations separated from other East Asian populations 8800–11,200 years ago and significantly contributed to the formation of Eastern European populations 4700–8000 years ago¹⁶. Thus, a gene flow from Asia through the Ural Mountains to eastern Europe was hypothesized. Consistently, a notable genetic relationship between Finns and Mongolian tribes was observed¹⁷. Additional evidence of the great diversity of the Russian population linking European, Asian, and Native American populations was presented in a country-wide study – Genome Russia Project, yet no phenotypic information was collected at that time resulting in the lack of biomedical applications for this data¹⁸.

For studies reflecting population history, a small sample set (~100 individuals) is usually sufficient; however, personalized medicine, biobank assembly and GWAS require a larger sample size and extensive data collection, which is impossible without epidemiological studies. In 2012—2013 in 12 regions of Russia a national study “Epidemiology of cardiovascular diseases in different regions of the Russian Federation” (ESSE-RF) was launched. Within the framework of this study, a stratified multistage random sample of approximately 22,000 residents with blood biobanking and detailed phenotyping was collected¹⁹.

Here, we present the first results of the analysis of clinical and genetic data from three metro areas that participated in the ESSE-RF study: St. Petersburg, Samara, and Orenburg. Our results reflect the genetic structure of the Western Russian population and phenotypic susceptibilities from 4145 participants across 464 phenotypes.

Results

Data collection

A cohort of 4800 residents of three areas in Russia – St. Petersburg (N = 1600, N males = 573, N females = 1027), Orenburg (N = 1600, N males = 656, N females = 944) and Samara (N = 1600, N males = 697, N females = 903) were recruited in 2012–2013 through an ambulatory visit to local hospitals and polyclinics (aged 46 ± 12 SD years) (Fig. 1a). Detailed phenotypic data and a blood sample were collected. All participants signed an informed consent. The study protocol was approved by local ethics committees (Almazov National Medical Research Center, St. Petersburg) (Methods, Data collection; Supplementary Note 1).

**Fig. 1: Cohort description and study design.**

Each patient was invited for an ambulatory visit for one day to collect phenotypic information (Fig. 1b). In 2018–2019, 289 out of 1600 original patients from St. Petersburg were invited for an additional ambulatory visit as a part of different local studies (familial hypercholesterolemia, metabolically healthy obesity, premature vascular aging). These patients were subjected to a detailed follow-up data collection protocol (Methods, Data collection; Supplementary Note 2).

For all participants, biannual updates on phenotypic and vital status were recorded through direct contact (phone calls) and indirect contacts (mail/e-mail letters, information from local clinical district databases, Supplementary Data 1). Additional independent cohort of 138 samples were recruited in 2017–2018 for participation as controls in a local study of early childhood starvation effects. They were evaluated using a similar clinical short protocol, and no follow-up data collection was performed (Supplementary Note 3).

Genetic data generation and imputation

A total of 4723 individuals, comprising 4594 population samples (ESSE) and 129 starvation controls were genotyped using FinnGen ThermoFisher Axiom microarray². Genotype imputation was conducted using Haplotype Reference Consortium (HRC) panel^12,20 and Beagle 5.2 software²¹ (Methods. Genetic data generation and imputation). The quality of the imputation was assessed using a masking experiment (Supplementary Note 4.1, Supplementary Fig. 1).

Samples with sex-mismatch (224 ESSE and 28 Starvation Controls) (Supplementary Materials, Supplementary Note 4.2, Supplementary Fig. 2) and duplicated samples (190) were excluded. Additionally, 347 related individuals were marked for further analysis (Supplementary Note 4.3, Supplementary Fig. 3).

The final dataset included 4281 individuals (4183 ESSE and 98 Starvation Controls) and 11,077,763 variants. Overall, 37,439 variants failed Hardy–Weinberg equilibrium (p < 1 × 10⁻⁴) and 371 variants were discordant with the HRC imputation panel. Allele frequencies were compared against gnomAD Finnish and non-Finnish European populations (Supplementary Note 4.4, Supplementary Fig. 4).

Population structure analysis

Initially, we performed principal component analysis (PCA) to identify ancestral clusters within the dataset (Methods; Population structure analysis; Fig. 2a; Supplementary Note 5; Supplementary Figs. 5 and 6). Joint PCA with the 1000 Genomes dataset indicated that the western Russian population is represented by individuals of admixed ancestry, spanning from European to East Asian continental ancestry (Fig. 2b, Supplementary Fig. 7a). A separate joint PCA with only European subpopulations from 1000 Genomes demonstrated closer relatedness of Russians to Finnish population. This relationship was particularly evident in the overlap observed at the PC1:PC2 plane (Supplementary Note 6, Fig. 2c, Supplementary Fig. 7b). We were able to distinguish between Russians and Finnish populations at higher principal components (PC3-PC4; Supplementary Fig. 8). Six clusters within the Russian data set were identified in the PCA space (Supplementary Note 7, Fig. 2d, Supplementary Fig. 9).

We calculated the contribution of haplotypes from geographically proximate ancestries represented in 1000 Genomes to each cluster using the ADMIXTURE²² (Supplementary Note 8, Fig. 2e, Supplementary Fig. 10). The first cluster included mostly individuals related to southern Europeans. The second cluster predominantly included a population close to Utah residents (CEU) with Northern and Western European ancestry and the British with a small admixture of Finnish and Asian ancestry. In the next four clusters, the proportion of Finnish and Asian haplotypes increased and the proportion Northern and Western European ancestry haplotypes decreased, which was consistent with the location of these clusters in the principal components space; these clusters co-localized with the Finnish population in PC1-PC2 (Supplementary Note 6, Fig. 2c). Furthermore, in order to establish the independence of our findings from the genotyping platform, we conducted an ADMIXTURE analysis exclusively utilizing HapMap variants. The outcome of this analysis revealed no qualitative alterations in our observations (Supplementary Fig. 11).

Next, F_st and IBD-sharing statistics between Russians and other populations from 1000 Genomes were calculated. We found that according to the F_st Russian population in general is close to CEU (Supplementary Fig. 12). However, a more precise cluster comparison showed, for example, that cluster 4 was close to Finnish population and cluster 6 was closest to Asian populations which was consistent with PCA and ADMIXTURE analysis (Supplementary Note 9, Supplementary Fig. 13).

To verify the relationship between Russians and Finns we calculated IBD-sharing statistics between all Russian clusters and 1000 Genomes populations. We collected all IBD regions with LOD score quality greater than 3 for each pair of individuals. Next, we merged the IBD regions if the gap was not greater than 0.6 cM and if there was not more than 1 discordant homozygous variant. We summarized the length of resulting IBD regions for each pair of individuals and calculated the median IBD length between pairs of populations (Supplementary Fig. 14). The resulting heatmap for all populations is shown in Fig. 2f. According to the clustering analysis, the Finns are closer to the first five Russian clusters than to the European populations from 1000 Genomes (Supplementary Note 10).

Conclusively, the Russian population sampled in the large metro-areas represents a heterogeneous combination of individuals of admixed ancestries between European and Asian populations. In addition, for a subgroup of the sixth cluster with the highest PC1 values more than half of the haplotypes are of Asian origin. This finding is expected, as these samples primarily come from Orenburg, a region located near the border with Kazakhstan (Central Asia). It has previously been shown that individuals from Central Asia have a mixture of European and Asian haplotypes^23,24,25.

To eliminate the potential influence of systematic errors linked to imputation, we conducted a population structure analysis solely utilizing genotyped variants. Notably, the comparison between cluster assessments based on genotyped variants and assessments involving all variants revealed a robust correlation (R² = 0.87). This correlation highlighted the strong alignment between these two approaches. As a result, the outcomes of our population structure analysis provided strong validation for the consistency of our observations, underscoring that imputation had no significant effect on the results (Supplementary Note 11, Supplementary Fig. 15).

Enrichment of finnish and russian variants

The analysis of population structure revealed an intricate pattern of relatedness among Russians, Finns, and East Asians. The Finnish population is historically unique; however, the genetic similarity of the Russian population illustrated above suggests that DNA variants enriched in Finns might also be found in Russia.

To assess the population-specific properties of DNA variants, we created the distributions of log₂ allele frequency ratios between the target population and non-Finnish Europeans from gnomAD (NFE) for 265,624 Finnish enriched variants (Methods. Enrichment of Finnish and Russian variants; Fig. 3a, Supplementary Fig. 16). The medians of these log-ratio allele frequency distributions (excluding variants that were not observed) increased from cluster 2 (0.56) to cluster 6 (3.05). In the sixth cluster, it reached an even higher value than in the Finnish population (3.032). As the fraction of East Asian haplotypes increases from cluster 2 to cluster 6 of the Russian population, we investigated the distribution of log-ratios between allele frequencies of Finnish-enriched variants in East Asians compared to Non-Finnish Europeans (Fig. 3a, Supplementary Data 2). The median value of enrichment was 3.962 which was more than observed in the Finnish population. We found that a significant part (58.35%, N = 154,981) of the Finnish-enriched variants have a non-zero frequency in the East Asian population. Furthermore, 45.25%, N = 120,204 variants have a log-ratio greater than 2, indicating that these variants are, in fact, more frequent in East Asians than in Finns. In order to assess any potential bias linked to imputed variants, we conducted a frequency analysis on directly genotyped variants. The outcomes obtained were consistent with our earlier findings (Supplementary Fig. 17).

**Fig. 3: Analysis of prevalence of Finnish-enriched variants in the Russian cohort and TreeMix population structure analysis.**

To investigate the prevalence of Finnish-enriched variants in the Russian sample, we selected only 110,643 variants that were not observed in the East Asian population in gnomAD and found that the highest median enrichments in the Russian cohort were observed in clusters 3–6, which represent the admixed Finno-Asian haplotype structure (Supplementary Note 12; Fig. 3b, Supplementary Data 3). Previously reported Finnish-enriched variants associated with clinical phenotypes were generally more prevalent in the Russian population compared to NFE (Supplementary Fig. 18)². The summary of VEP annotations for Finnish-enriched variants is shown in Supplementary Fig. 19a, b.

An increase in haplotype frequency usually occurs as a result of the “bottleneck.” For example, for the Finnish population, several bottlenecks were previously reported²⁶. We estimated the population size history for the Russian cohort using Finnish samples from the 1000 Genomes as a comparison (Fig. 3c). No bottleneck was observed in the Russian population, despite the presence of Finnish-enriched variants. We also computed population size based on the cluster structure in the Russian population and observed a sign of a bottleneck for clusters with the largest fraction of Asian ancestry (Supplementary Note 13; Supplementary Fig. 20).

We sought to understand the historical origins of the Russian subpopulations and their connection to Finnish and Asian populations, which would potentially explain the presence of population-specific variants. We calculated a maximum likelihood tree using the TreeMix model with reference populations: 1000 Genomes; the populations closest to Russia from the Estonian Genome Diversity Panel (EGDP)⁸; and Russians from the Human Genome Diversity Panel (HGDP)⁷. Interestingly, Finnish samples were the closest to Russians from HGDP and cluster 4, while cluster 5 was close to Ural populations (Fig. 3d). The only Russian cluster within the Asian branch is cluster 6, which is further supported by its close proximity in principal component space to Central Asia (Supplementary Fig. 21). Furthermore, we performed Treemix analysis additionally using all HGDP populations (Supplementary Note 14; Supplementary Fig. 22).

Using the ADMIXTURE in unsupervised mode with 8 clusters we built clusterization consistent with our tree, and detected a genetic component (yellow) widely represented in the Siberian population. The percentage of this genetic component decreased with moving from Siberia to East Europe through the Urals. Also, this component is present in the Finnish population, which may be the result of a known gene flow from Asia to Europe through the Ural Mountains (Supplementary Fig. 23).

The Russian population also had enrichment of population-specific variants compared to the Europeans, some of which have been described previously¹⁸. We identified 44,936 Russian-enriched variants as having a log-ratio between RUS and NFE from gnomAD greater than 2. Among Russian-enriched variants 40,743 (90.61%) had higher AF in East Asians than in Russians and only 2045 (4.55%) were not observed in the East Asian population in gnomAD (Supplementary Fig. 19c, d).

GWAS

We performed GWAS for 464 phenotypes. Results are available at https://biobank.almazovcentre.ru.

Although 4600 samples is a relatively modest cohort size for GWAS, we provide examples of replication of findings from other biobanks as well as newly identified associations specific to the Russian population.

Several associations observed in the UK biobank were replicated, for example, rs7412 and rs4970834 for LDL, rs4697701 and rs4549940 for uric acid levels (Supplementary Fig. 24a, b).

Interestingly, nominal associations in the UK biobank, such as the association of rs13266066 with the initiation of smoking (UKBB phenocode 20116_0, beta = −0.0043, p = 0.00022)¹, later confirmed with the MTAG approach using multiple addiction phenotypes (beta = −0.007, p = 1 ×10⁻¹⁰, beta was reversed to match models)²⁷, were significantly associated with the “never smoked” phenotype in the Russian cohort (N never smoked = 2391; AF never smoked = 0.414, N controls = 1488; AF controls = 0.475, beta = −0.28, p = 3.74 × 10⁻⁸, Supplementary Fig. 24c). To reduce the possibility of technical artifacts associated with this observation, we looked at only directly genotyped variants in this locus and confirmed the presence of highly-associated rs11781072 (p = 1.45 × 10⁻⁶). eQTL properties of rs13266066 are associated with expression of PTK2 in cerebellum (p = 8.8 × 10⁻¹²). We also performed a gene prioritization analysis which indicated the putative causal role of PTK2 (Supplementary Fig. 25).

Several novel genome-wide significant associations were identified: current smoking (rs7972723, with AF of 0.1465 in RUS, 0.1447 in NFE, and 0.1680 in FIN populations; 834 individuals as current smokers, AF of 0.189, and 3045 controls with AF of 0.137; beta = 0.43, p = 2.08 × 10⁻⁸; Supplementary Fig. 24d), abdominal obesity (rs56046524, with AF of 0.3537 in RUS, 0.3938 in NFE, and 0.3632 in FIN populations; 1405 cases with allele frequency of 0.306, and 2462 controls with AF of 0.378; beta = −0.324, p = 3.7 × 10⁻⁹; Supplementary Fig. 24e), and increased blood pressure in the second half of pregnancy (rs11948871, with AF of 0.2024 in RUS, 0.1727 in NFE, and 0.1479 in FIN populations; 366 cases with AF of 0.279, and 1642 controls with AF of 0.185; beta = 0.55, p = 1.4 × 10⁻⁸; Supplementary Fig. 24f). However, given the modest size of the discovery cohort, thorough replication is necessary for these findings (Supplementary Note 15).

Due to the presence of six genetic clusters in our population, we analyzed the effect sizes of the discussed variants in each cluster independently. We observed overall consistency in effect sizes. Some clusters were too small to detect a significant deviation of the effect size from 0 (Supplementary Fig. 26).

Population structure in the Russian cohort indicated that it could be feasible to use it for replication of Finnish-specific genetic associations. Despite cohort size limitations, we attempted to illustrate this through a systematic approach. First, we selected only potentially replicable (MAF RUS > 0.01) Finnish-enriched variants that were less frequently observed in the East Asian population than in Finns (log-ratio between EAS and NFE < 2). There were 20,050 such variants, which were not clumped together (includes LD-correlates). Among them, we found 142 variants with genome-wide significance associations with 177 traits in FinnGen (total 773 variant-phenotype pairs). Overlaps with the Russian cohort were found for 62 variants in 53 traits (332 variant-phenotype pairs, Supplementary Fig. 27a).

We combined the phenotypes by similarity into 8 phenotypic groups: Diabetes, Sleep apnea, Asthma, Statin, Alzheimer, Hypothyroidism, Arthritis, Hypertension, and for each group, we independently performed LD-clumping (R² < 0.1; Supplementary Data 3). The resulting data set contained 11 independent variants in 8 phenotypic groups. The only variant,rs74800719, passed the Bonferroni-adjusted replication threshold (p = 0.05/11 variants/8 traits = 5.7 × 10⁻⁴). This variant was associated with an increased risk of Alzheimer disease in FinnGen (p = 1.14 × 10⁻³², beta = 0.5718). In the Russian cohort, it was associated with an increase in the comorbid phenotype - apolipoprotein B levels (p = 4.4 × 10⁻⁴, beta = 0.15) (Supplementary Fig. 27b)²⁸.

Furthermore, we conducted an investigation of the genetic correlations between GWAS from the Russian Biobank and their corresponding counterparts in the UK Biobank and FinnGen datasets. Initially, we excluded all GWAS that had heritability estimates outside the range of [0,1] and a heritability standard error more than 50%. This left us with a total of 35 GWAS, 26 of which were matched with corresponding traits from the UK Biobank (Supplementary Data 4). For complex phenotypes, we compared them with all relevant components from the UK Biobank dataset. Thus, we had 34 pairwise comparisons. Out of 34 comparisons, 26 were nominally significant (p < 0.05), and 22 passed the Bonferroni significance threshold (0.05/34 = 0.00147). From the FinnGen traits, we specifically chose dyslipidemia, hypertension, type 2 diabetes (T2D), obesity, myocardial infarction, ischaemic heart disease, anxiety, depression, smoking, and sleep apnoea. We proceeded to examine the genetic correlation between these selected traits and their corresponding counterparts from our pool of 35 traits. This led to the construction of 18 pairs, with 15 of them showing nominal significance and 8 passing the Bonferroni significance threshold (0.05/18 = 0.00278) (Supplementary Data 4).

Discussion

The Russian biobank resource presented here is an essential step towards accessibility of precision medicine for the patients with a wide variety of genetic makeups not currently represented in other major genetic studies.

Our cohort illustrates that the genetic structure of the Russian population, sampled in metropolitan areas in the European part of the country, consists of the number of subpopulations with high relatedness to Finnish and East Asian populations. We also identified a subgroup that has Central Asian origins. This subgroup exhibited the highest proportion of Asian haplotypes and represents a mixed population with significant genetic similarity to Central Asian populations. This finding is supported by the geographic distribution of the samples and previous studies on Central Asian populations^{23,24,25,29,30,31}. The Finno-Ugric subpopulations in Russia are historically found west of the Uralic mountains, which is in good agreement with previous whole genome studies conducted in this area^9,14. The gene flow from Asia through Siberia and the Ural Mountains to eastern Europe, together with the previously found relationship between Finns and Asians, suggests that the unique Finnish variants could also be found on the territory of modern Russia. The bottleneck that Finnish population went through resulted in an increase of the allele frequencies of common Finno-Ugric ancestor population. Moreover, our IBD and ADMIXTURE analyses provide potential explanations to high relatedness between Finnish and Mongolian populations reported previously¹⁷.

Replication studies presented here indicate that even with a relatively modest cohort size, previously reported associations from the UK biobank¹ and FinnGen² could be directly observed in the Russian cohort. Such a unique genetic structure of the Russian population provides a potential power to discover and replicate associations that often were considered population-specific. Importantly, such replication can be achieved simultaneously with relatively modest cohort size and resources.

This suggests that the susceptibility to polygenic diseases in Russia could potentially be driven by a mixture of variants from multiple ancestral populations. Some of the ancestral populations in this case have not been studied before. Assessing population-dependent contributions of many associated alleles would be critically important for creating informative polygenic risk score models for individual inherited risk evaluation³².

We continue to monitor participants of this study with the latest data update in the Fall of 2023. Our proof-of-concept study shows that infrastructure, logistics, and research resources are sufficient to create polygenic trait studies in Russia. The major challenge, yet to be resolved, is an outline of how to scale such efforts to the size of other major biobanks.

Finally, we anticipate that the first local resource for polygenic trait genetics studies in Russia, which provides the largest public reference for allele frequencies and genetic associations – Biobank Russia (https://biobank.almazovcentre.ru) – will become a core for further expansion of complex trait genetics research to yet understudied populations.