Abstract
Recurrent pregnancy loss (RPL) is a major reproductive health issue with multifactorial causes, affecting 2.6% of all pregnancies worldwide. Nearly half of the RPL cases lack clinically identifiable causes (e.g., antiphospholipid syndrome, uterine anomalies, and parental chromosomal abnormalities), referred to as unexplained RPL (uRPL). Here, we perform a genome-wide association study focusing on uRPL in 1,728 cases and 24,315 female controls of Japanese ancestry. We detect significant associations in the major histocompatibility complex (MHC) region at 6p21 (lead variant=rs9263738; P = 1.4 × 10−10; odds ratio [OR] = 1.51 [95% CI: 1.33–1.72]; risk allele frequency = 0.871). The MHC associations are fine-mapped to the classical HLA alleles, HLA-C*12:02, HLA-B*52:01, and HLA-DRB1*15:02 (P = 1.1 × 10−10, 1.5 × 10−10, and 1.2 × 10−9, respectively), which constitute a population-specific common long-range haplotype with a protective effect (P = 2.8 × 10−10; OR = 0.65 [95% CI: 0.57–0.75]; haplotype frequency=0.108). Genome-wide copy-number variation (CNV) calling demonstrates rare predicted loss-of-function (pLoF) variants of the cadherin-11 gene (CDH11) conferring the risk of uRPL (P = 1.3 × 10−4; OR = 3.29 [95% CI: 1.78–5.76]). Our study highlights the importance of reproductive immunology and rare variants in the uRPL etiology.
Similar content being viewed by others
Introduction
Recurrent pregnancy loss (RPL), also referred to as recurrent miscarriage, is a major issue in reproductive medicine, defined as the occurrence of two or more spontaneous losses of pregnancy1,2,3. The prevalence of RPL was reported to be approximately 2.6% of all pregnancies worldwide, and most of them occur in the first trimester1. Maternal age at conception is a strong risk factor for miscarriage. In developed countries, where the female age at pregnancy has elevated year by year recently, the prevalence of RPL is increasing (e.g., 5.0% in Japan), becoming a topic of growing importance in reproductive health.
The underlying etiology of RPL is considered highly heterogeneous, involving immunologic, anatomic, cytogenetic, endocrinological, and infectious factors4. Clinically identifiable causes of RPL include antiphospholipid syndrome (APS), uterine anatomic anomalies, and parental chromosomal abnormalities1,2,3,5. While these established causes account for nearly half of RPL cases, the remaining half show no apparent clinical causes, referred to as unexplained RPL (uRPL). The lack of etiological explanations poses a tough challenge for clinical management and exacerbates the patients’ psychological distress. uRPL can be classified into aneuploid and euploid miscarriage according to the embryonic karyotype by examining their aborted products6. Aneuploid and euploid miscarriage are different in their epidemiological characteristics and the cumulative live birth rate7, indicating distinct underlying biology.
Genetic determinants of uRPL are a major subject of interest for elucidating the underlying etiology. A number of articles concerning uRPL genetics have been published after an association of the C677T variant in the methylenetetrahydrofolate reductase (MTHFR) gene was reported8. Susceptibility genes implicated to date include those involved in immune response, coagulation, metabolism, and angiogenesis. However, most of the studies employed a candidate gene approach with a limited sample size (n < 200) and varied case definitions. A recent systematic review carried out a meta-analysis of these studies and demonstrated predominantly inconsistent results, warranting hypothesis-free study design with a large sample size9.
Recently, Laisk et al. carried out a European ancestry genome-wide association meta-analysis of RPL with the case defined as having a history of three or more consecutive miscarriages (ncase = 750), reporting three genome-wide significant loci10. No association was ascertained between RPL and variants which were reported significant in the aforementioned systematic review9. This report was based on the existing biobank data, and most of the studied cases were not examined for the clinically identifiable causes of RPL and were not characterized for relevant clinical features, such as embryonic karyotypes. Given that the clinical heterogeneity in the studied cases potentially masked susceptibility loci responsible for uRPL, genome-wide association studies (GWAS) with detailed phenotyping of the cases are required for uncovering the underlying etiological factors of uRPL.
Here, by focusing on uRPL cases clinically ascertained not to have conventional causes, we perform the largest GWAS of uRPL to date, accompanied by stratified analyses according to clinical features, to provide insights into the genetic basis of uRPL. Our data reveal an association between uRPL and the major histocompatibility complex region (MHC), which is fine-mapped to specific HLA alleles by an HLA imputation analysis. Finally, given the strong purifying selection pressure imposed on the uRPL risk alleles, we conduct genome-wide rare copy-number variation (CNV) calling and explore its contribution to the uRPL predisposition.
Results
Study participants
In this study, a total of 1800 patients with uRPL and 25,999 female controls were enrolled and genotyped with the use of Illumina Infinium Asian Screening Array. During patient enrollment, those with known causes of RPL (i.e., APS, uterine anomalies, and parental chromosomal abnormalities) were exhaustively excluded from the analysis through systematic clinical, cytogenetic, and serological examinations (Methods). After stringent quality control (QC) filters were applied to the genotyped data, 1728 uRPL cases and 24,315 controls, both of Japanese ancestry, were retained for the genetic association analysis. The demographic characteristics are summarized in Table 1. The age at the first visit and number of pregnancy losses of the cases were 34.1 ± 4.7 years (range, 19–48 years) and 2.7 ± 0.95 times (range, 2–12 times), respectively. The 1728 uRPL cases contained 843 patients with a history of three or more pregnancy losses (48.8%). The embryonic karyotypes of abortus were examined and classified into the following six categories (Methods): euploid miscarriage (n = 204, 11.8%); aneuploid (n = 125, 7.2%); 45,X (n = 46, 2.7%); triploid (n = 31, 1.8%); other abnormalities (n = 30, 1.7%); and unknown miscarriage (n = 1292, 74.8%). Antinuclear antibodies (ANA) titer in the serum was measured for 1635 uRPL cases, 543 of which were positive (≥1:40). Free T4 level was measured for 1621 uRPL cases, 285 of which showed a low value (≤0.9). Hundred and forty-four patients with uRPL had a past medical history of autoimmune diseases, including systemic lupus erythematosus (n = 1), rheumatoid arthritis (n = 5), chronic thyroiditis (n = 98), and other diseases (n = 40).
Genome-wide association study
After whole-genome genotype imputation using a population-specific reference panel (n = 4561), we obtained 8,717,431 autosomal and X-chromosome variants fulfilling the post-imputation QC criteria (i.e., minor allele frequency [MAF] > 0.5% and Rsq > 0.7). Based on the imputed genotype dosages, we performed a single-variant GWAS of uRPL. To robustly control for population stratification and sample relatedness, we employed a generalized linear mixed model (GLMM) implemented in the SAIGE11 software for the association test. The overall distribution of the association statistics did not show systematic inflation due to potential confounding biases such as residual population stratification (genomic inflation factor [\({{{{{{\rm{\lambda }}}}}}}_{{{{{{\rm{GC}}}}}}}\)] = 1.026; linkage disequilibrium [LD] score regression intercept = 0.992 ± 0.010; the quantile–quantile plot is shown in Supplementary Fig. 1). The liability scale heritability of uRPL estimated by LD score regression was 0.307 ± 0.143. We observed the MHC region at 6p21 surpassing the genome-wide significance threshold (the smallest P = 1.4 × 10−10; Fig. 1; Table 2). The lead variant rs9263738 has two alleles of T/C, and the T allele showed a susceptible effect on uRPL (odds ratio [OR] = 1.51; 95% confidence interval [CI] = 1.33–1.72). We note that the C677T variant in the MTHFR gene previously implicated with uRPL (rs1801133)8 did not reach nominal significance in our GWAS (P = 0.12; OR = 1.06; 95% CI = 0.98–1.14).
In our primary GWAS above, the control group included all the female participants available in the Biobank Japan project (BBJ) to maximize statistical power for detecting susceptibility loci. However, given the hospital-based cohort design of BBJ, our primary GWAS could have the following potential limitations which may introduce false positive associations: i) high prevalence for other MHC-associated diseases in the controls and ii) difference in the body mass index (BMI), a known risk factor for RPL12, between cases and controls. To address these potential limitations, we additionally performed a sensitivity analysis. We confined the controls to 9,955 individuals who had a definitive record of the ICD10 code and were explicitly free of Crohn’s disease (ICD10 K50), ulcerative colitis (K51), and antiphospholipid syndrome (D686). To account for the BMI difference, we estimated age-matched BMI based on the relationship between age and BMI (Methods). When we repeated the association test using the strictly defined control set with adjustment for the age-matched BMI, the lead variant rs9263738 remained genome-wide significant with comparable effect size (P = 7.5 × 10−10; OR = 1.52 [95% CI = 1.33–1.74]), indicating the robustness of the association between uRPL and MHC.
MHC fine-mapping analysis
To fine-map the significant association within the MHC region, we performed an HLA imputation analysis using a high-resolution reference panel of 1118 Japanese individuals13 (Methods). Applying the post-imputation quality control filter (MAF > 0.5% and r2 by DEEP*HLA > 0.7), we obtained genotype dosages of 108 two-digit, 164 four-digit, and 173 six-digit HLA alleles, as well as 1666 amino acid polymorphisms of classical and nonclassical HLA genes in the entire MHC region. When evaluating the association of the imputed HLA variants with the risk of uRPL based on the strictly defined control set, we observed the most significant signal at HLA-C (Fig. 2a; Supplementary Data 1). The lead HLA-C alleles were HLA-C*12:02:02 (P = 4.8 × 10−11; OR = 0.66; 95% CI = 0.58–0.75) and its four-digit allele, HLA-C*12:02 (P = 1.1 × 10−10; Table 3). Notably, HLA-C*12:02 constitutes a Japanese-specific common long-range haplotype spanning the entire HLA class I and class II regions (HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02)14,15, which has a susceptible effect on ulcerative colitis but a protective effect on Crohn’s disease14. We note that the association at HLA-C showed the most significant signal when we performed the analysis with all the available control individuals (i.e., the controls with a known history of ulcerative colitis or Crohn’s disease were not explicitly excluded). In line with the earlier studies, the other HLA alleles constituting the long-range haplotype, HLA-B*52:01 and HLA-DRB1*15:02, showed comparable associations (P = 1.5 × 10−10 and P = 1.2 × 10−9, respectively; Fig. 2a; Table 3). When conditioned on any single HLA allele of HLA-C*12:02, HLA-B*52:01, and HLA-DRB1*15:02, no variants in the MHC region reached the significance threshold (Fig. 2b,c,d). Motivated by this observation, we further performed genotype imputation of the long-range haplotype of HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02 (Methods) and evaluated its association with uRPL. As expected, HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02 haplotype showed a significant protective effect (P = 2.8 × 10−10; OR = 0.65; 95% CI = 0.57–0.75; Table 3). When conditioned on HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02, no significant association was observed in the MHC region (Fig. 2e).
Stratified analysis according to clinical features
Given the heterogeneity in the underlying biology of uRPL, we stratified the GWAS participants according to clinical features, such as the embryonic karyotype and immunological characteristics. We assumed that uRPL with different embryonic karyotypes involves different etiologies; in particular, abnormal embryonic karyotypes would in itself serve as a predominant cause of miscarriage7. To reduce heterogeneity in the etiological background of the uRPL cases, we excluded the cases showing embryonic karyotype abnormalities, defining the remaining 1480 cases as case group (A) (Supplementary Fig. 2a). Motivated by the significant association at MHC, the genetic locus well known to be involved in immune function, we further stratified case group (A) based on the immunological features. Specifically, we defined 459 cases with positive ANA as case group (B) (Supplementary Fig. 2b). High ANA titer is commonly detected in patients with RPL, debated to be involved in the unexplained part of the disease etiology. Given that the MHC region is involved in a wide range of autoimmune diseases susceptibility, we hypothesized that the association between MHC and uRPL may be mediated by ANA production. If so, restricting the GWAS cases to those with positive ANA would result in a substantially larger effect size of the lead HLA haplotype HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02. Despite this expectation, the odds ratio was nearly equivalent whether limiting to positive ANA or not (Fig. 3). We note that this observation does not refute ANA involvement in the etiology of uRPL, since ANA may mediate uRPL pathogenesis independent of the MHC locus. Next, by excluding the cases with positive ANA, past medical history of autoimmune diseases, and hypothyroidism, we defined the remaining 694 cases as case group (C). When comparing the case group (C) with the controls, the HLA association remained significant, indicating that the HLA association did not originate from the risk alleles of the known autoimmunity (Fig. 3; Supplementary Fig. 2c). These results collectively suggest that the MHC association is independent of autoantibody presence. No genome-wide significant association was observed outside the MHC region in the GWAS of the case group (A), (B), or (C) (Supplementary Fig. 2a,b,c). We note that case groups (A), (B), and (C) contained uRPL cases with unknown karyotypes to retain sample size for the stratified analysis. To focus exclusively on euploid and aneuploid miscarriage, we defined the 203 cases with confirmed embryonic euploidy as case group (D) and the 124 cases with confirmed embryonic aneuploidy as case group (E). We performed a GWAS of case group (D) and (E), finding no genome-wide significant association (Supplementary Fig. 2d,e). In these settings, HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02 haplotype did not reach statistical significance due to the reduced sample size (P = 0.16 for (D) and P = 0.30 for (E); Fig. 3).
Rare copy-number variation associated with uRPL
From an evolutionary perspective, given that uRPL is directly related to reproductive fitness, risk alleles of uRPL should be exposed to particularly strong purifying selection pressure. Rare functional genetic variants are expected to be a key heritable component of uRPL susceptibility. Among rare genetic variants, a rare CNV affects a larger fraction of the genome sequence (i.e., >50 bp) than a single-nucleotide variant and short insertion/deletion, thus presumed to have a strong functional effect. To investigate the contribution of rare functional CNVs to uRPL susceptibility, we performed genome-wide rare CNV calling using the HI-CNV software, which leverages haplotype-sharing information in the biobank-scale data to increase the sensitivity for CNV detection. We detected a median of 12 CNV calls per individual for both the case and control group (Supplementary Fig. 3). To identify the genes of which deleterious CNV burden contributes to uRPL, we compared the number of predicted loss-of-function (pLoF) CNV carriers between uRPL cases and controls (Methods). Rare pLoF CNVs of CDH11 (carried by 0.93% of the cases and 0.28% of the controls) showed a significant association with the increased risk of uRPL after Bonferroni correction (P = 1.3 × 10−4; OR = 3.29; 95%CI = 1.78–5.76; Fig. 4a; Supplementary Data 2). All the CDH11 pLoF CNVs detected were deletions (Fig. 4b), and the carriers in the cases and controls were all heterozygous. CDH11 encodes cadherin-11, which is prominently expressed in the female reproductive system according to the Human Protein Atlas database16 (Supplementary Fig. 4) and plays a vital role in the differentiation and fusion of trophoblastic cells in vitro17.
Discussion
In this work, we performed a GWAS of uRPL with the largest case sample size ever reported, identifying genome-wide significant associations in the MHC region. The association was fine-mapped to a population-specific HLA haplotype of HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02, which is previously reported to increase the risk of ulcerative colitis and decrease the risk of Crohn’s disease. Furthermore, we interrogated the contribution of rare CNVs to the uRPL risk, revealing a significantly high pLoF CNV burden of CDH11 in patients with uRPL.
The HLA genes play a critical role in adaptive immune responses and maintenance of self-tolerance. Despite the fact that the fetus is a semi-allograft for the mother, it escapes immunological rejection. As pointed out by Medawar as ‘immunological paradox of pregnancy’ in 1953, pregnancy is an immunologically unique time when two genetically distinct individuals coexist18. Extravillous trophoblasts (EVT), the invasive form of differentiated trophoblast cells in direct contact with all maternal decidual cells, express a unique combination of MHC molecules, including HLA-C, E, and G antigens but not HLA-A, HLA-B, or class II antigens19. Since HLA-E and HLA-G have limited genetic variation in human populations, the polymorphism in the HLA-C alleles has been regarded as a major genetic determinant of allorecognition in the mother. The HLA-C allotypes are recognized by uterine natural killer cell (uNK), a key immune cell type for pregnancy accounting for 70% of decidual leukocytes in the first trimester20. HLA-C variants have been implicated in pregnancy disorders, including pre-eclampsia and high birthweights, via the uNK allorecognition system21,22. Our data provide the first evidence of the association between HLA polymorphism and RPL with genome-wide significance, corroborating the importance of immunological tolerance in a successful pregnancy.
High ANA titer and hypothyroidism (generally accompanied by anti-thyroid peroxidase antibody) are commonly observed in patients with RPL; however, their presence does not serve as a predictor of subsequent miscarriages, and their role in RPL pathogenesis is controversial5,23,24. In the stratified analysis, we demonstrated that the MHC association was consistently observed independent of the autoantibody presence in the cases (Fig. 3). Our data suggests that the MHC association is mediated by cell-mediated immunity rather than antibody-mediated immunity, which is also in line with the putative involvement of the uNK allorecognition system discussed above.
The MHC association was unreported in the previous GWAS of RPL in the European population10. One probable explanation for the discrepancy is the difference in the case definition. The GWAS cases in the previous study include those with established causes, which may result in attenuation of the association signal for unexplained RPL. Another probable explanation is the difference in allele frequency between the populations. The associated HLA alleles are rare variants in the European population (e.g., allele frequency = 0.0038 for HLA-C*12:02, HLA-B*52:01, and HLA-DRB1*15:02 according to the 1000 Genomes project European population25). The low allele frequency in European populations may limit statistical power for detecting the association. We note that the population-specificity and long-range LD of the protective HLA haplotype suggests positive selection due to enhanced reproductive success. The details of the positive selection process and the effect of this HLA haplotype on other traits may be further investigated as potential future research.
The comparative analysis of gene-damaging CNV burden between uRPL cases and controls nominated CDH11, one of the type 2 classical cadherins from the cadherin superfamily that mediates calcium-dependent cell-cell adhesion. CDH11 is expressed in the epithelium of the placenta as well as endometrial stroma, supposed to play a role in anchoring trophoblasts to the decidua26. The expression of CDH11 promotes differentiation and fusion of cytotrophoblasts to form syncytiotrophoblasts in vitro17. The damaged CDH11 coding sequence in the mother is inherited by the fetus with a 50% chance; thus, the pLoF CNVs potentially confer the disease risk by impairing the physiological function of the fetal tissues, including trophoblasts, although the direct target of our analysis is the maternal genomes.
Our study has some potential limitations. First, the control participants for the association analyses were derived from the existing biobank, and not all of them were confirmed to be free of uRPL. However, the potential misclassification of the controls in our GWAS does not undermine the robustness of our findings. Even if some controls had a history of pregnancy loss, it would not lead to false positive associations but rather conservative results with underestimated heritability and odds ratio. Second, although we performed the sensitivity analysis to account for the patients of MHC-associated diseases in the controls, the highly pleiotropic nature of the MHC locus could affect the association signal by other MHC-associated diseases potentially enriched in the controls. Third, in the sensitivity analysis, we adjusted for BMI by modeling the BMI trajectory as a function of age. Our model was relatively simple and may not fully capture the potential difference in the BMI difference at reproductive age between cases and controls. Last, the rare CNVs at CDH11 were detected using an SNP array intensity-based method. Since the SNP array probes are designed to be distributed genome-wide at intervals of more than 5 kbp, locating the precise genomic coordinates at which the variants start or end involves technical challenges. We also note another possibility that the rare variant association potentially represents more complex structural variants rather than simple deletions.
Collectively, we conducted a large-scale GWAS of uRPL and revealed the significant contribution of the HLA polymorphism to the disease predisposition. Through a genome-wide rare CNV analysis, we also demonstrated that deleterious rare variants confer the risk of uRPL. Our findings should shed light on the key role of reproductive immunology and rare genetic variants in the currently unexplained etiology of uRPL.
Methods
Study population
We enrolled 1800 Japanese patients with a history of two or more unexplained pregnancy losses. All patients were recruited from Nagoya City University Hospital between May 2007 and July 2022. All medical information, including the history of pregnancy losses, was obtained through medical interviews by the obstetricians. Chemical pregnancies were not included in pregnancy losses. All patients underwent a systematic examination, including 3D-ultrasound sonography, chromosome analysis of both partners, determination of antiphospholipid antibody, including lupus anticoagulant, by diluted activated partial prothrombin time, diluted Russell viper venom time and β2 glycoprotein I-dependent anticardiolipin antibody, and blood tests for hypothyroidism and diabetes mellitus, before a subsequent pregnancy. Patients with APS, an abnormal chromosome in either partner, or uterine anomaly were excluded. When a missed miscarriage was diagnosed, a dilatation and curettage or manual vacuum aspiration was performed and cytogenetic analysis of products of conception was carried out. Subsequent pregnancy outcomes were followed up until May 2023 by a review of the medical records. Embryonic karyotypes were primarily examined in the latest pregnancy loss for the patients. Some cases were examined multiple times, and if both euploid and aneuploid were observed in a case, they were classified into aneuploid miscarriage. As the control population, DNA samples of 26,037 females were obtained from the Biobank Japan Project (BBJ)27,28. This study was conducted with the approval of the Research Ethics Committee of Nagoya City University Graduate School of Medical Sciences, the University of Tokyo, and Osaka University. All participants provided written informed consent after being given a full explanation of the purpose of the study and the methods to be employed. This study complies with the Declaration of Helsinki.
Genotyping, quality control, and imputation
The genomic DNA was isolated with the standard protocols from the peripheral blood and genotyped with the use of Infinium Asian Screening Array (Illumina, San Diego, CA, USA). This genotyping array was built using an East Asian reference panel including whole-genome sequences, designed for effectively capturing genetic variation in East Asian populations. The genotyping probe intensity was converted to SNP genotype calls using Illumina GenomeStudio version 2.0.4 (Illumina, San Diego, CA, USA). We applied stringent QC filters to the genotype data using PLINK229 as described previously30. We excluded samples with a genotyping call rate <0.98. We included only the samples of the estimated East Asian ancestry, based on the principal component analysis with the samples of HapMap project31 (Supplementary Fig 5). We further filtered out SNPs with (i) call rate <0.99; (ii) minor allele count <5; (iii) P-value for Hardy-Weinberg equilibrium <1.0 × 10−10; and (iv) with more than 5% allele frequency difference when compared with the representative reference panels of Japanese ancestry (i.e., the reference panel used for the genotype imputation in this study and the allele frequency panel of Tohoku Medical Megabank Project32). After QC, we obtained genotype data of 519,668 autosomal and 17,359 X-chromosome SNPs for 1728 uRPL cases and 24,315 controls. To extend the coverage of the genetic variants to be tested, we performed genome-wide genotype imputation. We used SHAPEIT4 software33 version 4.2.1 for haplotype phasing and Minimac4 software34 version 1.0.1 for genotype imputation. For imputation, we used our in-house and Japanese-specific reference panel composed of n = 4,561 whole-genome sequence (WGS) data from multiple studies (e.g., n = 1939 from the BBJ study35 and n = 141 WGS from the previous study36). Variants imputed with MAF > 0.5% and Rsq > 0.7 were used for the subsequent analyses.
HLA genotype imputation
We also performed HLA genotype imputation for fine-mapping of the MHC region. We extracted the genotyped SNPs in the extended MHC region (24-36 Mb on chromosome 6, NCBI Build 37). Based on these SNPs, we imputed the classical and non-classical HLA alleles (two-, four-, and six-digits) and corresponding amino acid sequences using DEEP*HLA37, a multi-task convolutional deep learning method. We used the high-resolution HLA reference panel of the Japanese population13 (n = 1,118). The HLA imputation procedure produced binary markers indicating the presence or absence of an investigated HLA allele or an amino acid sequence. To impute the dosage of HLA-C*12:02–HLA-B*52:01–HLA-DRB1*15:02 haplotype, we encoded each combination of the four-digit alleles of HLA-C, HLA-B, and HLA-DRB1 present in the reference panel as a single allele and trained the prediction model of DEEP*HLA. HLA variants imputed with MAF > 0.5% and an imputation quality score (r2 in cross-validation) > 0.7 were used for the subsequent analyses.
Case-control GWAS
We performed genome-wide association tests between uRPL and imputed allelic dosages using a generalized linear mixed model (GLMM) as implemented in SAIGE11 version 0.44.6.1.26. In addition to the employment of GLMM, which controls population stratification and sample relatedness in the association test, we included the top five principal components as covariates in the regression model to robustly correct for potential population stratification. We set the genome-wide significance threshold of P-value < 5.0 × 10−8.
Sensitivity analysis accounting for MHC-associated diseases in the controls and BMI
Of the 24,315 control individuals from BBJ, 10,179 have a definitive record of disease status mapped to ICD10 codes. We confined the controls to 9,955 individuals who have BMI value and do not have a known history of diseases that potentially cause false positive association signals at MHC, including Crohn’s disease (ICD10 K50), ulcerative colitis (K51), and antiphospholipid syndrome (D686). To account for the age-dependent change in BMI, we modeled BMI trajectory as a quadratic polynomial function of age. We assigned an age-matched BMI for each control individual, with age aligned to the median value in the cases. We then re-analyzed the association between the lead variant rs9263738 and uRPL with the age-matched BMI additionally incorporated into the regression model.
Estimation of confounding biases and heritability in the uRPL GWAS
To evaluate the confounding biases and heritability in our GWAS, we performed LD score regression38. We used the East Asian LD score provided with the software. The liability scale heritability was calculated with the population prevalence of uRPL being 5%.
Association analysis of the HLA variants
We performed association tests between uRPL and the imputed HLA variants using a logistic regression model as implemented in R statistical software version 3.6.3. Accounting for the employment of a logistic regression model, we excluded 3rd-degree or more closely related individuals (KING39 kingship coefficient cutoff > 0.0884) from the GWAS dataset. We assumed additive effects of the allelic dosages on a log-odds scale. We defined the HLA variants as biallelic single-nucleotide variants in the MHC region, two-, four-, and six-digits biallelic HLA alleles, biallelic HLA amino acid polymorphisms corresponding to their respective residues, and multiallelic HLA amino acid polymorphisms for each amino acid position. We incorporated the same covariates as in the GWAS sensitivity analysis into the regression model. For multiallelic amino acid variants, we estimated its significance by an omnibus test for each amino acid position by a log-likelihood ratio test, comparing the likelihood of the fitted model with the null model. The significance of the improvement of the model fitting was evaluated by the deviance, which follows χ2 distribution with m – 1 degree(s) of freedom for an amino acid position with m polymorphic residues. The conditional association analysis was performed to find additional HLA genes with independent uRPL risk effects by additionally including the HLA allele/haplotype as covariates in the regression model.
Rare CNV calling
We performed haplotype-informed SNP array intensity-based CNV calling with the use of the HI-CNV40 software version 1.0. The LRR and θ values of the genotyping probes for 625,738 autosomal variants were exported from Illumina GenomeStudio. HI-CNV was run using the phased haplotype and imputed genotype data obtained for the GWAS. The CNV calls were deduplicated and merged with the default parameters. We excluded deletions with <75 bp and duplications with > 500 bp from the subsequent analyses. We excluded 104 individuals with aberrantly high CNV calls (>50).
Gene-based association test of CNV burden
We performed association tests between uRPL and gene-level predicted loss-of-function (pLoF) burden. Referring to canonical transcripts for 20,091 genes (from https://github.com/im3sanger/dndscv/blob/master/data/refcds_hg19.rda), if a deletion affects any part of the coding sequence or a duplication is contained within the coding sequence, we annotated the CNV as a pLoF variant40. We excluded 3rd-degree or more closely related individuals from the analysis. The genes of which pLoF variant carriers are more than 0.5% in cases were evaluated by using Fisher’s exact two-sided test.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The summary statistics of the GWAS results have been deposited in the National Bioscience Database Center (NBDC) Human Database (https://humandbs.dbcls.jp/en/) under accession code hum0197 (https://humandbs.dbcls.jp/en/hum0197-latest). Data can also be browsed at our pheweb.jp41 website (https://pheweb.jp/). GWAS genotype data of the BBJ are available at the NBDC Human Database (research ID: hum0311).
Code availability
We used publicly available software for the data analysis. The software used is described in the Methods section.
References
Quenby, S. et al. Miscarriage matters: the epidemiological, physical, psychological, and economic costs of early pregnancy loss. Lancet 397, 1658–1667 (2021).
ESHRE Guideline Group on RPL et al. ESHRE guideline: recurrent pregnancy loss: an update in 2022. Hum. Reprod. Open 2023, hoad002 (2023).
Practice Committee of the American Society for Reproductive Medicine. Definitions of infertility and recurrent pregnancy loss: a committee opinion. Fertil. Steril. 99, 63 (2013).
Ford, H. B. & Schust, D. J. Recurrent pregnancy loss: etiology, diagnosis, and therapy. Rev. Obstet. Gynecol. 2, 76–83 (2009).
Coomarasamy, A. et al. Recurrent miscarriage: evidence to accelerate action. Lancet 397, 1675–1682 (2021).
Ogasawara, M., Aoki, K., Okada, S. & Suzumori, K. Embryonic karyotype of abortuses in relation to the number of previous miscarriages. Fertil. Steril. 73, 300–304 (2000).
Sugiura-Ogasawara, M. et al. Abnormal embryonic karyotype is the most frequent cause of recurrent miscarriage. Hum. Reprod. 27, 2297–2303 (2012).
Nelen, W. L., Steegers, E. A., Eskes, T. K. & Blom, H. J. Genetic risk factor for unexplained recurrent early pregnancy loss. Lancet 350, 861 (1997).
Pereza, N., Ostojić, S., Kapović, M. & Peterlin, B. Systematic review and meta-analysis of genetic association studies in idiopathic recurrent spontaneous abortion. Fertil. Steril. 107, 150–159.e2 (2017).
Laisk, T. et al. The genetic architecture of sporadic and multiple consecutive miscarriage. Nat. Commun. 11, 5980 (2020).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Ng, K. Y. B. et al. Systematic review and meta-analysis of female lifestyle factors and risk of recurrent pregnancy loss. Sci. Rep. 11, 7081 (2021).
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
Okada, Y. et al. HLA-Cw*1202-B*5201-DRB1*1502 Haplotype Increases Risk for Ulcerative Colitis but Reduces Risk for Crohn’s Disease. Gastroenterology 141, 864–871.e5 (2011).
Okada, Y. et al. Construction of a population-specific HLA imputation reference panel and its application to Graves’ disease risk in Japanese. Nat. Genet. 47, 798–802 (2015).
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Getsios, S. & MacCalman, C. D. Cadherin-11 modulates the terminal differentiation and fusion of human trophoblastic cells in vitro. Dev. Biol. 257, 41–54 (2003).
Medawar, P. Some immunological and endocrinological problems raised by the evolution of viviparity in verte- brates. Symp. Soc. Exp. Biol. 7, 320–328 (1953).
Apps, R. et al. Human leucocyte antigen (HLA) expression of primary trophoblast cells and placental cell lines, determined using single antigen beads to characterize allotype specificities of anti-HLA antibodies. Immunology 127, 26–39 (2009).
King, A., Balendran, N., Wooding, P., Carter, N. P. & Loke, Y. W. CD3- leukocytes present in the human uterus during early placentation: phenotypic and morphologic characterization of the CD56++ population. Dev. Immunol. 1, 169–190 (1991).
Hiby, S. E. et al. Combinations of maternal KIR and fetal HLA-C genes influence the risk of preeclampsia and reproductive success. J. Exp. Med. 200, 957–965 (2004).
Hiby, S. E. et al. Maternal activating KIRs protect against human reproductive failure mediated by fetal HLA-C2. J. Clin. Invest. 120, 4102–4110 (2010).
Ogasawara, M., Aoki, K., Kajiura, S. & Yagami, Y. Are antinuclear antibodies predictive of recurrent miscarriage? Lancet 347, 1183–1184 (1996).
Dijk et al. Levothyroxine in euthyroid thyroid peroxidase antibody positive women with recurrent pregnancy loss (T4LIFE trial): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Diabetes Endocrinol. 10, 322–329 (2022).
Abi-Rached, L. et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PLOS ONE 13, e0206512 (2018).
MacCalman, C. D. et al. Regulated expression of cadherin-11 in human epithelial cells: a role for cadherin-11 in trophoblast-endometrium interactions? Dev. Dyn. 206, 201–211 (1996).
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: A large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Sonehara, K. et al. A common deletion at BAK1 reduces enhancer activity and confers risk of intracranial germ cell tumors. Nat. Commun. 13, 4478 (2022).
Altshuler, D. & Donnelly, P. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Tadaka, S. et al. jMorp updates in 2020: large enhancement of multi-omics data resources on the general Japanese population. Nucleic Acids Res. 49, D536–D544 (2021).
Delaneau, O., Zagury, J.-F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Sonehara, K. et al. Genetic architecture of microRNA expression and its link to complex diseases in the Japanese population. Hum. Mol. Genet. 31, 1806–1820 (2022).
Naito, T. et al. A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nat. Commun. 12, 1639 (2021).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Hujoel, M. L. A. et al. Influences of rare copy-number variation on human complex traits. Cell 185, 4233–4248.e27 (2022).
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Acknowledgements
The authors sincerely thank all the participants involved in this study. This research was supported by the KAKENHI Grants-in-Aid from the Japanese Society for the Promotion of Science (JSPS) [grant number 23K14451 (to K.S.), 23K08850 (to Y.Y.), and 22H00476 (to Y.O.)], the Japan Agency for Medical Research and Development (AMED) [grant number JP23km0405211, JP23km0405217, JP23ek0109594, JP23ek0410113, JP23kk0305022, JP223fa627002, JP223fa627010, JP233fa627011, JP23zf0127008, JP23tm0524002 (to Y.O.)], JST Moonshot R&D [grant number JPMJMS2021 and JPMJMS2024 (to Y.O.)], the Japanese Ministry of Education, Science, and Technology (MEXT) Promotion of Distinctive Joint Research Center Program [grant number JPMXP0621467963 (to M.S.-O.)], the Takeda Science Foundation, Bioinformatics Initiative of Osaka University Graduate School of Medicine, Institute for Open and Transdisciplinary Research Initiatives, Center for Infectious Disease Education and Research (CiDER), and Center for Advanced Modality and DDS (CAMaD), Osaka University.
Author information
Authors and Affiliations
Consortia
Contributions
K.S., Y.Y., Y.O., and M.S.-O. designed the study and wrote the manuscript. K.S., Y.Y., T. Naito, T.O., T. Nishiyama, Y.O., and M.S.-O. conducted data analysis. S.G., H.Y., F.O., and T.K. collected the samples. K.M. and the members of the Biobank Japan Project constructed the data. Y.O., and M.S.-O. supervised the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Abin Abraham and Triin Laisk for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sonehara, K., Yano, Y., Naito, T. et al. Common and rare genetic variants predisposing females to unexplained recurrent pregnancy loss. Nat Commun 15, 5744 (2024). https://doi.org/10.1038/s41467-024-49993-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-49993-5
- Springer Nature Limited