Introduction

In late December 2019, a previously unidentified coronavirus, also named as the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), rapidly caused a worldwide pandemic [1]. Despite the active efforts of governments, the emergence of several variant strains of the virus and the difficulties in accessing an effective vaccine has prevented the outbreak from being effectively controlled worldwide [2]. According to the latest data from the World Health Organization, coronavirus disease 2019 (COVID-19) caused by this novel coronavirus has led to over 6 million deaths and poses a major threat to worldwide health.

Awareness of the relationship between COVID-19 and other diseases or risk factors can help in developing disease prevention strategies. Based on available studies, COVID-19 is probably associated with cardiovascular disease, diabetes, and obesity as well as hypertension, gender, and blood type [3,4,5,6,7]. Among the known causes of miscarriage, infection could be an important factor, in addition to chromosomal abnormalities, anatomical factors of the uterus, and other endocrine disorders [8]. A previous study found that the risk of miscarriage was increased during outbreaks of coronaviruses, such as SARS-CoV and MERS-CoV [9]. However, the association between COVID-19 and the risk of miscarriage, which has become a concern for both patients and obstetricians, still lacks reliable evidence.

The first study of 116 infected pregnant women in Wuhan showed that COVID-19 was not associated with an increased risk of spontaneous miscarriage and spontaneous preterm birth, nor was there evidence of vertical transmission in late pregnancy [10]. In Italy, Zelini’s data suggested that SARS-CoV-2 infection does not appear to cause first-trimester miscarriage, and asymptomatic or mildly symptomatic infections even have a more limited effect [11]. While in India, an increased rate of miscarriage was observed during second wave of COVID-19 pandemic [12]. Furthermore, a retrospective study conducted in Turkey and a prospective cohort study in the USA both revealed an increased miscarriage rate [13, 14]. Given the variation of results in countries at different economic levels and in different populations, it is very essential to identify rigorous evidence on the association between COVID-19 and miscarriage.

Our study intends to apply two-sample Mendelian randomization (MR) to explore the potential causal association between COVID-19 and miscarriage in the genetic prediction. MR is a method for revealing causal associations in an unbiased way, relying on genetic variation as instrumental variables (IVs) to assess whether the exposure leads to a corresponding outcome [15]. In general, the gold standard for establishing causality is the randomized controlled trial (RCT). However, due to the complex experimental design, cumbersome implementation process, and strict ethical restrictions, RCT is difficult and costly. MR and RCT are similar, only in MR; different alleles are used to randomize participants into different groups rather than the interventions in RCT [16]. Alleles are separated following the Mendel’s second law of heredity and are assigned in time series, with the process unaffected by confounding. That means genetic variation in natural situations is randomly distributed in the population after meiosis, and this distribution is established at the beginning of conception [17]. So compared with retrospective studies such as case–control studies, MR avoids the confounding factor and reverse causation and has a stronger power of evidence [18].

In view of these advantages of MR, we performed a two-sample MR analysis of the GWAS summary data from the UK Biobank and EBI database so as to find a potential causal association between COVID-19 and miscarriage and provide novel evidence in this field of research.

Material and methods

The basic principle of MR is to use genetic variants associated with exposure and outcome as IVs (e.g., SNPs) to determine whether an observational association between a risk factor and an outcome is consistent with a causal effect. General steps include: acquisition of GWAS data, selection and evaluation of SNPs, statistical analysis, and sensitivity analyses. Data analysis and visualization were all achieved in R (version 4.1.3). Three core assumptions should be fully considered throughout the process (Fig. 1). First, relevance assumption: genetic variation must be strongly correlated with exposure factor. Second, independence assumption: genetic variation cannot be associated with any possible confounding factors. Third: exclusivity assumption: genetic variation cannot be directly related to outcome [19].

Fig. 1
figure 1

A brief illustration of two-sample MR about COVID-19 and miscarriage

Data sources

We searched for the keywords “COVID-19,” “miscarriage” in the IEU OPEN GWAS PROJECT (https://gwas.mrcieu.ac.uk/) and obtained the summary data of genetic variants from the Genome Wide Association Study (GWAS) study among European populations in the UK Biobank and EBI database.

The exposure data was based on the study of the COVID-19 Host Genetics Initiative Release (Dataset ID: ebi-a-GCST010779, released in October 2020). This cohort consisted of 6406 hospitalized COVID-19 patients and 902,088 matched population controls, all of European origin. These inpatients received some degree of symptomatic treatment, both for some mild cases and for some severe cases requiring mechanical ventilation. The total sample size reached 908,494, with 111,272,365 SNPs being reported [20]. Summary-level data of miscarriage were obtained from UK Biobank (Dataset ID: ukb-b-12621) which included 79,047 cases and 168,133 controls of European ancestry.

Selection and evaluation of SNPs

For satisfying the relevance assumption: first SNPs need to be associated with exposure at the genome-wide significance level (p < 5 × 10−8), and then SNPs need to be removed with linkage disequilibrium (linkage disequilibrium r2 < 0.05, distance within 10 Mb). Eventually, we selected 7 of the 111,272,365 SNPs in the COVID-19 Host Genetics Initiative Release study as IVs for exposure (Table 1).

Table 1 Data for 7 SNPs as instrumental variables in exposure and outcome

We then checked these 7 SNPs in the database of human genotype–phenotype associations (http://www.phenoscanner.medschl.cam.ac.uk/) to make sure they were not related to any confounding factors (independence assumption). In addition, we evaluated these instrumental variables by R2 values and F statistic values to assess their correlation with exposure (Table 2) [21]. The formulae were given below, with the relevant variables noted.:

Table 2 Evaluation of instrumental variables
$$\begin{array}{c}{R}^{2}=2\times (1-MAF)\times (MAF)\times {\left(\frac{\beta }{SE\times \sqrt{N}}\right)}^{2}\\ F=\frac{N-k-1}{k}\times \frac{{R}^{2}}{1-{R}^{2}}\end{array}$$

MAF, minor allele frequency; β, effect size; R2, IV explains the extent of exposure; SE, standard error; N, sample size; K, number of SNPs.

Statistical analysis

R 4.1.3 software (Lucent Technologies, USA) and the R package “Two Sample MR” were used for the statistical analysis [22]. We harmonized the exposure and outcome data into a dataset for MR results analysis (Table 1). Several calculation methods were taken for estimating causal effects, such as inverse variance weighted, inverse variance weighted (fixed effects), MR-Egger, weighted median, maximum likelihood, and mode-based estimate. The random effects inverse variance weighting method (IVW) was taken as the gold standard for MR results, and the other methods were taken as auxiliary[15].

Sensitivity analyses and visualization

After we had obtained the results of the statistical analysis by MR, we performed a sensitivity analysis to check robustness, including the following three aspects: heterogeneity test, pleiotropy test, and leave-one-out analysis test. In particular, we also validated the robustness of our results with the latest “MR-PRESSO” method [23].

In order to present the results in different dimensions, we have adopted a variety of visualizations. Several common methods were shown in the manuscript below, such as scatter plot, forest plot, leave-one-out plot, and funnel plot.

Results

Seven SNPs qualify as instrumental variables

We screened a total of seven SNPs based on the requirements for SNPs. Each SNP satisfied the relevance assumption and independence assumption. We presented information about the allele frequency, effect estimates, standard errors, and p values in Table 1 for these SNPs in exposure (COVID-19) and outcome (miscarriage). Meanwhile, by comparing the allele information, we ensured that these SNPs were not palindromic SNPs. Also, we calculated minor allele frequency and R2 (IV explains the extent of exposure) by which we derived F statistic (Table 2). Total R2 value for the instrumental variables was 0.000148, with F statistic of 19.21 higher than the cut-off of 10 for strong instrumental variables.

Causal effect estimates indicate no correlation

As suggested by the result of the heterogeneity test, we chose inverse variance weighting method (IVW) as the gold standard for MR analysis to overcome this heterogeneity between SNPs, while other methods as auxiliary. Results of different methods in MR analysis for the causal effect between COVID-19 and miscarriage were listed in Table 3. The IVW method showed there was no clear causal association between COVID-19 and miscarriage in the genetic prediction [OR 0.9981 (95% CI, 0.9872–1.0091), p = 0.7336]. The results of other methods were similar to the IVW, that is, the OR value was close to 1 and the p value was not statistically significant. Thus, according to the results of MR analysis using unstratified data from databases, it appeared that COVID-19 did not contribute to an increase in miscarriage rates.

Table 3 Causal effect between COVID-19 and miscarriage by different MR analysis methods

Sensitivity analysis reveals robustness

Heterogeneity was detected in the IVs chosen for COVID-19 (MR-Egger Q statistics = 20.71; Qdf = 5; Qpval = 0.00091; IVW Q statistics = 20.73; Qdf = 6; Qpval = 0.00204). This was probably because the SNPs we obtained were from summary data, and cannot be stratified by factors such as age and gender. The MR-Egger method can also detect horizontal pleiotropy by its intercept with the Y-axis. When the intercept was not zero, there was horizontal pleiotropy. In order to meet the exclusivity assumption, horizontal pleiotropy was not allowed to exist. And no horizontal pleiotropy was observed in our MR analysis results (intercept = 0.0001592; se = 0.0023; p value = 0.9480). Owing to the small number of SNPs, it was difficult to assess sensitivity by funnel plot, and specific results were shown in the Supplemental Fig.1 

In particular, we also validated the robustness of our results with the latest “MR-PRESSO” method. Although there was an outlier SNP (rs2269899), outlier-corrected MR analysis remains similar to IVW results, and neither had a statistically significant p value (Table 4). In the leave-one-out analysis test, we found that when removing individual SNPs and repeating the MR analysis, no substantial differences were observed in the estimated causal effects (Fig. 2). These results indicated that our findings were robust and single IV leaving did not affect the overall causal estimation effect.

Table 4 MR-PRESSO for horizontal multi-effect testing
Fig. 2
figure 2

Leave-one-out sensitivity analyses plot of COVID-19 on the risk of miscarriage

The causal effect of each single SNP on the risk of miscarriage was estimated by the Wald ratio method and has been visualized in the forest plot (Fig. 3). In Fig. 3, statistical significance was defined as p < 0.05. Unlike the leave-one-out analysis test, the threshold for statistical significance of forest plot was controversial and can also be defined as p < 0.05/n (conservative Bonferroni correction, where n referred to the number of SNPs) or p < 5 × 10−8 (the genome-wide significance level). Based on the p values for each SNP on outcome (Table 1), and leave-one-out analysis test (Fig. 2), as well as the results of all SNPs combined (Fig. 3), although rs2166172 and rs2269899 seemed to have a direct impact on the outcome, COVID-19 infection was not found to increase the risk of miscarriage. Since there was no clear causal association, the scatter plot explaining the MR results was placed in the supplementary material (Supplemental Fig. 2 ).

Fig. 3
figure 3

Forest plots show the causal effect of each single SNP on the risk of miscarriage

Discussion

This work explored the genetically association between COVID-19 and miscarriage. The evidence from MR does not support COVID-19 as a causal risk factor for miscarriage in European populations. These results were generally reliable in the sensitivity analysis.

SARS-CoV-2 virus is an enveloped positive-stranded RNA virus with a homogeneous distribution of S proteins on its surface, and angiotensin-converting enzyme 2 (ACE2) is the host receptor for SARS-CoV-2 cell entry. When SARS-CoV-2 binds to the host cell, the cellular transmembrane serine protease 2 (TMPRSS2) facilitates virus entry by activating the S protein [24]. The genetic susceptibility locus rs13050728 for COVID-19 is located within the interferon alpha and beta receptor subunit 2 (IFNAR2), an essential antiviral role as a receptor for type I interferon. Loss-of-function mutations in IFNAR2 can lead to severe COVID-19 infection [25]. Another genetic variant, rs2269899, is related to the stimulatory response element 2′,5′-oligoadenylate synthase (OAS) of interferon and shows an increased expression with age [26, 27]. Apart from the already mentioned SNPs, rs35081325 related gene leucine zipper transcription factor like 1(LZTFL1) primarily affects epithelial-mesenchymal cell transition in the lung and exacerbates inflammation [28]. Another SNP (rs2277732), as an intron of dipeptidyl peptidase 9 (DPP9), has been reported to increase the risk of idiopathic pulmonary fibrosis along with the activation of some inflammatory vesicles [29].

However, these genetic prediction loci associated with infection do not appear to explain the association of COVID-19 with miscarriage. Although these SNPs represent some processes in COVID-19, which may be associated with miscarriage, combining them with lung infection or other unrelated factors may reduce the ability to detect genetic correlations between COVID-19 and miscarriage [30,31,32].

Furthermore, while these genetic susceptibility loci highlight viral infection and inflammation, the likelihood of direct viral infection of the uterine cavity or placenta is relatively small. Typically, direct viral infection of the reproductive tract and uterine cavity often results in miscarriage [33]. For example, TORCH (toxoplasmosis, rubella, cytomegalovirus, and herpes simplex virus) can infect the placenta and cause embryonic abortion, and Zika virus could affect trophoblast cells leading to poor birth outcomes [34, 35]. But the SARS-CoV-2 does not look like that. According to a study published in the journal JAMA, the highest rate of virus positivity was found in bronchoalveolar lavage followed by sputum, nasal swabs, while the rate in blood was only 1%[36]. Thus, although there was ACE2 and TMPRSS2 (host receptors for SARS-CoV-2 cell entry) expressed in placenta [37]. But given the extremely low levels of the virus in the blood, we supposed the likelihood of SARS-CoV-2 being able to directly infect trophoblast cells via the blood is relatively low.

Indirect alterations in blood pressure and coagulation status mediated by ACE2 are speculated to be possible influences. ACE2 is a key regulator of the suprarenal angiotensin system (RAS) and plays an important role in the cardiovascular system; hence, we have to be careful whether COVID-19 may indirectly affect fertility outcomes through certain cardiovascular indicators [38]. Activation of platelets and damage to vascular endothelial cells by SARS-CoV-2 and its S-protein worsen the coagulation status of patients and make them more susceptible to the development of a prothrombotic state (PTS) [39, 40]. PTS can lead to micro-thrombosis of the spiral or chorionic vessels of the uterus during pregnancy, resulting in uteroplacental malperfusion and an increased risk of miscarriage [41]. In addition, angiotensin II (Ang II) is degraded by ACE2 to the angiotensin 1–7 (Ang-(1–7)), which negatively regulates RAS and lowers blood pressure [42]. Owing to the negative regulation of the SARS-CoV-2-ACE2 complex, ACE2 was significantly reduced in COVID-19 patients, resulting in a weakened inhibitory effect of Ang-(1–7) on the RAS system, manifested as an increase in blood pressure. This vasoconstriction-induced change in blood pressure may contribute in part to the outcome of miscarriage [43]. In the latest MR study, COVID-19 was shown to increase the risk of hypertension in pregnancy [44].

The above evidence seems to speculate that indirect alterations in blood pressure and coagulation status mediated by ACE2, rather than direct infection of the placenta, may increase the risk of miscarriage. It might somewhat explain the negative results of our MR analysis. Certainly, age and different types of miscarriage may also be risk factors. Miscarriage occurs predominantly in women of reproductive age, but the summary data include women of all ages and all types of miscarriage. This could be another possible explanation for our negative MR results.

Despite the fact that the virus is now being studied in greater depth, there is still a lack of unity of opinions on COVID-19 and miscarriage outcomes. These inconsistent results may be related to several reasons. First of all, the severity of infection is closely related to the pregnancy outcome. Although the populations included in different studies are all COVID-19 nucleic acid positive patients, there is a marked difference in the degree of symptoms between self-reported positive people and hospitalized infected people. Second, SARS-CoV-2 has evolved and mutated during the pandemic, producing multiple variants including alpha, gamma, delta, and omicron. In various studies, the corresponding variant may not be consistent at different epidemic stage, and the differences in transmissibility and virulence among the variants may lead to different pregnancy outcomes. For example, the delta variant, which emerged later, may have higher virulence and spread ability due to its special mutation in the spike protein [45]. Third, the level of medical services in different countries as well as the extent of vaccination may be another possible reason. The COVID-19 data in this study were from 2020, which was at the early stage of the pandemic, and the transmissibility and virulence of the corresponding variant were weaker than those of the later variants. This may be an alternative explanation for the negative result.

The main strength of our study is that this is the first time MR has been used to explore the genetically association between COVID-19 and miscarriage. Another advantage of this study over traditional observational studies is that the causal estimates obtained by MR avoid the confounding bias and reverse causation and hence has a stronger evidential power. Besides, we tested the efficacy of instrumental variables by various statistical methods, such as F values and PRESSO to improve the accuracy of estimated effects. However, several limitations cannot be avoided in our study. Since the biological mechanism remains unclear, we would not deny the pathophysiological processes that COVID-19 may cause during miscarriage. And limited by the inability to analyze subgroups, our MR study cannot provide more robust and reliable causal associations compared to MR based on individual-level data. In addition, we cannot ignore the impact of developmental canalization and epigenetic effects on outcomes through gene-environment interactions. Due to the lack of GWAS data from other populations, only populations of European origin were included. In view of the severe global situation of COVID-19, relevant genetic data from Asian, American, and African populations are urgently needed to carry out such causality studies to explore genetic differences between races.

Overall, the small probability of direct infection in the placenta, as well as the inability to stratify the data, may explain the results of MR, while indirect alterations in blood pressure and coagulation status mediated by ACE2 are speculated to be possible influences.

Conclusions

Our MR analysis showed there was no clear causal association between COVID-19 and miscarriage in the genetic prediction. We hope to have high-quality, multi-centered, and prospective RCTs to provide more favorable evidence on this controversial topic. The findings of our study can be used as a reference for maternal management and for reproductive physicians in selecting transplants during a pandemic.