FormalPara Key Points

The masking effect is a statistical issue associated with commonly applied signal detection methodologies in which signals for a product of interest are hidden by the presence of other reported products.

Due to vaccine novelty, and an unprecedented dynamic of reporting, statistical signals of adverse events related to coronavirus disease 2019 (COVID-19) vaccines are more prone to masking and, therefore, to being undetected or delayed.

A more advanced class of signal detection methodologies, based on regression, can address masking and expose strong statistical associations that would otherwise be deemed uninteresting.

The extent, direction, impact, and root causes of masking change in accordance with the changing nature of data.

1 Introduction

As the world contends with ending the coronavirus disease 2019 (COVID-19) pandemic, understanding the risks associated with COVID-19 vaccines is critically urgent. The Vaccine Adverse Event Reporting System (VAERS), co‐administered by the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention (CDC), is one of several systems used to monitor adverse events (AEs) that occur after vaccination, including the COVID-19 vaccines. Like other safety surveillance systems, VAERS offers the opportunity to rapidly identify potential risks associated with vaccines—a process usually known as signal detection.

According to the World Health Organization (WHO), a safety signal is defined as reported information on a possible causal relationship between an AE and a product, of which the relationship is unknown or incompletely documented [1]. At a very high level, signal detection is the active pursuit of safety signals. The process of signal detection is multifaceted and interdisciplinary and can take many forms, be performed at different levels of evidence and data, and be accomplished in different ways. The specific application considered in this study has previously been termed data mining, screening, disproportionality analysis, and quantitative signal detection. It involves the use of statistical techniques that cast a wide net to rapidly explore large databases of reported AEs for statistical patterns or anomalies that may be indicative of new risks that warrant further attention. This approach to signal detection has been routinely applied to safety surveillance systems for over 20 years and has become a de facto standard [2]. To distinguish this approach from other approaches and activities related to signal detection, we will simply refer to it as statistical signal detection, highlighting its statistical foundation. With that, it is important to emphasize that since statistical signal detection is ultimately based on reporting patterns that are influenced by reporting dynamics, it is characterized as hypothesis generating. The presence of a strong statistical signal does not automatically imply a causal relationship and must always be evaluated by other methods, including the clinical review of case-level reports, scientific literature, and relevant studies [2,3,4]. Likewise, the absence of a strong statistical signal does not automatically rule out the existence of a safety issue. It is also worth mentioning that statistical signal detection can be repurposed to inform suspicions originating from other sources, but that is not the focus of our investigation.

Methodologies for statistical signal detection are based on computing surrogate measures of statistical association between specific pharmaceutical products and AEs that are reported into safety surveillance systems [5]. The measures are typically interpreted as signal scores, with larger values representing stronger statistical associations, which may be more likely to represent true causal associations. In practice, a signal score threshold is often used to screen associations that warrant further attention.

Methodologies for statistical signal detection currently deployed by safety surveillance organizations are largely based on disproportionality statistics. These methodologies use frequency analysis of 2 × 2 contingency tables to quantify the degree to which a product–AE combination co-occurs disproportionately as compared with that expected if there were no statistical association. To illustrate, we use the relative reporting ratio (RRR), which is a disproportionality statistic underlying several methodologies. The RRR is defined as the ratio of the number of reports mentioning a specific (target) product–event combination to an expected number of reports for the same combination under the assumption that the product and AE occur independently. Based on the values displayed in Table 1, the RRR is formally given by,

$$ {\text{RRR}} = \frac{{\left( {a + b + c + d} \right) \cdot a}}{{\left( {a + b} \right) \cdot \left( {a + c} \right)}} $$
(1)
Table 1 2 × 2 contingency table used to compute disproportionality statistics for signal detection

and a number of enhancements, such as Bayesian smoothing and stratification, lead to several signal detection methodologies currently utilized by safety surveillance organizations [5].

Given its impact on public health, signal detection is still an active area of research, and since its inception, multiple guidance documents [3, 6,7,8] have been published with practice recommendations as well as admonitions concerning data and methodological limitations.

Undetected or delayed signals and false alerts are the two primary concerns with signal detection and two objective measures with which the reliability of signal detection can be evaluated. Undetected or delayed signals are especially disconcerting given their direct impact on public health. This study is concerned with those signals undetected by statistical signal detection, which we will refer to as statistical signals. Fortunately, multiple other surveillance and signaling efforts are deployed to reduce the chance of undetected signals.

Undetected statistical signals can stem from several sources. Incomplete data and the voluntary nature of reporting to surveillance systems are the primary sources of undetected signals. However, undetected statistical signals can also stem from methodological limitations and, in particular, a widely acknowledged problem called ‘masking’ [3, 9, 10].

Masking is an artifact of commonly applied disproportionality statistics that rely on the analysis of 2 × 2 contingency tables in which signals of disproportionate reporting may be hidden (hence, masked) by the presence of other non-target products frequently reported with the target AE. As described above, disproportionality statistics based on 2 × 2 contingency tables are defined as the ratio of the target AE rate for the target product to the background rate for target AE. However, defining the background rate can be problematic. We are prone to think of the background as being scattered randomly across all the non-target products, but this may not be the case. What if one non-target product has half of the target AEs appearing with all non-target products? In that case, under certain conditions eliminating that particular non-target product from the reports database would roughly double our target disproportionality. It would seem reasonable to do so, because otherwise, the non-target product would be masking the target’s true product disproportionality by cutting its value in half. Therefore, a possible solution to address masking is to first identify the ‘offending’ products and then remove reports containing those products from the calculation of disproportionality statistics. This solution may work in a limited set of scenarios, but is practically infeasible in the general case as it may require examining a combinatorically prohibitive set of product–AE pairs. A more direct and computationally feasible approach to address masking necessitates the use of a more advanced class of methodologies, such as regression, which go beyond the analysis of 2 × 2 contingency tables and can compute statistical associations adjusted for the presence of other products. This investigation makes use of one such methodology called Regression-Adjusted Gamma Poisson Shrinker (RGPS) [11].

To illustrate masking with a simple numerical example, consider the values displayed in Tables 2 and 3, which build on the example provided in Table 1 and Eq. (1). Tables 2 and 3 display values used for disproportionality analysis of 2 × 2 contingency tables capturing a hypothetical target AE and a hypothetical target product labeled ‘A.’ Table 2 introduces a product labeled ‘B,’ which serves as the ‘offending’ product that masks the true relationship between the target product ‘A’ and the target AE. To simplify our example, we assume that products ‘A’ and ‘B’ are not co-reported with other products and stress that what is being counted are the number of reports mentioning products/AEs and not co-occurrences. Table 2 shows that most of the reports (80/93) mentioning the target AE are associated with product ‘B,’ which leads to masking. Applying the RRR (Eq. 1) yields a masked \(\mathrm{RRR}=(393\times 3)/(93\times 13)=0.98\), indicating that there is no statistical association. However, removing the reports that mention product ‘B’ yields the counts displayed in Table 3, and an unmasked \(\mathrm{RRR}=(233\times 3)/(13\times 13)=4.14\) that indicates a strong statistical association between the target AE and target product ‘A.’

Table 2 Contingency table used to compute disproportionality statistics with the inclusion of reports containing product ‘B’ that masks the association of product ‘A’ with the target AE
Table 3 Contingency table used to compute disproportionality statistics with the exclusion of reports containing product ‘B’ that would mask the association of product ‘A’ with the target AE

Conditions that make signal detection especially vulnerable to masking effects include smaller safety databases such as VAERS that may lack diversity, relationships involving rare events, and relationships involving newer products. As such, the novelty of COVID-19 vaccines, coupled with ongoing vaccination programs, and the relatively early stages of COVID-19 vaccine surveillance make signal detection especially susceptible to masking.

The aim of this study is to investigate the problem of masking in relation to signal detection of COVID-19 vaccines and to assess its impact, extent, and root causes. To this end, we evaluate the evolution of signals corresponding to seven distinct AEs with various degrees of evidence linking them to the vaccines, and which demonstrate relatively strong masking effects. Five of these seven AEs are part of a list of AEs deemed to be of special interest for COVID-19 vaccine surveillance by the CDC and the FDA [12, 13]. The remaining two AEs, herpes zoster and tinnitus, are yet to be fully recognized but have accumulated thousands of reports in VAERS and are supported by published studies and case reports. We supplement this temporal investigation of seven AEs with a wider evaluation of masking at the database level. In addition, we center the evaluation on the messenger RNA (mRNA) vaccines from Pfizer-BioNTech (BNT162b2) and Moderna (mRNA-1273), which account for the vast majority of COVID-19 vaccine reports in VAERS.

2 Materials and Methods

2.1 Data

The investigation was performed using all VAERS reports available at the time of writing this article (1990 to October 1, 2021). These data represent a total of 1,599,958 reports, including 39 weeks of COVID-19 vaccine reports, which are publicly released on a semi-monthly (every 2 weeks) cadence from January 1, 2021 to October 1, 2021. Of those, 778,681 reports include the COVID-19 vaccine from three manufacturers: Pfizer-BioNTech (53%), Moderna (39%), and Janssen (8%). The investigation was based on AEs in VAERS coded at the MedDRA Preferred Term (PT) level and products at the ‘manufacturer’ level, e.g., ‘COVID19_PFIZER/BIONTECH.’

2.2 Adverse Events of Interest

The seven AEs investigated in this study and their associated MedDRA PTs are listed below. The MedDRA PTs associated with each of the seven AEs were used to identify VAERS reports mentioning a given AE.

  1. 1.

    Bell's palsy (PT = ‘Facial paralysis’ or ‘Bell's palsy’)

  2. 2.

    Myocarditis (PT = ‘Myocarditis’)

  3. 3.

    Pericarditis (PT = ‘Pericarditis’)

  4. 4.

    Appendicitis (PT = ‘Appendicitis’ or ‘Appendicitis perforated’ or ‘Complicated appendicitis’)

  5. 5.

    Pulmonary embolism (PT = ‘Pulmonary embolism’)

  6. 6.

    Herpes zoster (PT = ‘Herpes zoster’)

  7. 7.

    Tinnitus (PT = ‘Tinnitus’)

These AEs were selected for our investigation because they demonstrated strong masking effects and are supported by other sources. They were identified using an approach to screen and rank masked associations, which is described in Sect. 2.5 below. As noted in the Introduction, five of these AEs are partially recognized and are part of a list of AEs deemed to be of special interest for COVID-19 vaccine surveillance by the CDC and the FDA [12, 13]. The last two AEs (herpes zoster and tinnitus) were discovered through this investigation but are yet to be fully characterized like the other five AEs. Nonetheless, they are accompanied by strong statistical as well as published support, which is why they are included. Although we discovered other associations that exhibit masking effects, they did not appear strong or serious enough for inclusion in our evaluation, such as injection site pain.

2.3 Signal Detection Methodologies

We evaluated disproportionality statistics produced by four signal detection methodologies. A summary and short description of these methodologies as well as the statistics they compute is provided in Table 4, and further described in the following. Three of these methodologies—Multi-item Gamma Poisson Shrinker (MGPS) [14], Bayesian Confidence Propagation Neural Network (BCPNN) [15], and proportional reporting ratio (PRR) [16]—are well-established and are currently deployed by various organizations worldwide for routine safety surveillance. However, because these three methodologies are based on 2 × 2 disproportionality analysis, they are unable to, and were not designed to, control masking and certain confounding effects. We use these three methodologies as our baseline to investigate and verify masking effects. The fourth methodology, RGPS [11], is a signal detection methodology based on logistic regression that is designed to produce disproportionality statistics with adjusted background rates that can control masking and more extensive confounding effects. It operates by fitting separate Bayesian logistic regression models to each target AE and by automatically selecting predictors to be included in each regression model. The automatically selected predictors are products (vaccines in this case) that are statistically associated (based on unadjusted disproportionality statistics) with the target event and are represented as indicator variables. In addition, stratification categories are grouped by target AE rates and are represented as multiple regression intercepts. To address masking, RGPS adjusts a given target disproportionality statistic by adjusting its value for the presence of other products that also have large unadjusted disproportionalities (the regression predictors). This adjustment of the target disproportionality can be either positive or negative. When a non-target product with a large disproportionality never shows up in the same report as the target product, then the adjusted background AE rate will be lower and the target AE rate will be higher, in which case the association has been unmasked. Conversely, if that high-disproportionality non-target product is often co-prescribed with the target product, then the AE rate of the two products will be confounded and the adjusted targeted event rate for the two products will each be shrunk to express the uncertainty of which is the true causal factor when all three items, the two products and the target event, occur in the same report.

Table 4 Signal detection methodologies and disproportionality statistics used to investigate signals of coronavirus disease 2019 (COVID-19) vaccine adverse events

Additional details on the RGPS methodology are provided in the Supporting Information (SI1) (see the electronic supplementary material), and complete details of the RGPS methodology in Ref. [11].

The stratification categories used for RGPS, MGPS, and BCPNN were age and gender. Stratification by ‘report year’ was not applied because the vast majority of COVID-19 VAERS reports represent a single year of reporting (2021). We applied the canonical version of PRR, which does not require stratification. For RGPS and MGPS, we generated both the point estimates, labeled Empirical-Bayes Regression-adjusted Arithmetic Mean (ERAM) and Empirical Bayes Geometric Mean (EBGM), respectively, and their associated credible intervals labeled ER05–ER95 and EB05–EB95, respectively. Unless specified otherwise, signal scores are represented by the point estimates. The generation of signal scores for the four methodologies considered in this study and analysis thereof was done using Oracle Empirica Signal 9.1 [17].

2.4 Capturing the Evolution of Signals

The evolution of signal scores for each AE was captured by a time series of signal statistics. The time series runs from a period at which initial reports for an AE were available to the latest batch of reports available at the time of writing this article. Each time point corresponds to a semi-monthly public release of VAERS reports, starting from week 3 (W3) January 22, 2021 and ending in week 39 (W39) October 1, 2021, for a total of 19 time points. The signal statistics computed for each time point include the signal score point estimate and its credible interval, e.g., ER05-ERAM-ER95 for RGPS and EB05-EBGM-EB95 for MGPS. These were computed based on all data available in VAERS and not only the COVID 19 reports or data within the range of dates underlying the time series.

2.5 Analysis and Evaluation

The comparison of signal detection methodologies for the time series centers on the RGPS and MGPS methodologies. These were chosen as representatives of the two classes of methodologies described in the ‘Introduction’ and above. That is, MGPS as a representative of the class of methodologies based on 2 × 2 disproportionality analysis that are unable to address masking, and RGPS as a representative of the more advanced class of methodologies based on regression that can address masking. The information component (IC) statistic [15] computed by the BCPNN methodology produces signal scores that are almost identical to those produced by MGPS and therefore redundant in many parts of our evaluation. The PRR signal statistic in its canonical application does not include smoothing or signal score adjustments for small counts as do the other methodologies and, therefore, does not protect against false alarms as well as the other methodologies. For this reason, a direct comparison against PRR (in its canonical form) would not have allowed us to isolate and explain sources of undetected signals. Nonetheless, both PRR and the IC statistic are used to confirm masking effects using the approach discussed in the following and presented in the ‘Results’ section.

Table 5 defines several concepts and conditions that we use to evaluate signals and to describe our findings in the ‘Results’ section. These include the concept of a signaling threshold, criteria to decide if a signal is detected or not (signal present/absent), a condition we use to decide if the difference between signal scores produced by different methodologies is statistically significant, a condition we use to screen candidate associations for masking, and the calculation we use to quantify the size of a masking effect.

Table 5 Concepts and conditions used to evaluate signals

Having generated the time series of signal scores for each AE of interest, we investigate and attempt to validate masking sources based on the following:

  1. (1)

    We select two time periods: an earlier point in the evolution of signals when masking starts to take effect, and the end period (W39). Doing so allows us to examine the origin of the masking sources and whether the sources change over time. The earlier time point corresponds to the earliest point in the time series (for both the Pfizer-BioNTech and Moderna vaccines) for which the RGPS and MGPS signals scores were significantly different, and RGPS’s signal score exceeded the signaling threshold as defined above.

  2. (2)

    For each time point, we evaluate the predictors that are automatically selected by RGPS to be included in the regression model for the target AE. Based on the regression coefficients, we then identify the strongest predictors (vaccines) as potential sources of masking.

  3. (3)

    As mentioned in the ‘Introduction,’ once masking sources have been identified, the conventional approach to control masking is to remove all reports containing the maskers, and re-compute signal scores. We use this conventional approach to confirm our findings. That is, we remove reports containing the potential maskers (vaccines) identified by RGPS and re-compute signal scores for the signaling methodologies based on 2 × 2 disproportionality analysis (MGPS, PRR, BCPNN). Substantial increases in these signal scores as well as their convergence toward the original RGPS signal score is a strong indication that the sources of masking have been correctly identified and a likely explanation for undetected or delayed statistical signals.

3 Results

Figures 1 and 2 and Table 6 depict our findings for each of the seven AEs investigated in this study. The figures display the evolution of signal scores for each AE captured as a time series of signal scores, whereas Table 6 provides signal scores for each AE averaged across the time series. As described in the ‘Materials and Methods’ (Sect. 2.4), the time series ranges from W3 to W39 of COVID-19 reports, for a total of 19 time points in 2-week intervals corresponding to the semi-monthly public release of VAERS reports.

Fig. 1
figure 1

The evolution of signal scores for Bell's palsy, myocarditis, pericarditis, and appendicitis. MGPS Multi-item Gamma Poisson Shrinker, RGPS Regression-Adjusted Gamma Poisson Shrinker, W week

Fig. 2
figure 2

The evolution of signal scores for pulmonary embolism, herpes zoster, and tinnitus. MGPS Multi-item Gamma Poisson Shrinker, RGPS Regression-Adjusted Gamma Poisson Shrinker, W week

Table 6 Average signal score and average masking effect for Bell's palsy, myocarditis, pericarditis, appendicitis, pulmonary embolism, herpes zoster, and tinnitus

Rows in the figures correspond to AEs, and columns to vaccines (Pfizer/BioNTech vs Moderna). Figure 1 covers the AEs Bell's palsy, myocarditis, pericarditis, and appendicitis, whereas Fig. 2 covers the AEs pulmonary embolism, herpes zoster, and tinnitus. Each figure displays a time series of signal scores for the RGPS and MGPS methodologies. Each point corresponds to the signal score point estimate and its credible interval (shaded region), i.e., ER05-ERAM-ER95 for RGPS and EB05-EBGM-EB95 for MGPS. Table 6 summarizes and supplements the figures by providing average signal scores for RGPS and MGPS (ERAM and EBGM, respectively) across each time series, as well as the average masking effect size defined in Sect. 2.5/Table 5. Finally, supporting information (SI2) (see the electronic supplementary material) provides signal statistics for all combinations of AE/vaccine/signaling methodology, including signal statistics for the PRR and BCPNN methodologies.

The figures clearly show several trends:

  1. (1)

    The time series curves of signal scores produced by RGPS are always above those of MGPS, i.e., the RGPS signal scores are always larger than those of MGPS. This is not an expected pattern and is indicative of masking effects for the AEs of interest. This also suggests that the RGPS methodology would have been able to detect signals missed by MGPS or identify signals at an earlier time point than MGPS. According to Table 6, the average masking effect size ranges from around 40% for Bell’s palsy to around 230% for herpes zoster, and the average signal score corrected for masking (RGPS) exceeds the signaling threshold.

  2. (2)

    For most AEs, RGPS and MGPS initially agree on their signal scores (statistically insignificant differences) and then diverge in their signal scores. The divergence is likely due to the influence of masking effects, the evolution of VAERS data, and possibly changes in reporting practices.

  3. (3)

    For several AEs, the time series exhibits an acute increase in signal score values at certain time points. These acute increases are likely explained or coincide with external events, such as the availability of a vaccine to certain age groups and the influence of publications.

  4. (4)

    For certain AEs at certain time points, the signal scores fall below the signaling threshold. This indicates that at those time points statistical signals would have been undetected and that statistical signaling may be time sensitive.

  5. (5)

    As more data accumulates, signal scores expectedly stabilize. Larger fluctuations are seen for RGPS, indicating that it is sensitive to masking and confounding effects and that the data may still be evolving.

The following describes our findings for each AE of interest.

3.1 Bell’s Palsy

Bell's palsy is a form of acute facial paralysis with a weakening and a drooping appearance of the facial muscles usually on just one side of the face. In most cases, the paralysis resolves spontaneously within several weeks. Bell's palsy is due to swelling of the facial nerve, and type I interferons have been proposed as the potential mechanism [18]. Incidents of Bell’s palsy were reported in clinical trials for both the Pfizer-BioNTech and Moderna vaccines, and it has also been documented with the influenza vaccine [19, 20]. The FDA currently recommends its surveillance with larger populations globally. In addition, there have been multiple case reports of Bell's palsy associated with the mRNA vaccines [19, 21,22,23], and several studies that investigated the association [24,25,26].

As of W39, there are 7795 reports of Bell's palsy for the mRNA vaccines (5684 Pfizer-BioNTech, 2111 Moderna). The time series in Figure 1 shows that the signal scores produced by each methodology differ by a small amount, with RGPS and MGPS diverging (non-overlapping credible intervals) around W7–9. The figure also shows that a mild masking effect is present (40% averaged across the time series). Regardless of masking, all methods agree early on that the reported co-occurrence of the mRNA vaccines with Bell’s palsy is unlikely due to chance (signal scores exceeding the signaling threshold). However, towards the end period (W33) the MGPS signal scores fall below the signaling threshold for the Moderna vaccine.

3.2 Myocarditis and Pericarditis

Myocarditis and pericarditis refer to inflammation of the heart muscle and outermost layer of the heart, respectively. Myocarditis and pericarditis are both thought to be caused by viral infections, and symptoms include chest pain, shortness of breath, and irregular heartbeat appearing within several days after the second dose of the mRNA vaccines. Several case reports of myocarditis and pericarditis developing rapidly after the first and second doses of the mRNA vaccines have been published [27,28,29,30,31], as well as several retrospective studies [13, 32,33,34,35] identifying it as a rare complication of the vaccines. One study in mice suggests that inadvertent intravenous injection of COVID-19 mRNA vaccines may induce myopericarditis [36].

The risk of myocarditis following vaccination has been observed to be highest among young males. The CDC has recognized the association with the COVID-19 mRNA vaccines [2], and both myocarditis and pericarditis now appear on the product labels (warning section) of the vaccines [37, 38].

As of W39, there are 4690 reports of myocarditis for the mRNA vaccines (3515 Pfizer-BioNTech, 1175 Moderna) and 3079 reports of pericarditis for the mRNA vaccines (2408 Pfizer-BioNTech, 671 Moderna) in the VAERS system. Relative to the total number of cases for these AEs, 87% of myocarditis cases and 83% of pericarditis cases are associated with the mRNA COVID-19 vaccines.

The changing age distribution of COVID-19 vaccine recipients can be observed in the progression of the time series. Figure 1 shows that both the RGPS and MGPS signal scores for myocarditis were initially not indicative of a safety signal, but around W19–21 (week ending May 30, 2021), as the COVID-19 vaccines were made available in the US to people under 65 years, a substantial increase in both signal scores can be observed. At this point RGPS and MGPS start diverging, with MGPS remaining on point and RGPS showing a gradual increase from a signal score of 2.3 to above 9.0 (Pfizer-BioNTech) and 1.5 to above 5.0 (Moderna). Similar trends of signal score progression are observed for pericarditis, with a slight decrease in RGPS signal scores around W31–33 onwards.

The size of the masking effect for myocarditis is ranked second for the AEs of interest, with an average value around 190%. For pericarditis, the effect size is 70%. The sources of masking for myocarditis were evaluated based on the process described in Sect. 2.5. The two time periods examined were W19 and W39. RGPS automatically selected 20 (W19) and 39 (W39) vaccine predictors for the myocarditis regression model. The strongest predictors for both time points were a set of three smallpox vaccines (at the manufacturer level), which is consistent with published reports recognizing myocarditis as a rare AE of the smallpox vaccine [39,40,41].

Upon removal of all reports containing the smallpox vaccines on W19, the PRR, EBGM, and IC signal scores indeed reverted to larger signal scores close in magnitude to RGPS’s original signal score. The PRR signal score for the Pfizer-BioNTech vaccine increased from 1.44 to 2.48 (72%), and for the Moderna vaccine, from 0.8 to 1.34 (67%). Similarly, the EBGM signal score for the Pfizer-BioNTech vaccine increased from 1.44 to 2.17 (51%), and from 0.94 to 1.42 (51%) for the Moderna vaccine. As more data accumulated in VAERS, the Pfizer-BioNTech and Moderna COVID-19 vaccines were also identified by RGPS as potential maskers. In this case, they masked each other for the myocarditis AE. On W39, the Pfizer-BioNTech vaccine was identified by RGPS as the strongest masker. Removing all reports containing the Pfizer-BioNTech vaccine led to a substantial increase in signal scores for the Moderna–myocarditis association. The PRR signal score increased from 1.2 to 4.98 (315%), and the EBGM score increased from 1.32 to 2.13 (61%). This demonstrates how the Pfizer-BioNTech vaccine is masking the Moderna vaccine, and how masking sources may evolve over time. In addition to the COVID-19 vaccines, the smallpox vaccines were still identified by RGPS as strong sources of masking on W39. Removing both smallpox and Pfizer-BioNTech vaccines led to the following additional increases for the Moderna association: PRR increased from 4.98 to 8.14 (63%) and EBGM increased from 2.13 to 2.4 (13%). Similarly, removing the smallpox and Moderna vaccines led to the following increases for the Pfizer-BioNTech-myocarditis association: PRR increased from 5.42 to 10.96 to 17.94 (230%) and EBGM increased from 1.94 to 2.02 to 2.12 (9%).

3.3 Appendicitis

Appendicitis is an inflammation of the appendix usually caused by an obstruction of the appendiceal lumen; however, the exact etiology of acute appendicitis is often unknown. Appendicitis is the most common cause of acute abdominal pain requiring surgery. If left untreated, acute appendicitis can result in serious complications, such as peritonitis or abscess formation [42, 43]. According to the Pfizer-BioNTech COVID-19 Vaccine Fact Sheet for Healthcare Providers, appendicitis was reported as a serious AE in a clinical trial for eight vaccine participants and four placebo participants (Pfizer-BioNTech COVID-19 vaccine = 10,841; placebo = 10,851), but not during post-authorization experience [37]. The Moderna COVID-19 Vaccine Fact Sheet for Healthcare Providers does not mention appendicitis as an AE in clinical trials or in post-authorization experience [38]. However, both the Pfizer-BioNTech and Moderna Fact Sheets for Healthcare Providers mention lymphadenopathy as a reported AE during clinical trials. Barda et al. demonstrated an elevated risk ratio for appendicitis (risk ratio 1.40; 95% confidence interval [CI] 1.02–2.01) with the Pfizer-BioNTech COVID-19 vaccine in a mass nationwide vaccination setting [44].

As of W39, there are 725 reports of appendicitis for the mRNA vaccines (537 Pfizer-BioNTech, 188 Moderna) in the VAERS system. As shown in Fig. 1, both MGPS and RGPS showed extremely large signal scores early on that attenuated over time but remained high for RGPS, with values above 3.7 for Pfizer-BioNTech and above 1.7 for Moderna. This early signaling by W3 appeared even when the number of reports was small (15 Pfizer-BioNTech, 6 Moderna). RGPS and MGPS started diverging around W11, likely due to masking. The figure shows a relatively large masking effect. Averaged across the time series, the size of the masking effect was high and around the value of 100% for both vaccines.

3.4 Pulmonary Embolism

Pulmonary embolism is a sudden blockage in a lung artery. It usually happens when a blood clot breaks loose and travels through the bloodstream to the lungs. Pulmonary embolism is a serious condition that can cause permanent damage to the lungs, low oxygen levels in the blood, and damage to other organs in the body from not getting enough oxygen. Pulmonary embolism can be life-threatening, especially if a clot is large, or if there are many clots [45].

Systematic reviews and meta-analyses showed high incidences of pulmonary embolism in COVID-19 patients [46, 47]. Barda et al. reported an elevated risk ratio for pulmonary embolism (risk ratio 12.14; 95% CI 6.89–29.20) for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-infected compared to uninfected persons [44].

Besides COVID-19 itself, it appears that COVID-19 vaccines increase the risk for pulmonary embolism; several authors reported the occurrence of pulmonary embolism, often in combination with vaccine-induced thrombotic thrombocytopenia (VITT), following COVID-19 vaccination, mainly for adenovirus-based COVID-19 vaccines [48,49,50,51,52,53,54]. Although no increased risk for pulmonary embolism was found by Klein et al. for mRNA vaccines [12] and by Barda et al. for Pfizer-BioNTech [44], some case reports described the occurrence of pulmonary embolism following vaccination with Pfizer-BioNTech [55,56,57]. As of this writing, pulmonary embolism is not mentioned in the vaccine labels of the Pfizer-BioNTech and the Moderna COVID-19 vaccines.

As of W39, there are 5869 reports of pulmonary embolism for the mRNA vaccines (4394 Pfizer-BioNTech, 1475 Moderna) in the VAERS system. Figure 2 shows that both MGPS and RGPS exceed the signal threshold for pulmonary embolism already in W3 for both vaccines. In the following weeks, starting on W9, RGPS departs from MGPS and stays on a value level about threefold that of MGPS. Averaged across the time series, the size of the masking effect was high and around the value of 170% for both vaccines. The MGPS time series for Moderna decreases to below the signaling threshold in W39, whereas RGPS remains well above the threshold. For Pfizer-BioNTech, MGPS and RGPS remain above the signaling threshold, with RGPS at about three times the value of MGPS.

3.5 Herpes Zoster

Herpes zoster (shingles) is a painful rash that develops on one side of the face or body. The rash consists of blisters that typically clear within 2–4 weeks [58]. Multiple reports of patients who developed herpes zoster shortly after COVID-19 vaccination have been recently published [59,60,61,62,63,64], as well as observational studies and systematic reviews [44, 65,66,67], which suggest a potential link with the mRNA COVID-19 vaccines. Possible mechanisms that explain the pathogenic link are related to the stimulation of innate immunity through toll-like receptors 3, 7 by mRNA-based vaccines [65].

As of W39, there are 8228 reports of herpes zoster for the mRNA vaccines (5637 Pfizer-BioNTech, 2591 Moderna). Figure 2 shows a substantial difference between RGPS and MGPS, with MGPS indicating that there is no statistical association between herpes zoster and the vaccines (signal scores below the signaling threshold), versus RGPS indicating the contrary (signal scores exceeding the signaling threshold) from W13 (Pfizer-BioNTech) and W17 (Moderna) through the remaining time periods. Although the value of the RGPS signal score is not large relative to the other AEs, it indicates that the association is unlikely to be due to chance.

Interestingly, the size of the masking effect for herpes zoster was the largest among the AEs of interest. Averaged across the time series, the size of the masking effect was 230% for both mRNA vaccines. The sources of masking were evaluated and validated based on the process described in Sect. 2.5. The two time periods examined were W17 and W39. RGPS automatically selected 67 (W17) and 44 (W39) vaccine predictors for the herpes zoster regression model. The strongest predictors were the varicella (chickenpox) and the VARZOS (a combination varicella and zoster) vaccines, for a total of six vaccine predictors at the manufacturer level. Although the risk is low, there are documented cases and studies of herpes zoster following varicella and VARZOS vaccination [68,69,70]. Upon removal of all reports containing the varicella and VARZOS vaccines, we found that the PRR, EBGM, and IC signal scores indeed reverted to larger signal scores close in magnitude to RGPS’s original signal score. For example, the PRR signal score for the Pfizer-BioNTech vaccine increased from 0.37 to 1.47 (297%) on W17 and from 0.76 to 2.3 (202%) on W39. Similarly, the EBGM signal score increased from 0.35 to 1.47 (320%) on W17 and from 0.66 to 1.48 (124%) on W39. In addition, we found that these masking sources (i.e., the varicella and VARZOS vaccines) did not change over time and remained consistent at both time periods that were evaluated.

3.6 Tinnitus

Tinnitus is described as the sensation of hearing ringing, hissing, or other noises in one or both ears that is not caused by an external sound. Tinnitus can be intermittent or continuous and can vary in pitch and intensity. Prolonged exposure to loud sounds and a variety of other conditions can lead to tinnitus; however, the mechanism responsible for tinnitus is unclear.

Tinnitus has been linked to other vaccines such as hepatitis, rabies, measles, and H1N1 vaccines [71]. In COVID-19 vaccine trials prior to the release of the Pfizer-BioNTech and Moderna vaccines, no mention was made of the onset of tinnitus or worsening tinnitus for either vaccine. As early as March 2021, in a report from the United Kingdom Medicines and Healthcare products Regulatory Agency (MHRA), 196 tinnitus cases among 33,207 vaccinated persons were recorded for the Pfizer-BioNTech vaccine [72], and since then, several case reports linking tinnitus to the mRNA vaccines as well as to the Janssen and AstraZeneca vaccines have been published [72,73,74,75]. In addition, due to an apparently increased number of individuals experiencing tinnitus during the pandemic period, the connection between the vaccines and tinnitus received special attention in various media outlets and professional associations dedicated to tinnitus [76, 77]. As of this writing, tinnitus is not mentioned in the vaccine labels. As mentioned in the ‘Introduction,’ tinnitus is not contained in the set of AEs of interest recognized by various health organizations. As of W39, there are 12,296 reports of tinnitus for the mRNA vaccines (7649 Pfizer-BioNTech, 4647 Moderna) in the VAERS system. Interestingly, the number of reports for tinnitus is larger by a substantial amount than for any of the other AEs covered in this article. Figure 2 shows that both MGPS and RGPS exceed the signal threshold early on for both vaccines and remain above the signaling threshold through the remaining time periods (excluding a brief crossing for MGPS and Moderna on W9–15). RGPS and MGPS start diverging on W15–17, with RGPS rapidly increasing to signal score values twice as large in a short amount of time. This appears correlated with the increase in the number of reports available throughout the period and likely the dynamics of masking effects.

Averaged across the time series, the size of the masking effect was high and around the value of 80% for both vaccines. Based on the process described in Sect. 2.5, we evaluated the sources of masking for tinnitus. The two time periods examined were W17 and W39. RGPS automatically selected 21 (W17) and 25 (W39) vaccine predictors for the tinnitus regression model. For W17, the strongest predictors and potential maskers identified by RGPS were the HPV4 (papilloma virus) vaccine and the Janssen and Pfizer-BioNTech COVID-19 vaccines. Hence, on W17, two COVID-19 vaccines were already masking other associations; Janssen masking the Pfizer-BioNTech and Moderna COVID-19 vaccines, and the Janssen and Pfizer-BioNTech vaccines masking the Moderna vaccine. Removing all reports containing these three vaccines (HPV4, Janssen, and Pfizer-BioNTech) resulted in expected signal score increases for the Moderna-tinnitus association, with PRR increasing from 1.79 to 2.5 (40%) and EBGM increasing from 1.13 to 1.43 (27%). Expectedly, on W39, as more data accumulated in VAERS, the Pfizer-BioNTech and Moderna vaccines were identified by RGPS as the strongest maskers (masking each other) in addition to the Janssen vaccine. On W39, the HPV4 vaccine was no longer identified as a strong masker. Removing reports containing the Janssen and Pfizer-BioNTech vaccines led to the following signal score changes for the Moderna-tinnitus association: PRR increasing from 1.8 to 5.5 (205%) and EBGM increasing from 1.18 to 1.69 (43%). Similarly, removing reports containing the Janssen and Moderna vaccines led to the following signal score changes for the Pfizer-BioNTech-tinnitus association: PRR increasing from 2.75 to 6.67 (143%) and EBGM modestly increasing from 1.57 to 1.71 (9%). This demonstrates that the Pfizer-BioNTech and Moderna vaccines may mask each other to varying degrees, in this case, Pfizer-BioNTech having a larger effect on Moderna than vice versa.

3.7 Masking Statistics at the Database Level

Table 7 displays counts for the number of potentially masked associations in VAERS categorized by vaccine type. The conditions that define a potentially masked association are provided in the ‘Materials and Methods’ (Sect. 2.5; Table 5, candidate association for masking). The table shows that the likelihood of a masked association for the COVID-19 vaccines is 2.3%, which is roughly eight times larger than for non-COVID-19 vaccines (0.3%). This result clearly demonstrates the increased potential and susceptibility of VAERS COVID-19 vaccine surveillance to the problem of masking effects.

Table 7 VAERS counts of masked associations

4 Discussion

The unprecedented dynamic and extent of reporting into VAERS for the novel class of COVID-19 vaccines may have created conditions that predispose commonly applied signal detection methodologies to the statistical issue known as masking. This in turn may limit our understanding of the risks associated with COVID-19 vaccines, as well as other vaccines and delay their identification.

Signal detection can be approached and accomplished in many ways. In this article, we consider a specific approach and application that is routinely applied by pharmacovigilance organizations, and whose purpose is to computationally explore large databases of reported AEs for statistical patterns that are indicative of new safety issues that warrant further attention. We term this application statistical signal detection and further distinguish two classes of methodologies, one based on 2 × 2 disproportionality analysis that is prone to masking, and a more advanced class of methods that can cope with masking. Methodologies currently deployed by pharmacovigilance organizations are to a large extent based on the former class of methods and, thus, prone to masking, a motivating reason for this investigation. To abbreviate our discussion, we will refer to this class of methods as the ‘standard’ methods.

To demonstrate such masking effects, trace their origins, and assess their impact, we center our investigation on seven AEs with various degrees of reported and statistical evidence that link them to the Pfizer-BioNTech and Moderna vaccines. Five of the AEs are largely recognized by various health authorities. The investigation enabled us to discover two potentially new AEs (herpes zoster and tinnitus), which are yet to be recognized by health authorities, but which have overwhelming statistical support in VAERS and are supported by published case reports and studies. These seven AEs were identified and selected for this investigation based on criteria to screen and rank masked associations described in the ‘Methods.’ We do not extrapolate and claim that masking is often prevalent because it was identified for these seven AEs; neither do we suggest that masking is limited to just these AEs. Rather, we argue that masking is an issue that is important and addressable, and an issue that can be impactful in situations such as COVID-19 vaccine safety surveillance and other emergency use authorization products.

In the investigation, we traced the evolution of signals related to the seven AEs during the course of the initial year of COVID-19 vaccination and the accompanying availability of COVID-19 vaccine AE reports made public in VAERS. This temporal evaluation led to several findings. We surmise that these findings are important not only for the COVID-19 vaccines currently approved and investigated in this article, but are also important for any new COVID-19 vaccines that might be approved in the future and, likewise, should also apply to any new vaccine (or drug) approved for use in the future.

The results show that statistical signals for AEs related to COVID-19, and possibly other vaccines, may go undetected or be delayed due to masking when generated by standard methodologies. The results also suggest that properly identifying and addressing the masking effect exposes strong statistical associations that would otherwise be deemed uninteresting. For example, the tinnitus and herpes zoster signals may have been overlooked partly due to the low signal scores produced for them by standard methodologies. Similarly, signals for the other five AEs may have been delayed by the same standard methodologies. As mentioned, safety surveillance and signal detection are not limited to statistical approaches, and fortunately, these other five AEs had already been well characterized by the FDA, CDC, and other sources.

We found that although the masking effect is rare relative to the entire set of possible associations between vaccines and AEs (representing 0.5% of the total number of unique associations), it is roughly eight times more likely to occur with COVID-19 vaccines than with other vaccines. As mentioned, this may be explained by the unique dynamic and extent of reporting into VAERS for the class of COVID-19 vaccines. Furthermore, the volume of reporting for COVID-19 vaccines is likely to influence future statistical associations with other new vaccines. This suggests that masking may become more frequent and should be carefully considered.

The results also demonstrate that masking is not a static effect but rather a dynamically changing and evolving effect in terms of its origins, direction, and strength. Naturally, this is due to the evolving nature of data. For example, we found that in earlier time periods, non-COVID-19 vaccines could mask signals associated with COVID-19 vaccines, whereas in later time periods, as more COVID-19 reports accumulate, the Pfizer-BioNTech and Moderna vaccines can mask each other and likely other vaccines. This suggests that the assessment of masking should be done on a continuum rather than be a point-in-time exercise and, more generally, that statistical signal detection is time sensitive. Relatedly, it appears that the VAERS data for COVID-19 vaccine surveillance are still evolving and susceptible to external influences, such as vaccination policies, publication influences, reporting practices, and updates to the MedDRA terminology. This in turn could contribute to signal score fluctuations, resulting in time-dependent signaling uncertainty.

Masking effects have been traditionally addressed by removing cases containing the ‘offending’ product, by using stratification, or by employing regression techniques. However, each of these approaches requires to some extent identifying masking sources prior to signaling, which may limit the utility of signal detection in scenarios where masking is present and where the goal is unconstrained hypothesis generation. This investigation was made possible by using a methodology that automatically identifies and adjusts masking effects. Its ability to correctly identify maskers was verified for three of the seven AEs we investigated (e.g., the smallpox vaccines masking COVID-19 for myocarditis) by using the traditional approach to address masking. That is, by re-applying standard signaling methodologies on data that excludes the maskers.

At a higher level, the results suggest that different signaling approaches may lead to drastically different results—a conclusion that is especially disconcerting in the context of COVID-19 surveillance. Unfortunately, in the absence of an ultimate benchmark, the question of which methodology to rely on is still in debate. Nonetheless, the findings highlight the utility of a more advanced class of signal detection methodologies based on regression. Given present-day computational power and recognized analytic approaches such as regression, there are few reasons to avoid the utilization of these approaches, at the very least to address acknowledged problems such as masking.

The mRNA Pfizer-BioNTech and Moderna vaccines have been demonstrated to be highly effective in preventing infection and severe illness from COVID-19. They also appear to have acceptable safety profiles, suggesting that the benefits of COVID-19 vaccination outweigh the potential risk of AEs. Consequently, AEs such as those highlighted in this article, which are also rare as far as we know, cannot be used to argue against vaccination. Moreover, statistical signal detection is inherently an exploratory hypothesis-generating process. Therefore, associations flagged by signaling approaches do not imply causal relationships and always warrant further scrutiny, including those named in this article. Notwithstanding, the strength of statistical signal detection (as an unconstrained hypothesis-generating process) lies in being fast and performed in near ‘real time.’ Analyses can be easily ‘tailored’ to a specific age group or gender, time frame, and product type. The method also has the advantage of casting a much wider net for AE reporting from millions or hundreds of millions of people and may identify rare AEs not seen in clinical trials. These advantages are critical in the ‘real time’ and the ‘real world’ environment of COVID-19 vaccine surveillance.