Arshad et al. [1] assessed whether the use of a sensitive method for signal generation and then a specific method for signal evaluation in real-world studies improves the performance of vaccine safety surveillance. They did so by comparing type I and type II errors of three approaches against vaccine exposures with negative control and simulated positive control outcomes on the same data. The approaches are the historical comparator method alone, the self-controlled case series (SCCS) method alone, and the two methods combined serially. The authors state that serially combining epidemiological designs does not improve overall signal detection in vaccine safety surveillance.

We support their conclusion for vaccine monitoring systems to consider the relative gains that could come from using one design alone and postulate what a proper surveillance methodology would be. For example, the SCCS method is not suitable for outcomes that have insidious onset and long latency. In prospective surveillance, SCCS analysis will be delayed as it requires additional data accrual following the postvaccination risk window.

We also would like to provide additional context to explain why a serial approach can still be beneficial. The serial design can be useful in reducing systematic error, but the authors used it in a way that differs from how this is typically applied in vaccine safety surveillance. Rather than simply retesting signals flagged in the initial hypothesis generation by using a different analytic design on the same database, the second stage of signal evaluation (‘hypothesis testing’) in vaccine safety studies typically involves chart review validation as a gold standard [2]. We understand such validation is resource intensive and very time consuming, but given the importance of vaccines to public health, most researchers have deemed it worthwhile to do so to minimize the risk of both type I and type II errors. We suggest the following actions to refine signals and make hypothesis testing timelier: check data quality; examine diagnostic code frequencies and check reliability against laboratory and/or diagnostic imaging datasets; control properly for confounding; and replicate studies using different source data [2].