Introduction

Measles is a highly infectious, acute systemic viral infection, estimated to cause over 100,000 deaths annually, despite widespread use of a safe and effective vaccine [1]. Between 2000 and 2020, an estimated 31.7 million deaths were averted because of measles vaccination and estimated global measles deaths declined by 94% [2]. In 2020, global coverage of the first dose of measles containing vaccine (MCV1) was estimated at 84% [3]. Coverage of a second measles-containing dose (MCV2) has accelerated in the last decade: as of 2020, 179 countries introduced MCV2 and global coverage was 70% [3, 4]. However, this level of coverage is inadequate to control measles, and progress has been stymied by persistent gaps in measles vaccination coverage, with wide variations within and across populations. Global cases resurged since 2016, with lapses in coverage contributing to high numbers of cases and deaths in 2018 and 2019 [5, 6]. In 2019, there were almost 870,000 cases and over 200,000 deaths – the greatest number of cases since 1996 [7, 8]. Since the COVID-19 pandemic, the measles vaccination coverage has declined and, as of 2021, 40 million children have missed a measles vaccine dose [9].

High quality vaccination programs routinely rely on two sources of data to identify measles outbreaks and populations at highest risk: 1) vaccination coverage monitoring; and 2) measles case surveillance. However, many countries lack high-quality vaccine coverage and/or disease incidence data. Serosurveillance for immunoglobulin G (IgG) antibodies to measles virus can account for waning vaccine-induced immunity, inaccurate recordkeeping, and immunity from natural infection, and is therefore potentially a more direct tool to identify susceptible populations and intervene prior to an outbreak [10]. Between 1996 and 2004, 17 European countries and Australia used serosurveillance to classify progress towards elimination status, including gaps in coverage and risk of localized outbreaks and epidemics [11]. In principle, serosurveillance, which allows the assessment of vaccine failure as well as infection, can also be used to assess the impact of vaccination programs, vaccine effectiveness, transmission dynamics, and predict risk of future epidemics [12].

A challenge of serosurveillance is finding a feasible, accurate, and high throughput assay to measure measles antibody level and estimate susceptibility in a population. The plaque reduction neutralization test (PRNT) is a functional antibody assay that measures the neutralization activity of measles antibodies regardless of isotype. A neutralizing antibody (NAb) is an antibody that defends a cell from a pathogen or infectious particle by neutralizing any effect it has biologically. Neutralization renders the particle no longer infectious or pathogenic [13]. Neutralization assays are considered the “gold standard” for determining protective immunity [12, 14,15,16]. A threshold of measles neutralizing antibody levels of 120 mIU/mL is often considered the correlate of protection although other thresholds, such as 200 mIU/mL, are used depending on which international reference sera was used to calibrate the assay and the objective of the test [17,18,19]. Quantitative values from PRNT show good correlation with immune status and predict protection against infection and disease [20]. However, using PRNT in large serological studies is impractical because it is technically demanding, expensive, conducted in a limited number of laboratories around the world, labor-intensive, time-consuming, and the procedures and interpretation of PRNTs are difficult to standardize between laboratories [20, 21]. Enzyme immunosorbent assays (EIA) are rapid, relatively inexpensive, higher throughput assays that can be performed in most laboratories with basic equipment using commercially available assays [22]. However, EIAs are not functional assays and measure IgG isotype-specific epitopes regardless of neutralization capacity [23]. Multiple studies have reported that EIA results are less sensitive than PRNT, especially in the context of low antibody levels [14, 21, 24,25,26,27]. This may lead to individuals being misclassified by EIA as susceptible to measles in populations with low antibody levels from vaccination as a result of immunological immaturity, interference by passively acquired maternal antibodies, or with waning antibody levels after prolonged periods since vaccination, especially in the absence of boosting from exposure to wild-type virus [22]. Uniquely for measles, minor reductions in EIA sensitivity can have substantial consequences for estimating population immunity due to its high herd immunity threshold, which could result in a misallocation of resources to increase vaccination coverage.

As the use of serosurveillance to evaluate population susceptibility to and seroprotection against measles increases, understanding the diagnostic accuracy of EIAs compared to the gold standard is critical to select an appropriate assay for the target population that can achieve the research or programmatic goals [28, 29]. Although direct comparisons of measles IgG EIA results with PRNTs have been periodically reported in the literature, such comparisons are often not the main objective of the analyses [30] and lack sufficient information about assays and procedures to assess the EIA validity. This systematic review was conducted to assess, characterize, and – to the extent possible – quantify the performance of measles IgG EIAs compared to PRNT.

Methods

We followed the PRISMA statement for a systematic literature search (Supplementary Table 1) [31, 32] and followed methods for conducting and reporting systematic reviews and meta-analyses recommended by the Cochrane Screening and Diagnostic Tests Methods Group (SDTM) [33].

Registration and protocol

We documented methods of the analysis and inclusion criteria in a protocol registered with PROSPERO (registration ID: CRD42020170464).

Eligibility criteria

We included serologic studies with participants of any age from the same source population that reported an index and reference test of measles antibodies using sera, whole blood, or plasma. The index test was any EIA (in-house or commercial, including single or multiple bead-based assays [MBA]) detecting measles virus IgG antibodies. The reference test for the primary analysis was PRNT. Studies that included neutralizing tests (NT) only as the reference test were included in the review but excluded from the primary analysis.

Information sources

We identified studies through PubMed and Embase electronic databases. The original search was conducted on 28 January 2020 and updated twice on 8 June 2020 and 25 August 2021. After full text screening, we attempted to acquire missing information on results from the primary investigator of studies of potential relevance.

Search strategy and selection criteria

The search strategies used terms such as “measles”, “measles vaccine”, “enzyme immunoassay”, “EIA”, “viral plaque assay”, and “PRNT”. Full PubMed and Embase search strategies are detailed in supplemental materials, S2. We included studies if the subjects were human, measured measles IgG antibodies using both an EIA and PRNT, and were published from 1946 to the most recent search (25 August 2021). The literature search was not limited by language and non-English studies were included if an English translation could be obtained. We excluded duplicate studies, basic science literature (e.g., vaccine development), conference abstracts, studies with no abstracts, reviews, and meta-analyses. In addition, we conducted snowball search strategies to identify relevant studies that may have been missed by our database searches, including reviewing the reference lists of included studies.

Study selection

We used Covidence Review Software [34] to maintain search results and conduct all screening processes. Two investigators independently assessed titles and abstracts for eligibility based on the PICOS criteria (Population = participants with and without previous measles infection from all settings, tested for measles virus IgG; Index test = EIA; Comparator = PRNT; Outcomes = EIA vs. PRNT performance, measured by sensitivity, specificity, positive predictive value, negative predictive value, c-statistic, R2, kappa, and/or percent agreement; Study design = immunologic studies). Two investigators then screened full-text studies for inclusion using the same criteria. We analyzed outcomes from the remaining relevant research studies. Disagreements between reviewers at all stages were resolved by consensus or involving a third investigator when consensus could not be reached.

Data abstraction

We developed a data abstraction tool using the Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015 guidelines [35] and guidance from similarly-focused reviews [36]. We pilot tested the abstraction tool on studies representative of different study designs and data quality and refined it accordingly. All authors commented on the abstraction tool and approved the final version. Four investigators abstracted data from included studies.

We abstracted the following information from each study: 1) study design and setting, e.g., country in which the study was conducted, age of the population, specimen type; 2) EIA results including qualitative and quantitative IgG antibody results, assay type (in-house, commercial); 3) PRNT results including qualitative result, antibody levels, methods for conversion to international units; 4) EIA performance compared to PRN, e.g., sensitivity, specificity, positive predictive value, negative predictive value. We used thresholds as reported in the papers. Each comparison from papers reporting more than one EIA vs. PRNT comparison (eg., multiple EIA or PRNT thresholds, multiple EIA kits, multiple age groups or populations etc.) was reported as separate results. After the data were abstracted, measles elimination status at the time of the study and time since elimination in elimination settings was determined using peer-reviewed and grey literature, based on country and year of specimen collection (or publication year if date of specimen collection was not reported). Elimination status included endemic (the existence of continuous indigenous or imported measles virus transmission that persists for ≥ 12 months in any defined geographical area), interruption (absence of endemic measles virus transmission in a defined geographical area for < 12 months), or elimination (the interruption of endemic measles transmission in a defined geographical area for ≥ 12 months in the presence of a well-performing surveillance system).

Assessment of methodological quality and data quality classifications

We classified studies as high, medium, or low quality in terms of the metrics reported and the reproducibility of study findings (Table 1).

Table 1 Data quality classification definitions for publication abstracted and included in analysis

Medium and low quality studies are described in Supplemental Tables 2A and 2B, but are not included in the main analysis. Papers were excluded from analysis if they did not report data relevant to the study objectives or did not classify the quality of these data. We also assessed the risk of bias for individual studies using a modified version of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) revised tool for Cochrane reviews [37].

Table 2 Descriptive characteristics of studies evaluating EIA assays compared to PRNT (high quality)

Data analysis

Measures of diagnostic accuracy with 95% confidence intervals (CI) were abstracted for each study result, where reported. Data were also abstracted to generate the four cell values of a two-by-two table, where available, and used to recalculate the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with 95% CIs for each index-reference test comparison. Recalculated metrics were used in the main analysis. If recalculated metrics were not available (e.g., medium quality studies in the supplementary materials), the reported measures were used to calculate metrics. Indeterminate or equivocal EIA results were handled in the same way as reported by authors in the study (i.e., excluded or treated as positives or negative) in the primary analysis. If the data reported by authors or methods for treating equivocals were unclear, authors were contacted for additional information or to verify calculations. If no information could be obtained from the authors, investigators came to a consensus regarding whether to include the study (n = 1). We conducted sensitivity analysis by reclassifying equivocal or indeterminant EIA results (e.g., treating as negative, as positive, or excluding from analysis).

Diagnostic accuracy measures were presented for high (main text) and medium (supplementary materials) quality studies only. Differences between studies was assessed by visual examination of forest plots using Stata/IC (version 16.1) [38]. The diagnostic accuracy measures for high quality comparisons that used the Enzynost kits were also presented in a hierarchical summary receiver operating characteristic (HSROC) curve, indicating pooled sensitivity and specificity with 95% confidence regions around the summary estimates. This was used to explain observed differences in accuracy between EIA kits.

We generated a QUADAS figure for all studies using R (version 3.6.1) (Supplementary Fig. 1). For studies with multiple groups (e.g., multiple age groups or multiple EIA kits), we reassigned QUADAS-2 assessments so that a single result was presented per domain for each study. This was done by following an algorithm that compared multiple results within each QUADAS-2 domain and assigned the worst rating as the final, overall assessment per study.

Results

Search results

A total of 549 results were identified through the literature searches after removing duplicates, of which 463 studies were excluded at title and abstract screening, and 41 were excluded at full-text review (Fig. 1). Of the 45 studies included for abstraction, ten were excluded after detailed assessment because a PRNT or comparable test was not used (n = 8) or relevant results were not reported (n = 2). One additional study was included through a snowball search. Thirty-six studies were included for review and 26 for analysis [19, 21, 25,26,27, 30, 39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68]. Thirteen were classified as high quality, 13 as medium quality, and 10 as low quality.

Fig. 1
figure 1

PRISMA flow diagram for database searches and study inclusion, Studies evaluating EIA assays compared to PRNT (high quality)

Characteristics of reviewed studies

For the following sections, characteristics described are not mutually exclusive (i.e., studies may have used more than one age group, specimen source, or EIA kit).

Study populations

Nine of thirteen high quality studies were conducted in high- or upper-middle-income countries (Brazil, Canada, China, England, Portugal, The Netherlands, United States, and United Kingdom), three in lower-middle- or low-income countries (Kenya, Malawi, and Uganda; Table 2), and one study analyzed specimens from both high and lower-middle income countries (United States, Tajikistan, and Bangladesh). Ten of 13 high quality studies used data from measles endemic settings, two from measles elimination settings, and one from a mix of endemic and elimination settings. Nearly all medium and low quality papers were conducted in high- or upper-middle-income countries.

The number of specimens ranged substantially from 43 to 2344 specimens per study (Table 2). Across all high quality studies, one study used specimens from adults, five from children (< 18 years), two from a mix of adults and children, and five did not report the age range. Five of the seven studies with pediatric specimens included children younger than 12 months of age. The original purpose of the analysis varied by study (e.g., diagnostic accuracy evaluation, serosurveillance, Table 2).

Types of EIA kits used

In high quality studies, ten commercial EIAs and two in-house EIAs were compared to PRNT (Table 3). Siemens Enzygnost/Dade Behring (“Enzygnost”) EIA was used most often (n = 14 results in 6 studies), followed by the VIDAS® (bioMerieux; “VIDAS”) assay (n = 3), and other in-house EIAs (n = 2) (Tables 3 and 4). MBAs were used in three studies: one commercial MBA and two studies using in-house MBAs. Medium- and low-quality studies used a wider variety of commercial EIAs as well as in-house EIAs and MBAs (Supplementary Table 2A and 2B).

Table 3 High quality studies diagnostic accuracy measures (Siemens Enzygnost/Dade Behring only)
Table 4 High quality studies diagnostic accuracy measures (non-Siemens Enzygnost/Dade Behring only)

Methodological quality assessment

Based on the QUADAS-2 tool assessment, we concluded there was no bias evident in any of the included studies to justify exclusion (Supplementary Fig. 1). Overall, the intent of the QUADAS-2 tool did not suit the objective of the present review [37] and we used a modified version for methodological quality assessments. However, challenges with applicability of the tool’s domains remained, including inability of reviewers to assess domains when study authors did not report needed information in the text.

Diagnostic accuracy of EIA assays compared to PRNT

The original intent of this review was to provide a quantitative pooled summary of sensitivity and specificity of EIA results compared to PRNT and evaluate hypothesized risk factors for variability in diagnostic accuracy such as assay type, thresholds used, age of study population, and measles elimination. However, there was an insufficient number of studies per category to identify generalizable patterns.

Since most high quality studies used Enzygnost, we assessed the sensitivity and specificity of this assay separately and generated pooled diagnostic accuracy estimates. The sensitivity of the Enzygnost EIA ranged from 66.3% to 100.0% with median (IQR) = 92.1 [82.3, 95.7] (Fig. 2A, Table 3A, Supplementary Table 3). Specificity ranged from 68.8% to 100.0% and median (IQR) = 96.9 [93.0, 100.0]. Confidence intervals on specificity were much wider compared to the sensitivity estimates. Seven comparisons reported sensitivities ≥ 90.0%, ten reported specificities ≥ 90.0%, and six reported both sensitivity and specificity of ≥ 90.0% (Fig. 2A and Supplementary Table 3). When high quality studies using the Enzygnost kit were combined in an HSROC curve, the pooled sensitivity and specificity were 91.6% (95%CI: 80.7, 96.6) and 96.0 (95%CI: 90.9, 98.3), respectively (Supplementary Fig. 2).

Fig. 2
figure 2

A Diagnostic accuracy of Siemens Enzygnost EIA kit compared to PRN reported in high quality studies. % sensitivity and specificity presented. CI, confidence interval. EIA, enzyme immunoassay. FN, false negatives. FP, false positives. NE, not estimable. NR, not reported. TN, true negatives. TP, true positives. *Study classified EIA equivocals as EIA negative. **Study excluded EIA equivocals. All other studies classified EIA equivocals as EIA positive. All studies used the Enzygnost EIA kit with a threshold of < 0.1 O.D (except Warrener where the threshold was not reported). All PRNT tests reported to used a threshold of ≥ 120 mIU/mL except Cohen 2008 which used a batch-specific thresholds and Tischer et al. which reported to use “40 ± 20mIU/mL”. We do not report any comparisons that used EIA thresholds from eg., 8mIU/mL. Cohen 2008 authors reported weighted estimates, unweighted estimates displayed. All papers tested samples by both index and reference tests except Cohen 2008 (both these all samples by PRNT and selected a subset of those samples for EIA testing).” B Diagnostic accuracy of non-Siemens Enzygnost EIA kits compared to PRN reported in high quality studies. % sensitivity and specificity presented. CI, confidence interval. EIA, enzyme immunoassay. FN, false negatives. FP, false positives. MBA, Multiplex bead assay. MIA, Multiplex immunoassay. MeV N, recombinant measles virus nucleoprotein. MeV WVAL, Laboratory-produced purified measles whole-virus antigen. MeV WVAc, Commercially produced whole-virus antigen. NE, not estimable. NR, not reported. TN, true negatives. TP, true positives. *Study classified EIA equivocals as EIA negative. **Study excluded EIA equivocals. ***How EIA equivocal were treated was not reported. All other studies classified EIA equivocals as EIA positive. 1)EIA threshold of < 40mIU/mL and PRNT threshold of ≥ 40mIU/mL 2) EIA threshold of < 100mIU/mL and PRNT threshold of ≥ 100mIU/mL. All samples were tested by both index and reference tests but the small number of PRN positive samples by EIA threshold of < 100mIU/mL and PRNT threshold of ≥ 100mIU/mL limited our ability to estimate sensitivity. All PRNT tests reported to used a threshold of ≥ 120 mIU/mL except Goncalves 1999 et al. (at birth age group) used a threshold of 40mIU/mL, Mao 2009 et al. used threshold 1:4 titer, Cohen 2006 et al. used a batch-specific threshold, Lee 1999 et al. used a threshold of > 200mIU/ml, Hatchette 2017 et al. used a threshold of > 192mIU/mL and deSouza 1991 et al. did not report a threshold. EIA equivocals were grouped differently depending on the study. All papers tested samples by index and reference tests except Fowlkes 2011(tested random subset of EIA tested samples by PRNT), Lee 1999 (both these all samples by PRNT and selected a subset of those samples for EIA testing). BioPlex 2200 MMRV IgG is reported as “BioPlex 2200””

The sensitivity of all other EIA kits across high quality studies ranged from 0% to 98.9% with median (IQR) = 90.6 [86.6, 95.2] (Fig. 2B, Table 4, Supplementary Table 3). The specificity of all other EIA kits across high quality studies ranged from 58.8% to 100.0% with median (IQR) = 100.0 [88.7, 100.0]. When studies with fewer than five PRNT seropositive individuals[48] were excluded (n = 1), the sensitivity of all other EIA kits ranged from 58.8% to 98.9% (Fig. 2B). Ten comparisons reported sensitivities ≥ 90.0%, fourteen reported specificities ≥ 90.0%, and six reported both sensitivity and specificity ≥ 90.0% (Fig. 2B and Supplementary Table 3). There were no observed differences in median sensitivity or specificity by study quality (Supplementary Table 3).

In addition to Enzygnost, the VIDAS, DiaSorin LIAISON®, and commercial MBA EIA kits were used in at least three comparisons, allowing general assessments of within-kit performance across studies without stratifying by quality classification (Supplementary Fig. 4 and Supplementary Table 3). The three sensitivity estimates for the Diasorin assay were overall slightly lower than Enzygnost, ranging from 87.2% to 90.2%, with variable specificity (75.0% to 100%). Sensitivity estimates from three of the four studies using the VIDAS assay were comparable to Enzygnost (87.2% to 90.6%) with less variability in specificity (86.4% to 100%). Although in-house EIAs are all different, each study except one reported high sensitivity (86.8% to 100%) and specificity (80% for one study, 100% for all others). Calculated sensitivity was more variable for MBAs compared to Enzygnost, VIDAS, and DiaSorin. Insufficient information on MBA assays was available to assess reasons for variability.

Sensitivity analysis when reclassifying equivocal results

For studies that provided sufficient information regarding how equivocal EIA results were classified, we regrouped equivocal results to assess how this classification affected the sensitivity and specificity. Five high quality studies and one medium quality study included sufficient information for reclassification (Supplementary Fig. 5), in which 2% to 30% of samples tested had equivocal results. Equivocal EIA results were analyzed in the primary analyses as they were reported in the original publication: three studies grouped equivocal results as positive, one grouped equivocal results as negative, one reported results treating equivocal results as both positive and negative, and one excluded equivocal results from the analysis. As expected, sensitivity metrics increased when EIA equivocal results were grouped with positives. Compared to when equivocal results were excluded or grouped with negatives, sensitivity increased by up to 35.3 percentage points when grouped with positives. Conversely, specificity metrics increased when EIA equivocal results were grouped with negative. Compared to when equivocal results were excluded or grouped with positives, specificity was increased up to 36.5 percentage points when grouped with negatives.

Discussion

Serosurveillance has the potential to be a powerful tool for informing vaccination program design and monitoring. Historically, cross-sectional seroprevalence studies have contributed to epidemiological understanding of poliomyelitis, rubella, and hepatitis A and B virus infections [69]. More recently, serosurveillance has been used globally as a method for monitoring population immunity against HIV [70, 71] and SARS-CoV-2 [72,73,74,75], and biomarkers are increasingly included in country-level household surveys such as the Demographic and Health Surveys (DHS) and the National Health and Nutrition Examination survey (NHANES) [76,77,78]. As countries continue to work towards measles elimination, if antibody prevalence can be accurately measured and subsequently correlates with immunity, then serosurveillance can contribute to monitoring progress, identifying gaps in population immunity and susceptible segments of the population, understanding reasons for apparent increases in incidence and resurgence of disease, and evaluating vaccine impact [12, 69, 79, 80]. This is particularly important after declines in routine immunization rates globally during the COVID pandemic. EIA assays are less resource-intensive and require less technical expertise than the current gold standard PRNTs and are widely used in laboratories around the world including low-income countries, and as such can be deployed on the larger scale. Given the relative ease of these assays, establishing their diagnostic accuracy is important for broader use in research and surveillance.

This analysis summarized and, to the extent possible, compared the diagnostic accuracy of measles IgG EIA assays to gold standard PRNT. Overall, the sensitivity of measles IgG antibody EIAs were moderate to excellent, but highly variable. Specificity tended to be lower and estimates were often imprecise due to the small number of seronegative individuals. With the exception of studies evaluating the Enzygnost EIA, there were an insufficient number of comparable studies to generalize on the diagnostic accuracy of other EIAs compared to PRNT. Studies were too diverse in terms of age groups, population characteristics (e.g., vaccination status, measles endemicity), EIA kits used, EIA and PRNT threshold(s) used, treatment of EIA equivocal results, and inclusion/exclusion of samples for testing to allow a meta-analysis or a more systematic analysis of factors associated with diagnostic accuracy. Measles vaccination status of the mother was not available for studies with young children. Furthermore, the lack of standardization of methods and reporting of results, even among studies that explicitly sought to assess diagnostic accuracy, limited our ability to make meaningful inferences regarding the performance of EIA kits.

Optimal diagnostic accuracy characteristics depend on the objective of the activity, risk of misclassification, and consequences and cost of the subsequent intervention. For example, diagnostic testing at the individual level (e.g., HIV testing) or case detection early in outbreaks settings aim to minimize false negatives by maximizing sensitivity. For seroprevalence studies, misclassification in either direction could result in important public health consequences and should be considered. The general priority for measles serosurveillance is to identify susceptible populations to assess progress toward elimination or trigger supplemental activities to fill immunity gaps. Assays that are inadequately specific could result in overestimates of population immunity, leaving susceptible individuals at risk and may result in large, unexpected measles outbreaks that could have been prevented. On the other hand, assays that are inadequately sensitive would underestimate population immunity, which could lead to unnecessary and costly supplementary immunization activities [30]. Hence, an important limitation to consider is that EIAs are designed for determining individual immunity. As such, they err on the side of high specificity (i.e., minimizing false positives, the lesser risk to the individual being to classify someone who is immune as susceptible, rather than classifying someone who is susceptible as immune) and may not be fit for purposes related to population-level serosurveillance. It would therefore be useful to better characterize acceptable diagnostic accuracy thresholds for EIAs when adapted for use in such contexts.

Our results revealed substantial variability in test performance of measles EIAs and may help to contextualize the results of recent large scale measles serosurveys in Laos [81], Bhutan [30], Zambia [82], Madagascar [83], Canada [41], and the Democratic Republic of Congo (DRC) [76, 84]. Results from some of these studies were unexpected and speak to the importance of validating the diagnostic accuracy of measles IgG antibody EIA kits. The serosurveys from Canada (2019) and Bhutan (2017) conducted validation testing via PRNT on a subset of EIA seronegative and equivocal results, which demonstrated a non-negligible proportion were positive by PRNT (33.3% and 10%, respectively). A serosurvey in Laos reported relatively low seroprevalence among children ages one to two years (48.6%), which was substantially lower than expected based on estimated vaccination coverage of 69% to 72% and was also lower than in persons aged 5–21 years (86.8%). Validation of a random sample of results, or a subset of seronegative results using PRNT would have helped to understand the discrepancies.

Evaluation and interpretation of measles EIA results can be complex, particularly in measles elimination settings where antibody levels are not boosted by exposure to wild-type measles virus. False negative results and over-estimation of population susceptibility are risks with EIAs in elimination settings. Average antibody concentrations are likely to have waned, but individuals could still mount an anamnestic antibody response if exposed to measles, which has important consequences for the working definition of “susceptible”. Twelve studies included in this analysis, most of which were population-based seroprevalences studies, attempted to better characterize EIA accuracy near the threshold by testing all or a sample of EIA negative, equivocal, or low-positive samples by PRNT.

An in-depth head-to-head analysis of commercially available measles IgG assays would help to build confidence in large-scale measles serosurveys and surveillance programs. A recent study conducted in the United States performed head-to-head comparisons of five commercial EIAs with PRN and found discrepant results for samples in the low-positive ranges of even the most sensitive EIAs [53]. This study was included in this review but was classified as medium-quality because information needed to generate two-by-two tables were not included. False negative EIA test results occurred in approximately 11% of sera, which generally had low levels of neutralizing antibody. The study demonstrated that lowering the PRN threshold (i.e., rather than the EIA threshold) from 120 to 40 mIU/mL increased specificity of EIA assays at the expense of sensitivity. Although there is debate on the 120 PRN correlate of protection, lowering the threshold to 40 mIU/mL is unlikely to be clinically meaningful [85].

In addition, systematic analyses of diagnostic accuracy among vaccinated populations, of varying ages, in elimination settings, where average antibody levels are generally low, would help to fill evidence gaps identified in this review. Alternatives to traditional EIAs, such as MBAs, have demonstrated excellent diagnostic accuracy and analytic sensitivity for other disease-specific antibodies and are promising for measles serosurveillance [86, 87]. However, limitations in access to multiplex machines, availability of commercially available regents with measles antigens, and cost limit their used in low- and lower-middle income settings. Promising microneutralization assays may also overcome challenges of evaluating functional responses in surveillance settings [88, 89].

Strengths and limitations

This review included studies conducted between 1984 and 2020, over which time diagnostic and analytic methods have changed, limiting conclusions we can draw about EIAs in contemporary use. For example, Enzgynost EIA was the most common assay used in included studies, but was recently discontinued [90]. The EUROIMMUN Anti-Measles Virus IgG ELISA is used frequently in seroprevalence studies at present [91, 92], but was not assessed in any studies returned from the searches and therefore not included in this review.

This review contributes to the existing literature on EIA and PRNT diagnostic accuracy for the identification of measles IgG and is the first to systematically review their comparative test performance. It identified critical gaps regarding systematic reporting and use of standardized methodologies. The literature search was not limited by language but translated full-texts were not available for three publications and may have limited the analysis.

Conclusions

To expand the utility of measles serological surveillance, robust, feasible, high-throughput, and accurate assays are needed to identify susceptible and protection populations. Evidence on the diagnostic accuracy of currently available measles IgG EIAs is variable, insufficient, and may not be fit for purpose for serosurveillance goals. Additional studies evaluating the diagnostic accuracy of measles EIAs, including MBAs, should be conducted among diverse populations and settings (e.g., vaccination status, elimination/endemic status, age groups). Analyses of serosurveys would be strengthened if PRNT validation were conducted on a random subsample or on samples near the EIA threshold.