Background

Hepatitis C is a liver disease caused by the hepatitis C virus (HCV) that causes acute and chronic infection [1, 2]. An estimated 71 million people had chronic hepatitis C infection worldwide in 2015 [3]. Viral hepatitis caused 1.34 million deaths in 2015, a number comparable to deaths caused by tuberculosis and higher than those caused by HIV [3]. The introduction of direct-acting antivirals (DAAs) has led to a sustained virological response (SVR) in greater than 90% of treated individuals [4, 5]. DAAs are now recommended by the World Health Organization (WHO) [1] and many other HCV treatment guidelines [1]. DAAs will not only improve SVR rates but also may simplify HCV management algorithms and allow smaller health facilities to manage HCV-infected individuals [6]. Despite the availability of effective treatment, most HCV-infected individuals remain undiagnosed and untreated [7]. Left untreated, approximately 15–30% of individuals with chronic HCV infection progress to cirrhosis, leading to end-stage liver disease and hepatocellular carcinoma [1, 2].

In February 2016 the WHO updated the guidelines for the screening, care, and treatment of persons with chronic hepatitis C infection [1]. These guidelines included recommendations on whom to screen for HCV and how to confirm HCV infection, but not which tests are optimal for initial screening. Advances in HCV detection technology create new opportunities for enhancing screening, referral, and treatment. Previous systematic reviews on HCV infection have focused on treatment response [8, 9], clinical complications [10], and epidemiology [11, 12]. Two previous systematic reviews on hepatitis C testing have focused on evaluating point-of-care tests compared to EIAs and other reference tests [13, 14]. We have undertaken a further systematic review and meta-analysis to generate pooled sensitivity and specificity of rapid diagnostic tests used to detect HCV antibody (HCV Ab), and to inform the development of recommendations on serological testing in the 2017 WHO testing guidelines [15].

Methods

Research question

The main purpose of the review was to assess the diagnostic accuracy of available assays for detecting HCV Ab in persons identified for hepatitis C testing. The research question was structured in a PICO format (ie. population, intervention, comparisons and outcome).

P: Persons identified for HCV testing; I: Rapid diagnostic tests and enzyme immunoassays for HCV Ab detection; C: 1), EIA (with a subanalysis based on the last 10 years); 2), NAT (nucleic acid testing); 3), Immunoblot or similar assay; 4), A combination of 1,2,3 above; O: Diagnostic accuracy [Sensitivity (SE), Specificity (SP), Positive predictive value (PPV), Negative predictive value (NPV), True Negative, True Positive (TP), False negative (FN), and False positive (FP)].

Search strategy and identification of studies

Search strategies were developed by a medical librarian with expertise in designing systematic review searches. Our search algorithm consisted of the following components: hepatitis C, diagnostic tests, and diagnostic accuracy. We searched MEDLINE (OVID interface, 1946 onwards), EMBASE (OVID interface, 1947 onwards), the Cochrane Central Register of Controlled Trials (Wiley interface, current issue), Science Citation Index Expanded (Web of Science interface, 1970 onwards), Conference Proceedings Citation Index-Science (Web of Science interface, 1990 onwards), SCOPUS (1960 onwards), Literatura Latino-Americana e do Caribe em Ciências da Saúde (LILACS) (BIREME interface) and WHO Global Index Medicus. The search was supplemented by searching for ongoing studies in WHO’s International Clinical Trials Registry. The literature search was limited to English language and human subjects that available until April 30th, 2015. In addition to searching databases, we contacted individual researchers and authors of major trials to address whether any relevant manuscripts are in preparation or in press. The references of published articles found in the above databases were searched for additional pertinent materials.

Study selection proceeded in three stages: 1) titles/abstracts were screened by a single reviewer according to standard inclusion and exclusion criteria; 2) full manuscripts were obtained and evaluated by two independent reviewers to include or not; 3) two independent reviewers extracted all data. Differences were resolved by a third independent reviewer.

Selection criteria

The inclusion criteria included the following: primary purpose was HCV Ab test evaluation, reported sensitivity and specificity of HCV Ab test kits, and studies published before May 2015. We included observational and randomised control trial (RCT) studies that provided original data from patient specimens. Studies that only reported sensitivity or specificity, conference abstracts, comments or review papers, panel studies, or those that only used reference assays for positive samples were excluded. In this manuscript, a hepatitis panel refers to a laboratory series test in which use the blood with confirmed hepatitis C serostatus to assess the accuracy of a testing kit.

Data extraction

Information on the following variables were extracted from each individual study: first author, total sample size, country (and city) of sampling, sample type (oral fluid, finger prick, venous blood), point-of-care (POC, defined as being able to give a result within 60 min and having the results to guide clinical management in the same encounter), eligibility criteria, reference standard, manufacturer, raw cell numbers (true positives, false negatives, false positives, true negatives), antibody-antigen combo (yes or no), sources of funding, reported conflict of interest, and study population (general population, high risk population and hospitalized population). The high risk population groups include men who have sex with men, sex workers and their clients, transgender people, people who inject drugs and prisoners and other incarcerated people [16]. The hospitalized population was defined as those admitted to a hospital for medical care or observation. We also verified whether assays evaluated in the studies were currently on the market (as of June 1st, 2017), and if this was the case, we also reported the available version of the testing kit (Table 1).

Table 1 Characteristics of studies focused on evaluating diagnostic accuracy of HCV antibody tests

Assessment of methodological quality

Study quality was evaluated using the QUADAS-2 tool [17] and the STARD checklist [18]. QUADAS includes domains to evaluate bias in the following categories: risk of bias (patient selection, index test, reference standard, flow, and timing); applicability concerns (patient selection, index test, reference standard). The STARD checklist consists of a checklist of 25 items and flow diagram that authors can use to ensure that all relevant information is present.

Data analysis and synthesis

Data synthesis

Data were extracted to construct 2 × 2 tables. By comparing with reference standard results, the index test results were categorized as a true positive, a false positive, a false negative, or a true negative. Indeterminate test results were not included in pooled analyses.

Statistical analysis

To estimate test accuracy, we calculated sensitivity and specificity for each study and pooled statistics, along with 95% confidence intervals [19]. We pooled test estimates using the DerSimonian-Laird method, a bivariate random effect model. We did further subanalyses based on reference standard (EIA alone; NAT or immunoblot; EIA, NAT, or immunoblot), brand, sample type, and combination test. We performed all statistical analysis (including heterogeneity, through Q test) using the software R and RevMan 5.3.

Results

Study selection

A total of 11,163 citations were identified, and 6163 duplicates were removed. Each of the 5000 unique citations was examined. A total of 52 research studies were included in the final analysis (Fig. 1) [8, 16, 19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68]. Of the 52 studies, 32 studies evaluated the accuracy of 30 different rapid diagnostic tests (RDTs) [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50], of which 5 evaluated RDTs compared to EIA alone [25, 26, 31, 34, 49], 13 compared RDT results to NAT or immunoblot [19,20,21,22, 27, 29, 32, 37, 42, 43, 45, 47, 50], and 14 focused on evaluating RDT by comparing with the results of EIA or immunoblot or NAT [23,24,25,26, 30, 34, 35, 38, 39, 41, 44, 48, 49, 51]. Eleven studies evaluated the diagnostic accuracy of oral fluid RDTs [22, 24, 27, 29, 33, 34, 43,44,45, 47, 52].

Fig. 1
figure 1

PRISMA flow diagram outlining study selection examining the diagnostic accuracy of HCV antibody tests

There were insufficient data to undertake a subanalysis based on HIV co-infection or other co-infections.

Study characteristics

Of the 52 included studies, nine were published before 2000 [37, 38, 42, 53,54,55,56,57,58], 12 studies reported evaluation using oral fluid samples, and 34 studies evaluated POC tests. Of the 52 studies, 41 different brands of testing kits were evaluated (Table 1).

Assessment of the quality of the studies

All studies used a cross-sectional or case–control design. The risk of bias in patient selection, index test, or reference standard was assessed using QUADAS-2 (Table 2). Among the included studies, 25 had at least one category that was considered high risk [19, 22, 25,26,27,28, 30, 31, 34, 36,37,38,39, 41, 45,46,47,48,49,50, 53, 55, 56, 58,59,60,61,62]. The risk of bias in patient selection usually came from a poor description of patient selection and clinical scenario. Bias in the index test was primarily due to a lack of reported blinding while reading test results. Bias in the reference standard was due to the use of multiple reference standards (EIA, NAT, and/or immunoblot). Bias in the flow and timing was primarily due to a lack of reported details.

Table 2 Quality assessment by QUADAS-2 of the included studies

Diagnostic accuracy

Overall clinical performance of assays

The 52 included studies contributed 127 data points from 52,273 unique test measurements. Some studies contributed additional data points by comparing the accuracy of two or more tests, reporting data from multiple study sites, or reporting the accuracy of a test in more than one type of specimen. The sample sizes of the included studies ranged from 37 to 17,894. Sensitivities of included studies ranged from 22 to 100%, and specificities ranged from 77 to 100%. The overall pooled sensitivity and specificity for all tests were 97% (95% CI: 97%–98%) and 99% (95% CI: 98%-99%) respectively. Figure 2 shows estimates of sensitivity and specificity from each study.

Fig. 2
figure 2

Sensitivity and specificity of HCV Ab tests included in the review (n = 52)

Manufacturers and accuracy of RDTs among included studies

Overall, 32 studies evaluated the accuracy of 30 different RDTs (Table 3). The most commonly evaluated test kit was the OraQuick ADVANCE® from OraSure Technologies.

Table 3 Manufacturers and accuracy of RDTs among included studies

Pooled test accuracy for RDT versus EIA alone

Overall, five studies evaluated RDTs compared to the EIA alone, with a total sample of 15,943. Of the five studies, sample sizes ranged from 197 to 2754, sensitivities ranged from 83 to 100%, and specificities ranged from 99 to 100%. The pooled sensitivity and specificity were 98% (95% CI 98%-100%) and 100% (95% CI 100%-100%), respectively, while heterogeneity was observed in the included studies (P < 0.001) (Table 3, Additional file 1).

For the three studies that were conducted within the last 10 years [25, 49, 51], the total sample size was 12,992, with pooled sensitivity and specificity of 99% (95%CI 99%-100%) and 100% (95%CI 100%-100%), respectively.

RDT accuracy compared to NAT or immunoblot

Overall, 13 studies evaluated RDTs compared to NAT or immunoblot [19,20,21,22, 27, 29, 32, 37, 42, 43, 45, 47, 50], with a total sample of 7083. Among these studies, sample sizes ranged from 36 to 549, sensitivities ranged from 76 to 100%, and specificities ranged from 77% to 100%. The pooled sensitivity and specificity were 93% (95% CI 91%-95%) and 98% (95% CI 98%- 99%), respectively, while heterogeneity was observed in the included studies (P < 0.001) (Table 3, Additional file 2).

RDT test accuracy compared to EIA, NAT or Immunoblot

Overall, 14 studies evaluated RDTs by referencing to EIA with NAT and/or immunoblot [25, 26, 31, 33,34,35, 38, 39, 41, 45, 48, 49], with a total sample of 42,212. Of the 14 studies, sample sizes ranged from 168 to 2754, sensitivities ranged from 29 to 100%, and specificities ranged from 90 to 100%. The pooled sensitivity and specificity were 97% (95% CI 96% -98%) and 100% (95% CI 100%-100%), respectively, while heterogeneity was observed in the included studies (P < 0.001) (Table 3, Additional file 3).

Pooled test accuracy for oral versus blood samples

EIAs using oral fluid samples

Overall, 11 studies compared the accuracy of EIAs using oral fluid samples to a blood sample as a reference, with a total sample size of 12,370 [22, 24, 27, 29, 33, 34, 43,44,45, 47, 52]. Of the 12 studies, sample sizes ranged from 37 to 2176, sensitivities ranged from 72 to 100%, and specificities ranged from 91 to 100%. The pooled sensitivity and specificity were 94% (95% CI 93%-96%) and 100% (95% CI 99%-100%), respectively. Heterogeneity was observed in the included studies (P < 0.001) (Table 3, Additional file 4).

Blood samples

Overall, 47 studies used blood samples for evaluations, with a total sample of 90,008. Sample sizes ranged from 37 to 17,894, sensitivities ranged from 29 to 100%, and specificities ranged from 18 to 100%. The pooled sensitivity and specificity were 98% (95% CI 97%-98%) and 98% (95% CI 98%- 98%), respectively. Heterogeneity was observed in the included studies (P < 0.001) (Table 3, Fig. 3).

Fig. 3
figure 3

Pooled HCV Ab test accuracy for blood samples (n = 47 studies)

Pooled test accuracy for OraQuick versus other brands on oral kits

OraQuick

Overall, eight studies reported sensitivity and specificity of OraQuick (OraSure Technologies, PA, USA), with a total sample of 9024 [22, 24, 27, 33,34,35, 43, 45]. The sample size of these studies ranged from 172 to 2183, sensitivities ranged from 90% to 100%, and specificities ranged from 95% to 100%. The pooled sensitivity and specificity were 98% (95% CI 97%-99%) and 100% (95% CI 90%-100%), respectively. Heterogeneity was observed in the included studies (P < 0.001) (Table 3, Additional file 5).

Overall, six studies reported sensitivity and specificity for other three brands of oral kits [29, 43,44,45, 47, 52], with a total sample of 6652. The sample size of these studies ranged from 37 to 1081, sensitivities ranged from 72 to 100%, and specificities ranged from 91 to 100%. The pooled sensitivity and specificity were 88% (95% CI 84%-92%) and 99% (95% CI 99%- 100%), respectively, while heterogeneity was observed between the included studies (P < 0.001) (Table 3, Additional file 6).

Other findings

Our study further found that the overall sensitivity and specificity of studies conducted among general populations were 95% (95% CI 94%-96%) and 99% (95% CI 98%-99%), among high risk populations were 97% (95% CI 96%-98%) and 94% (95% CI 94%-95%), and among hospital patients were 97% (95% CI 96%-98%) and 100% (95% CI 100%-100%), respectively. The overall sensitivity and specificity of the antibody and antigen combo test were 86% (95% CI 79%-99%) and 99% (95% CI: 98%-100%).

GRADE approach (Grading of Recommendations, Assessment, Development and Evaluation to assessing overall quality of evidence

GRADE for RDT versus EIA

HCV Ab RDTs showed comparable sensitivity and specificity compared to that of EIAs. Among the five studies that evaluated RDTs versus EIA, 15,943 of samples were evaluated, and moderate risk of bias was observed (Table 4), but there was a consistent high level of specificity. Since the unit of the analysis varied among studies (Table 4), indirectness was observed. In addition, the overall strength of the pooled evaluation was moderate, with pooled sensitivity and specificity of 99% (95% CI 98%-100%) and 100% (95% CI 100%-100%), respectively. Under the pre-test probability of 5%, the post-test probability after a positive test result is 97%, and the post-test probability after a negative test result is 100%.

Table 4 Pooled test accuracy for different testing strategies (n = 52 studies)*

GRADE for oral RDT versus blood reference

The use of oral RDTs HCV Ab had comparable sensitivity and specificity compared to blood reference standards (Additional file 7). For the 12 studies evaluated oral RDT versus blood reference, 14,547 samples were evaluated. A moderate risk of bias was observed. Inconsistency was present for sensitivity, as the sensitivities of the included studies varied. But there was a consistent high level of specificity. Since the unit of the analysis varied with each other among the included studies (Table 4), indirectness was observed for included studies. In addition, the overall strength of the pooled evaluation was moderate, with pooled sensitivity and specificity of 94% (95% CI 93%-96%) and 100% (95% CI 100%-100%), respectively. Assuming a pre-test probability of 5%, the post-test probability after a positive test result was 94%, and the post-test probability after a negative test result was 100%.

Discussion

There is a global need to expand HCV diagnostic testing. In this meta-analysis, we found HCV Ab RDTs, including those using oral fluid, showed a high overall sensitivity and specificity compared to laboratory-based EIAs. This extends the literature by including several new studies that were not included in prior reviews, including a sub-analysis that focused on use of RDTs with oral fluid. In addition, the evidence collected from this review was used to inform recommendations in the 2017 WHO guidelines on testing for hepatitis B and C [15]. The evidence for generally high levels of diagnostic accuracy across most brands from this systematic review and meta-analysis supported a strong recommendation for the use of HCV RDTs in WHO testing guidelines [15].

Our data suggest that RDTs can be used for HCV Ab detection in a wide range of clinical settings. For example, for all the included studies, 17 were conducted among general populations, 20 were among high risk populations, and 17 were among hospitalized patients (two studies included two kinds of populations). High HCV Ab RDTs sensitivity and specificity were observed across multiple different populations (including general population, high risk populations, and hospital patients), which is consistent with previous systematic reviews [13, 14, 63]. The use of an EIA to detect HCV Ab followed by NAT to confirm active infection is standard practice for diagnosis of HCV infection and recommended by the US Centers for Disease Control and Prevention and the WHO [64, 65]. However, despite these recommendations, HCV Ab EIA assays have not been widely used because of the complexity of laboratory-based assays, long turnaround time, high cost and requirements for specialized apparatus and trained technicians [13]. To overcome these barriers, RDTs for HCV Ab screening were developed [66]. They obviate the need for multiple follow-up appointments, shorten wait times, and allow for the simplification and decentralization of testing (Additional file 8). However, it is essential for policymakers, government officials, and health care practitioners engaged in HCV screening, care, and treatment to be aware that the performance of individual RDTs for detection of HCV Ab vary widely. Individual diagnostic accuracy for specific brands should be examined to ensure acceptable performance.

Our data suggest that oral fluid RDTs have high sensitivity and specificity. This is consistent with other literature [67]. Tests that can be used with non-invasive samples allow testing to be decentralized further and can be used in outreach settings [68]. Our data suggest that oral tests have a slightly lower pooled sensitivity (94%, 95%CI: 93%-96%) compared to blood-based tests (98%, 95% CI: 97%-98%) but comparable specificity. Oral HCV Ab RDTs tests may be particularly useful in contexts where venepuncture may be difficult, such as subsets of people who inject drugs which have difficult veins to access.

With the increasing availability of DAAs, countries are seeking testing kits with high sensitivity and specificity, in order to allow them to scale up HCV Ab screening, especially among at-risk populations. The advantages and disadvantages of EIAs and RDTs are well established [15]. Performance, cost, and accessibility need to be considered. Determining which tests to deploy at which level of the health care system and for what settings require policy makers to consider the different attributes of laboratory-based EIA versus blood-based or oral RDTs. Potential trade-offs include slightly lower accuracy for greater uptake and acceptability of testing, provision of test results, and linkage to care. Each country needs to decide on which trade-offs or compromises are acceptable, based not only on disease prevalence and the health care infrastructure but also on technical, socioeconomic, cultural, behavioral considerations. For example, they need to be clear on whether it is acceptable to buy Test X which is 10% less accurate than Test Y but is considerably cheaper so that many more people can be tested. In addition, although oral RDTs are less accurate than blood-based RDTs, it may be that oral RDTs will be more acceptable for outreach testing and accessing at-risk populations and allow the control programs to identify more HCV cases. In a low prevalence setting, even a test with 98% specificity can yield more false positive than true positive results. All these trade-offs can be modeled to give an estimate of the cost-effectiveness and potential impact of different strategies for HCV Ab screening.

Our review also underlines some of the common methodological problems encountered in evaluating diagnostic accuracy. Cross-sectional or case–control designs were used by all 52 included studies, introducing a potential risk of bias. These studies used a broad range of reference standards, which makes the pooled performance data less meaningful. Within the evaluation of diagnostic accuracy, even cross-sectional studies in patients with diagnostic uncertainty and direct comparison of test results with an appropriate reference standard can be considered high quality [69]. The majority of the included studies used convenience sampling. In this review, we excluded panel studies because they are not based on clinical settings and our purpose was to generate data that would be relevant in clinical settings as part of detection of HCV Ab.

Most studies that reported HIV or HBV co-infection only reported the test performance of the kits among all samples, instead of disaggregated diagnostic accuracy. There were insufficient data from two studies to undertake a subanalysis based on HIV co-infection. It may be important for policymakers to know the diagnostic accuracy of HCV Ab tests among individuals with co-infections, particularly HIV co-infection [70], and this requires further research among co-infected individuals.

Our study is subject to several limitations. First, we included studies conducted among the general population, hospital patients, and high risk populations. Diagnostic performance can be influenced by disease prevalence and HCV prevalence is variable among these different populations [71, 72]. Second, we detected substantial heterogeneity that could influence our confidence in the review findings [73], but addressed this problem through a series of sub-group stratified analyses. Third, about 20 brands of RDT kits were used in the included studies, and their performance varies considerably. This limited our ability to summarize the accuracy of different brands, with the exception of comparing OraQuick to other brands. Another concern is publication bias, as studies with poor test performance may be less likely to be published, leading to distorted estimates of accuracy [74]. Fourth, since not all HCV RDTs can be performed from oral fluid/capillary whole blood (some require plasma/serum), and some of them require a cold chain for storage and transport, the direct comparison between EIA and RDTs in this meta-analysis would be less meaningful. Fifth, we should note that not all test kits are still on the market and that versions of the tests included in this meta-analysis may have since changed. Finally, statistical heterogeneity was present. But is common in meta-analyses of diagnostic studies. Additional research is important for understanding why the tests perform more poorly in certain populations or settings.

Conclusion

RDTs, including oral tests, have excellent sensitivity and specificity compared to laboratory-based methods for HCV antibody detection across a wide range of settings. National policymakers should consider the performance, cost and accessibility of RDTs into consideration, when selecting assays for use in their national testing algorithms.