Serological diagnostics of Lyme borreliosis: comparison of assays in twelve clinical laboratories in Northern Europe

Lyme borreliosis (LB), caused by spirochetes belonging to the Borrelia burgdorferi sensu lato complex, is the most common tick-borne infection in Europe. Laboratory diagnosis of LB is mainly based on the patients’ medical history, clinical signs and symptoms in combination with detection of Borrelia-specific antibodies where indirect enzyme-linked-immunosorbent assay (ELISA) is the most widely used technique. The objective of the study was to evaluate and compare the diagnostic accuracy (sensitivities and specificities) of serological tests that are currently in use for diagnosis of LB in clinical laboratories in Northern Europe, by use of a large serum panel. The panel consisted of 195 serum samples from well-characterized and classified patients under investigation for clinically suspected LB (n = 59) including patients with Lyme neuroborreliosis, Lyme arthritis, acrodermatitis chronica atrophicans, erythema migrans or other diseases (n = 112). A total of 201 serum samples from healthy blood donors were also included. The panel (396 serum samples altogether) was sent to 12 clinical laboratories (using five different ELISA methods) as blinded for group affiliation and the laboratories were asked to perform serological analysis according to their routine procedure. The results from the study demonstrated high diagnostic concordance between the laboratories using the same diagnostic assay and lower diagnostic concordance between laboratories using different diagnostic assays. For IgG, the results were in general rather homogenous and showed an average sensitivity of 88% (range 85–91%) compared to IgM which showed lower average sensitivity of 59% (range 50–67%) and more heterogeneous results between assays and laboratories.


Introduction
Lyme borreliosis (LB) is the most common ticktransmitted disease in Europe and is caused by spirochetes belonging to the Borrelia burgdorferi sensu lato (s.l.) complex [1]. The annual incidence varies from 1/100,000 to > 100/100,000 inhabitants in different countries in Europe [2,3]. Clinical manifestations of L B i n c l u d e e r y t h e m a m i g r a n s ( E M ) , Ly m e neuroborreliosis (LNB), acrodermatitis chronica atrophicans (ACA) and Lyme arthritis (LA) [4]. Diagnosis of LB, except for EM which is considered as a clinical diagnosis, is based on the presence of typical symptoms and signs, the patients' medical history in combination with laboratory evidence of borrelia infection. In clinical practice, serological detection of Borrelia-specific antibodies by enzyme-linked immunosorbent assay (ELISA) is widely used, sometimes supplemented by immunoblot in order to increase the specificity and the positive predictive value [5]. However, this two-tiered testing approach is expensive, timeconsuming and laborious and may not be necessary with modern ELISAs that are based on synthetic or recombinant antigens [6]. Modern ELISA methods have high analytical sensitivity and specificity, besides being inexpensive and easy to perform [5]. However, there are some limitations in clinical interpretation due to biological aspects that need to be taken into consideration. For instance, the natural delay in antibody response in relation to onset of symptoms in LB may influence the diagnostic sensitivity in early LB [7], and possible IgM cross-reactivity between antigens of pathogens within the same genus, but also in different genera, may lead to false-positive results [8,9]. The long-term persistence of antibodies after a Borrelia infection and the high seroprevalence in the healthy populations in endemic areas may also have impact on the clinical diagnostic specificity, since it can be complicated to distinguish an active from a previous Borrelia infection [10][11][12]. An investigation from 2011, based on a survey alone, summarized the different methods used at 43 laboratories in Sweden, Norway, Denmark and Finland [13]. The survey showed differences regarding methods/combinations of methods, strategies (one-step or two-step), choice of assays and cut-off values between laboratories and countries. This study, together with many other studies evaluating and surveying the diagnostic assays for serological testing, is a good example showing the lack of uniformed methods used for detection of LB and the need of further development of recommendations for interpretation and reporting in order to achieve more consistent laboratory diagnostics of LB in Europe. Data to support the twostep strategy in a European clinical setting is ambiguous [6,13,14]. The objective of this present study was to evaluate and compare the diagnostic accuracy (sensitivities and specificities) of several serological ELISA methods that are currently in use for LB diagnosis in clinical laboratories in Northern Europe (including Sweden, Norway, Denmark and the Åland Islands, Finland), by using a large and well-characterized panel of sera from patients and controls.

Study design
A cross-sectional study design was used to create a panel of serum samples representative for patients referred to specialist clinics for suspected LB. The study panel contained 396 serum samples, including 195 serum samples from clinically and laboratory well-characterized patients (> 18 years of age) under investigation for clinically suspected LB in Jönköping County, in the municipality of Kristiansand and on the Åland Islands and 201 blood donors. All patient samples were prospectively included in the study and then retrospectively classified based on the patients' medical records.

Participants
Serum and cerebrospinal fluid (CSF) were collected when the patients were referred to the specialist clinics for investigation of suspected LB manifestations and the samples were analysed according to the local standard procedure, used at the respective laboratory at the hospitals recruiting the patients, including both CSF cell count, detection of Borreliaspecific antibodies and calculation of intrathecal antibody index (AI). The serological assays used at the three recruiting hospitals were IDEIA Lyme Neuroborreliosis test (Oxoid, Hampshire, UK) and Enzygnost Borrelia Lyme IgM/IgG (Siemens/DADE Behring, Marburg, Germany) in Sweden, Enzygnost Borrelia Lyme IgM/IgG (Siemens/DADE Behring) in Norway and Immunogenics® C6 LYME ELISA ™kit, IgM/IgG (Immunetics, Inc., Boston, MA) and RecomWell Borrelia IgM/IgG (Mikrogen, Neuried, Germany) on the Åland Islands. Manifestations like LA and ACA were confirmed by Borrelia-specific PCR in addition to the serological testing, while the diagnosis of EM was solely based on the physician's clinical assessment [2]. All serum samples were taken before treatment and only one sample per patient was included. The blood donors had stated that they were healthy and a health declaration was completed before the blood donation.
Based on both laboratory results and by review of medical charts, the patients were retrospectively classified into four groups, (1) LB patients with manifestations including definite LNB, LA, EM or ACA (n = 59), (2) patients with other diseases (n = 112), (3) blood donors (n = 201) and (4) suspected LB (n = 24) (Fig. 1). The latter group presented with symptoms and signs that did not fulfill the criteria for any of the LB groups and they were referred for evaluation at the specialist clinics because of their seropositivity. However, they were not included in the statistical analysis since the patients in this group were difficult to evaluate and classification was uncertain. A flow chart demonstrating the inclusion and classification process is shown in Fig. 1. The criteria for classification are shown in Table 1 and age at time for inclusion and sex for all four groups together with the major clinical symptoms and signs from patients with other diseases, not classified as LB patients, are shown in Table 2.

Test methods
The study involved 12 clinical laboratories (referred to as laboratory 1-12) located in Sweden (n = 6), Norway (n = 4), Denmark (n = 1) and the Åland Islands, Finland (n = 1) using commercial borrelia serology assays quite representative for clinical laboratories in these countries. The study panel, consisting of frozen serum samples, was sent on dry ice and blinded for group affiliation to the laboratories, but also blinded to the coordinating laboratory, which was the Laboratory of Clinical Microbiology, Laboratory Medicine, Region Jönköping County, Sweden (CMLJ). The participating laboratories were asked to analyse the samples according to their routine procedure and the results together with the * the samples is not included in the statistical analysis due to uncertain diagnosis The samples is not included in the statistical analysis due to uncertain diagnosis method descriptions were reported to the CMLJ for compilation. The serum samples in the panel were analysed according to the laboratories' diagnostic routine procedure. All laboratories based their primary diagnostics on ELISA, and assays from five different manufacturers were used. Participating laboratories, diagnostic assays, abbreviations for the different diagnostic assays used in this manuscript, manufacturers, reference intervals and cut-offs together with units are presented in Table 3. In the qualitative comparison, the cutoff value from each laboratory was used to establish results as positive or negative. In the quantitative comparison, the cut-off values were not taken into consideration. The serological results were reported as positive, borderline or negative. Borderline results were regarded as positive in the statistical analyses.

Data analysis
The R statistic software [16] was used for statistical analysis and graphics. The receiver operating characteristic curve (ROC) analyses were performed using R statistic software, package pROC and mada. The Rpackage mada is a tool implementing the so-called "Reitsma" method for the meta-analysis of bivariate diagnostic accuracy [17]. For bivariate comparison of sensitivities and specificities, the sROC approach was used. Results outside the 95% confidence regions for fits were Possible disseminated LB e (n = 9) Patients with symptoms not explained by any other disease and with significantly elevated and rising IgG antibody titer in serum. In some cases also intrathecal antibody production in CSF, but no pleocytosis nor increased levels of CXCL13 (<20 pg / mL) in CSF. Probable (visited an endemic area) or observed tick-bite. Good response to antibiotic treatment (amoxicillin or doxycycline).

Previous infection (n = 8)
Patients with intrathecal antibody production in CSF but without signs of pleocytosis or increased levels of CXCL13 in CSF (>20 pg / mL). The patients received no antibiotic treatment Not LP (n = 7) Patients not classified due to lack of lumbar puncture and CSF analysis a Total cell count ≥ 5 × 10 6 /L in CSF b Classified in accordance with European guidelines [2,15] c Classified in accordance with European guidelines [2] d All but one of the samples, recruited in Jönköping, was from patients located on the Åland islands e Patients not classified as LNB, ACA, LA, EM, lymphocytoma or carditis LB = Lyme Borreliosis, LNB = Lyme neuroborreliosis, CSF = cerebrospinal fluid, LA = Lyme arthritis, ACA = Acrodermatitis chronica atrophicans, EM = Erythema migrans, CNS = central nervous system, TBE = tick-borne encephalitis, Dissem. LB = disseminated Lyme borreliosis, LP = lumbar punctured considered statistically significant. The statistical comparison of area under curve (AUC) used the command roc.test with the default "Delong" algorithm [18]. The assessed results in the qualitative comparison (positive, borderline or negative) have been established by the participating laboratories, while the quantitative comparison is based on the numerical values reported for each sample from each laboratory.

Qualitative comparison
Qualitative comparison within the diagnostic assays The twelve laboratories used five different assays, where three of them were used at more than one laboratory and will be compared in this section ( Table 3). The results in this section are presented as range of positive results (rpr), representing the range between the highest and the lowest number of positive results within a diagnostic assay. The IgM assay showed a heterogenic picture with low correlation both within assays and between assays while the IgG assays showed a more homogeneous picture with high correlation. The rpr in the IgG asays are as follows: (1) (Fig. 2b) was reported from laboratory 6 (in both blood donors and patients with other diseases). This laboratory had adjusted their cut-off value from < 1.0 U/mL, given by the manufacturer, to < 2.0 U/mL in order to decrease the number of false-positive results. This resulted in higher specificity but also lower sensitivity compared to laboratories 7 and 8. Finally, the C6 ELISA showed high correlation between laboratories 11 and 12, and few (7 respective 6) borderline results were reported.

Qualitative comparison between the diagnostic assays for IgG and IgM
Using bivariate analysis of the sensitivity and specificity, the rate of positive results among patients with LB in all five IgM assays used at the 12 laboratories was low, with an average sensitivity of 59% (range 50-67%) (Fig. 3a, y-axis), with a large heterogeneity from 40 to 80% (Fig. 3a, y-axis). The positive rate among blood donors for IgM was in average 9.5% (range 7-14%) and among patients with other diseases 28.6% (range 23-35%) in average corresponding to the x-axis in Fig. 3a.
Four out of five IgG assays (Fig. 3b) showed high average sensitivity of 88% (range 85-91%) among patients with LB (Fig. 3b, y-axis). The positive rate among blood donors for IgG was in average 22% (range 20-24%) and among patients with other diseases 68% (range 35-71%) in average corresponding to the x-axis in Fig. 3b. The laboratory using  EuroImmun IgG had a lower positive rate compared to the other laboratories, 76% (range 64-85%) among patients with LB (Fig. 3b, y-axis), and a slightly higher positive rate for blood donors and patients with other diseases (Fig. 3b, x-axis), 20% (range 14-26%) and 60% (range 50-68%), respectively (Fig. 3b, x-axis). The choice of control group has large influence on the apparent clinical diagnostic specificity, with a high average positive rate 60% (range 50-68%) in patients with other diseases (Fig. 3b, x-axis). The C6 ELISA showed a slightly higher positive rate both among patients with other diseases, 75% (range 66-82%) in average (Fig. 3b, x-axis) and blood donors, 26% (range 21-33%) in average (Fig. 3b, xaxis). However, the positive rate among patients with LB was comparable to the rest of the diagnostic assays (average    Fig. 3b, y-axis)). The seroprevalences among blood donors in the study ranged from 20 to 27% (Fig. 3b, x-axis) between the five diagnostic assays, with a slightly higher number for the C6 ELISA and a slightly lower number for laboratories 7 and 8, using the Enzygnost IgG assay and for laboratory 1 using the Liaison IgG assay (data not shown). Overall, the homogeneity among the different assays is higher for IgG compared to IgM, and by using the test for separate comparison of the sensitivities and specificities, the p values for differences regarding IgG are not significant (p > 0.20) but highly significant for IgM (p < 0.001).

Quantitative comparison
Quantitative comparison within the diagnostic assays In order to assess the analytical technical performance of the laboratories using the same assay, a quantitative comparison was performed using the numerical values of each sample obtained at each laboratory. Pairwise comparison between the laboratories using the same diagnostic test was established. Figure 4a illustrates a representative example of comparison between laboratories using the same assay, in this case laboratories 1 and 2 for the Liaison IgG assay. The intra-assay correlation displayed good agreement along the diagonal of equal values (Fig. 4a), except for some of the really high values where laboratory 1 tended to have lower results. In this case, no samples around the cut-off were reclassified as positive or negative. All correlation curves for the three assays used at more than one laboratory showed curves comparable to the one shown in Fig. 4a (data not shown). In the Bland-Altman plot (Fig. 4b), the very high and low values are excluded and the horizontal broken line corresponds to the diagonal line in Fig. 4a. The results in Fig. 4b show that the measurement error within the quantitative range of the instrument is lower for laboratory 1, within a 95% range around the equality line of one from 0.74 to 1.16. This inter-assay variation is highly acceptable (and impressive) with a coefficient of variation of 12% and 95% of the values within ± 20% of the mean. The remaining IgG assays and the IgM assays show similar high correlation within the assays (data not shown).

Quantitative comparison between the diagnostic assays
A ROC curve analysis was performed for all laboratories comparing patients with LB to patients with other diseases and to blood donors. The ROC curve analysis showed that the EuroImmun assay has a relatively low specificity compared to the other assays and that the AUCs are significantly lower when the blood donors are used as negative controls in contrast for both IgM and IgG compared to the Liaison IgG assay (p = 0.006) (Fig. 5a, b). For IgM analysis, the Enzygnost IgM assay has the highest AUC compared to the Liaison IgM assay (p = 0.001) (Fig. 5a+c). When comparing the patients with LB to the patients with other diseases, the AUCs are lower and more similar, except for the Enzygnost IgM assay which still performs better (p = 0.008) (Fig. 5c). The EuroImmun IgG assay has a low AUC of 0.65 (range 0.56-0.74) which is significantly lower than the Liaison IgG assay with a AUC of 0.71 (range 0.63-0.79) (p < 0.01) (Fig. 5b). The RecomWell IgM/IgG assay is in line with the Liaison IgM/IgG assay using blood donors as controls (Fig. 5a, b) as well as IgM using patients with other disease as controls (Fig. 5c). However, the RecomWell IgG assay has lower AUC of 0.42 (range 0.33-0.50) compared to the Liaison IgG assay of 0.71 (range 0.63-0.79) (p < 0.0001) using patients with other disease as controls (Fig. 5d). The low  (Fig.  5d). These figures also support that the same assays gave the same results in different laboratories.

Discussion
This study is an attempt to determine if there is any significant variability between and within the diagnostic assays currently in use at clinical laboratories in Northern Europe by using a large and well-characterized panel of serum samples from patients and controls. The results show high intra-assay correlation between the laboratories using the same diagnostic assay (especially for IgG) and lower correlation between laboratories using different diagnostic assays. Both the intra-and inter-assay comparison showed more specific results with high compliance for IgG, and lower for IgM. Interestingly, we found an increased seroprevalence among blood donors compared to previous studies from the same geographical areas [11]. The results within the IgM assays showed more heterogeneity compared to the IgG assays, not only between the with other diseases (group 2) for IgM and IgG, respectively. Black = Liaison assays, red = Enzygnost assays, green = EuroImmun assay, blue = RecomWell assay and light blue = C6 ELISA assays different diagnostic assays, but also between laboratories using the same diagnostic assays. This may suggest reproducibility problems in the IgM assays, but is more likely a result of the different cut-off values used in the different diagnostic assays or the fact that some laboratories [1,[7][8][9][10] have numerous samples (> 20 samples) with borderline results, which in this study were classified as positive in the statistical analyses. The low specificity for the IgM assays was expected since it is well known that IgM antibodies are less mature and specific than IgG antibodies, and false-positive IgM reactions due to crossreactivity are difficult to overcome. In everyday practice of clinical microbiology, IgM interpretation may indeed be challenging and should be performed cautiously.
The IgG assays showed high concordance and more homogeneous results both between and within assays. The EuroImmun IgG assay showed a slightly lower sensitivity with no major gain in specificity compared to the other assays. This is in line with previous studies showing a higher sensitivity and specificity for assays based on recombinant antigens compared to whole cell lysate [19]. However, a previous study showed the opposite results with superior sensitivity and negative predictive value in negative tests in combination with low specificity and positive predictive value [20]. Both this study and the one by Kodym et al. 2018 include a low number of laboratories, which makes it hard to draw any firm conclusions. However, Kodym et al. [21] suggest that the EuroImmun IgM/IgG assay may serve as a screening test to be used together with a confirming immunoblot. Overall, the serological methods showed high concordance and comparable sensitivity and specificity regarding IgG both within and between assays, while the IgM assays showed more heterogenic and less sensitive results. This implicates that if laboratories were to analyse only Borrelia-specific IgG in serum, patients and clinicians were to receive more or less the same test result irrespective of which laboratory that performed the analysis. However, IgM results differ considerably more between laboratories and methods, and our data suggest that IgM testing in serum does not really add any diagnostic value to IgG testing in suspected LB cases, since the sensitivity for IgM is lower (with the possible exception of the Enzygnost Borrelia IgM assay) and results in loss of specificity. Also, the positive rate of IgM in sera from patients with other diseases is higher than among blood donors, illustrating the well-known risk of false-positive reactivities [14]. Taken together, IgM testing in serum samples is a diagnostic tool that is difficult to handle correctly and its value in clinical diagnostics of LB may be questioned. It is important to keep in mind, though, that in this study, we have not included CSF samples or samples from children, and the clinical value of Borrelia-specific intrathecal IgM index or IgM testing in pediatric sera cannot be assessed here.
Commercial assays are marketed using different antigens or combinations of antigens. Comparison of diagnostic assays with different antigens will show less analytical correlation. This is consistent with biology since the reactivity to different antigens or antigen combinations will statistically have conditional independence. Antibodies develop differently in different individuals and different assays detect different antibodies, which may result in both strong and weak correlation when no antibody development is measured in one assay and a high reactivity is measured in another assay. It has been shown in reports concerning external quality assurance, round robins where a smaller number of samples are tested in many laboratories to result in some variation between the laboratories [20]. The Enzygnost IgG assay uses a mix of whole cell detergent extract and recombinant VlsE from the three main B. burgdorferi s.l. species pathogenic to humans, whereas the Liaison IgG assay, according to kit insert, is based solely on recombinant VlsE from B. garinii (PBi). If this is correct, the sensitivity for the Liaison IgG assay may be lower in samples from Northern Europe where B. afzelii is the most prevalent infecting genospecies. However, a previous study [22] evaluating a recombinant Borrelia line immunoblot assay displayed the highest sensitivity for the recombinant VlsE of B. garinii (PBi) for both IgM and IgG detection. The study also showed that the most sensitive antigen for IgG in all LB stages, especially in early manifestations like EM and acute LNB, is VlsE followed by DbpA and p58 while VlsE of B. afzelii (PKo) reacted poorly with samples from patients with ACA and LA (late manifestations). The poor reactivity for LA might be explained by the rare observation of B. burdorferi sensu stricto (s.s.) in ticks and patients from Northern Europe [22]. However, cross reactivity of VlsE between different species may occur and different species are more likely to cause certain clinical signs and symptoms, e.g., B. garinii has been associated with more distinct symptoms and more pronounced intrathecal inflammation in LNB while B. afzelii, in Europe, is often associated with skin manifestations like EM and ACA [23,24]. In Europe, there are at least five different species that are known to be pathogenic to humans [25]. A previous study has shown a higher specificity for the Enzygnost assay in both IgM and IgG compared to the Liaison assay [26], which is in line with our findings, especially for IgM indicating that recombinant VlsE antigens obtained from all three B. burgdorferi genospecies pathogenic to humans improved the diagnostic sensitivity with sustained specificity of LB. Most of the diagnostic assays in this study include VlsE as antigen. VlsE epitopes provoke an early antibody response, which is not detectable in ELISAs prepared from whole-cell sonicates of cultured B. burgdorferi bacteria, since the VlsE antigen is not expressed by the bacteria in vitro [27]. This present study shows that there is no gain in sensitivity, except for the Enzygnost IgM assay, analyzing the samples with both IgM and IgG if VlsE is used as antigen in the IgG test. The use of IgM may instead result in specificity problems. However, if excluding IgM testing is considered, the serodiagnostic IgG assay should include either shared antigens or antigens from the different pathogenic species.
The Recomwell IgG assay follows a principle of using a panel of several recombinant antigens in the same ELISA assay (p100, OspC, VlsE, p18), with a purpose of increasing the the sensitivity. However, in this study, there was a lower specificity without noticable gain in sensitivity for this assay and the RecomWell IgG assay has low screening value (AUC < 0.50) in consecutive patients with other diseases. The low specificity of the Recomwell IgG assay implicates that if used, it would be advisable to use a second confirmatory assay like another ELISA or an immunoblot.
When using blood donors as controls, awareness of seropositivity rate in the local population is crucial as this may be used as a pointer in clinical interpretation of the results, especially in patients with typical symptoms. However, it is of less importance if the specificity of the assay is high [13]. The seroprevalence in blood donors in this study is in agreement with previous studies done in Kalmar [11,28], a region closely located to Jönköping County, indicating an increase in seroprevalence in the healthy population over the years. It is known that a high seroprevalence for both IgM and IgG can be found in a healthy population in Borrelia endemic areas which is in line with the results in this study.
This study included two control groups, blood donors and patients with other diseases. The results show that healthy blood donors consistently lead to higher specificity than controls with other diseases. The high seropositivity among patients with other diseases is caused by the fact that patients under investigation for symptoms that could be attributed to a tick-borne infection were referred to the specialized centers for further investigation (e.g., a lumbar puncture) partly due to their seropositivity and that the presence of antibodies in this case does not prove the occurrence of an active infection or disease, since antibodies, especially IgG, may persist for 10-20 years at least [29]. However, it cannot be excluded that some of the positive results reflect on-going LB, but in the referral center, seropositivity in serum is of little diagnostic value. In a systematic review by Leeflang et al. [6], it is recommended that "Future diagnostic accuracy studies should be prospectively planned cross-sectional studies, done in settings where the test will be used in practice". This study follows these recommendations using patients referred for suspected LB, later classified as patients with other diseases. Thus, a future prospectively planned cross-sectional study should collect samples in the flow of patients at the time of the first suspicion of LB, not after referral of the patient. This is, however, hardly feasible to carry out for practical reasons, as recruitment should be done in primary care involving a large number of clinics.
We are aware that exclusion of the patient group with suspected LB (n = 24) in the statistical analyses may have resulted changed estimated test performances. But we would not know if they should be included in the patient group or the control group. Another drawback in the study population is that it did not include children. Examining diagnostic performance of the assays in paediatric patients would have been of interest, since the seroprevalence may differ from adults and children often present with neurological symptoms early in the course of LNB, when antibody production is low and hard to detect and laboratory diagnosis therefore remain uncertain [15,30] .

Conclusions
The IgG detection kits showed comparable results with small variations both within and between assays, an average sensitivity 88% (range 85-91%) compared to IgM which showed lower sensitivities of 59% (range 50-67%), while the intraand inter-assay results for IgM were more heterogeneous. Our findings support that separate IgM testing in serum from adult patients gives no added diagnostic value in LB diagnostics, especially in highly endemic areas, when modern IgG assays containing VlsE antigens from several of the main pathogenic species are used. However, the study showed that the Enzygnost IgM assay had a higher sensitivity and specificity which is of interest particularly in diagnosis of young children. The more suitable control group in this study consisted of samples from blood donors, since the high seropositivity rate found in patients assessed not to have LB, indicated that many of the patients investigated for suspected LNB were referred for further investigation partly due to the seropositivity found in serum, thus resulting in a study selection bias.