Background

The incidence of nosocomial infections (NI) amongst intensive care unit (ICU) patients is 2–5 times that of general admissions [1]. Amongst the most prevalent and threatening ICU NIs is ventilator-associated pneumonia (VAP), which may develop in patients receiving invasive mechanically ventilated (MV) for ≥ 48 h [2,3,4,5,6]. VAP has a cumulative incidence of 10–45%, and an attributable risk of 5–27% [7,8,9,10,11,12]. VAP-associated comorbidities include prolonged duration of MV, delayed MV weaning, increased antibiotic consumption, prolonged ICU and hospital length-of-stay (LOS), increased treatment-related expenditures, and increased crude and attributed mortality with recent studies reappraising the impact of VAP on mortality to be 10% [2,3,4,5,6, 13,14,15,16,17]. Accordingly, VAP prevention has emerged as a high priority. As such, one component of the Institute for Healthcare Improvement’s recommended ventilator bundle is the accurate diagnosis and determination of VAP incidence [18,19,20]. However, the optimal VAP diagnostic strategy remains contentious. Research in this field is limited by the lack of a consensus ‘gold standard’ definition against which to test the diagnostic accuracy of new diagnostic algorithms or methods of detection. VAP diagnosis remains challenging as clinical signs and symptoms may be non-specific, with clinical diagnosis being overly sensitive (leading to increased antibiotic use), and histopathology (ante- or post-mortem within 96 h of death) being limited in availability, consistency, standardization and reliability [21,22,23]. Moreover, quantitative respiratory cultures have been found to correlate poorly with histopathology [22, 24].

As none of the available diagnostic tests, performed alone, can provide an accurate diagnosis of VAP, a diagnostic strategy incorporating several criteria has been viewed by many to be a good compromise. To this end, great effort has been expended to generate standardized diagnostic algorithms that incorporate clinical, radiographic and microbiological data. Some examples (Table 1) include: Centers for Disease Control and Prevention’s National Healthcare Safety Network (CDC/NHSN) [25], Clinical Pulmonary Infection Score (CPIS) [26], Hospital in Europe Link for Infection Control through Surveillance (HELICS) [27], Johanson criteria [28], and others [29, 30]. As compared to immediate post-mortem lung biopsies, clinical criteria have reasonable diagnostic performance but may be highly impacted by the diagnostic thresholds used, and the lack of a uniform reference diagnostic standard has contributed to variable diagnostic performance (Table 2) and made inter-study comparisons difficult [31]. A highly performing VAP diagnostic method is greatly needed, but international guidelines disagree on the use of clinical algorithms for risk stratification to determine treatment [32, 33]. Data comparing algorithm performance head-to-head is lacking, and as most such data stems from high-income countries. Great need exists for head-to-head comparisons, as well as data from low-to-middle income countries to supplement the international data pool. To this end, a prospective non-randomized study was conducted to determine if in patients with VAP, does application of the CDC/NHSN, CPIS, or Johanson criteria provide the greatest diagnostic performance characteristics as compared to HELICS as the reference standard.

Table 1 Ventilator-associated pneumonia diagnostic algorithms utilized in this study
Table 2 Performance characteristics of ventilator-associated pneumonia diagnostic algorithms

Methods

A prospective observational cohort study was performed in three mixed medical-surgical ICUs from one academic medical center from 1 October 2016 to 30 April 2018. The study was approved by the Investigational Review Board at Hamadan University of Medical Sciences, Hamadan, Iran (IR.UMSAHA.REC.1395.23). All study parts were reviewed according to the Strengthening the Reporting of Observational Studies in Epidemiology ‘STROBE’ guideline [34]. Written consent was required and covered both study participation and publication of de-identified aggregate findings. Surrogate consent from the patient’s legal guardian or designated health proxy was permitted in cases where the subject lacked decision-making capacity. All patients that survived and regained their faculties were informed of the project. All data generated or analyzed during this study are included in this article. De-identified individual subject data may be available from the corresponding author on reasonable request.

Patients were eligible for study participation if: (1) age ≥ 18 years, (2) admitted to the ICU > 48 h, (3) receiving invasive MV > 48 h (any mode except high frequency percussive ventilation or high frequency oscillatory ventilation), (4) full-code status, and (5) informed consent obtained from the patient, legal guardian or healthcare surrogate upon ICU admission (prior to intubation). Patients with any limitation of code status including (but not limited to) No Code, Do Not Resuscitate, or Do Not Intubate were excluded (Fig. 1). Patients with known pregnancy were excluded.

Fig. 1
figure 1

Patient flow diagram

Patient selection was performed by an enrollment team of two physicians (1 critical care, 1 infectious disease) not directly involved in the study. All consecutive patients identified at the participating ICUs with VAP according to the HELICS criteria were eligible. Each case patient was matched by the enrollment team, which was blinded to the outcome, with another ICU patient that did not have VAP. Matching was based on: (1) admission indication; (2) ICU LOS ≥ 48 h; (3) receiving invasive MV > 48 h (any mode except high frequency percussive ventilation or high frequency oscillatory ventilation as these preclude proper calculation of the CBC/NHSN criteria); (4) severity of illness at ICU admission as quantified by the Acute Physiology and Chronic Health Evaluation (APACHE) II score > 15, (5) full code status, and (6) age ≥ 18 years.

VAP diagnosis was made independently by the treating clinical team. Diagnostic criteria were according to HELICS criteria [27] in accordance with the institutional standard and other published studies [2, 35,36,37,38] as it is the definition currently used in much of Europe, Australia, and the near- and middle east (including Iran). Chest radiograph interpretation was undertaken “off-line” by a team of 3 physicians (1 radiology, 1 critical care, 1 pulmonology) who were independent of the treating team. Kendal agreement coefficient between the clinicians in chest radiograph interpretation was 0.99. Procalcitonin was measured at the time of initial VAP suspicion. A single value was used, and thresholds were in accordance with prior published studies [39].

Specimen collection and processing

Protected tracheal aspirate (TA) samples were obtained through a sterile 12 French catheter (SUPA Medical Devices, Tehran, Iran). This catheter is placed in the trachea by advancing through the endotracheal tube until resistance was encountered (level of the carina) and retracted approximately 2 cm. To obtain TA samples, 5–10 mL of sterile saline was instilled followed by aspiration into a sterile syringe. This generally yielded an aspirate of 2-3 cc. The samples were then transferred to the microbiology laboratory for processing and examination within 30 min. The materials were evaluated by gram-stain and quantitative cultures. Light microscopy was utilized to assess gram stains for bacteria and white blood cells. The samples were vortexed for one minute at 3,000 rpm, diluted with saline to 1:10 ratio, and 0.01 cc inoculated onto blood agar, chocolate agar, and MacConkey agar plates. Cultures were incubated at 35 ± 1ºC for 24, followed by quantitative bacterial evaluation. The cut-off values for bacterial colony counts were taken as ≥ 105 colony forming units (CFU)/cc. When more than one bacteria type was identified, a separate colony count was performed for each. Microbial identification and antimicrobial susceptibility testing were performed using the automated Vitek® 2 Advanced Expert System (bioMérieux, Marcy-l'Étoile, France).

The criteria for sample rejection were: (1) improperly labeled specimens, (2) specimens with transport times exceeding study standards, (3) clotted specimens, (4) specimens not submitted in an appropriate transport container, (5) insufficient volume, or (6) external contamination. If an unacceptable specimen was received, the treatment team was notified, and another specimen was requested.

Data collection

Screening, data collection and reporting was undertaken by a trained, dedicated full-time nurse. The data collection tool was a two-part checklist including demographic variables, clinical and microbiological variables. The tool was developed during two 90-min meetings by a consensus multidisciplinary panel consisting of 17 physicians representing critical care (n = 5), anesthesia (n = 3), pulmonology (n = 5), internal medicine (n = 3), and forensic medicine (n = 1), and 10 critical care nurses. The Quantitative face validity was determined using Impact Score (2.5–4.5), and quantitative content validity was determined via 27 panelists. The measured content validity ratio and content validity index were 0.51 and 0.89 respectively. The internal validity of the questionnaire was determined by Cronbach's alpha coefficient to be 0.91.

Statistics

Statistical analyses were performed using IBM® SPSS version 22.0 (IBM Corp, Armonk, USA). Data were summarized using mean ± standard deviation (SD) for quantitative variables and frequency (%) for qualitative variables. Study size was determined by a prior sample size calculation. Considering a VAP prevalence of 0.5, 95% confidence interval level, 80% power, and absolute error 10%, the necessary sample size was calculated to be 85 patients.

Normally distributed variables were compared using the Student’s t-test. Categorical variables were compared using Chi-square (χ2) test or Fisher's exact test when appropriate. Trend of change in distribution of relative frequencies between ordinal data were compared using χ2 test for trend. The Youden index (or Youden’s J Statistic) was calculated as: J = sensitivity + specificity – 1.

Results

One-hundred twenty-nine patients were screened, and 85 were included in the final analysis (Fig. 1). The mean age was 46.94 ± 18.90 years with a male predominance (72.9%). Measures of illness severity and hospital course metrics are listed in Table 3. Positive tracheal culture was seen in 81.2% with cultures yielding Acinetobacter (37.6%), Staphylococcus aureus (22.4%), Escherichia coli (14.1%), Pseudomonas (10.6%), Klebsiella (10.6%), and Proteus (3.5%). Multiple drug resistant (MDR) organisms were identified in 36.5% of isolates. The sensitivity and specificity of the tested algorithms are presented in Table 4. Of note, the sensitivity for positive TAC with the serum procalcitonin level > 0.5 ng/ml was 51.8%, lower than each of the algorithms assessed. The highest Youden index, a measure of diagnostic accuracy, was seen with CPIS (Table 4).

Table 3 Patient demographic and clinical information
Table 4 Sensitivity, specificity, and Youden index for assessed methods of ventilator-associated pneumonia diagnosis compared to the HELICS criteria as the reference standard

The Kappa agreement coefficient results between each diagnostic algorithm and either serum procalcitonin level or positive TAC is highlighted in Table 5. The greatest correlation between positive VAP assessment and serum procalcitonin levels > 0.5 ng/ml was observed with the Johanson method and CPIS (both roughly 70%).

Table 5 Correlation of serum procalcitonin and tracheal aspirate results with ventilator-associated pneumonia diagnostic algorithms

As stated previously, CPIS correlated most closely with the HELICS standard. However, when comparing the three tested algorithms, CPIS displayed near perfect agreement with the much simpler and historical Johanson criteria, whereas CDC/NHSN showed only slight agreement with either of the other algorithms (Table 6). Moreover, CPIS correlated most closely with traditional clinical markers for pneumonia (Table 7).

Table 6 Kappa agreement coefficient among ventilator-associated pneumonia diagnostic methods
Table 7 Correlation of individual variables with ventilator-associated pneumonia diagnostic methods

Discussion

Suspicion and clinical criteria continue to serve as the foundation for VAP diagnosis, however the criteria used to diagnose VAP vary widely, impacting reports of incidence and outcomes. Historically, VAP diagnosis has been based on 2 or 3 components: (1) systemic signs of infection, (2) new or worsening infiltrates seen on chest imaging, and (3) microbiologic evidence of pulmonary parenchymal infection when available [40]. However, the false positive rate is high for clinical symptoms (e.g. fever [42%]), purulent airway secretions (67%), and chest roentenograms [41, 42]. Moreover, combining these criteria does little to improve diagnostic performance [43], and the use of histopathology and microbiology alone carries considerable limitations [21,22,23,24, 40].

Numerous diagnostic algorithms have been proposed to standardize the diagnosis, allow for easier identification, and improve inter-study comparability. Patient characteristics in our cohort were largely similar to those of other published cohorts, including age [9, 17, 44,45,46,47,48], male gender predominance [9, 45, 49,50,51,52,53], APACHE II score [45,46,47,48, 51, 52, 54], MV duration [49, 52, 54,55,56], re-intubation rates [9, 52, 57], ICU LOS [47,48,49,50, 52, 53, 55], and hospital LOS [47, 50, 52, 55]. In particular, the ICU LOS and mortality were similar to other published VAP cohorts in Iran [53, 58, 59]. Moreover, the array of cultured and MDR pathogens, was consistent with prior studies [51].

A direct comparison of the correlation and diagnostic performance of the VAP algorithms is important for both individual patient care and epidemiology, cross-study comparisons, and meta-analyses. If algorithms have suboptimal sensitivity, specificity, or do not correlate well, subsequent meta-analyses and epidemiologic investigations will be flawed from inception. Direct comparisons of the performance characteristics of the CDC/NHSN, CPIS, HELICS, and the historical Johanson criteria have not previously been reported. Moreover, only two studies were identified that compared VAP diagnostic algorithms [31, 60]. HELICS was chosen as the reference standard due to its wide international and regional use (Europe, Australia, Near- and Middle East [including Iran]), and as it has been used as the reference standard for numerous other studies [2, 35,36,37,38, 61]. CDC/NHSN and CPIS criteria were chosen as the other two most widely recognized and used criteria (especially in North America). The Johanson criteria was selected as the third comparator for its historical significance. The sensitivity of the CPIS and Johanson methods was moderate, whereas CDC/NHSN was poor. Moreover, the diagnostic agreement was substantial for CPIS, moderate for Johanson, and only slight for CDC/NHSN (Table 5). Algorithm accuracy was improved by adding serum procalcitonin > 0.5 ng/ml, however, similar to prior reports, the addition of microbiological data to the clinical definitions did not significantly improve the sensitivity or specificity [40].

These findings suggest that combining cohorts based on HELICS and CPIS may be reasonable for meta-analysis or population studies, but the same may not be true for studies based on CDC/NHSN criteria as the diagnostic agreement is poor. Moreover, it is recommended that studies report serum procalcitonin values to better refine their data sets to optimize data utility as diagnostic algorithms evolve to best facilitate future meta-analyses and as procalcitonin may correlate with mortality [62]. Lastly, this data highlights how little progress these complicated VAP diagnostic algorithms have made beyond that of the historical and simple Johanson criteria. These algorithms will most certainly undergo modification, and it is important that investigators clearly define their patient populations and present the data in a way that allows the data to inform future decisions as the diagnostic techniques evolve.

Limitations

The non-randomized methodology and absence of histopathology confirmation of VAP diagnosis are limitations of this study. This study was performed in a resource-limited setting in a low-to-middle income country (LMIC) and limiting the study cohort to those with ante- or post-mortem histology would have introduced selection bias and served as a barrier for subject recruitment.

The use of TAC specimens is a minor limitation as positive quantitative TAC’s have been reported to have a high degree of correlation with broncho-alveolar lavage in VAP patients and are a useful minimally invasive diagnostic tool [63,64,65].

Lastly, the serum procalcitonin values were not significantly elevated in the VAP vs. no-VAP group. Procalcitonin is not specific to infection location (i.e. VAP). It may rise with bacterial infections in other locations as well. The no-VAP group did not equate to “no infection anywhere.” Indeed, infections are common in ICU patients ranging from catheter-associated urinary tract infections and other device infections, to soft-tissue infections or even peritonitis from a perforated viscus. There were some patients in the no-VAP group that had non-pulmonary infections with elevated procalcitonin values that raised the mean. It would not be appropriate to remove these patients from the analysis for the following reasons: (1) it would skew remove the real-world applicability of the data, and (2) the study would fall below the necessary sample size required. Lastly, it’s worth noting that procalcitonin values were not a study endpoint and the study was not powered for this purpose.

Conclusion

Ventilator-associated pneumonia remains a considerable source of morbidity and mortality in modern ICUs. The optimal diagnostic method remains unclear. Using HELICS criteria as the reference standard, CPIS displayed substantial diagnostic agreement whereas CDC/NHSN and Johanson criteria displayed slight and moderate agreement respectively. Accuracy was improved with the addition of serum procalcitonin > 0.5 ng/ml, but not positive quantitative endotracheal aspirate culture. These findings suggest that combining cohorts based on HELICS and CPIS may be reasonable for meta-analysis or population studies, but the same may not be true for studies based on CDC/NHSN criteria.