Introduction

Pediatric acute respiratory distress syndrome (PARDS) is a rapid-onset and potentially life-threatening lung injury [1]. Diagnosis of PARDS requires an assessment of the disease’s etiology, onset, a measure of oxygenation defect, and acute pulmonary parenchymal infiltrates on chest radiography per Pediatric Acute Lung Injury Consensus Conference (PALICC) [2]. The PALICC chest radiograph criteria for PARDS are distinct from the pre-existing Berlin Acute Respiratory Distress Syndrome (ARDS) criteria by allowing for the presence of unilateral disease. PALICC authors justified this distinction based on three factors—the lack of evidence on the sensitivity of chest radiograph to detect pulmonary parenchymal infiltrates, the heterogeneity present within the diagnosis, and question of whether bilateral infiltrates impart additional risk [2]. Regardless, interpreting clinicians and researchers have notably poor agreement when evaluating if chest radiography meets ARDS or PARDS criteria [3,4,5].

ARDS is underrecognized in both adults and children [6, 7]. Disease onset and measures of oxygen defect are readily obtainable from electronic medical records; however, clinician screening of chest radiographs is time-consuming and highly variable. Using computer-aided detection of findings consistent with PARDS on chest radiographs has the potential to improve the clinical care and research of pediatric acute respiratory failure. At the bedside, adherence to lung-protective principles in PARDS is associated with improved patient outcomes, and recognition of the disease is a key step in applying consensus guidelines [8,9,10]. Automated screening for inclusion in PARDS research studies could also remove a barrier for the inclusion of more patients and research sites into multicenter research studies and minimize center-level variation in patient inclusion.

A deep convolutional neural network (CNN) developed to identify findings of ARDS on chest radiographs was trained using 600,000 chest radiographs from adult patients and demonstrated excellent performance in an external cohort [11, 12]. However, when this model incorrectly assigned a high probability of ARDS, but clinicians did not assign ARDS, it was observed that unilateral airspace disease was present on some radiographs. Because unilateral disease is included in the definition of PARDS, it is possible that this CNN could be applied to children to detect PARDS. In this study, we sought to test the performance of the model trained and validated on adult patients in a cohort of children.

Materials and methods

This study was approved by the University of Michigan Institutional Review Board (HUM00129801) and all ethical principles for research involving human subjects were followed. Consent for study inclusion was waived. Chest radiographs from eligible patients were analyzed from 3 weeks coinciding with the Pediatric Acute Respiratory Distress Syndrome Incidence and Epidemiology (PARDIE) study from May 2016 to January 2017. Patients aged 7 days to 18 years were eligible for inclusion if they were admitted to the pediatric intensive care unit and mechanically ventilated through a tracheostomy or endotracheal tube or full-face non-invasive positive pressure mask. The sample was enriched with a convenience sample of 12 children who received extracorporeal membrane oxygenation (ECMO), but only radiographs prior to ECMO cannulation were included in the current analysis.

All chest radiographs were assessed for the presence of infiltrates consistent with the PALICC definition of PARDS (unilateral or bilateral airspace disease) and the Berlin definition of ARDS (bilateral airspace disease only) by a pediatric intensivist (JGK) and a pediatric radiologist (MGM). Chest radiographs from the same patient were interpreted independently. If there was agreement on the presence or absence of infiltrates consistent with the PALICC PARDS definition or Berlin ARDS definition, the radiograph was classified as such. If there was disagreement, a third pediatric intensivist (RPB) independently reviewed the radiograph. Chest radiographs were classified as clinician-identified PARDS or ARDS if 2/2 or 2/3 reading clinicians determined that the image met the specified criteria.

We calculated pairwise inter-rater reliability between the initial reading clinician (JGK), pediatric radiologist (MGM), and the CNN model. We assess inter-rater reliability using Cohen’s kappa. We determined the model’s ability to identify Berlin ARDS and PALICC PARDS separately by calculating the area under the receiver-operator curve (AUROC), and F1 score. To calculate F1, we selected a model probability threshold to classify chest radiographs as positive that matched the average sensitivity of the reading intensivist and radiologist and fixed this threshold to enforce this sensitivity. Based on this approach, probability threshold values of 0.195 and 0.362 were used as the CNN output score to identify PALICC PARDS and Berlin ARDS, respectively. We used gradient-weighted class activation mapping (Grad-CAM) to qualitatively understand which areas of the chest radiograph image the CNN model focused on when it correctly and incorrectly classified images. Representative images were presented as demonstrative examples of CNN focus.

Results

There were 328 chest radiographs from 66 patients analyzed (Table 1). There were 59 of 66 patients (89%) with at least one chest radiograph with findings consistent with PARDS. The number of chest radiographs per patient ranged from 1 to 22. 84% (276/328) of the chest radiographs had radiographic findings consistent with PARDS and 48% (158/328) had findings consistent with Berlin ARDS. There were 28 of 66 patients (42%) who were classified as PARDS by the PARDIE investigators because they had both a consistent chest radiograph and met all other clinical criteria without exclusions.

Table 1 Characteristics of children with chest imaging findings consistent with PALICC PARDS or Berlin ARDS

When applied to the pediatric cohort, the CNN previously trained to detect ARDS in adults had an AUROC of 0.882 (95% 0.84–0.92) for the identification of PALICC PARDS (any airspace disease) on chest radiograph (Table 2). The same model also had an AUROC of 0.84 (95% CI 0.8–0.883) for the identification of Berlin ARDS (bilateral airspace disease) on radiographs. Among chest radiographs with bilateral airspace disease, the CNN model generated a higher output probability score (median 0.470, IQR 0.338–0.716) compared to chest radiographs with unilateral disease (median 0.276, IQR 0.194–0.393) (Fig. 1). Chest radiographs without any airspace disease had the lowest probability outputs (median 0.170, IQR 0.108–0.238).

Table 2 Model performance in the identification of PALICC PARDS or Berlin ARDS radiographic findings on chest radiographs
Fig. 1
figure 1

CNN output scores stratified on the presence of pulmonary infiltrate signs in none, one, or both lungs on chest radiographs

The inter-rater reliability was similar between the clinicians and the deep learning model. Cohen’s kappa in detecting PARDS was 0.48 (95% CI 0.372–0.585) between the two initial reading clinicians, 0.46 (95% CI 0.35–0.57) between the deep model and the pediatric intensivist and 0.47 (95% CI 0.36–0.57) between the deep model and the pediatric radiologist (Table 3).

Table 3 Agreement between model and individual clinicians in the identification of PALICC PARDS or Berlin ARDS radiographic findings on chest radiographs

When reviewing the Grad-CAM of chest radiographs both correctly and incorrectly classified by the CNN as PARDS (Fig. 2), we found that the model generally focused on areas of lung airspace disease when it correctly classified chest radiographs as PARDS (Fig. 2a). When the model incorrectly assigned a chest radiograph as PARDS, it often focused on areas outside the lung fields (Fig. 2b). When clinicians identified the radiograph as PARDS and the CNN model did not, the radiographs seemed to either have very subtle infiltrates or there were larger, homogenous-appearing infiltrates that may have been difficult for the model to recognize as the lung field (Fig. 2c).

Fig. 2
figure 2

Example Grad-CAM images demonstrating areas of CNN model focus in correctly and incorrectly assigned images. The Grad-CAM image shows a heat map of the area on the image where the model focuses, with areas of green/yellow/red demonstrating the most focus. The CNN probability output score for each image is presented on the images. a is the chest radiograph and Grad-CAM image of a patient that clinicians and the CNN model both identified as PALICC PARDS. b shows the chest radiograph and Grad-CAM image of a patient where the clinician identified the patient as not having PARDS and the model identified PARDS (false positive). c shows the chest radiograph of a patient where the model did not identify PARDS but the clinicians did (false negative)

Discussion

In this analysis, we applied a CNN model trained and validated to detect findings of ARDS in adult patients to a cohort of children with respiratory failure. The model performed as well as clinicians in the identification of chest imaging findings consistent with pediatric ARDS. Despite being trained and validated in adults, the deep learning model performed better in the identification of PALICC PARDS than Berlin ARDS in a pediatric cohort.

When the model was trained on adult patients, the machine learning model trained on features in images that were labeled as ARDS through multiple clinician evaluations. This approach means that the factors clinicians weigh when determining if a chest radiograph is consistent with ARDS (bilateral vs. unilateral disease, presence of atelectasis, etc.) may not be the features the model weighs when assigning a probability of ARDS. We speculate several possible reasons the model performed better in the identification of PARDS, despite never being trained to identify PARDS. The first possible explanation is the misclassification of unilateral disease as ARDS that was anecdotally observed in the adult ARDS model validation [12]. This misclassification issue in the adult model may be a beneficial feature in the effort to identify pediatric ARDS which allows for unilateral disease. The CNN model is a continuous function of the input image, so it is possible that the larger the infiltrate area is in the image, the higher the output score. Since the overall infiltrate area is (most likely) larger in bilateral ARDS than unilateral (on average), the model may fail when there are large infiltrate areas in one lung field. In the case where the model “missed” clinician-identified PARDS cases, the chest imaging findings were either subtle, or more homogeneous, as in Fig. 2c. It is possible that the model is using textural image features and “patchy” airspace disease is more readily identified.

Machine learning has great potential in a data-rich environment like the pediatric intensive care unit [13, 14]. The algorithm in this study was developed using a methodology in machine learning called transfer learning, where the model was pre-trained to identify common chest radiograph findings in a large dataset before being trained to label ARDS [12, 15]. This process of transfer learning can support model development when there is a small dataset, which is a common issue in pediatric critical care. This study, while not trained on pediatric data, demonstrates the potential of either directly applying adult-trained machine learning models or to support the development of pediatric models. The concept of transfer learning may allow for the more rapid development and deployment of machine learning models in pediatric critical care.

This study has several important limitations. The study population included only children invasively or non-invasively mechanically ventilated with a high incidence of PARDS, so it is possible that the model performance would change when applied to more general critical care population. The positive predictive value and negative predictive value presented should be interpreted with this context. However, the primary measures of model performance including AUROC should be independent of the event rate. Importantly, this single-center study requires validation in a larger multicenter cohort with more clinicians interpreting chest radiographs.

While identification of features of PARDS on a chest radiograph is an important advance, there are other potential applications of this technology. The allowance for unilateral disease in the definition for PARDS opens the door to segment the chest radiograph into the left and right lung and determine disease laterality. Furthermore, we have previously shown that there are important clinical and pathophysiological differences associated with chest radiograph findings, so the ability of a machine learning model to detect findings on a chest radiograph that are associated with clinical outcomes would be meaningful for clinicians [16].

An algorithm trained and validated to detect ARDS in adult chest radiographs performs well in children for the detection of findings consistent with PARDS. The application of this model can support the clinical diagnosis of PARDS and eligibility for research enrollment. This study also highlights the potential for application of models developed in adult critical care to a pediatric critical care population.