Background

Neonatal respiratory distress syndrome (NRDS) is a condition in which newborns experience respiratory distress shortly after birth, primarily due to progressive alveolar atrophy caused by a deficiency in alveolar surface type II active substances [1]. Traditionally, the diagnosis of NRDS is based on clinical manifestations and radiographic examinations [2]. Previously, the diagnosis of lung disease was considered inappropriate for ultrasonography [3, 4]. Recent studies have shown that lung ultrasound has good diagnostic sensitivity and specificity for various lung diseases in neonates and children [5,6,7]. However, an accurate ultrasound diagnosis of pulmonary diseases requires systematic operator training for operation and diagnosis. Moreover, this often relies on the subjective judgment of the operator, which may delay the diagnosis and treatment of the child if the judgment is incorrect. Therefore, finding and establishing a more objective and reliable diagnostic method for NRDS is important for clinical ultrasonography.

Radiomics is a novel and non-invasive technique that can extract massive amounts of feature data from images that are difficult to discern using human vision. It can achieve a quantitative representation of image features such as grayscale, texture, and morphology [8]. Machine learning algorithms can be utilized to analyze data accurately, establish predictive models, reduce subjective judgments, and provide objective quantitative predictive data to assist physicians’ decisions [9, 10]. By analyzing lung ultrasound images of patients with coronavirus disease (COVID-19), some scholars [11] found that the support vector machine (SVM) model demonstrated better accuracy in assessing the severity of pleural line changes, which is significant for accurately assessing patients’ diseases. A recent study [12] shows that integrating seven machine learning models selected for the prediction of preoperative 2-deoxy-2-[fluorine-18]fluoro-D-glucose ([18 F] FDG) positron emission tomography/computed tomography (PET/CT) radiographic features to predict the pathological aggressiveness of lung cancer had the highest diagnostic efficacy and better stability. These studies demonstrated the application of histological imaging in lung diseases.

Ultrasound imaging is not commonly used to diagnose pulmonary diseases. This study aimed to investigate the diagnostic efficacy of ultrasound-assisted diagnosis of NRDS.

Methods

Patients and data collection

A total of 150 inpatients who underwent lung ultrasound examination at the neonatal intensive care unit of Quanzhou Maternity and Children’s Hospital between September 2021 and November 2022 were included in this study. The patients were divided into training (n = 120) and verification cohorts (n = 30) based on the time of admission in an 8:2 ratio. The diagnostic criteria for NRDS were based on the European Consensus Guidelines on the Management of Respiratory Distress Syndrome:2022 Update. These criteria include (1) clinical manifestations such as shallow breathing, dyspnea, and expiratory moans appearing immediately after birth or within 4–6 h, which gradually worsen with time, along with blue and gray complexion, three concave signs, nasal flapping, and progressive cyanosis that does not improve with oxygen. Additionally, the pulmonary breath sounds and audible crackles decreased at the end of deep inspiration; (2) blood gas analysis indicated hypoxemia, increased blood carbon dioxide, metabolic acidosis, and respiratory acidosis; (3) chest X-ray findings showed a diffuse decrease in the transmittance of both lungs, ground-glass opacities in mild cases of both lung fields, the bronchial inflated phase as the disease progressed, and white lung formation in severe cases.

The exclusion criteria for this study were as follows: (1) Congenital developmental abnormalities such as congenital pulmonary dysplasia, thoracic malformation, posterior nasal atresia, congenital diaphragmatic hernia, and severe congenital heart disease; (2) Restrictive lung ventilation diseases, including severe pneumothorax and severe abdominal distention; (3) Prophylactic use of alveolar surfactant after birth; (4) Diagnosis of pulmonary cyst adenomatous malformations during both fetal and postpartum periods. The procedures for incorporating and excluding study participants are depicted in Fig. 1.

Fig. 1
figure 1

The flowchart outlining the inclusion and exclusion criteria for study subjects. NRDS = neonatal respiratory distress syndrome, MAS = meconium aspiration syndrome, TTN = transient tachypnea of the newborn

Ultrasound image acquisition

A GE LOGIQ P6 color Doppler ultrasound was utilized in this study, and a line array probe with a frequency of 9–12 MHz was selected. Only one focal point was selected and aligned with the pleural line. The harmonics were turned off, and the sweep depth was set to 3–4 cm. Two doctors with extensive experience in lung ultrasound diagnosis adjusted and optimized the image quality to capture the optimal images. Two ultrasound images of each patient were obtained and saved in digital imaging and communications in medicine (DICOM) format.

Lesion segmentation and radiomic feature extraction

The workflow for imaging histology involves several steps, including regions of interest (ROI) cutting, feature selection, feature extraction, and model construction. All ultrasound images meeting the inclusion criteria were obtained using an ultrasound instrument. Two senior physicians specializing in neonatal lung ultrasound manually outlined the ROI of the lesion area using ITK-SNAP 3.8.0 software (http://www.itksnap.org). In cases where images with combined pleural effusion were encountered, pleural effusion was excluded to avoid any potential interference.

Feature selection and radiomics model construction

The PyRadiomics module of Python software was used to extract the imaging histology features of ROI. A total of 107 initial imaging histology features were extracted, and after applying a significance threshold of P < 0.05, 83 features were retained for further analysis. Any two features with a correlation coefficient greater than 0.9 were identified using Spearman rank correlation coefficient calculation and reduced to a single feature to eliminate redundant features with high repeatability. Furthermore, a greedy recursive strategy was employed to filter irrelevant features, and the remaining features were used to construct a dataset for the least absolute shrinkage and selection operator (LASSO) regression model. Ten-fold cross-validation was performed to obtain the optimal λ, and the features with none zero were retained for the prediction model construction (Fig. 2).

Fig. 2
figure 2

The workflow of the radiomics model construction. In the segmentation section, we delineated the regions of interest (ROI) for lung ultrasound in patients with neonatal respiratory distress syndrome (NRDS), neonatal pneumonia, meconium aspiration syndrome (MAS), and transient tachypnea of the newborn (TTN), respectively. In the feature extraction part, we presented the results of feature extraction and the distribution proportions. In the feature selection section, we showcased the p-value distribution of the extracted features, cluster analysis, the coefficient convergence of the least absolute shrinkage and selection operator (LASSO) regression applied to the features, and the results of ten-fold cross-validation to obtain the optimal λ. In the model construction section, we displayed the features with none zero obtained after feature selection, ROC curves for the training and validation sets, and the decision curve analysis (DCA) for different models

Statistical analysis

All statistical analyses were performed using the SPSS software (version 26.0). For normally distributed measures, descriptive statistics are presented as mean ± standard deviation (±S). A t-test was used for two independent sample groups. Non-normally distributed measures were described using the median and interquartile range, represented as M (P25, P75), and the Mann-Whitney U test was used to compare the differences between the groups. Count data were presented as cases and composition ratios (%), and differences between groups were assessed using the chi-square, continuous corrected chi-square, or Fisher’s exact test. The variability of the performance data obtained from the internal validation of each model was analyzed using the McNemar test. Calibration curves were constructed for each model to assess the degree of calibration. Statistical significance was set at p < 0.05.

Results

Clinical characteristics of patients

The fundamental clinical characteristics of the children in the training and validation cohorts are comparable, as shown in Table 1.

Table 1 Baseline characteristics of patients in the training and test cohorts

Feature extraction and selection for ultrasound images

A total of 107 initial image histology features were extracted. Following feature screening by the LASSO algorithm, 22 features with none zero were ultimately incorporated into building the prediction model (Fig. 3).

Fig. 3
figure 3

The histogram of the coefficients of the selected features. 22 features that coefficient value was none zero remained, signature was built according to the coefficient value of the selected features

Comparison of diagnostic effectiveness of different models

The 22 features with none zero were included in five models: random forest (RF), support vector machine (SVM), multilayer perceptron (MLP), k-nearest neighbor (KNN), and logistic regression (LR). The results demonstrated that all models had high diagnostic efficacy, and no statistically significant differences were observed between the pair-wise comparisons of sensitivity, specificity, positive predictive value, or negative predictive value among the models (P > 0.05). In the training cohort, the RF and SVM models showed higher Youden indices, whereas the KNN and LR models showed lower Youden indices (Table 2). In the validation cohort, the RF and SVM models showed higher Youden indices; however, the KNN and MLP models showed lower values. The study found that the RF model exhibited the highest diagnostic efficacy (Table 3).

Table 2 Comparison of the diagnostic performance of different models in the training cohort
Table 3 Comparison of the diagnostic performance of different models in the validation cohort

Calibration curve comparison

The Hosmer–Lemeshow test showed that the KNN model was poorly calibrated (p = 0.004, p < 0.05), whereas the RF (p = 0.982), MLP (p = 0.599), SVM (p = 0.462), and LR (p = 0.340) models were better calibrated (Fig. 4).

Fig. 4
figure 4

Calibration curves of different models. a\(\sim\)e: RF、MLP、SVM、LR、KNN model, respectively. The x-axis represents the mean predicted probability, and the y-axis represents the proportion of true positives. The diagonal dashed line represents the reference line of perfect calibration. A model with good calibration will have its calibration curve closer to the reference line. A Hosmer-Lemeshow test with a p-value > 0.05 indicates good model fit

Comparison of decision curves for different models

In most cases, diverse models have demonstrated the potential to offer a notable rate of clinical benefit (Fig. 5).

Fig. 5
figure 5

Decision curves of different models. a\(\sim\)e: RF、SVM、LR、KNN、MLP model, respectively. The horizontal axis represents the threshold probability, while the vertical axis represents the net benefit rate. The larger the red shaded area, the wider the threshold range, and the greater the clinical value of the model

The diagnostic accuracy of randomly selected lung ultrasound images varies across doctors of differing levels of expertise

The diagnostic efficacy of the RF model and that of the senior physicians (over 5 years of dedicated experience in neonatal lung ultrasound) did not differ significantly regarding AUC, sensitivity, specificity, positive predictive value, or negative predictive value, and both were significantly better than those of junior physicians (3 \(\sim\) 5 years of involvement in neonatal lung ultrasound work) (Table 4). Moreover, physicians at different levels exhibited discrepancies in their subjective diagnoses, with mild NRDS and transient respiratory tachypnea in neonates being frequently misdiagnosed.

Table 4 Comparison of diagnostic efficacy between different levels of doctors and RF model

Discussion

Radiomics, a branch of artificial intelligence, has recently gained increasing attention in clinical medicine because of its ability to extract significant feature data from ultrasound images, representing quantitative information on image features, such as grayscale, texture, and morphology. The integration of machine learning algorithms allows for objective analysis of this data, minimizing subjective judgments and providing physicians with quantitative data to inform their decision-making [9, 13, 14]. Although research on machine learning for lung ultrasound is scarce, both domestically and internationally, scholars such as Cristiana Baloescu et al. [15] have successfully used convolutional neural networks to develop automated detection models for lung ultrasound B-lines, which were effective in assessing the severity of the alveolar interstitial syndrome. Another recent study [16] established a deep learning model to distinguish seven pivotal lung ultrasound features in neonates, demonstrating a commendable average accuracy. However, this model cannot distinguish NRDS from other lung diseases. Notably, limited research exists on imaging histology for identifying NRDS and other neonatal lung diseases using lung ultrasound.

In this study, lung ultrasound images of 60 children with NRDS and 90 children with other lung diseases were analyzed using ultrasound imaging histology. The results indicated that the models performed well in the training and validation cohorts. Specifically, in the training cohort, no statistically significant differences were observed in sensitivity, specificity, positive predictive value, or negative predictive value between the models. However, in the validation cohort, the Jordans were higher for the RF and SVM models and lower for the KNN and MLP models; the KNN model was poorly calibrated, whereas the other models were well calibrated, with the RF model being the best and the MLP and SVM models being the second best. When comparing the diagnostic efficacy of randomly selected lung ultrasound images between different levels of physicians, the RF model showed comparable diagnostic efficacy to senior physicians, with no statistically significant differences in sensitivity, specificity, positive predictive value, or negative predictive value. However, statistically significant differences were observed with junior physicians, and the Youden index was slightly lower than that of senior physicians and significantly higher than that of junior physicians. These results suggest that the RF model has better clinical application value. Furthermore, the RF model demonstrated stable and high application ability and proved to be the optimal model in the study of imaging-based histology-assisted diagnosis of NRDS.

Noteworthy, the RF model exhibits high accuracy and strong model generalization ability owing to the integrated algorithm that incorporates decision-tree-based stochastic attributes. The RF model demonstrated widespread applicability in various scenarios. For instance, Ren et al. [17] showed that the screening and predictive modeling of endometriosis-causing genes based on the RF model exhibited good clinical ability. Moreover, studies by Kwak et al. [18] and C. Venkata Narasimhulu [19] showed that the diagnostic efficacy of RF models is superior to that of experienced sonographers in diagnosing benign and malignant thyroid nodules and classifying benign and malignant renal cancers after noise reduction processing of the images. These studies indicate that the RF model is a useful tool for clinical practice and medical image classification, which is consistent with the RF model selected in this study for differentiating NRDS from non-NRDS lung ultrasound diseases.

This study had several limitations. First, this study used only one instrument model for image collection, and further exploration is required to determine whether the findings can be consistently reproduced using different instrument models. Second, this study was limited to a single-center setting; multicenter studies are needed to verify whether the diagnostic efficacy of the RF model is consistent across different settings. Finally, the sample size of this study was small, and obtaining a larger sample size for analysis would help verify the fundamental clinical characteristics of the children [20] and the stability of the model. These limitations highlight the need for further research to address these issues and improve the robustness of our findings.

Conclusions

In conclusion, the results of this study indicate that imaging histology analysis based on lung ultrasound images using the RF model resulted in superior diagnostic efficacy compared to other models, as demonstrated by its consistently high performance in both the training and validation cohorts, as well as in the evaluation of calibration curves. These findings suggest that the RF model is a promising approach for diagnosing neonatal respiratory distress syndrome based on lung ultrasound.