1 Introduction

The incidence of bronchopulmonary dysplasia (BPD) is variable among centers, ranging from 12 to 32% in infants < 32 weeks of gestation at birth (Bhandari and Bhandari 2013). BPD is a disorder arising from genetic-environment interactions. As not all premature babies exposed to the same environmental risk factors (infections, hyperoxia, ventilator-induced lung injury) develop BPD, there is certainly a genetic predisposition explaining the development of BPD (Bhandari et al. 2006). While there is no specific or effective treatment for BPD, it is still important to identify those babies who are predisposed to develop BPD to target testing of preventive strategies or early therapies.

As BPD is multifactorial and involves different molecular pathways, many biomarkers have been proposed to identify babies at risk for developing this disease (Bhandari and Bhandari 2013). A biomarker is a measurable indicator of a biological state or condition. An effective or useful biomarker must be reliable and should have the capacity to be detected in the initial phase of the disease or even before the disease has developed, to allow early interventions to avoid or minimize the negative effects of the disease.

Some biomarkers have been identified in maternal, cord or peripheral blood, while others in the urine (Bhandari and Bhandari 2013). A limitation of these biomarkers is that the concentration of these molecules can be influenced by an inflammatory state in other parts of the body; hence, these are not lung specific. In fact, the currently available biomarkers are neither universally accepted nor used routinely in the clinical setting to identify those babies predisposed to develop the disease (BPD). Tracheal aspirates (TA) and/or bronchoalveolar lavage fluid (BALF) have been analysed to find some specific biomarkers of BPD, as it reflects more accurately, what is going on in the lung.

Biomarker studies can either select predefined biomarkers measured by specific methods or apply an ‘‘unbiased’’ approach involving use of indiscriminate detection platforms. Metabolomics is a promising approach to identify novel specific biomarkers for BPD; it consists of the quantitative analysis of many low molecular mass metabolites (< 2000 Da) found in a specific cell, organ or organism, involving substrates or products of a defined metabolic pathway. This analysis allows identifying changes in the composition of metabolites caused by the interaction between specific pathophysiological states, gene expression, and environment. Metabolites analyzed can be peptides, lipids, organic acids, vitamins, minerals, drugs, amino acids, nucleic acids, carbohydrates, fatty acids, hormones, drugs and any other chemical with a molecular weight < 2000 Da.

Metabolomics provides a unique fingerprint of an organism. As endogenous metabolites are linked to biochemical pathways such as glycolysis, Krebs’ cycle, lipid or amino acid metabolism and specific biochemical processes, changes in a specific metabolite pattern will reflect changes in its pathway or process.

The study of specific biochemical profiles of the organism’s response in temporal sequence of the clinical evolution of BPD can improve timing, sensitivity and specificity of diagnosis and potentially improve treatment strategies. The “omic” approach can help define the profile of babies (a set of biomarkers that can help identify a specific disease pattern, and potentially, response to treatment) with and without BPD and identify some compounds that are specific to the host response during the pathogenesis of the disease and subsequently, these can be used as biomarkers (Lal et al. 2018a). While urine (Fanos et al. 2014; Pintus et al. 2018), amniotic fluid (Baraldi et al. 2016), and cord blood (La Frano et al. 2018) have been used to detect a metabolomic signature early in the course of BPD, we are aware of only 1 recent study that has evaluated TA (Lal et al. 2018b), in a relatively small sample size of infants. We therefore performed a metabolomics study on the TA fluid of a larger number of preterm babies with gestational age (GA) less than 30 weeks, to find new biomarkers of BPD.

2 Materials and methods

We performed a study on the metabolomics analysis of the TA specimens collected in the first week of life of ventilated preterm neonates with a GA < 30 weeks. We recruited all neonates with a GA < 30 weeks hospitalized at Yale-New Haven Children’s Hospital and Cooper University neonatal intensive care units (NICUs). We excluded neonates with major malformations. TA samples from 91 patients were utilized. We collected TA samples from preterm neonates on mechanical ventilation during the first week of life (day 1, day 3, day 5 and day 7, when possible). The technique consisted of the introduction of two aliquots of 1 mL/kg normal saline at room temperature in the lower airways through the in-dwelling endotracheal tube followed by immediate removal and transport in ice (de Blic et al. 2000). The samples were stored at − 80 °C till analysis. All infants were followed till 36 weeks post-menstrual age, allowing us to divide the patients in two groups (neonates who developed BPD vs neonates who did not) according to the physiologic definition of BPD (Walsh et al. 2004). These infants had the final outcomes of having the diagnoses of with or without BPD and/or death. Collection and processing of the human TA samples was approved by the institutional review board of Yale University and Cooper University Hospital and informed, written consent was obtained from the parents of the participants of the study. Clinical characteristics collected included mode of delivery, GA at birth, birth weight (BW), sex, and development of BPD.

2.1 Extraction method

TA samples were prepared according to the modified AbsoluteIDQ p180 (Biocrates Life Sciences AG) MS-based assay kit. Briefly, 500 µL of TA was used for extraction to ensure good enrichment of targeted metabolites. Extraction was carried out utilizing a Waters Oasis HLB 1 mL cartridge. Captured compounds on the cartridge were eluted with methanol (50–100 µL) and dried down using the nitrogen evaporator. The dried pellet (TA extract) was re-suspended into 30 µL methanol just prior to the AbsoluteIDQ p180 kit preparation (Romisch-Margl et al. 2012). 10 µL of ISTD (internal standards of the p180 kit) was added to each TA extract sample, prior to assaying. The prepared samples were randomized using the List Randomizer (https://www.random.org/lists/) prior to being analyzed via both a flow injection analysis (FIA), as previously described (Goffredo et al. 2017), for 40 acylcarnitines, hexoses, 90 glycerophospholipids, and 15 sphingolipids; and a liquid chromatography (LC) method for amino acids and biogenic amines. Each sample was injected once for the LC method and twice (one in positive mode and one in negative mode) for the FIA method. The LC system used in the later part was a Perkin Elmer Flexar ultra high pressure liquid chromatography system coupled in-line to a 4000 Q-Trap LC MS/MS system. The FIA method was carried out on the same MS system. Additionally, an Agilent Technologies ZORBAX Eclipse XDB-C18 (3.0 × 100 mm, 3.5 µm pore size) column (p.n. 961967-302), coupled to an analytical Phenomenex SecurityGuard trap (C18, 4 × 3.0 mm) was utilized for the LC assay portion. Data was collected utilizing Analyst software (v1.5.2) and analyzed with Biocrates MetIDQ software (v4.6.1). Internal calibration and quality control (QC) samples of the p180 kit was utilized to benchmark the quality of the assay and robustness of the data, as well as to ensure robustness of the chromatography, mass spectrometer, extraction method, and reagents utilized in the experiment. Calculated concentrations of all the metabolites were normalized back to the predetermined protein weight of the starting TA sample.

2.2 Statistical analysis

The approach chosen for the present study was based on Partial Least Squares-Discriminant analysis (PLS-DA) (Barker and Rayens 2003), due to its versatility and ability to deal with highly correlated predictors. Indeed, the PLS-DA algorithm, by projecting the data onto a low-dimensional space of orthogonal latent variables, allows to cope with classification problems, involving a high number of intercorrelated variables (at least higher than the number of observations), for which other methods, such as linear discriminant analysis, result inapplicable (Wold et al. 1982). Once the PLS-DA model was calculated, information about the experimental variables or predictor important in the discrimination was gained by inspecting the so-called variable importance in projection (VIP) indices (Marzetti et al. 2014).

2.2.1 Validation of the PLS-DA model

For any statistical model to be reliable, the calibration stage should always be accompanied by a careful validation of its results. The choice of proper validation becomes even more relevant when the phase of model building involves the selection of the optimal value of some adjustable parameters (as it occurs in the case of PLS-DA, where the number of LVs must be set). Since the optimal values of the parameters are typically chosen as the ones that minimize error measurement in a dataset (possibly treated as unknowns by the model), this same set of data cannot be used also to evaluate the final model performance (as it would lead to over optimistic results). When many samples are available, this translates into the necessity of having three sets of data: the calibration set for model building, the internal validation set for the selection of model parameters, and the external validation set (or test set) for the final validation of model performance.

However, in cases such as the one of the present study, where the number of participants would not allow extracting three representative sets, re-sampling strategies maybe used. Specifically, the so-called double cross-validation strategy was adopted (Westerhuis et al. 2008). Double cross-validation operates by randomly extracting from the available number of samples a small subset to constitute an external validation set. The remaining samples are then divided into a certain number of cancelation groups, like in standard cross-validation, to estimate the optimal model complexity.

The optimal model on this subset of data is subsequently validated on the external validation subset. The whole procedure is repeated a suitable number of iterations to obtain a good representativeness and the final results are averaged.

To rule out any possibility of chance correlation, the average results obtained from the double cross-validation procedure were further compared with the results of permutation tests (Szymanska et al. 2012). These tests are used to obtain an empirical distribution of the classification figures of merit under the null hypothesis (i.e., under the assumption that no discrimination exists between the two classes) and are carried out by repeating the whole modeling stage on datasets for which the class labels are randomly permuted. In the present study, permutation tests involved 1000 randomizations. Two figures of merit were considered: the number of misclassifications (NMC) and the area under the receiver operating characteristic (ROC) curve (AUROC). NMC is the most intuitive of all diagnostic statistics as it simply indicates the number of samples (or participants, as in the present investigation), which are wrongly classified by the model. AUROC is a figure of merit borrowed from signal processing and is particularly useful to characterize binary classifiers. Its values range between 1 (perfect classification) and 0 (no discrimination).

As the average of the squared VIP scores equals 1, “greater than one rule” is usually used as a criterion for selection of variables (Chong and Jun 2005). This is not a statistically justified limit and it can be shown that it is very sensitive to the presence of non-relevant information pertaining to predictor variables (Tran et al. 2014).

2.2.2 Correlation analysis

Pearsons’ correlation analysis was performed through corrr package in R software (Team 2017). Correlation with r values higher than |.6| where maintained to draw networks’ edges through igraph package (Csardi and Nepusz 2006). Blue lines indicate positive correlation among metabolites, while red lines indicate negative correlations. Moreover, to assess specific metabolites that could potentially serve as a milestone within the respective network, nodes degree and betweenness centrality were estimated.

The identification of difference in metabolites abundance among groups was assigned using Wilcoxon Mann–Whitney test (corrected for false discovery rate or FDR). Mann–Whitney Wilcoxon analyses was performed in order to assess the difference between the groups for each metabolite detected.

The heatmap was produced using the mean values for each significant metabolites among groups. Moreover, to draw dendrograms both for subjects than for metabolites, euclidean distance was calculated and “ward.D2” was used for clustering.

3 Results

TA samples from 91 patients were utilized as part of on-going studies (Aghai et al. 2012, 2013; Mody et al. 2012). We excluded all the neonates that died before 28 days of life. Sixty-four neonates developed BPD whereas 27 did not develop the disease. We further eliminated 26 neonates because of low quality of the sample. We finally evaluated 68 neonates, 44 with BPD and 24 without BPD. We collected 160 TA samples: 48 were collected in the first day of life, 42 in the third day of life, 39 in the fifth day of life and 31 in the seventh day of life. TA could only be collected if an endotracheal tube was in situ in the patients; since infants were extubated over the first few days of life based on improvements in their clinical status, the number of samples collected decreased with increasing postnatal age. The distribution of the samples divided by presence of BPD is shown in eTable 1.

Demographic characteristics of the neonates recruited are shown in Table 1. Babies who developed BPD had a significant lower GA and lower birth weight. We performed a partial least squares-discriminant analysis (PLS-DA) of BPD categories versus metabolites identified in the TA. The relationship between presence of BPD and patterns of circulating biomarkers was explored by constructing and validating a predictive classification model.

Table 1 Demographic characteristics of the neonates

3.1 PLS-DA on individual data matrices

The classification model was firstly built on the data coming from the entire metabolomic dataset used to characterize the TA samples; the model was built, optimized and validated on the data collected on the 71 training samples according to a double cross-validation (DCV) procedure, where the inner resampling loop was used for model selection and the outer loop for testing (i.e., to evaluate the predictive accuracy on the left out samples that were treated as unknown).

The PLS-DA performed on all metabolites is shown in eFigure 1. The optimal model built on autoscaled data was found to include 4 significant latent variables (LVs). The PLS-DA model, in this case, did not allow a clear separation of samples as confirmed by scarce statistically significant values of diagnostic statistics; particularly, BPD patients group results appears to be split in two parts, one of which appears to be superimposed with the control subjects (eFigure 1). The validated correct classification rate was found to be 61.2% ± 2.9 for babies with BPD and 74.7% ± 8.3 for babies without BPD when considering the internal DCV loop and 56.3% ± 3.6 and 70.3% ± 6.3 for babies with and without BPD, respectively. The statistical significance of the PLS-DA model was characterized by testing the average values of NMC and AUROC with respect to the empirical distributions for their null hypotheses obtained by permutation tests. Accordingly, p values of 0.090 and 0.043 were obtained for NMC and AUROC, respectively; the latter figure of merit, thus, more strongly supporting the significance of the observed classification. Successively, to identify which were the predictors contributing the most to the classification model, the VIP scores were calculated for each metabolite (Wold et al. 1993). Indeed, the VIP scores are commonly used to identify which variables contribute the most to a PLS-DA model, as the normalization used to define them, allows one to introduce a “greater than one” criterion for significance. Accordingly, based on the VIP criterion, 53 variables belonging to different chemical classes were identified as significant for the discrimination of the two groups of subjects (Table 2). To reduce the system complexity and increase the robustness/selectivity of the models, a second PLS-DA model, including only the 53 metabolites selected according to their VIP scores, was then built and validated (Fig. 1).

Table 2 Metabolites that resulted significant for the discrimination of the group of neonates who developed BPD versus the group of neonates who did not develop BPD
Fig. 1
figure 1

The results of the PLS-DA analysis performed only on the 53 significant variables

The model built on autoscaled data was represented by five significant LVs. The validated correct classification rate was found to be 68.4% ± 3.6 for babies with BPD and 84.0% ± 5.6 for babies without BPD when considering the internal DCV loop and 62.1% ± 4.3 and 84.4% ± 4.5 for babies with and without BPD, respectively. In this case, the statistical significance of the PLS-DA model was characterized by testing the average values of NMC and AUROC with respect to the empirical distributions for their null hypotheses obtained by permutation tests. The corresponding p values were 0.008 and 0.001 for NMC and AUROC, respectively; accordingly, both figures of merit strongly support the significance of the observed classification. As already done with the full model, in order to identify the variables contributing the most, the VIP scores were calculated and 19 metabolites, belonging to acylcarnitines, amino acids, biogenic amines, lysophosphatidylcholine, phosphatidylcholine and sphingomyelin appeared to be statistically significant according to the “greater than one” criterion (Table 3).

Table 3 Metabolites that resulted highly significant for the discrimination of the group of neonates who developed BPD versus the group of neonates who did not develop BPD

3.2 Correlation analysis

Given the significant differences in the GAs between the groups, to control for the developmental differences we further analyzed our data by additional grouping, as follows: BPD from 23 to 26 weeks (group 1), BPD from 27 to 30 weeks (group 2), NO BPD from 23 to 26 weeks (group 3) and NO BPD from 27 to 30 weeks (group 4). Dividing the neonates in these subgroups allowed to obtain two groups of neonates that showed no significant difference between GA and BW (Table 1).

Therefore, to understand if there could be different biomarkers, we correlated each group with the 53 significant variables (metabolites).

Specifically, to identify potential master regulators in the metabolomics profile, we generated a co-expression network, between the 53 significant metabolites and groups, by using Pearson correlation as a distance metric (eFigures 2–5, eTable 2A). Nodes in the co-expression network represent metabolites, and the presence of an edge linking two metabolites means that the correlation between their expression metabolic profiles exceeds a given threshold in terms of absolute value. We estimated the number of strongly connected components for different correlation values and predicted that the Pearson correlation threshold of 0.6 would largely maintain network integrity.

The correlations were exclusively positive in each correlation network. In eFigure 2, Panel A the Pearson correlation distribution of all 53 metabolites in group 1, revealed a clear formation of three separated groupings. To identify which metabolites affect these connections the Degree (D) and Betweenness (B) analysis were estimated. Acylcarnitine CH18.1.OH: (D, 50; B, 98.5449) and the lysophosphatidylcholine (LysoPC) analogue lysoPCaC18.0 (D, 56; B, 124.18) showed high values of both D and B parameters, highlighting the ability of these metabolites to develop interaction within the groups and outside the groups (eFigure 2, Panel B).

The Pearson correlation in group 2 evidenced the creation of a two-linked grouping and one without connection (eFigure 3, Panel A). This statistical analysis revealed that only one metabolite showed high value of D as CH18.1.OH (D, 62), and one for B as aminoacid Ile (B, 281.9776) (eFigure 3, Panel B).

On the contrary, the network built on metabolites of group 3 highlight a unique dense cluster with also four linked outliers (eFigure 4, Panel A). Acylcarnitine C18.1.OH (D, 74) showed high values of D while the acylalkyl-phosphatidylcholines PCaeC40.1 (D, 52; B, 43.78485) and PCaeC42.3 (D, 52; B, 45.84746) had high values of both D and B parameters (eFigure 4, Panel B). Finally, when the networks were built on metabolites of group 4: one main cluster with some linked outliers (eFigure 5, Panel A), and one small cluster with two metabolites were seen. In eFigure 5, Panel B, as in group1, the CH18.1.OH: (D, 54; B, 80.88051) and lysoPCaC18.0 (D, 58; B, 67.9516) showed high values of both D and B parameters.

Moreover, by the comparison through Wilcoxon–Mann–Whitney test, of BPD versus NO BPD at the same GA, we observed two different set of statistical significant variables (metabolites) (Fig. 2, eTable 2B). The comparison between group 1 versus 3 evidenced 18 statistically significant metabolites, all more expressed in group 1 (Fig. 2A; eTable 2B). On the contrary, those between group 2 versus 4 highlighted 12 statistically significant metabolites, 11 higher in group 2 and 1 (SDMA.4v2) higher in group 4 (Fig. 2B; eTable 2B).

Fig. 2
figure 2

Manhattan plots representing –log10 of p values obtained from Wilcoxon-Mann-Whitney test, trough the comparison of 53 statistically significant metabolites (dots) between children of Group 1 versus Group 3 (A), and between Group 2 versus Group 4 (B). Significant threshold levels are reported with red line (FDR ≤ 0.05) and with blue lines (FDR ≤ 0.01)

In eTable 3 are the results reported using the Mann–Whitney–Wilcoxon test to analyze the difference between groups for all metabolites detected by considering each time point. We divided the subjects into 4 groups: 24–26 weeks GA at 1st day and at 7 days of life, both for BPD patients and controls (NO BPD), and 27–30 weeks GA at 1st day and at 7 days of life, both for BPD patients and controls (NO BPD). The comparison, carried out by intra- and inter-groups, revealed 41 statistically significant metabolites (eTable 3).

In order to highlight the potential cluster formation, in the Heatmap (eFigure 6), we have reported the statistically significant (p < 0.05) correlations between subject’s groups. Hierarchical cluster analysis revealed two main clusters for metabolites, highlighted by red and green colors. The red cluster regrouped metabolites especially belonging to acylcarnitines, while the green cluster regrouped metabolites belonging in particular to aminoacids and glycerophospholipids.

The subjects were also divided in two main clusters, and the same trend was noted for controls (eFigure 6). In particular, in the control group 24–26 of GA after 7 days of life (24_26_7_c) all metabolites were under produced; on the contrary, in the control group 27–30 weeks GA after 7 days of life (27_30_7_c) the aminoacids and glycerophospholipids were overproduced (eFigure 6). Moreover, a different trend was evidenced also for patients at 24–26 and 27–30 weeks GA after 1 and 7 days of life, showing an overproduction of metabolites on day 1 of life with respect to 7 days of life (eFigure 6).

4 Discussion

Metabolomic studies can be performed using two techniques: the targeted and the non-targeted approach. The first approach consists of investigating some pre-identified and expected metabolites. This technique can be only used when the pathophysiological state of a disease is already well known or at least partially understood. The non-targeted approach conversely consists of investigating all metabolites detectable in a sample to capture as much information as possible, providing a functional fingerprint of the pathological state that is being investigated. The metabolic pattern should be investigated in different experimental groups to define a specific fingerprint related to a pathological process (Griffiths et al. 2010).

The alveoli are covered by a thin layer of epithelial lining fluid that has the role to facilitate gas exchange and to protect the lung from external stimuli; it is rich with cells and soluble components (Hsia et al. 2016). The analysis of TA/BALF has been used for the detection of biomarkers related to various pulmonary diseases that are characterized by up or down-regulation of normally expressed proteins, or changes in the balance of pro- and anti-inflammatory proteins, including idiopathic pulmonary fibrosis (Lenz et al. 1993), cystic fibrosis (von Bredow et al. 2001), and acute respiratory distress syndrome (Bhargava et al. 2014). TA/BALF has also been used to perform metabolomic analyses in children. It has been used to better define patients with asthma and cystic fibrosis (Wolak et al. 2009).

Fabiano et al. conducted a preliminary study to analyze metabolic profiles on the BALF of preterm infants complicated by respiratory distress syndrome (Fabiano et al. 2011). In this study 12 BALF samples were collected at birth prior to surfactant administration, post-surfactant administration during mechanical ventilation and at extubation time-points, and were analyzed by nuclear magnetic resonance (NMR) and gas chromatography (GC-MS). The analysis identified 25 metabolites of whom 10 had a known molecular structure and were overexpressed in BALF collected during mechanical ventilation after surfactant administration compared to BALF collected prior to surfactant administration. These were: undecane, decanoic acid, dodecanoic acid, hexadecanoic acid, octadecanoic acid, hexadecanoic acid methyl ester, 9-octadecanoic acid, tetracosanoic acid, myristic acid, phosphate.

Urine samples have been used for metabolomics analysis because they have the advantage of being non-invasive and easy to collect. In a recent paper Fanos et al. compared the urinary metabolic profile at birth in 36 newborns with a GA below 29 weeks and BW below 1500 g (Fanos et al. 2014). Neonates were divided in two groups: the first group (18 cases) consisted of newborns who later developed BPD and the second group (18 controls) consisting of newborns without BPD. They found five discriminant metabolites in urine: lactate, taurine, trimethylamine-N-oxide (TMAO), myoinositol (which increased in BPD patients) and gluconate (which was decreased).

In our study we were able to find a metabolomic signature of BPD in the TA of preterm infants in the first week of life. The analysis performed on all the metabolites allowed us to select a cluster of 53 metabolites characteristic of BPD.

The PLS-DA analysis evidenced a differentiation between the babies who developed BPD and the ones that did not develop BPD (eFigure 1). In fact, neonates who did not develop BPD were homogenously located in the left square of the plot (green spots), while neonates who developed BPD (red spots) were more diffusely distributed. Fifty-three variables were identified allowing us to separate the two groups. A subsequent analysis performed only on these 53 metabolites allowed us to select 18 metabolites that were highly significant in the separation of the two groups.

A recent study has reported on the metabolomic signature of neonates being classified as BPD “susceptible” or “resistant” (Lal et al. 2018b). However, the different demographic characteristics, timing of collection of the TA (at or within 6 h of birth), and the methodology used to evaluate the metabolites preclude our ability to do an appropriate comparative analysis given the discordant results (Lal et al. 2018b). In our study, the different metabolic pattern contributing to the separation of the neonates with and without BPD included several metabolites that are possibly involved in the pathogenesis of the disease.

The further analysis we performed on the four subgroups GA (BPDs 23–26 weeks vs NO BPD 23 vs 26 weeks and BPD 27–30 weeks vs NO BPD 27 vs 30 weeks) allowed us to establish which metabolites are statistically significant between babies with and without BPD, independently of GA and BW. The most significant metabolites were the amino acids histidine (His), glutamic acid (Glu), citrulline (Cit), glycine (Gly) and isoleucine (Ile), their level being higher in neonates with BPD respect to neonates without BPD (eTable 2B). All the babies participating in the study were provided the same composition of parenteral nutrition, using the NICU standardized protocol.

Asparagine was increased in neonates developing BPD, as well as serine. Of particular interest is the increase in taurine in neonates developing BPD. In their paper Fanos et al. (2014) found an increse in taurine in neonates with BPD. The authors state that taurine has a fundamental biological role for osmoregulation and membrane stabilization and also plays important roles in calcium homeostasis, renal cell cycle and apoptosis, nerve cell activity and detoxification. De Boo et al. performed an animal study on asphyxiated sheep (de Boo and Harding 2007). Taurine was more elevated in the sheep with the worst outcome in terms of morbidity and mortality. Taurine seems therefore to be increased in relation to hypoxia.

Citrulline is another metabolite that has a higher concentration in patients with BPD, with respect to controls. Recently Montgomery et al. demonstrated that neonates who develop BPD and pulmonary hypertension (PH) have a lower plasma concentration level of citrulline than neonates who develop BPD without PH (Montgomery et al. 2016). Citrulline, a non-essential amino acid, is the precursor of arginine, which is a substrate to produce nitric oxide. Citrulline should therefore have a protective role towards the development of PH. In fact, the administration of citrulline in the animal model reduces the development of PH (Fike et al. 2015). Similarly, children who underwent cardiac surgery had lesser PH when citrulline was administered (Fike et al. 2014; Smith et al. 2006). The higher content of citrulline in our cohort of BPD patients could be interpreted as an attempt by the infant to raise the level of citrulline to produce a higher level of NO to fight the onset of PH.

Biogenic amines are nitrogen compounds that are formed by microbic decarboxylation of aminoacids or by amination or transamination of aldehydes and ketones. Among these, spermidine resulted increased in the group of neonates who developed BPD. As expected the symmetric dimethylarginine (SDMA), the inhibitor of the endogenous production of NO, has a role in the development of PH in patients with BPD, as demonstrated by Trittmann et al. (2015). In our study SDMA was lower in neonates who developed BPD, with respect to controls.

Among acylcarnitines, acylcarnitines C16-OH and C18:1-OH were higher in neonates who developed BPD, but especially in the most preterm infants (neonates with GA < 27 weeks). Acylcarnitines are released during β-oxidation of fatty acids. Recently an interesting relationship between an alteration of the β-oxidation of fatty acids and development of preeclampsia was discovered. Illsinger et al. demonstrated an increase in the concentration of carnitine and acylcarnitines in women with preeclampsia (Illsinger et al. 2010). The authors demonstrated in the placenta of preeclamptic mothers a reduced expression of cytochrome C oxidase, responsible for an altered energetic production in the mitochondria. They studied 17 mothers with preeclampsia and 35 mothers with non-complicated pregnancy. The levels of carnitine and acylcarnitines were higher in mothers with preeclampsia and in cord blood. The authors also studied 46 neonates born from mothers with normal pregnancy and 25 neonates born from mothers with preeclampsia. The blood levels of carnitine and acylcarnitines were significantly higher in neonates born from mothers with preeclampsia. Neither the mothers nor the neonates had any congenital metabolic defect. It seems, therefore, that in preeclampsia the alteration of the mitochondrial function is not due to a congenital defect of β-oxidation. During pregnancy, there is probably an increased energetic requirement and a slight mitochondrial dysfunction that can lead to an altered concentration of acylcarnitines both in the mother and in the neonate.

In our study, the increased level of acylcarnitines in the group of neonates who develop BPD could potentially be ascribed to the fact that neonates born to mothers with preeclampsia have a higher rate of BPD, as demonstrated in a recent paper by Torchin et al. (2016). However, since we do not have detailed data regarding the incidence of preeclampsia in our study cohort, any relationship of acylcarnitines and preeclampsia in our study cohort is speculative.

Moreover, by the correlation analysis it appears that GA may affect the metabolomics profile as only a few number of metabolites create and interact to the metabolic network formation. In fact, in the group of neonates with GA between 23 and 26 weeks C18.1.OH and LysoPCaC18.0 were the most significant metabolites for the metabolic network in neonates with BPD, whereas PCaeC40.1 and PCaeC42.3 were the most significant metabolites in neonates without BPD. In the group of neonates with GA between 27 and 30 weeks, C18.1.OH and Ile were the most significant metabolites for the metabolic network in neonates with BPD, whereas C18.1.OH and LysoPCaC18.0 were the most significant metabolites in neonates without BPD.

5 Conclusions

It is becoming more and more evident that the identification of a single protein biomarker for a complex disease as BPD is unlikely to be sufficient. The most successful strategy would be to define a panel of biomarkers that could come from a combination of genomics, proteomics and metabolomics. The most promising fluid to analyze seems to be TA/BALF as it reflects directly what is going on in the lung. Based on the VIP scores, we identified 18 metabolites belonging to acylcarnitines, amino acids, biogenic amines, lysophosphatidylcholine, phosphatidylcholine and sphingomyelin that discriminated between BPD and NO BPD groups. To control for the potential confounding variables of developmental GA and postnatal age, we further analysed the data by grouping the samples into 2 groups of 23–36 weeks and 27–30 weeks GA categories and compared BPD versus NO BPD as well as between day 1 and 7.

The developmental profile provided us with unique metabolic signatures. The 23–26 weeks GA with BPD had statistically significantly increased levels of 18 metabolites, compared to NO BPD in the same GA group. In the 27–30 weeks GA group, among the 12 statistically significantly different metabolites, 11 were higher in those with BPD, while 1 was higher in in the NO BPD group, in the same GA category. Specifically, we highlighted that the amino acids histidine, glutamic acid, citrulline, glycine and isoleucine levels were higher in neonates with BPD. In addition, acylcarnitines C16-OH and C18:1-OH were also higher in neonates who developed BPD, but especially in the most preterm infants (neonates with GA < 27 weeks).

Regarding the temporal profile of the metabolites, carried out by intra- and inter-group comparisons, we noted 41 statistically different metabolites between day 1 and day 7 mostly consisted of acylcarnitines, aminoacids and glycerophospholipids.

We believe that the identification of these clusters of substances could lead to the development of a rapid test, allowing its use in the clinical setting. The identification of a predisposition to BPD can allow the clinician to treat precociously the babies that are prone to develop BPD. Thus, the development of therapeutics and diagnostics integration, called “theranostics” that tailor optimized therapies based on a biomarker profile of each individual patient, thus improving response to treatment and reducing side-effects, would become a viable option for BPD. This field promises to increase the quality of clinical care by identifying the right drug for the right patient at the right time.