Background

Neonatal jaundice

Neonatal jaundice is the most common clinical manifestation of newborns [13]. Hyperbilirubinemia, the cause of jaundice, appears in approximately 60% of the newborns at term and almost in all preterm neonates, with prevalence greater than 80% [4, 5].

In the vast majority of newborns, jaundice is a benign condition. However, an incorrect or delayed diagnosis may put newborns at risk of developing kernicterus [6, 7].

Kernicterus is the chronic form of bilirubin encephalopathy and occurs when the deposition of bilirubin in the brain causes irreversible damage [7, 8].

The correct identification of newborns at risk of developing severe hyperbilirubinemia and kernicterus is essential for early treatment. Therefore, preventing the newborn from toxic bilirubin levels, especially for their immature central nervous system, has become a main concern for pediatricians [8, 9].

Assessing the risk of neonatal jaundice is currently done with the support of specific nomograms that take into account the age of the newborns, the serum or transcutaneous bilirubin levels and associated risk factors [10]. Bhutani’s nomogram is the most widespread and it is also suggested by the guidelines published by AAP and NICE [4, 11].

Despite the use of different methodologies to assess the risk of developing neonatal hyperbilirubinemia, several studies pointed out a growing resurgence of bilirubin encephalopathy and kernicterus, identifying the need to improve diagnosis [12, 13].

When predicting bilirubinemia, the isolated use of risk factors is identified as the most poor in terms of predictive ability [14]. In another sense, the evaluations of serum and transcutaneous bilirubin in the first day of life of the newborn have shown a significant correlation with the subsequent development of hyperbilirubinemia [15, 16]. However, this correlation is even more significant when the evaluation of measurements of serum or transcutaneous bilirubin are combined with the risk factors, especially when the bilirubin levels are high [1, 3, 16].

Table 1 presents a comparative analysis between the different predictive methods, according to the outcome and predictive accuracy.

Table 1 Comparison of the accuracy of traditional risk assessment strategies (adapted from Keren & Bhutani, 2007)

The predictive outcome – severe hyperbilirubinemia – was defined differently in the presented studies of different strategies for risk assessment. Thus, this definition can affect many important factors found with the different models and also the predictive accuracy of the model [17].

Data mining

Data mining is one of the newest areas of computer science that uses various statistical techniques, databases, artificial intelligence and pattern recognition (one of the areas of machine learning). The basis of the methodologies of data mining is its ability to find patterns and relationships within large quantities of data that can enable the construction of models that meet the task of assigning the class label at unlabeled cases, the combination of statistical methods and artificial intelligence to the management of databases [18, 19].

Data mining techniques have thus successfully been applied in a variety of forecasting tasks [20]. By identifying hidden patterns, data mining can get information that allows a new perspective on certain diseases and to find knowledge that can foster more research in several areas of medicine. The high degree of accuracy of developed models is a good example of data mining's contribution to medicine [21].

In many areas of medicine, data mining has proven to be a huge added value by contributing with new discoveries and improving the results obtained with other methodologies [20].

Thus, the application of data mining techniques can be an excellent way to improve the diagnosis of neonatal jaundice, contributing to the reduction in cases of newborns whose misjudgment of the risk of the development of hyperbilirubinemia can put them in danger. To our knowledge, no other study used data mining techniques to improve the diagnosis of neonatal jaundice.

Hence, the purpose of this study is to improve the diagnosis of neonatal jaundice with the application of data mining techniques.

Methods

This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology [22].

Business understanding

Different recent studies point out the need to improve the diagnosis of neonatal jaundice to prevent severe hyperbilirubinemia and kernicterus. Hence, it is important to explore new methodologies, such as data mining, that can provide better results than the traditional methods.

After examining the different data mining tools, the software WEKA version 3.6, was chosen mainly because of its characteristics: it is a user-friendly tool for health professionals and, as a free application, does not represent any additional cost [23].

Compared with the studies identified in the literature it is expected that data mining techniques could induce predictions with greater accuracy than known traditional methods.

Data comprehension

The study was performed at the Obstetrics Department of the Centro Hospitalar Tâmega e Sousa, E.P.E., North Portugal, during the period from February to March of 2011.

Healthy newborn infants with 35 or more weeks of gestation were included in the study. Thus, 4 cases without this requirement were excluded from the 231 in the initial sample.

All the data present in the newborn original paper-based record, collected by doctors and nurses, was transcribed into a Microsoft Access database previously implemented for this purpose.

The collected data included: mother and father information, siblings information, gestational information, delivery information, physical exam of the newborn and clinical information of the complete hospital stay. At total, 72 variables were collected and analyzed. The complete table with all the variables is presented in Additional file 1.

Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer, the JM-103 Jaundice Meter from Konica Minolta, following the manufacturer’s instructions. Once hyperbilirubinemia was diagnosed and phototherapy was provided, the further bilirubinometer measurements were not performed.

Data preparation

A preliminary statistical analysis was carried out to increase knowledge about the dataset.

During this statistical analysis we performed the data preparation that included elimination, integration, recoding and calculation of variables. All these transformations are presented in detail in Additional file 1.

Eliminated variables – only variables with all missing values have been eliminated, that is, those variables whose information was not collected by doctors and nurses.

Integrated variables – in the newborn paper record, different variables collected repeated information, therefore we integrated the information of these variables into new ones.

Recoded variables – to facilitate the statistical analysis, some variables were also recoded (transformed).

Calculated variables – some variables, such as the dates of admission and discharge, were used to calculate new variables (e.g., length of hospital stay).

After the preparation of data, 60 out of 72 variables remained, plus the transcutaneous bilirubin levels. The final dataset was converted to be modeled using WEKA.

Modeling

To perform data modeling, different classification algorithms, often applied in medical datasets and implemented in WEKA, were chosen: J48 (implementation of the C4.5 algorithm, for generating pruned or unpruned decision trees), simple CART (a decision tree learner implementing minimal cost complexity pruning), naïve Bayes (a Naïve Bayes classifier using estimator classes), multilayer perceptron (a classifier that uses backpropagation to classify instances), SMO (implements John Platt’s sequential minimal optimization algorithm for training a support vector classifier) and simple logistic (classifier for building linear logistic regression models). Other similar methods were also used but without better results and, therefore, are not reported in this study.

The tests were performed using internal cross validation 10-folds. The internal cross-validation is used to determine how the quality of a learning algorithm will be affected in separate sets of data.The average performance on the test set provides an estimate of the performance of the classifier built from the entire data set [20, 24, 25].

xAll classification algorithms were tested for different subsets of variables and compared in terms of accuracy, sensitivity and specificity. For all subsets, we established a sensitivity of 90% and calculated the respective specificity due to the importance of high sensitivity values in medical decision. Standard error for all AUC measurements was estimated using the method proposed by Hanley and McNeil [26].

The different subsets corresponded to three different moments. First we used only risk factors that were obtained immediately after the newborns birth: Mother age; Father age; Head circumference; Mother pathologies; Mother usual medication; Gestational age; Physical exam report; Type of delivery; Newborn blood group (Rh); Newborn blood group (ABO) and Mother blood group (ABO).

Then, we also tested the algorithms with the TcB levels, without other risk factors, obtained until 24 hours of life of the newborn.

Finally, we tested the combination of the risk factors and the TcB levels at 24 hours of life of the newborn.

An approval was obtained from the Ethics Committee of the Centro Hospitalar Tâmega e Sousa, EPE, having the reference number 0568/2011.

Results

From the total of 227 newborn infants included into the study, 35 cases (15.4%) were diagnosed with hyperbilirubinemia and treated with phototherapy, the predictive outcome of the study.

The 35 newborn infants treated with phototherapy initiated treatment with a median age of 45.5 hours and early jaundice, detected before the newborn completes 24 hours of life, was present in 4 cases (11.4%).

In the first step, applying the algorithms to the clinical risk factors, a higher accuracy was obtained with Bayes net algorithm (AUC=0.74), followed by naïve bayes and simple logistic (AUC=0.72).

Using only the TcB levels obtained before 24 hours of life of the newborn, higher accuracy was obtained with the multilayer perceptron, the WEKA artificial neural network algorithm (AUC=0.84) followed by naïve Bayes (AUC=0.82) and simple logistic (AUC=0.80).

When combining clinical risk factors with TcB, at 24 hours of life of the newborn, higher accuracy was obtained with simple logistic algorithm (AUC=0.89) followed by naïve Bayes (AUC= 0.88) and Bayes net (AUC=0.87).

In all algorithms, except the multilayer perceptron, the combination of clinical risk factors with TcB levels allowed to improve the accuracy of prediction when compared with TcB or clinical risk factors alone.

Table 2 presents the results from the comparison of the different algorithms applied to data subsets.

Table 2 Comparison of the application of different algorithms to data subsets in terms of accuracy and specificity (for sensitivity of 90%)

Discussion

When compared with the traditional methods, the prediction with the application of data mining techniques offered interesting results.

Comparing with the literature, and specifically with a study from Chou et al. [14] which also sought to provide information for the indication for phototherapy, this study shows improved results with an AUC of 0.74, compared to the 0.69 presented in that study, although the differences are not statistically significant (the confidence intervals overlap). But, when compared with other studies, particularly a study by Newman, et al. [16] which seeks to predict bilirubin levels above 25 mg/dl, and safeguarding the differences, our study presented falls short of the 0.83 presented.

Despite not presenting so good results, decision trees models, generated using for instance J48 or Simple Cart, have the advantage of being more easily interpretable, especially when compared with closed models, usually called black box models, such as Artificial Neural Networks. This advantage makes the first to be more easily accepted by the medical community [24, 27].

Regarding the bilirubin assessment, the identified studies seek to predict the risk of subsequent hyperbilirubinemia using predischarge TSB values. In the present study we used the first day TcB level, to predict the need for phototherapy.

With the application of the multilayer perceptron algorithm, we obtained a slightly higher accuracy than Keren & Bhutani [17], with an AUC of 0.84, compared with AUC of 0.83, however, this difference is not statistically significant because our result falls in the confidence interval presented in their study.

However, in practice, because it presents better accuracy results, the pediatricians base their assessment in the combination of clinical risk factors with the bilirubin levels presented by the newborns. This is also the methodology supported by the international guidelines from AAP and NICE.

Applied to our dataset, the simple logistic algorithm returned better results than those presented by Newman, et at [16]: we obtained an accuracy of 0.89 compared to 0.86 in their study. Once more, this difference is not statistically significant, since the confidence intervals overlap.

In addition to the comparison of accuracy it is also important to make an interpretation of the generated models and compare them with clinical rules of thumb, that is, what actually prevails in practice.

Thus, taking as an example the results obtained with the simple logistic algorithm, which is one of the best performing models in all feature subsets, we found that, when applied to the subset containing risk factors and transcutaneous bilirubin levels, the variables with higher influence are, in descending order: TcB in the range between 8 to 16 hours, TcB in the range 16 to 24 hours, gestational age and newborn blood group (ABO).

It is interesting to note that, with regard to TcB levels, the range 8 to 16 hours has greater influence than the subsequent interval, between 16 to 24 hours. It is also important to underline that the first interval between 0 and 8 hours of the newborn life is not part of the generated model. This may be due to the low register of values in the first interval of 8 hours. However, it also reflects the importance of assessment and registration of TcB as early as possible, as supported by several studies.

Concerning risk factors, the algorithm used only the variables gestational age and newborn blood group (ABO) for building the model when, in daily practice, the presence of any risk factor guidelines described by the presence, for example, of cephalhematomas or previous sibling with phototherapy, are considered as an equal increase in risk for subsequent hyperbilirubinemia.

These results are similar to studies that indicate the gestational age as the most determinant variable in the prognosis of neonatal jaundice [28]. However, the newborn blood group (ABO) acquires a prominent position in the generated model, since it can be related to the cases of jaundice derived from blood incompatibility.

Resuming, preserving the differences, the application of data mining techniques allowed building high accuracy models, with results not lower than the traditional methods found in the literature.

As mentioned, the average age of newborns at the beginning of treatment is around 45.5 hours of life, a value very close to the possible time of hospital discharge. This makes us believe that an early correct assessment, which can be performed by the proposed methods – the application of data mining methods – can enable reducing effectively the time of admission, as well as prevent incorrect diagnoses for the same reason and reduce readmissions after hospital discharge.

Limitations

The predictive outcome, hyperbilirubinemia, defined differently in the compared studies, may constitute an important bias factor.

The use of other data mining software’s besides WEKA, with different implementation of data mining algorithms, could eventually lead to different results.

A bigger sample could also improve the obtained results.

Conclusion

Neonatal hyperbilirubinemia and kernicterus prevention is still one of the most defying problems that face pediatricians nowadays, even with the generalization of the AAP and NICE guidelines.

The main findings of this study showed that data mining techniques are important and valid approaches for the prediction of neonatal hyperbilirubinemia.

So, we recommend that new technologies, such as data mining, should be explored and utilized to support medical decision, contributing to improve diagnosis in neonatal jaundice.