Introduction

The first successful primary repair of oesophageal atresia (OA) was in 1941 [1]. Although there has been little change in the fundamental principles of surgery, i.e., division of the tracheo-oesophageal fistula (TOF) and meticulous oesophageal anastomosis, there has been a big improvement in survival of infants with this condition. Recent national studies have shown survival rates of just over 90% [2, 3].

Several classification systems exist which predict mortality based on pre-operative factors. Over 50 years ago, Waterston pioneered these prognostic classifications, identifying low birth weight, associated congenital anomalies and pneumonia as the main risk factors [4]. Thirty years later, Poenaru found birth weight not to be relevant as an independent predictor in their series and proposed a simpler, two-part Montreal classification based upon the presence of co-existing anomalies and ventilator dependence [5]. One year later, perhaps the most widely quoted classification system by Spitz proposed low birth weight and/or the presence of cardiac anomalies as predictors for mortality [6]. Having very low birth weight (VLBW) and cardiac anomalies predicted 22% survival in their series (group 3). Twelve years later, Spitz provided an update on the mortality rates in each of the groups, with those in his group 3 now having equal chance of survival or death [7, 8].

These two more recent classifications are now over 20 years old and although they were published in consecutive years, it is interesting to note that they do not agree on whether low birth weight is a risk factor for predicting mortality. Since their publication, there have been large improvements in neonatal critical care and the treatment of babies born with congenital cardiac anomalies. In the United States in 2012, the neonatal mortality rate was 4.01 per 1000 live births. This mortality rate had fallen by approximately 75% from 1960 and by almost 25% since 1993 [9, 10]. Over a 15-year period up to 2005, there has been a reduction in the mortality of infants born with congenital heart disease of 77% [11].

The classification systems can be used to counsel parents on the likelihood of their child surviving if they request this information. It is also useful for individual centres to compare their outcomes with others. A centre that has a higher number of OA/TOF patients in Spitz group 3, for example, would expect a higher mortality rate than a centre where they are mainly within groups 1 and 2.

Currently there is no pre-operative classification system to predict morbidity nor, to our knowledge, have existing mortality classification systems been tested to see if they also predict post-operative morbidity.

The aims of this study were several. We sought to determine whether the existing mortality classification systems in OA/TOF were still applicable and to find out which was the most accurate at predicting mortality. We investigated which independent risk factors predict mortality. We also sought to determine which factors were significant in predicting post-operative morbidity and to determine which of the current classification systems could be used to predict morbidity.

Methods

All neonates with pure OA or OA/TOF treated at our institution over a 20 year period from January 1990 until December 2010 were identified. Those with case notes available were included in this study. Data were collected including gestation and birth weight, presence of co-existing congenital anomalies and pre-operative ventilator dependence or pneumonia. Each patient’s outcome including mortality, length of hospital stay and number of days of ventilator dependence was recorded.

Patients were then scored under each of the three classification systems (Waterston, Montreal and Spitz). Logistic regression was used to check that the classification systems predict mortality and then Cramér’s V and Pearson contingency were applied to this nominal data to test the strength of association between score within a classification and death. Univariate logistic regression analysis was applied to identify pre-operative variables as predictors of mortality and then multivariable logistic regression was performed to identify independent pre-operative risk factors for mortality.

We defined morbidity as the length of neonatal unit stay and the duration in days of post-operative ventilatory support (days of intermittent positive pressure ventilation (IPPV) plus continuous positive airways pressure (CPAP) support) for survivors. To test whether the mortality classifications also predicted morbidity, we performed robust regression and checked whether a significant trend existed across the ordered groups (with each group containing ordinal data) using Cuzick’s test. Kendall’s tau b co-efficient was calculated to measure the correlation between score and number of days of either ventilatory support or length of stay. Multivariable regression analysis was performed to identify pre-operative variables that predicted either a long stay, prolonged ventilatory support or both.

The data were analysed using LibreOffice 5.0.3.2 (The Document Foundation), Number Crunching Statistical System 10 (NCSS Inc) and R 3.2.2 (R Foundation for Statistical Computing). Statistical significance was defined for two-sided P < 0.05.

Results

248 patients had case notes available over the study period. The mean birth weight was 2470 g (range 600–4350 g) illustrating that exactly half of the patients had low birth weight. 107 patients were born prematurely with a gestation less than 37 weeks (median 37 weeks, range 24–42 weeks). There were 130 males and 118 females. 15 patients had pure OA in the study period and there were 220 survivors discharged (11.3% mortality).

Case notes were not available for ten neonates. Eight did not undergo operation and died. Of the remaining patients, 207 underwent primary anastomosis of the oesophagus and 33 underwent an initial gastrostomy placement with or without ligation of fistula, subsequently undergoing either delayed primary anastomosis or oesophageal replacement.

Comorbidities

Two-thirds of the patients had other congenital anomalies (166/248). 95 patients had congenital cardiac disease and all cardiovascular conditions were recorded and scored according to severity for the purpose of multivariate analysis. The scoring system and the number of patients assigned each score is given in (Table 1). 26 patients had 3 or more features compromising the VACTERL association.

Table 1 Cardiac severity score

Mortality

The distribution of patients within the Waterston, Montreal and Spitz classification systems based upon pre-operative factors is shown in (Table 2). All three classification systems were found to be highly significant at predicting mortality (Table 3).

Table 2 Mortality data by classification system
Table 3 Relative performances of the scoring systems

The relative performance of the classification systems was assessed using Cramér’s V, Pearson contingency and Chi-square tests (Table 3). Larger values indicate better discrimination and for all these tests, the Montreal classification was found to be the best predictor of mortality, followed by the Spitz classification.

Univariable analysis of pre-operative variables and mortality revealed that weight, gestation, prematurity, cardiac score and the presence of pneumonia were all significant predictors of mortality. Multivariable logistic regression analysis was then performed with weight entered for gestation and prematurity to minimise autocorrelation. Independent risk factors for mortality were found to be co-existing congenital cardiac conditions, the presence of pre-operative pneumonia and low birth weight. Each 100 g increase in birth weight was associated with a reduction in mortality risk (odds ratio 0.84) (Table 4).

Table 4 Multivariable pre-operative variables and mortality

Morbidity

220 survivors were discharged alive (88.7%). The median duration of time spent on post-operative ventilatory support (IPPV and/or CPAP) was 4 days, (IQR 2–8 days). The median length of neonatal surgical unit stay was 17 days, (IQR 12–32 days).

Waterston was the only classification to significantly correlate with duration of ventilatory support (Table 3). Moving up in Waterston class resulted in a 1.3 day increase in ventilatory time. Of interest, when we originally ran this study for the first 10 years of data, this figure was 3 days. Both Waterston and Spitz significantly correlate with length of stay, with Waterston correlating closer than Spitz. Moving up in Waterston class translates into an increase in length of stay of 3.3 days.

Multivariable robust regression analysis was used to identify independent risk factors (pre-, peri- and post-operative) which had a significant effect on length of post-operative respiratory support (IPPV + CPAP) and on length of neonatal surgical unit stay. Complete data was available for 195 neonates with regards to respiratory support and 201 neonates for length of stay and the analysis was based on these patients. There was a longer duration of post-operative ventilation required for neonates with low gestation, pre-operative pneumonia or in whom a primary closure was not achieved at first operation. A longer neonatal unit stay was predicted by low gestation, congenital cardiac disease, failure to achieve primary closure at initial operation and by a longer duration of initial surgery.

Discussion

Over a 20-year period, our results have shown an overall mortality rate of 11.3% which is in keeping with published national studies [2, 3]. During the study period, we had a smaller than expected number of neonates with pure OA (6%).

All of the classification systems were found to be highly significant at predicting mortality with the Montreal system having the strongest association between Score within the classification and death. The next best performing system was Spitz. Classification into the high risk Montreal category, in our series, equated to a 35 times risk of death (95% confidence interval 12.7–107) compared with the low risk category. Montreal has only two categories compared with the three categories of both Waterston and Spitz which may be why it appears here as the best performer. Of the systems with three classifications, Spitz was the best performer.

Independent risk factors for mortality

Two large recent national population studies from the United States (US) have reported on risk factors for mortality in OA. Sulkowski et al. searched the Pediatric Health Information System (PHIS) from 1999 to 2012 and found 3479 patients from 43 PHIS hospitals. In-hospital mortality in these 43 hospitals in the first 2 years of life was 5.4% [12]. Amongst other factors, they found birth weight, congenital heart disease, other genetic anomalies and pre-operative ventilation to be independent predictors of mortality by multivariable logistic regression. Of note, almost 50% were ventilated pre-operatively (the figure in our series was 17.8%). To determine the accuracy of their data, they reviewed all the charts from two of the 43 hospitals and found that 13.6% of the birth weights were classified wrongly (by wrongly they meant more than 100 g difference). The effect on mortality they found for birth weight was 0.88 less odds of dying per increase in birth weight by 100 g. This was similar to our odds ratio figure of 0.84 per 100 g of birth weight.

Published in the same month, Wang et al. queried the US based Kids’ Inpatient Database (KID) covering the period 1997–2009 [2]. They identified 4168 cases of OA with overall in-hospital mortality of 9%. They identified birth weight <1500 g (odds ratio 4.5), operation within 24 h (indicator of need for urgent surgery) and the presence of ventricular septal defect as independent predictors of in-hospital mortality. They recognise the limitations of KID and state that birth weight information was not collected prior to 2003. They urged that if possible, fistula repair should be delayed to beyond 24 h after birth if surgery is not required.

Previous studies looking at mortality in OA have conflicting findings as to whether birth weight is a significant factor [12,13,14,15] or not [16,17,18,19]. It seems apparent that low birth weight ought to predict mortality in OA. The larger series tend to show that weight is an independent predictor as does our series of 248 patients. Of the studies that did not find an effect of weight, it is interesting to note that most of them have small numbers—less than one hundred patients. The Waterston and Spitz classification systems, with 218 and 372 infants, respectively, propose weight as a predictor for mortality but the Montreal classification, which does not, only based their calculations on 95 patients. It is possible that some of these studies are suffering from too small sample size and therefore a type II error.

A similar criticism of size could be made of this series of 248 patients. To achieve these numbers, we have had to go back 20 years during which time there have been further advancements in the neonatal care of VLBW infants. The studies by Sulkowski and Wang [2, 12] query large US nationwide hospital databases. The problem with studies of this nature is that they are retrospective and data accuracy becomes a problem as highlighted. For rare conditions, such as OA, larger studies with several hospitals collaborating [20] or prospective studies such as those organised by the British Association of Paediatric Surgeons Congenital Anomalies Surveillance System (BAPS-CASS) [3] allow for more accurate data collection. This study should help to inform what parameters will be important to collect prospectively in such surveillance systems. At present, however, these still rely on individuals at each centre manually collecting and returning the data and automation of this process with collaboration, for example, with the information technology systems widely used in UK neonatal units, would increase the returns and accuracy of such data.

Morbidity

Waterston was the only classification useful for prediction of duration of ventilatory support. Unsurprisingly, independent risk factors for a longer period of time spent ventilated were low gestation, pre-operative pneumonia or those in whom a primary anastomosis was not achieved at first operation. The first two of these factors are represented in the Waterston classification, albeit birth weight in lieu of gestation.

Both Waterston and Spitz can predict post-operative length of stay, with Waterston being the more accurate predictor. Low gestation and congenital cardiac disease are two of the independent risk factors for a long stay which are represented in both Waterston and Spitz and these are in addition to intra-operative factors of failure to achieve primary closure and a longer duration of surgery.

Conclusion

The presence of low birth weight, co-existing congenital cardiac disease and pre-operative pneumonia remain independent risk factors for increased mortality in oesophageal atresia. Montreal, Spitz and Waterston all remain able to predict mortality and Waterston is able to predict post-operative morbidity in terms of length of stay and duration of ventilatory support.