Recent advances in therapeutic protocols and medical facilities highlight the need for accurate prediction systems [1]. Such risk prediction models can be used in tasks such as benchmarking for the evaluation the effectiveness and efficiency of pediatric intensive care (PICUs), early detection of critically ill patients, and optimizing resource allocation which may result in better quality of care and patient safety, especially in low and middle-income countries [2]. These countries including Iran have scarce resources, especially when a surge of critically ill pediatric patients leads to disproportionate disbalance between needs and available resources.

In this circumstance, employing an accurate, well-validated, and easy-to-calculate risk assessment instrument can benefit prioritizing patients and optimizing resource use in the PICUs [3, 4]. Such risk assessment instruments can be based on scoring systems, which use the worst physiological and laboratory values during the first 12–24 h of admission to indicate severity of illness. A higher score represents higher severity [5].

Multiple scoring systems have been introduced, some of which are widely used to predict the risk of death in children such as the Pediatric Risk of Mortality (PRISM) and the Pediatric Index of Mortality (PIM). The PRISM was developed using the data collected from 11,165 patients admitted to PICUs in the USA [6] whereas, the PIM was developed based on the data of ICUs located in the UK, Australia, Ireland, and New Zealand [7]. The third version of these scoring systems (PRISM-3 and PIM-3) is commonly used in the Intensive Care Units (ICUs) for years after its introduction [8]. Over the last decade, there have been significant advances in pediatric intensive care among developing countries. However, in countries with low- and middle-income as well as with a higher pediatric population, there is still a need to PICUs, a greater number of competent health care professionals, timely access to required medicine, and equipment to successfully contribute to the reduction of pediatric mortality. The predictive performance of models based on the PRISM-3 and PIM-3 scores for PICU mortality and in-hospital mortality are not well understood, especially in developing countries including our country. Hence, this study is aimed at evaluating and comparing the predictive performance of prediction models based on the PRISM-3 and PIM-3 scores in a sample of patients admitted to the PICU a developing country [9]. Hence, this study is aimed at evaluating and comparing the predictive performance of prediction models based on the PRISM-3 and PIM-3 scores in a sample of patients admitted to the PICU a developing country.


Study design and setting

We designed a multicenter, retrospective cohort study of severely ill children admitting to the six tertiary PICUs at two university hospitals for a period of 12 months, from December 2017 to November 2018, in Mashhad, northeast of Iran. Each hospital had an average of 1500 admissions per year and each PICU was equipped with an average of 7.2 beds.

Both centers are general pediatric hospitals and admit all sorts of cases (medical and surgical). However according to the subspecialty approach of the hospitals, the majority of oncology, nephrology and hematology cases were treated in “hospital A” and many cases of surgical, rheumatology, lung, infectious, gastrointestinal and neurological were treated to “hospital B”. The PICUs in each hospital do not differ in terms of the type of patients referred, but since smaller beds are intended for younger patients, the only difference is related to age classification.

Study population

All children (i.e., aged younger than 18 years) admitted to the PICU were eligible in the study. We excluded from the analysis patients with brain death at the time of admission, and patients who stayed in PICU for less than 2 h and discharge or expire before 24 h of admission. In addition, those patients who were referred to subspecialty hospitals are also excluded. Individuals with missing values for the main variables which is essential for calculating the scores were imputed using the chained equations approach implemented in the mice package available in R. It should be noted that re-admissions due to different diagnoses were considered as new admissions.

Study variables

The following variables were collected: age, gender, diagnosis, other main variables for calculating the scores, as well as length of stay (LOS) at both PICU and hospital, and the two outcome variables PICU mortality and in-hospital mortality.

The key variables were collected to calculate the PRISM-3 score: arterial blood gas, glucose, creatinine, Glasgow Coma Score (GCS), respiratory rate, systolic and diastolic blood pressure, heart rate, pupillary reactions to bright light, blood urea, potassium, platelet, white blood cell count temperature, Prothrombin Time (PT), and Partial Thromboplastin Time (PTT) [5]. The variables used to calculate the PIM-3 score were as follows: Low, high, or very high-risk diagnosis. Low-risk diagnosis including: asthma, bronchiolitis, croup, obstructive sleep apnea, diabetic, ketoacidosis, and seizure. High-risk diagnosis including: spontaneous cerebral hemorrhage, cardiomyopathy or myocarditis, hypoplastic left heart syndrome, neurodegenerative disorder, necrotizing enterocolitis. Very high-risk diagnosis including: cardiac arrest, severe combined immune deficiency, leukemia or lymphoma, bone marrow transplant recipient, and liver failure.

systolic blood pressure, base excess, type of admission (Emergency, referral, and elective), FiO2, PaO2, mechanical ventilation support, recovery from surgery as the main reason for admission to the PICU, and admission due to cardiac bypass [10]. The other vital signs and.

Since these two models categorized the age variable according to the month. We also considered the month as a unit of age (see Table 1 which summarizes these models in terms of variables, unit of variables, the formula for calculating and their point assignment schemes). It should be noted that all of the units associated with each variable is provided in Tables 2 and 3. The unit for the LOS was consider day.

Table 1 The point assignment scheme of each scoring system
Table 2 Baseline demographic and clinical characteristics of the patients admitted to the PICU with available and missing data
Table 3 Baseline demographic and clinical characteristics of the patients admitted to the PICU after imputation

Statistical analyses

Normality of continuous variables was assessed via the ShapiroWilk test. The data did not follow a normal distribution, so we compared the groups by utilizing nonparametric techniques. The Mann-Whitney U test was used for comparisons of continuous variables between survivors and non-survivors. The Chi-square test or Fisher’s exact test were also used to compare categorical data. Data were presented as median (IQR) for continuous variables and as the frequency (%) for categorical variables.

The PIM-3 and PRISM-3 sores were calculated retrospectively by the researchers for each patient based on the measurements at the time of admission to the PICU. The formula for calculating PRISM-3 and PIM-3 score is presented in Table 1.

After calculating the scores of each scoring system, we applied logistic regression analysis to predict both PICU and in-hospital mortality as response variable by using the PRISM-3 and PIM-3 scores as the explanatory variables, separately. The logit formula was used to calculate the probability of mortality as following:

$$\mathrm{P}=\frac{1}{1+\exp \left[-\left({\beta}_0+{\beta}_1X\right)\right]}$$

0: Intercept; β1: Coefficient of the score; X: score)

Then the predictive performance of the models was assessed in terms of the overall accuracy, discrimination, and calibration. The discrimination ability of the probabilistic models as measured by the AUC is exactly the same as the discrimination ability of the original scores they are based on. This is because the model keeps the same order in the probabilities as in the scores (i.e. if we sort the probabilities in ascending order it will result in the same order as with the score). The probabilistic model however allows us to investigate the additional performance measures of calibration and the Brier score.

The predictive performance of the models was quantified with respect to the accuracy of the predicted probabilities, discrimination, and calibration. The accuracy between the predicted and observed probabilities was assessed by the Brier Score (BS), which is the mean squared difference between the observed and predicted outcome and using a standardized mortality ratio (SMR), which is the ratio of the risk-adjusted observed mortality to the expected mortality derived from the development set where the score was developed. Discrimination between survivors and non-survivors was quantified by the Area Under the Receiver Operating Characteristic Curve (AUC). Calibration, which is a measure of the agreement between the predicted and observed probabilities was assessed by calibration and lack of agreement was tested by the Hosmer-Lemshow. Moreover, the Negative Predictive Value (NPV), Positive Predictive Value (PPV), specificity, and sensitivity were calculated using the Youden Index threshold [11]. We used bootstrapping with 1000 samples to internally validate the model and calculate the bias-corrected estimate of the AUC and its confidence intervals (CI) and the Delong’s method was used to compare the two AUCs. Statistical significance was set at the 0.05 p-value level. All analyses were performed using the R statistical environment (with packages rms, Hmisc, pROC, and mice).


In total, 3000 patients were eligible and met the inclusion criteria. After applying the exclusion criteria, 2784 patients remained for further analyses (Fig. 1). The data had about 11.3% of missing values, which were imputed as described in the Statistical Analysis section.

Fig. 1
figure 1

The flowchart diagram of the patient inclusion process

The PICU and in-hospital mortality were 12.14 and 15.58%, respectively. Table 2 and Table 3 demonstrate the baseline characteristics of the study population before and after imputation. The median length of both the PICU and hospital stay were 7(3–13) and 8(4.7–16), respectively (See Table 2 and Table 3).

A total of 1379 (56.4%) patients were male and the median age of the patients was 4.2 months (IQR: 0.66–24) the majority of the patients were younger than 12 months (65.43%). Generally, of the demographic profile, age was associated with outcome (p < 0.001) while gender did not show any significant influence on the outcome (p = 0.13).

The congenital malformation, digestive system disease, and patient with the respiratory diseases accounted for 24.5, 13.8, and 10.6% of the admissions, respectively. The cause of mortality according to the ICD-10 coding system were as follows: 88 (22.3%) congenital malformations, 54 (13.7%) digestive system diseases, 51 (12.9%) the respiratory diseases, 43 (10. 9%) blood and neoplasm diseases, 23 (5.82%) kidney diseases, 22 (5.6%) infectious diseases, 18 (4.6%) metabolic diseases, 14 (3.5%) cardiac disease, 8 (2%) neurological disorders, 7 (1.8%) perinatal diseases, and 67 (16.9%) other diagnostic groups.

The mean score of PRISM-3 and PIM-3 were 6.9 ± 6.5 and 3 ± 2.8 respectively. About 48% of patient were referral cases, 30% were brought in by emergency medical services, and 9.4% of patients required mechanical ventilation support.

As shown in Tables 2, 3 and Table 4, predominantly those patients in the age group of 12–144 months had the worst outcome, and this pattern is similar in both PICU (19%) and in-hospital mortality (38.5%). These patients were mainly assigned the diagnosis belonging to neoplasm, circulatory, respiratory, and also digestive system categories.

Table 4 Predictive characteristics of PRISM-3 and PIM-3 to predict PICU and in-hospital mortality

The linear predictors of the logistic regression models presented per outcome, separately, are:

For predicting PICU mortality:

$$\mathrm{PRISM}-3:-3.056+0.174\times \mathrm{PRISM}-3\_\mathrm{score},\mathrm{and}\ \mathrm{PIM}-3:-3.075+0.297\times \mathrm{PIM}-3\_\mathrm{score}.$$

For predicting in-hospital mortality:

$$\mathrm{PRISM}-3:-3.094+0.166\times \mathrm{PRISM}-3\_\mathrm{score},\mathrm{and}\ \mathrm{PIM}-3:-2.772+0.312\times \mathrm{PIM}-3\_\mathrm{score}.$$

The BS, SMR, AUC, HL-test, and other characteristics of both models for PICU and in-hospital mortality prediction, as well as according to age groups are presented in Table 4. The BS of the PRISM-3 and PIM-3 was 0.088 and 0.093 for PICU mortality and 0.108 and 0.113 for in-hospital mortality. The SMR of the PRISM-3 and PIM-3 was 1.34 (CI 95%: 1.19–1.49) and 1.37 (CI 95%: 1.21–1.52) for PICU mortality and 1.73 (CI 95%: 1.56–1.90) and 1.78 (CI 95%: 1.6–1.95) for in-hospital mortality, respectively. The PRISM-3 demonstrated significantly higher discrimination power in comparison with the PIM-3 (AUC = 0.831 vs 0.745) for in-hospital mortality and (AUC = 0.781 vs 0.737) for in-hospital mortality. The HL test revealed poor calibration for both models in both outcomes. The difference in the AUCs for PRISM-3 and PIM-3 models are significantly significant (P = 0.001) (see Fig. 2 and Table 4). The calibration graphs of both models are shown in Fig. 3.

Fig. 2
figure 2

Receiver operating characteristic curve of the PRISM-3 and PIM-3 in hospitals

Fig. 3
figure 3

Calibration curves for the observed mortality against predicted risk of death for PIM-3 and PRISM-3 models in hospitals


Main findings

This multi-center study aimed to evaluate the performance of the models based on the PRISM-3 and PIM-3 scores in predicting both PICU and in-hospital mortality. We found that the overall performance of PRISM-3 and PIM-3 were comparable for in-hospital mortality in terms of the Brier score. The discrimination power of PRISM-3, however, was significantly higher than the PIM-3 for both PICU and in-hospital mortality. Interestingly, when considering PICU mortality as an outcome, the PRISM-3 appears to be much more discriminative (AUC: 0.78 vs 0.83). A possible explanation is that predicting a short-term outcome is easier than a longer-term outcome. With the exception of the adolescent age group, the PRISM-3 was far superior in predicting PICU and hospital mortality than PIM-3.

The models were not well calibrated in predicting PICU mortality nor in-hospital morality. One possible explanation is that the original models were developed for western populations and are now being applied to an Asian population in a developing country. Generally, with respect to the discrimination ability, the PRISM-3 performed significantly better than the PIM-3. A possible explanation for this is the consideration of more important factors. However, PRISM-3 requires the collection of 17 variables while the PIM-3 requires the collection of only 12 variables which makes the former a more demanding model [11]. Practicality, just as clinical sensibility, may play an important role in clinical applications. Generally, the purpose of designing a prediction model is to offer a reliable model that can be transported and used in clinical practice; hence, it is critical to choose a model that is reasonably simple but does not sacrifice substantial predictive performance. The requirement for a succinct decision method may be even more important in the PICUs in developing countries, where clinicians frequently deal with complicated and severely ill children as well as limited resources. Having an objective method, using either the more complex model if applicable, or the simpler model if that is opportune, can assist them in prioritizing and managing complex patients, as well as enhancing benchmarking indices. Our findings reveal that the PICU and in-hospital mortality were 12.4 and 16.4%, respectively. The PICU mortality in our study is much higher than in European and the US PICUs (12.4 vs 2.5) [12, 13]. However, the PICU mortality rate in our study is situated in the middle of the mortality range in developing countries (range from 8.4% for Korea to 40% for Egypt). There are various reasons for the discrepancy in mortality rate between our study and in western countries. To begin with, the two centers in our study are referral hospitals so they frequently deal with the most critically ill patients. In addition, the disease profile in the present study is also different from studies in western countries. For instance, the majority of patients have also suffered from congenital malformation, digestive and respiratory and cancer disease, and treatments were more challenging for these patients. Furthermore, the other explanation of higher mortality in our study compared to developed countries is the difference in quality and standards of care, equipment used, and the relatively undeveloped medical care level. So, these differences necessitate a significant effort for improvement.

In comparison to previous investigations that have been conducted in Iran, the Middle East, and Asia, this is one of the largest studies that examine the prognostic performance of the PRISM-3 and PIM-3 in predicting pediatric patient outcomes (both PICU and in-hospital mortality). All of those studies were performed in Iran were single center and the median sample size was only 221 (min-max: 90–365) tend to be located at higher frequencies in male gender and the majority of the patients were included samples was younger than 40 months. These investigations determined that the PRISM-3 differential power was between the range of fair (AUC:0.70–0.80) to adequate (AUC:0.80–0.90).

In general, with respect to Table 5, most of related studies have been performed on small samples, the median AUCs for the PIM-3 and PRISM-3 in similar studies were 0.82 [min-max: 0.72–0.89] and 0.82 [min-max: 0.56–0.93], respectively [2, 12, 16, 17, 21, 29, 31, 33,34,35]. In most of the studies, the AUC of the PRISM-3 was higher than PIM-3. A study found that the AUC of PRISM-3 was significantly higher than the PIM-3 (P = 0.04) [21], which is in line with our findings. Moreover, two studies reported poor discrimination measures for the PRISM-3 scoring system (AUC =0.667 [12] and 0.56 [33]), which might be due to specific conditions of their study sample (e.g., children receiving extracorporeal support for respiratory failure).

Table 5 Published evaluation studies of different versions of the two models in PICU

In some studies, the Hosmer-Lemeshow test was used to evaluate the (lack of) concordance between observed versus predicted outcomes of the PIM-3 scoring system, which resulted in significant p-values (P = 0.003, P = < 0.001) [12, 41]. The PIM-3 performance was also evaluated in 49 PICUs in Argentina with 6602 patients aged between 1 month and 16 years and observed mortality rate was 8% (531/6602), whereas the predicted mortality by PIM-3 was 6.16% (407 deaths), moreover, the Hosmer-Lemeshow test showed disagreement between the predicted and observed mortality rates (χ2 = 135.63; P < 0.001) [36], supporting our result and Sankar and Wolfler studies [14, 41].

In our study the PRISM-3 model was well-calibrated, which is in line with findings provided by similar studies [12, 18, 19, 23, 27, 29, 32, 34, 35, 42]. However, there were also contradicting results showing poor calibration of these scoring systems. Aside from differences in the populations and selected sub-populations, this can also be due to the characteristics of the Hosmer-Lemeshow test as it is sensitive to the sample size (with larger sample size it tends to reject the null-hypothesis of agreement between the predicted and expected probabilities of the event) and cutoff points.

In several studies, it has been reported that a higher risk of mortality is associated with mechanical ventilation [2, 10, 23, 34]. The multivariable analysis of the Balkin et al. study showed that the ventilation support had the highest odds ratio among all covariates (OR: 2.1, 95% CI: 1.7–2.6), which is in line with our findings (P < 0.001) (11). This result was also confirmed by other studies, indicating the higher mortality rate for the patients admitted to the ICU with a higher number of organ failures [32, 35]. Also the prospective study in a pediatric oncology intensive care unit demonstrated that there is a significant relationship between mortality rate and diagnosis, the number of organ failures and ventilation support (P = 0.03, P < 0.001, P < 0.001, respectively) [23]. The presence of high urea and high creatinine, which often reflect low cardiac output or shock, suggests that renal function is an important prognostic indicator of mortality [2, 32].

Strengths and limitations

We conducted the analysis in a large heterogeneous multicenter cohort. In addition, we used a comprehensive battery of performance measures and conducted a rigorous internal validation using bootstrapping [43,44,45]. There are also some limitations in the present study which are important to mention: First, the original scoring systems were based on the worst value for each variable in the first 24 h, whereas in the current investigation, measures were obtained during the first hour of admission. However, by fitting the logistic regression model based on the scores ameliorates this limitation. Second, due to the retrospective study in some cases we did not have all the key variable required to calculate the scores. However, we used imputation to cope with the missing values. Third, although we considered all types of disease in our study, many patients with heart disease are directed to heart hospitals and are not in our cohort, which hence contains a limited proportion of heart patients. In this sense the cohort is not representative of those subgroups of critically ill patients. Future studies are needed for developing these models in other populations and for externally validating these models.


The prediction model based on PRISM-3 had superior predictive performance of that based on PIM-3 in discrimination, calibration, and accuracy of predicted probabilities. Further large validation studies are needed to consolidate these findings.