In the study by Haniffa et al., the authors have brilliantly proposed “TropICS” as the first multinational prognostic model for critically ill patients in resource-limited settings. The study used data from four South East Asian nations, involving a large number of unselected ICU patients for development and validation of three prognostic models. Of the three proposed models, TropICS performed better than APACHE II and SAPS II in terms of discrimination and calibration. It is interesting to see them test the three models, which varied in inclusion of clinical, laboratory and treatment variables, which were selected based on multivariate logistic regression. The models have tried to address the variable availability and affordability of the parameters measured [1].

However, it would be prudent to note that the complete case availability for APACHE II was only 15% [1]. Moreover, as diagnostic, therapeutic, and prognostic techniques in the ICU evolve over time, the scoring systems need to be updated [2]. Newer APACHE versions, such as APACHE IV, which is based on a larger database and in which the selection of variables and their weights is based on multiple logistic regression, perform better than older versions [3]. These factors together might have caused the lower performance of APACHE II compared to the other proposed models.

Disease-specific scoring systems are increasingly being used. The SOFA score has been proposed to quantify organ dysfunction, as per Sepsis-3 [4]. Some of the parameters of the SOFA score like PaO2, serum creatinine, and bilirubin level have poor availability (less than 50%) in the databases of the new models, raising questions about the feasibility and validity of using the SOFA score to quantify organ dysfunction in resource-poor settings [1]. There may be a need to develop a simpler and more feasible scoring system to recognize sepsis in resource-limited settings [5]. Laboratory variables of TropICS, blood urea and hemoglobin level, were available for only 50% of the patients [1]. Considering the wide variability of availability of resources between low and middle-income countries (LMICs) and even within the same nation, TropICS needs to be validated across these settings before assuming global applicability in places with limited resources.

Rather than depending on a single score at admission to ICU, change in score over time (like the delta SOFA score) may reflect the progression of organ dysfunction over time and can be helpful for better prognostication [2]. Future studies may attempt to explore the utility of the delta score for TropICS as well.

Authors’ response

We would like to thank Dr. Shrestha for his insightful comments. Our systematic review attributes the variable performance of prognostic models in LMICs to three main factors: missing data and subsequent normal imputation; differing endpoints due to unknown hospital outcomes; and the inadvertent inclusion of paediatric patients [6]. Other reasons more commonly attributed to poor performance of prognostic models in such settings include differing case-mix when compared to high income countries (HICs) and a predominantly healthy and younger population. These are, however, often difficult to evaluate given the limited investment in infrastructure (for example medical registries) enabling the systematic collection of data. We agree that the reason for the poor performance of APACHE II in our dataset may be due to missing data [7]. However, for the same reason, newer and more complicated versions such as APACHE III or IV are unlikely to fare any better given their greater data collection burden. It is also important to note that such missingness is not limited to LMICs or indeed to the ICU [8]. The development of simplified models such as TropICS, R-MPM and qSOFA can be seen as attempts to overcome this impediment.

As variability between and within countries is likely to persist, and as all models require regular validation and refinement (for example by recalibration), prognostic model selection and assessment are perhaps better guided, at least initially, by data availability; by matching models and their variants to the degree of missingness [7]. Subsequent validation using the classic statistical tests of discrimination, calibration and accuracy can then follow. Further research to investigate the impact of the extent of missingness on prognostic models and strategies to mitigate this seemingly ubiquitous problem, beyond traditional statistical methods, is ongoing in multiple LMIC settings by our group.

In addition to the quest to refine statistical methodology to improve model performance, we recommend that the validation of such tools should be seen in the context of ‘real world performance and impact on patient outcomes’ [9]. To have greater applicability to improving the quality and equity of care provided, the priorities of clinicians, researchers and administrators in the target setting need to be better understood. Could the biggest barrier to the validity and applicability of such tools be the disconnect of the necessary stakeholders from processes established as indispensable in HICs: benchmarking, audit and quality improvement, and risk stratification for clinical trials?