When a patient is admitted to the intensive care unit, the initial focus of clinicians and families is on survival. However, many patients survive their critical illness and the quality of survival quickly becomes an important consideration. In an ideal world, we would be able to predict which patients are likely to have a good recovery after a period in the intensive care unit (ICU), and which patients will suffer ongoing functional impairment. To this end, previous research has developed prediction models for disability and quality of life after critical illness. These prediction models may assist with shared decisions about admission to ICU, levels of organ support, duration of care and discharge planning [1,2,3].

It is with these important considerations in mind that we consider a paper published in this issue of Intensive Care Medicine. Ohbe and colleagues describe novel risk prediction models for new functional impairment in survivors of critical illness at hospital discharge using routinely collected predictors within 48 h of admission [4]. The primary outcome of this study was functional impairment, measured with the Barthel Index. The Barthel Index is a valid and reliable outcome measure that evaluates functional independence in ten activities of daily living [5]. In Japan, it is used to measure independence for all patients in acute care hospitals at the time of hospital discharge. The strengths of the study include the multicentre design (350 centres) and the large number of eligible patients used in the models (19,846 eligible ICU patients). However, there are some important limitations of the risk prediction models that need to be considered in clinical practice with regard to patient selection. The study excluded anyone with pre-existing functional impairment, elective surgery and patients who died within 2 days. This limits the generalisability of the model.

While it is recognised that prediction models may contribute to clinical decision-making in the ICU, as well as improving care after ICU discharge, the vast majority of developed ICU prediction models hardly find their way to clinical practice [6]. Many prediction models do not get further than the development stage and are thus not proven generalizable to other settings. A crucial question remains unsolved: How well does a model fit in your clinical practice when it is not validated? In this regard, the study of Ohbe and colleagues is appreciated for both developing and validating models for early prediction of new onset of functional impairment.

To support further the general applicability of prediction models, it is important to choose the best way of external validation, including data from other centres, and/or from a different time. The authors performed a temporal validation, splitting their data into two periods (2014–2019 and 2019–2020), including similar patient groups from different timeslots, instead of patients from different hospitals. Not surprisingly, most data between the development and temporal validation sets did not differ, resulting in a small or no change in discriminative power and calibration. Thus, their temporal validation was closer to an internal validation than an external validation. Another way of performing internal validation could have been via bootstrapping and assessing the calibration performance (e.g. slope) to detect any form of over-fitting throughout. Splitting the dataset into two groups from different centres (with likely significant differences in several patient variables) would have improved the external validation of the model, as patients differ across hospitals, e.g. university or non-university, rural or city hospitals. In this instance, a high discriminative power and calibration of the model would have indicated the robustness of the prediction model and would have further increased the generalisability. Further research will be required to validate the reported prediction models with an additional external dataset [7].

Beyond the issue of generalisability, other barriers mentioned by Ohbe and colleagues may impede the applicability of their prediction models to clinical practice. For instance, while it is easy to report prediction models derived by classical regression techniques as simple, ready-to-use equations, this format is not suitable for models including numerous predictors or for machine learning algorithms. While Ohbe and colleagues reported the equations of a logistic regression model and an elastic net model (for which updating by re-estimating the unreported intercept will be needed before implementation), the real-life usage of more complex models requires developing specific software systems, which need to be compatible with those of the hospital centres where the implementation of the models is intended. In practice, data about predictors of a given patient need to be collected in real time and included in the models to return predicted outcome values for that particular patient. In realistic situations, when one (or more) predictor value(s) is missing for that specific patient, the data of other comparable patients may be used to impute plausible values as substitutes for the missing one(s). Such a real-time imputation of missing predictors may be essential to the implementation of clinical prediction models in daily practice, for an exhaustive collection of all predictors is unlikely for every patient [8]. From this, ethical issues may arise, such as questions of data record, extraction, storage and property. Data need to be collected to inform prediction—yet, who may own, store and conceal them? From the user’s perspective, those layers of complexity—both ethical and practical—may contribute to a lack of usability of the models. Some clinicians with substantial workloads may be reluctant to find time to invest in implementing and using such prediction tools, which may demand considerable changes to their practice and habits. These issues are not specific to the current study by Ohbe and colleagues alone. The inherent complexity of advanced data-driven technologies poses ongoing concerns to the current digital health care revolution. Current initiatives have been undertaken to facilitate the transparency of complex prediction models [9]. Further steps are needed before the prediction models developed by Ohbe and colleagues could be implemented into clinical practice. Despite those challenges, we believe Ohbe and colleagues have reported a sound and transparent study. It is with these considerations that we would like to commend their important contribution to developing novel models for predicting functional independence at hospital discharge in survivors of critical illness.