Sir, we are grateful to Drs. Li, Xia and Li for their interest in our article [1] and their knowledgeable insights [2]. Addressing each of the points they raise, the aim of our retrospective study was to evaluate if radiomic features of anal squamous cell carcinoma (ASCC) extracted from baseline FDG PET-CT could be predictive of tumour progression. There is currently no single, widely accepted definition of key outcome measures in anal cancer, such as time to progression (TTP) and progression-free survival (PFS) [3]. With the definitions provided by Drs. Li, Xia and Li, they are correct that we were strictly interested in TTP rather than PFS, as we regarded patients with an alternative cause of death to ASCC and no evidence of tumour progression at this point as ‘censored’ rather than an ‘event’. We believe that our paper makes the outcome definition clear to the reader, and that there should therefore be no ambiguity regarding the model predictions, irrespective of the terminology used.

Splitting data into independent cohorts for training and testing purposes is widely performed to validate machine-learning models. As noted, by Drs. Li, Xia and Li, our randomised splitting procedure resulted in training and validation cohorts which were homogeneous with respect to a number of standard clinical features. Importantly, we did not split the cohorts according to radiomic feature values, and consequently, results for models B and C are not impacted by this approach. If anything, it may have ensured that the finding regarding the independent value of radiomic features (relative to clinical features) is more robust. We fully recognise that our approach does not ensure generalisability of our model and have carefully avoided claiming so in the paper. For this, we would need the potential heterogeneity offered by an external site. However, as we stated in the discussion, we accept this as a study limitation and encourage future external validation and multi-centre collaborative studies which could greatly improve predictive models in ASCC.

Differences in the area under the receiver operating characteristic curve between testing and validation cohorts were more stable with the addition of the clinical features. However, the stability was similar in model A (clinical features alone) and model C (combined radiomic and clinical features). Model B (radiomic features alone) was more unstable than the other models, and so, the radiomic features are evidently less stable compared to the established clinical features. Nonetheless, we have shown the potential of radiomic features to help improve prediction of progression in ASCC and future studies performed across multiple centres with larger combined populations may help to improve this and allow the identification of stable radiomic features that can be incorporated into routine clinical use.