We thank Prof. Molinski and collaborators for their interest in our research and their comments.

We agree with some of their remarks, especially regarding the risk of overfitting for a data set of 54 cases, and therefore we clearly acknowledge that limitation of our article [1]. Although we have attempted to reduce the risk of overfitting in the methodology adopted, it indeed remains a possibility, particularly when considering the strong performance of our pilot data. Since the publication of our article, we have started recruiting patients for a prospective validation of our algorithm.

Regarding using a larger network such as ResNet50 or DenseNet201, we believe that the use of larger models increases the likelihood of overfitting (greater parameters in an underdetermined system). We used the pretrained VGG19 (a smaller, “outdated” model) to only extract features from the images and only trained a support vector machine classifier. Although the ImageNet images have different characteristics from medical images, as the commentary states, additional training of the VGG19 directly with our data set would have only increased the risk of overfitting, and thus we chose transfer learning. Furthermore, we could have used a larger model as a feature extractor but would have needed to extract a larger number of features to respect shallow and deep Convolutional Neural Network features. This large number of features would also increase the risk of overfitting. Note that there is potential for the use of other medical imaging data sets tangential to our classification task for additional training, but those are outside the scope of our preliminary investigation.

The commentary seems to misunderstand that we used the VGG19 only to extract image feature information, not as a full deep learning classification model. The only model training that was done was the development of the support vector machine classifier, which used the feature extracted from the ImageNet pretrained Visula Geometry Group19 architecture. The Support Vector Model is, in general, a more robust classification approach than a deep learning classifier when considering a limited data set; this was also adopted to reduce the risk of overfitting. We acknowledge that we should probably have been clearer on the description of the model training in the original article. Bayesian optimization for SVM training with a maximum of 30 objective evaluations was used. We agree that nonlinear dimension reduction techniques such as kernel Principal Component Analysis may have better conserved data structure and improved results, but we felt this was unnecessary given the strong performance of the current method. We will reevaluate this, should the current model not maintain strong performance in our prospective validation cohort.

Lastly, we also agree that the use of explainable Artificial Intelligence tools such as heatmaps may be useful to identify specific image features that the model focuses on and understands the classification performance of. As noted in our article [1], we suspect the early signs of Hypoxic-Ischemic Brain Injury may be due to subtle changes in gray–white matter differentiation or brain edema that evade the human eye. Therefore, the output from explainable artificial intelligence tools may not necessarily be easy to interpret to a human reader. Our future plans, once we have a collected a larger prospective data set, do include investigating visualization methods, such as Gradient-Weighted Class Activation Mapping, to observe when the deep learning indicates subtle brain changes.