Clinical application of machine learning and computer vision to indocyanine green quantification for dynamic intraoperative tissue characterisation: how to do it

Introduction Indocyanine green (ICG) quantification and assessment by machine learning (ML) could discriminate tissue types through perfusion characterisation, including delineation of malignancy. Here, we detail the important challenges overcome before effective clinical validation of such capability in a prospective patient series of quantitative fluorescence angiograms regarding primary and secondary colorectal neoplasia. Methods ICG perfusion videos from 50 patients (37 with benign (13) and malignant (24) rectal tumours and 13 with colorectal liver metastases) of between 2- and 15-min duration following intravenously administered ICG were formally studied (clinicaltrials.gov: NCT04220242). Video quality with respect to interpretative ML reliability was studied observing practical, technical and technological aspects of fluorescence signal acquisition. Investigated parameters included ICG dosing and administration, distance–intensity fluorescent signal variation, tissue and camera movement (including real-time camera tracking) as well as sampling issues with user-selected digital tissue biopsy. Attenuating strategies for the identified problems were developed, applied and evaluated. ML methods to classify extracted data, including datasets with interrupted time-series lengths with inference simulated data were also evaluated. Results Definable, remediable challenges arose across both rectal and liver cohorts. Varying ICG dose by tissue type was identified as an important feature of real-time fluorescence quantification. Multi-region sampling within a lesion mitigated representation issues whilst distance–intensity relationships, as well as movement-instability issues, were demonstrated and ameliorated with post-processing techniques including normalisation and smoothing of extracted time–fluorescence curves. ML methods (automated feature extraction and classification) enabled ML algorithms glean excellent pathological categorisation results (AUC-ROC > 0.9, 37 rectal lesions) with imputation proving a robust method of compensation for interrupted time-series data with duration discrepancies. Conclusion Purposeful clinical and data-processing protocols enable powerful pathological characterisation with existing clinical systems. Video analysis as shown can inform iterative and definitive clinical validation studies on how to close the translation gap between research applications and real-world, real-time clinical utility. Supplementary Information The online version contains supplementary material available at 10.1007/s00464-023-09963-2.

Indocyanine green (ICG) fluorescence angiography can be used to assist in the delineation of surgical anatomy and assessment of tissue physiology as well as to identify malignant pathology with varying levels of accuracy [1][2][3][4]. Moving beyond surgeon visual interpretation, computer vision and machine learning (ML) methods can extract real-time dynamic time-fluorescence profiles of ICG inflow and outflow in tissues enabling mathematical examination and extrapolation. Such profiles can, thereby, be compared and categorised into clinically relevant determinants providing artificial intelligence (AI)-based insights into tissue nature. These and other approaches for ICG signal interpretation and quantification have been described in research applications for the assessment of colonic perfusion [5][6][7][8][9] and even the characterisation of rectal cancer vs benign neoplastic tumours [10][11][12][13].
However, concerns have been raised regarding the translation of such computerised quantitative ability into current NIR systems as such systems have been primarily designed as clinical tools for image creation and display for human interpretation and not precise computational measurement methods [11,14,15]. Previous work has demonstrated discrepancies in detected fluorescence intensities with ICG circulation (related to both dosing amount and methods as well as patient and anaesthesia-related variables) and further vary by on-screen tissue location (e.g. near vs far, centre vs periphery) [16][17][18]. Other potential pitfalls for interpretation by both humans and machines include the clinically observed phenomenon of ICG tissue diffusion vs true perfusion and the fact that rectal tumours can contain areas of admixed malignant and benign disease. [17,19] Camera and tissue movement as well as any other impingement upon the field of view (e.g. by instrumentations) can also undermine accurate extrapolation of insights from dynamic imagery [17].
Given the growing interest and potential clinical utility of ML in the field of ICG fluorescence signal quantification, we sought to identify the relative importance of such considerations in a clinical series of fluorescence angiograms from patients with neoplasia and determine potential mitigating strategies on how useful these might be. Pertinent findings can then inform further endeavours to clinically characterise tissues by their ICG perfusion profiles with and without ML.

Patients and methods
Multispectral (white light and NIR) videos from 50 patients (including 37 with rectal tumours, 13 of which were benign and 24 malignant, and 13 colorectal liver metastases (CRLM)) capturing ICG inflow and early outflow over between 2 and 15 min of direct observation immediately following intravenously administered ICG were studied. All patients were undergoing diagnostic and therapeutic surgical interventions and were specifically consented for inclusion in prospective study (the 'Future of Colorectal Cancer Surgery' project, ClinicalTrials.gov NCT04220242, ethical approval ref: 1/378/2092). Videos were created and recorded using a commercial near-infrared (NIR) imager (Pinpoint Endoscopic Imaging System, Novadaq/Stryker Corp, Kalamazoo, MI, USA) with each patient receiving systemic ICG whilst the neoplastic area of interest was being observed continually as part of a defined protocol. Patients with rectal pathology (whether benign or malignant) had all been referred for surgical consideration by a medical gastroenterologist after initial colonoscopy and were examined endoscopically in lithotomy position under general anaesthesia as described previously [10]. All those with liver metastases were undergoing therapeutic resection either via a primary laparoscopic or open approach with the NIR camera again being deployed intraoperatively.

Video analysis
Video recordings from these patients, as well as time-fluorescence curves previously generated using open-source fluorescence tracking software through tracking of expert annotated regions of interest (demarcating both healthy and unhealthy areas within the same screen view) within these videos [11,20], were scrutinised to identify examples of potential discrepancies and pitfalls in their interpretation based on previous observations from the published literature in the field of fluorescence-guided surgery. These included examination for any impact of variation in ICG dosing on the detected fluorescence intensity values from all generated time-fluorescence curves. All studied rectal lesion interrogations had been performed using 0.25 mg/kg fixed dosing. The liver lesions included for study had been imaged with both 0.1 and 0.05 mg/kg of ICG with the lower dose being trialled due to signal saturation (pixel greyscale unit > 255) at the higher dose. Signal attenuation resulting from varying distance between camera and target as well as the impact of movement (both in the tissue and by the camera) were also assessed. Sampling bias, both in the post hoc user-selected region of interest (ROI) and in the observation period (video length), was also studied.
Once identified, methods to ameliorate such issues were developed and applied via clinical protocol standardisation (e.g. ICG dosing alteration and administration standardisation) and accepted ML mathematical methods (e.g. normalisation of time-intensity curves obtained by dividing every ROI intensity value by the curve's peak intensity reading) and their mitigation impact studied. Computer generated curves, as well as clinically obtained curves, were created to demonstrate both the underlying principle and the relative impact of any methodological attempt at correction as follows where not otherwise explained above.

Video acquisition challenges
Distance-intensity relationships and the degree of camera movement occurring were demonstrated clinically in real time in a patient with rectal cancer (post-neoadjuvant treatment) undergoing tumour assessment under general anaesthesia. This patient was placed in the lithotomy position with a tabletop electromagnetic (EM) tracking system (Aurora NDI, Canada) positioned under padding beneath their sacrum to include the anatomical ROI within the EM field. A motion sensor (measuring movement in the x, y and z axis along with rotation around the x and y axis) was secured to the tip of the camera and the camera itself used both freehand by the surgeon and in stabilised positions using a robotic laparoscopic controller (Freehand Laparoscopic Controller, Freehand, Surrey, UK) during ICG examination. Relative camera-to-target positions were calculated by moving the camera tip to recognisable locations in the rectum (e.g. distal and proximal healthy tissue regions as well as the tumour site as origin) after introduction through a standard transanal minimally invasive surgery (TAMIS) setup. Temporospatial data from the Aurora field generator, in combination with extracted time-fluorescence curves from three geographically distinct landmarks within the resulting NIR fluorescence video, were used to demonstrate distance-intensity relationships as well as reflect camera movement. Indiscriminate, rapid movement of the camera and the impact such use has on fluorescence intensity appearances was demonstrated by comparing fluorescence intensity tracking at the tumour and camera movements recorded by the EM tracking system. Normalisation as a technique to negate distance-intensity difficulties was demonstrated using this setup by comparing the time-intensity curve upslopes (i.e. ICG inflow) of two healthy regions of tissue with EM-calculated distance discrepancies. Potential mathematical solutions to movement issues were examined including smoothing of time-fluorescence curves using a Savitzky-Golay filter (to facilitate peak detection within fluorescence curves).

Image presentation challenges
One video of a rectal cancer was chosen as a demonstrator of the potential for sample error in user-dependent ROI selection, akin to that seen in traditional biopsy, by comparing fluorescence curve outflow milestones of multiple tracked ROIs within a malignant lesion to each other, as well as to healthy control tissue within the same patient video.

ML methods challenges
Factors pertinent to the data collection and analysis of ICG time-fluorescence curves utilising ML methods were assessed including optimal methods of automated data extraction from time-fluorescence curves as well as using methods to navigate missing data points whilst maintaining discriminatory classification performance (here, accurate healthy vs unhealthy tissue categorisation). For this, at least one healthy control and one lesion tracing (either benign or malignant) were generated for each video to permit subsequent fluorescence curve milestone extraction [8,11]. ROIs with missing downslope milestones (e.g. shorter duration videos) were calculated by imputing missing data points. K-nearest neighbour was used to impute within training sets with missing computed values compared to other known values within the set. Fitting to an exponential curve was used to impute missing values in testing curves to simulate real clinical practice where only a single video will be tested at any one time and therefore cannot be compared to 'neighbours'. Following identification of a robust automated feature extraction methodology, AutoML (auto-sklearn) was used to identify an optimised classifier by searching over all possible hyper-parameters for a large range of known classifiers to choose the best candidate based on area under the receiver operating characteristic curve (AUC-ROC). Training-testing was performed using fivefold cross-validation with all available ROIs with an 80:20 training: testing split without holdout. 'Time to first peak', 'Time ratio' and 'Downslope 10 s after peak' were included in all classification combinations. Increasing durations of downslopes were incrementally added to ascertain the benefit of increased duration of tracking on performance. Classification was performed using two-way (cancer vs not cancer) as well as three-way (cancer vs benign tumour vs healthy control) splits. Finally, to investigate imputation as a method to replace missing values, all recorded results for downslope values beyond 10 s were removed (i.e. simulating a scenario where only data 10 s from the peak of inflow were available for all 37 patients) and replaced via imputation with the classification then re-run for comparison to the original results.

Results
Clinical, technical and technological challenges for appropriate application of ML (see Fig. 1) were commonly seen across the clinical series for both rectal and liver lesions and prompted evolutions in both acquisition protocols and postacquisition data processing.

Video acquisition
Fixed dose ICG signal acquisition of rectal cancer utilised a wide range of the available greyscale spectrum on postacquisition fluorescence curve analysis (pixel intensity range: 0-254.38 greyscale units across all 37 videos) without saturation. On the other hand, dosing issues related to signal saturation in healthy liver tissue, early in the series, were observed in two videos assessing CRLMs at 0.1 mg. kg ICG concentration. Subsequent reduced dose (0.05 mg/ kg) imagery avoided such plateaus occurring (Fig. 2a). In the rectal tumour cohort, one "double peak" (Fig. 2b) occurred attributable to ICG dosing in two pulses.

Distance-intensity relationships and movement
Time-intensity curves created from a rectal polyp video with a visible discrepancy in NIR camera-lesion distance (near) and healthy control (far away) are shown in Fig. 2. The raw (non-normalised) curves in this case show large differences in peak intensity readings (209.42 greyscale units vs 82.82 greyscale units) addressable by curve normalisation. The fallibility of comparing slope-based curve milestones between tissue fluorescence curves of raw/nonnormalised data is shown in Fig. 2f using two artificially generated, identically shaped curves with different absolute intensity values (simulating the distance discrepancy seen in 2c and 2d).
Camera stability over a prolonged period (11 min of ICG inflow/outflow) is demonstrated with a maximum camera movement of 2.15 mm over this time (Fig. 3b). Time-fluorescence curves of two highlighted healthy ROIs are shown in Fig. 3 with the impact of normalisation to allow comparison between the regions highlighted. Indiscriminate camera movement resulted in smooth tracing disruption previously seen with camera stabilisation (Fig. 3e) with a late tissue outflow increase in tracked fluorescence intensity being seen with camera movement toward the origin time-fluorescence curve oscillations (noise) successfully smoothed via a Savitzky-Golay filter (available from the open-source Scipy Python Package [21]) enabling peak detection (Fig. 4a).

Machine learning data classification challenges
To develop and assess the tissue classifier and its features, curve milestones extraction from 251 ROIs across Fluorescence time-series data were populated and output in comma separated values (CSV) files and curve milestones including upslope, time to peak and downslope were successfully extracted using open-source signal processing algorithms from the Scipy Python Package. All included ROIs had peak detection successfully performed with complete datasets then obtained up to 100 s of downslope. Missing downslope values were imputed for 11, 72 and 106 ROIs at downslopes of 200, 300 and 400 s, respectively. An ensemble composed of multiple K-nearest neighbours was identified as the best classifier based on AUC-ROC results (see Table 1). An excellent AUC-ROC score (0.92) was achieved on 2-way classification for all 251 ROIs when all downslope values to 400 s obtained during tracking were included (with imputed values where videos were of insufficient length). This result was

Discussion
Given the fallibility in human interpretation of dynamic ICG perfusion angiograms [22], quantification and computerassisted interpretations, including ML, have been explored for the purpose of providing more objective, and potentially more detailed, fluorescence signal interpretation with determination of confidence levels. Application of such methods to real-world data and ideally, of course, in real-time, however, will require a variety of considerations to be addressed as shown in this work. Here, we have shown how such concerns can be factored into a capable CV-ML method with consistent precision. ICG dosing remains a topic of considerable discussion with wide variations in both weight-based and fixed dosages being reported for colorectal and hepatobiliary assessment. The 0.25 mg/kg rectal lesion dosing described in this paper utilised the entire available intensity scale without saturation (avoiding a false-intensity plateau and compromised data analysis). Conversely, a smaller dose (0.1 mg/kg) in the liver lesion cohort resulted in early saturation of pixel intensity due to liver hepatocyte concentration and excretion of ICG and highlights the importance of attention to dose and tissue type when quantifying ICG administered intraoperatively. Antecedent dosing hours-days preoperatively as described by others for CRLM localisation depends on complete washout from healthy tissue and retention in/near malignant tissue over the prolonged preoperative period of time providing high tumour-to-background ratio at the time of interrogation, meaning high ICG doses can be used successfully. Whilst intravenous ICG administration method (especially central vs peripheral line) is discussed in the literature, the need for rapid, single bolus administration when using quantitative ICG fluorescence angiography is demonstrated here and needs to be clearly understood by the administrator. Deviations may result in false flow peaks that possibly derail automated peak detection software and ML algorithms.
The incorporation of absolute intensity as a feature for analysis also needs address. Whilst normalisation of data to a maximum peak brightness across all ROIs results in the loss of absolute intensity values as a potential feature in any subsequent computation, this study demonstrates the impact of using raw data to compute slope-based parameters. Furthermore, this permits the comparison of tissue fluorescence curves without requiring both tissues to be in the centre of the screen and at similar distances which is not always clinically possible. Although it may be possible to adjust for such distance-intensity relationships automatically within future NIR systems that incorporate distance data (like that we describe using EM tracking fields system for instance), normalisation of raw data currently provides a useful alternative yielding excellent and reproducible results.
Movement either at tissue (e.g. peristalsis, respiration, patient heaving, etc.) and/or at operator-held camera level can in theory be problematic. Use of a robotic camera holder as described here results in minimal camera movement although similar results can be achieved by a human operator steadying the view by careful attention to the video display especially for focussed time periods. Large amounts of movement (> 5 cm of movement over 5 s) can result in false signal detection (e.g. an apparent increase in intensity during outflow despite no further ICG administration); however; all videos utilised in the classifier described were obtained with a handheld camera approximately 3-8cms from the target (i.e. standard TAMIS working distance) with any resulting small deviations being adequately managed through curve smoothing.
Tissue biopsy sampling errors, due to tumour heterogeneity, is a significant clinical problem with high false-negative rates occurring in larger polyps [23,24]. Digital biopsy in the fashion described may also suffer from the same phenomenon; however, it more easily affords the opportunity to sample large areas of lesions as demonstrated. Care must be taken to analyse as large a tissue area as possible; however; limited user-selected ROI tracking remains prone to underrepresentation. Next step evolution of this work includes real-time image stabilisation with synchronous white light surface feature tracking and whole screen equivalent pixel intensity extraction providing full screen characterisation with heat map display. Such an approach would also permit tumour margination via such automated classification.
Time-series feature extraction and analysis, including imputation to deal with missing data as may arise due interruption of the continuous field of view observation by movement or indeed surgical instrumentation intrusion, is demonstrated to yield consistent, positive results across a relatively large cohort of patients. The methods of data management (e.g. curve milestone extraction, curve smoothing, etc.) detailed within this manuscript, whilst applied to ROI data extracted from the described open-source fluorescence intensity tracker, can be applied similarly to other software programmes (such as the Fluorescence Tracker App by MATLAB available at www. mathw orks. com) as well as to other curve milestones which may be more relevant in other applications (such as colonic intestinal perfusion assessment) [20]. We chose AUC-ROC (rather than 'straight' accuracy) to report the results of the classifier as AUC-ROC is the probability that the model ranks a random positive example more highly than a random negative example [25]. The resulting performance figure is, therefore, an indication of how much better a classifier is at correctly classifying than misclassifying which is not so clear from just accuracy rate alone. High AUC-ROC scores are obtained at 200 s with only small improvements achieved with tracking beyond this timeframe. In the case of missing values, however, imputation of downslope values up to 400 s improved the AUC-ROC result and suggests that should imputation be required for missing data, it should be done for longer periods of time if possible. It should be noted that although the results of imputation, when performed on all ROIs, dropped below 0.9, these results were obtained by simulating a scenario where tissues were tracked for only 10 s beyond their peak intensity. This was to simulate an extreme scenario (less than a minute of actual tracking per video) and was done to demonstrate feasibility of this approach in such circumstances. It is likely, however, that some data will be lost intermittently during surgery (camera movement, patient movement, etc.) and this method of imputation demonstrates a robust method for dealing with such occurrences. Whilst this technique is demonstrated here for cancer characterisation, the methodology can be applied too to other timecurve applications. Finally, the classifier's ability to maintain "excellent" results during three-way classification "benign dysplasia vs healthy vs cancer", which is inherently more difficult than two-way "cancer vs benign" discrimination, is encouraging for future further sub-classification work and suggests that with increasing datasets, discrimination by dysplasia type (low vs high grade) and T stage will likely be possible as well as in other important clinical scenarios such as rectal lesion interrogation post-neoadjuvant therapy.
Limitations in this study relate to the specific nature of the dataset although similar considerations likely also apply to tissue perfusion signal capture for other dynamic-based indications (e.g. intestinal or flap assessment). However, open assessments (whether extracorporealised viscera or plastic surgery operations) are most often done by stabilised cameras and/or systems with bigger camera heads enabling more intense illumination and signal sensing. Also, the study only involved one type of NIR system and other systems are known to differ in their signal capture and display. However, the work identifies the important considerations to be considered and all systems will need specific user guidance in similar respect. Such study of alternative commercial systems especially with regard to distance-intensity is already ongoing. The retrospective nature of the video analysis is also a potential limitation although consistently high results have been demonstrated with several diverse methods of time-fluorescence curve analysis both based on feature extraction and classification as well as profile fitting based on biophysical modelling and subsequent classification [10,11].
In conclusion, whilst ICG quantification is not readily available in most clinical NIR systems currently, timeseries quantification, and subsequent analysis, can be performed with excellent results with adherence to important intraoperative physical and technical recommendations and purposeful post-processing of data. Best presentation of the dynamic imagery for interpretation by machines is as important from this perspective as image display is for human observer interpretation. Whilst some technological advancement is needed to enable this seamlessly in real time, the computational trajectory seen over the last few decades should give confidence that evolved processing capability can encompass all such aspects.
Funding Open Access funding provided by the IReL Consortium. This work was supported by Disruptive Technologies Innovation Fund, Enterprise Ireland, Ireland. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.