Introduction

Proton therapy in comparison to conventional radiation therapy is able to target the tumor thanks to the maximum dose deposition at the end of the trajectory of the protons (Bragg peak) and its finite penetration in matter. As the dose deposit beyond this distal edge is very low, proton therapy minimizes the damage to neighbouring tissues compared to photon therapy and is hence especially well-suited for tumors close to sensitive organs and in pediatric cases because the lower dose received by healthy tissues reduces the long-term secondary effects1. However, inherent range uncertainties associated to anatomical changes, patient setup errors and range errors from uncertainties in particle stopping power, and imaging reconstruction artifacts2 lead to the application of conservative safety margins. Indeed, up to 1 cm of margin is considered nowadays for a prescribed range of 30 cm3, limiting significantly the potential benefits of protons over photons.

In this context, several experimental methods to verify the proton beam range have been developed in recent years, mainly based on the monitoring of the secondary \(\gamma \)-rays, neutrons or positron emitters produced in nuclear reactions along the proton trajectory. Prompt gamma (PG) monitoring allows a proper assessment of the distal dose falloff of the proton beam during the treatment4. Unlike the produced positron emitters5 or neutrons6, the spatial distribution of the emitted prompt gamma rays shows a very close correlation with the proton dose distribution at the end of the beam7. Moreover, these gamma rays are prompt, which means that they are emitted within 1 ns after the collision, which is key for the online verification of the proton range.

In the last decade, several research groups have designed and tested PG monitoring systems based in prompt gamma timing (PGT)8,9, \(\gamma \)-ray spectroscopy10,11, or prompt gamma imaging (PGI) methods. The latter are based on either passive collimation, such as knife-edge-slit cameras12,13,14, or in active collimation, where most efforts are focused in the development of Compton cameras15. Compton cameras are electronically collimated imagers, which represent a promising solution for the imaging of gamma-rays of a few MeV. These systems, in contrast to passive collimated cameras, present higher efficiency and expanded field-of-views which allow reconstructing two- (2D) or even three-dimensional (3D) images instead of one-dimensional (1D) profiles4,16. However, there are several challenges that need to be addressed for a reliable implementation of this methodology in the clinical case17. Indeed, in-vivo range monitoring remains still an issue for most of the Compton cameras under development16,18,19,20,21,22,23,24,25,26,27,28,29. These limitations are related to the limited coincidence efficiency of some of the detectors20,21,25, the high counting rates in clinical conditions16,24,30, the spatial resolution22, the signal-to-background ratio that is challenged by contaminant reactions18,31, and the CPU processing-time required by the corresponding image-reconstruction algorithm26.

There are remarkable similarities between most of these challenging effects of PGI in proton therapy, and those encountered in other nuclear physics fields, such as neutron-capture cross-section measurements employing the time-of-flight (TOF) technique32. These similarities are discussed in the following with some detail. In a TOF neutron capture experiment a pulsed beam of neutrons is shot against a small sample, which typically has a small mass of grams or even milligrams. The reaction of interest, radiative capture, leads to the formation of a compound nucleus, which de-excites emitting a prompt cascade of \(\gamma \)-rays. Thus, common \(\gamma \)-ray energies typically span from a few keV up to 5–6 MeV, which is similar to the range of \(\gamma \)-ray energies from proton-induced inelastic reactions in the carbon, oxygen and nitrogen atoms of the human tissues. On the other hand, in neutron-capture TOF experiments the elastic scattering channel dominates versus the radiative channel of interest. These stray neutrons can be captured in the detector itself or in the surrounding materials, thereby enhancing the \(\gamma \)-ray background level and further obscuring the observation of the channel of interest. For the latter reason, an effort is made to design detection systems of high intrinsic \(\gamma \)-ray efficiency and as transparent as possible to neutrons33,34,35. This reduces the intrinsic background level and enhances the signal-to-background ratio. Similarly, PGI is also challenged by the background induced by neutrons originating from nuclear reactions of the primary proton beam4,6,36. In the case of carbon-ion beams, neutron production is even more pronounced and their discrimination against prompt \(\gamma \)-rays is an issue4. Therefore, optimization of the detection system in terms of neutron sensitivity becomes also an aspect of interest for hadron therapy, although not much attention has been put into this aspect in recent optimization studies31,37.

In order to overcome the experimental difficulties discussed above for TOF nuclear experiments, we have developed a total-energy detector with gamma-ray imaging capability, called i-TED38,39. i-TED is an array of four individual Compton imaging modules, each of them consisting of two position-sensitive detection layers based on large monolithic LaCl\(_{3}\)(Ce) crystals. A detailed description of the first such i-TED module can be found in Ref.40. i-TED features an excellent time resolution for enabling the TOF technique39 and it has been especially designed to attain a high detection efficiency, a low sensitivity to scattered-neutron backgrounds and a high image resolving power. Its performance can be further enhanced by using innovative methods based on machine learning (ML) techniques34,39, which are very powerful thanks to the multi-dimensional structure of the data acquired with i-TED operated in S&A coincidence, including the energy deposition, position of interaction and time difference of the \(\gamma \)-ray interaction in the two detection planes.

While the work presented here is entirely based on Monte-Carlo (MC) simulations, the i-TED detector has been already developed34,38,40,41,42 and employed for TOF experiments at CERN n_TOF39. Therefore, intrinsic performance parameters such as energy resolution41, 3D intrinsic position resolution34,42 and efficiency40 have been experimentally validated and are realistically included in the present study. Based on our previous extensive experience in MC simulations of detectors38,39 and neutron production and transport43, in the present simulation work we have been able to account for both geometrical and physical effects to a great level of detail.

Although most of the work carried out so far with i-TED in terms of neutron-capture experiments has been based on an adaptation of the back-projection (BP) method44, for the present study we have implemented two additional imaging algorithms which are the stochastic origin ensemble (SOE) method45, and the analytical algorithm (AA) of Tomitani et al.46. This has allowed us to better illustrate the benefits of the aforediscussed aspects.

Finally, two other aspects are worth mentioning. On one hand, we have been able to implement a novel event classification algorithm, that allows one to significantly improve the quality of the results by filtering out high-energy \(\gamma \)-ray events with incomplete energy deposition. To the best of our knowledge, this is the first time that such a machine learning (ML) technique is successfully applied to this aim, thereby enhancing the signal-to-background ratio by up to a factor of two. On the other hand, we have been able to boost the time-performance of the image reconstruction algorithms by means of an advanced graphical-processing unit (GPU) implementation, which led to reconstruction times of the order tenths of seconds for the most complex of the implemented algorithms. As discussed below, the latter two aspects in conjunction with the high intrinsic efficiency of the i-TED design, turn out of great interest when aiming at real-time ion-range monitoring.

Results

To demonstrate the performance of a detector like i-TED for PGI, a series of MC calculations were carried out. In the simulation, a pencil-beam of 120 MeV protons with a spatial spread of \(\sigma =3\) mm and a total intensity of \(2\times 10^{10}\) protons impinges on water and PMMA phantoms with a size of \(10\times 10 \times 20\) cm\(^{3}\). The energy of the protons, similar to that of previous works4,37,47,48, was chosen to match the proton range in water and PMMA (10.7 cm and 9.3 cm, respectively) with the center of the phantom, where the detectors are pointing to.

Figure 1
figure 1

Schematic diagram of the simulated geometry, as implemented in the Geant4 code. See text for details.

Figure 1 shows a schematic view of the simulated geometry. The phantom is surrounded by four i-TED detectors at 50 mm distance from each lateral surface. Each module consists of two layers of LaCl\(_{3}\) crystals. The separation between both layers (F\(_{d}\) in Fig. 1) can be adjusted for a trade-off between efficiency and resolution40. For the present work we use a constant distance of 15 mm. The scatter (S) detector is made from a monolithic block of LaCl\(_{3}\) with a size of \(50\times 50 \times 15\) mm\(^3\). The absorber detector (A) consists of an array of four LaCl\(_{3}\) crystals, each one with a size of \(50 \times 50 \times 25\) mm\(^3\). Detector housing, photosensors and other necessary elements have been modeled according to the existing i-TED detector and more details can be found below and in Ref.40.

Apart from the geometry itself, an effort was made to implement the most suitable physics libraries for a realistic description of the nuclear reactions both for charged particles and neutrons, and the delivered prompt \(\gamma \)-ray distributions from each isotope. The simulations were carried out with the Geant4 toolkit49 (v10.6) and the officially released QGSP_INCLXX_HP Physics List (PL) was chosen. More details on the impact of the choice of PL are given below.

All the secondary particles generated in the phantom, mainly \(\gamma \)-rays and neutrons, were registered. The largest production yield is found for prompt \(\gamma \)-rays generated in nuclear reactions induced by the 120 MeV proton beam. According to our MC simulations, on average, 0.081 and 0.065 prompt \(\gamma \)-rays (\(E>1\) MeV) are produced in Water and PMMA, respectively, by each incident proton. These results are in agreement with the values given in previous works8,31 but might overestimate the actual production48.

The PG spectra generated by the 120 MeV protons in water and PMMA phantoms, presented in Fig. 2, are dominated by four \(\gamma \)-ray transitions at 2.3 MeV (\(^{14}\)N), 4.4 MeV (\(^{12}\)C), 5.25 MeV (\(^{15}\)O) and 6.1 MeV (\(^{16}\)O). More details on the reactions originating these lines can be found elsewhere10. The depth distribution of the \(\gamma \)-rays emitted in the irradiation of a water phantom is shown in the right panel Fig. 2. Different emission patterns along the proton track can be identified for each of the lines in the spectrum, associated to the different energy dependence of the underlying nuclear cross sections47. Some of the PG lines (for instance 4.4 MeV and 6.1 MeV), show a sharp production maximum in the vicinity of the Bragg Peak, which makes them specially suited for a direct assessment of the proton range. Considering the similarity in terms of PG yield, energy spectra, and proton range in water and PMMA, only the simulations of the water phantom are considered for the results presented hereafter.

To simulate the response of our detection system, all the secondary particles generated inside the phantom are tracked through the geometry model of Fig. 1. For those \(\gamma \)-rays and neutrons interacting with the i-TED detectors, the deposited energy, interaction position and time of all the hits in the S- and A-layers are registered for the subsequent image reconstruction and background assessment. The particle type and energy of the incoming particle were also stored for the analysis described in the following. Experimental effects such as the low energy threshold of 100 keV in each crystal, and the resolution on \(\gamma \)-ray hit position and deposited energy were included in the simulations to account for their impact on the imaging resolution (see Ref.40). These experimental resolutions include a 4.5% fwhm energy spread at 500 keV41 and 1.5 mm fwhm spatial uncertainty in all three space coordinates for the reconstruction of the \(\gamma \)-ray hit location in each scintillation block34,42.

Figure 2
figure 2

Left: Energy spectrum of prompt \(\gamma \)-ray generated by 120 MeV protons along the beam track in water and PMMA. Right: correlation between the \(\gamma \)-ray energy and the emission depth for a water phantom.

ML-aided prompt \(\gamma \)-ray imaging

The four main PG lines of Fig. 2, corresponding to de-excitation transitions in \(^{14}\)N, \(^{12}\)C and \(^{15,16}\)O, were selected for image reconstruction. For each \(\gamma \)-ray line, scatter and absorber (S&A) coincidence events were selected and the energy deposited by the \(\gamma \)-ray in the two planes was summed to create the so-called add-back spectrum. A energy window of 150–300 keV around the peak energies was selected. In order to reduce the delayed gamma and neutron background associated to the moderation and partial capture of neutrons in the phantom, only events firing the S-layer within 10 ns after the proton pulse were selected. This choice of time-of-flight (TOF) selection and its relation with the neutron sensitivity are discussed in more detail in the following. The applicability of such TOF selection in clinical conditions would depend on the specific time structure of the proton accelerator4.

A crucial aspect for the application of PGI to proton range verification is the attainable spatial resolution in the distal edge of the PG depth distribution (see Fig. 2). Aiming at achieving a good balance between imaging resolution and reconstruction time, three different algorithms have been implemented and tested for the reconstruction of 2D PG images: BP, SOE and AA. More details on the implementation of these algorithms are given in Methods. Figure 3 shows the Compton images obtained with the three algorithms after selecting in the add-back spectra the four main peaks corresponding to the most intense PG lines. These images have been obtained by combining the statistics of the four i-TED detectors and using the full simulated statistics (\(2\times 10^{10}\) protons) to highlight the attainable spatial resolution and the differences between imaging algorithms.

Figure 3
figure 3

From left to right, the imaging algorithms used correspond to BP, SOE and AA, respectively. The top-row figures show the Compton image obtained when only full-energy deposition in S and A detectors are selected. For the middle-row images, no condition on full-energy deposition has been applied and also neutron interactions are considered. The bottom-row images have been obtained considering also all the events but in this case the ML-classifier was used to filter out events with incomplete energy deposition.

If one compares the three images in the upper row of Fig. 3, obtained using only gamma-ray events with full-energy deposition, the limited resolution and signal-to-background achieved with the simple BP algorithm is only slightly improved with the SOE algorithm (1000 iterations). On the contrary, the analytical approach leads to a clear upgrade in terms of imaging resolution (see Fig. 3). For a quantitative comparison of the resolution provided by the different algorithms, the projections of the 2D image along the proton beam (Y) axis are presented for different PG energy selections in Fig. 4. For the case of the \(^{12}\)C peak, the three algorithms reproduce the position of the PG emission maximum within 5 mm. In both cases (\(^{12}\)C and 4 main PG lines), the profiles extracted from BP and SOE algorithms start to deviate from the actual fall-off profile below the 80% of the maximum. On the contrary, the AA method is able to reproduce in a remarkable manner the sharp PG fall-off distribution. Moreover, the AA yields also the best reproduction of the PG depth distribution at shallow depths. The precision in the reconstructed PG emission profiles for the number of protons of clinical interest is discussed later in this work.

Figure 4
figure 4

1D projection of the Compton image along the proton beam axis obtained using only the 4.4 MeV transition in \(^{12}\)C (left) and including the four main transitions in \(^{14}\)N, \(^{12}\)C and \(^{15,16}\)O (right). The different solid lines correspond to the BP, SOE and AA reconstruction methods. The true depth distribution (MC) is shown as the black dashed line.

Besides the finite resolution of the detector and the performance of the imaging algorithm, the hard spectra of the PG lines of interest represents an additional challenge for the satisfactory reconstruction of the PG depth distributions. In 2-plane Compton cameras (CC) like i-TED, the large fraction of \(\gamma \)-ray events with an incomplete energy deposition leads to a deteriorated image reconstruction, as displayed in the central row of Fig. 3. This limitation has led to the development of 3-plane and multistage CC for applications dealing with high energy gamma-ray transitions20,22,25,26. These systems enable a more reliable determination of the \(\gamma \)-ray energy by means of threefold time-coincidences. Nevertheless, such an approach has a significant cost in detection efficiency, which hinders the feasibility of real-time range verification.

Figure 5
figure 5

1D projection of the Compton images reconstructed with the analytical algorithm using the 4.4 MeV transition in \(^{12}\)C (left) and combining the four main transitions in \(^{14}\)N, \(^{12}\)C and \(^{15,16}\)O (right). The three solid curves correspond to the ideal image with only full-energy \(\gamma \)-ray events(blue), the image with no selection including neutron events (red) and the corrected result after the ML-classifier is applied (green). The true depth distribution (MC) is shown as the black dashed line.

Aiming at keeping the advantage of the high efficiency in i-TED while selecting only full-energy events, an innovative approach based on machine learning (ML) identification of full-energy events has been developed in this work. This ML solution was coded in the TensorFlow deep-learning API50 and it has allowed us to enhance the fraction of full-energy events in a factor ranging from 1.5-to-2 in the energy range of interest for PGI (see Fig.  13 in Methods for the details). As a consequence, to a large extent one can recover the resolution and signal-to-background ratio of an ideal Camera with only full-energy events (see Fig. 3).

The impact of the ML-aided image reconstruction is shown for the 1D profiles reconstructed with the analytical algorithm in Fig. 5. This figure shows that the reproduction of the position, width and fall-off of the PG emission profile are improved to some extent thanks to the ML solution (green) with respect to the original reconstruction (red) and getting closer to the ideal reconstruction with only full-energy Compton events (blue). In particular, the deviation between the reconstructed and the true (MC) position of the 4.4 MeV PG fall-off curve (left panel in Fig. 5), quantified at the 80% of the maximum, \(F_{80\%}\), is reduced from the 3.2 mm in the original reconstruction to only 1.3 mm after the ML selection is applied. As for the depth profile of the 4 main lines combined (right panel), the accuracy improvement in the reconstruction of the PG fall-off (\(F_{80\%}\)) is more subtle, changing from 3 mm with all the events to 2.5 mm with the ML method.

The impact of the ML selection in the Compton images is more sizable for the images reconstructed with the faster SOE and BP algorithms, which do not reach the resolution of the aforediscussed AA. The 1D profiles obtained from the BP and SOE images of Fig. 3 are compared in Fig. 6 for the same three scenarios. In this figure one can appreciate the noticeable enhancement in peak-to-background ratio, especially for the BP algorithm, related to the application of the ML-based full-energy selection (blue) when compared to the initially reconstructed profile (red).

Figure 6
figure 6

1D projection of the Compton images of Fig. 3 along the proton beam axis obtained with the BP (left) and SOE (right) algorithms. The three solid curves correspond to the ideal image with only full-energy \(\gamma \)-ray events (blue), the image with no selection including also neutron events (red) and the corrected result after the ML-classifier is applied (green). The true depth distribution (MC) is shown as the black dashed line.

Figure 7
figure 7

2D Compton images obtained with the analytical reconstruction method and the ML-aided event selection (right) compared to the spatial distributions from the MC simulations (left). The top and bottom images correspond to the \(^{14}\)N peak and \(^{12}\)C peaks, respectively.

Fig. 7 shows some examples of the final results achieved for the imaging of single PG lines using the combination of the AA reconstruction algorithm and the ML-aided selection of full-energy events. The reconstructed 2D images for the 2.3 MeV (\(^{14}\)N) and the 4.4 MeV (\(^{12}\)C) lines are compared with the true distribution obtained from the MC simulation, showing a remarkable reproduction of the PG emission distribution and the distal fall-off in both cases.

Impact of neutron-induced backgrounds

Neutrons are produced along with prompt gamma-rays in nuclear reactions during the proton therapy treatment. According to our MC simulations, the prompt neutron yield generated by 120 MeV protons in water is of 0.064 per incident proton, just 20% smaller than that of \(\gamma \)-rays in the energy region of interest (1–7 MeV). These neutrons are partially moderated before escaping the water phantom and may be captured in the detector sensitive volume or surrounding structural materials, leading to an important source of background.

The impact of the detector neutron sensitivity was studied by means of simulations of the same set-up and only 10\(^{8}\) protons, in which the LaCl\(_{3}\) crystals of i-TED were replaced by other inorganic scintillators or semiconductor detectors commonly used in existing or foreseen CC designs. These active detection materials are LYSO16,31, BGO16,28,30, CdZnTe19,20,26,37 and LaBr\(_{3}\)18,23,25,51 crystals. Ce:GAGG22,24 has not been included in the study but its neutron sensitivity could be even higher due to the presence of Gd, featuring one of the largest known thermal cross sections52.

Figure 8 shows the time distribution of neutron events in the detector for the different active materials compared to the distribution of \(\gamma \)-ray events. LaCl\(_{3}\) shows the smallest efficiency for the detection of prompt neutrons reaching the detector within few ns after the proton bunch, simultaneously with the prompt gamma-rays arising from proton interactions in the water phantom. On the other hand, BGO is the least sensitive material to slow neutrons reaching the detector 1-100\(\mu \)s after the proton pulse.

The figure of merit to be studied is the fraction of neutron- and gamma-induced coincidences, displayed in the right panel of Figure 8. LYSO, a very promising crystal in terms of efficiency and time resolution16, shows the highest sensitivity to neutrons, which represent 42% of the total counting rate. On the other side, apart from its limited energy resolution for Compton imaging, BGO seems the best solution due to its low sensitivity to neutrons. If the clinical accelerator pulse structure allows to set TOF selections of a few ns, as proposed in recent works28,53, the contribution of neutron background would be clearly suppressed for all the studied crystals. Our results indicate that after a TOF selection of e.g. 10 ns, LaCl\(_{3}\) and BGO would have the lowest fraction (14%) of neutron coincidences or neutron sensitivity. A complete TOF separation of prompt gammas and neutrons is not feasible if one aims to maximize the \(\gamma \)-ray detection efficiency with a small detector-to-phantom distance. Hence, detectors with a low intrinsic sensitivity to neutrons, such as the LaCl\(_{3}\) based i-TED modules, may represent a valuable solution.

Figure 8
figure 8

Left: Time distribution of the neutron-induced counts registered in a Compton camera (i-TED geometry) for different detection materials. The time distribution of \(\gamma \)-ray counts in LaCl\(_{3}\) is shown as a reference in dashed line. Right: Fraction of neutron- and gamma-induced S&A coincidences for each detector material in the full time window (solid) and after selecting the first 10 ns(dashed).

Efficiency and in-vivo real-time PG imaging

The spatial resolution and low sensitivity to backgrounds attainable with the ML-aided i-TED detector are crucial for its applicability to PGI. Aiming at in-vivo real-time range verification, several additional aspects are of special relevance. The first key feature is the detection efficiency, which has to be high enough to reconstruct PG images with sufficient resolution in clinical conditions. Average clinical beam intensities are of the order of 1–2 nA (i.e. 6-12\(\times \)10\(^{9}\) p/s)4 and the relevant clinical scenarios for a single pencil beam in beam-scanning proton RT correspond to the delivery of 10\(^{8}\) to 10\(^{9}\) protons12,16,26.

Table 1 summarizes the detection efficiency for the proposed setup composed of four i-TED modules at 100 mm from the proton beam axis displayed in Fig. 1. The resulting efficiency, defined as the number of S&A coincidences per number of incident protons (as in Ref.37), is compared for several energy ranges in add-back which comprise single or multiple prompt gamma lines. The impact of the distance between the S- and A-planes, which can be remotely adjusted for an optimum trade-off between efficiency and resolution has been studied. The latter approach refers to the electronic-dynamic collimation implemented in i-TED, which is described in detail in Ref.40. The proposed setup of four detectors at 10 cm may not be feasible in some clinical scenarios. In a general case, the efficiency of our imaging system would scale proportional to the number of Compton modules and inversely proportional to the square of the distance.

Table 1 Detection efficiency for S&A in time coincidence per incident proton combining the four i-TED modules of Fig 1.

According to the results shown in Table 1, only 6% of the total coincident events are selected with an energy window around the 4.4 MeV peak. If the four main PG lines are combined, as in Fig. 3, this fraction increases to 16% of the total coincidences. Hence, in order to achieve sufficient statistics (\(\sim \)10\(^{4}\) events per image) for in-vivo range verification (i.e. <10\(^{9}\) protons), the combination of several PG lines becomes mandatory. Figure 9 compares the 1D profiles of the images obtained with the AA algorithm and the ML-based full-energy selection corresponding to different number of protons and two different energy windows. On one hand, selecting the whole energy range from 1 to 7 MeV improves the statistics and thus less fluctuations are obtained in the reconstructed profiles. On the other hand, limiting the energy selection to the four main peaks reduces the background and enhances the fraction of full-energy events, hence improving the resolution in the determination of the proton range. The achieved precision in the reproduction of the PG emission fall-off is quantitatively analyzed in Table 2. The values in this table correspond to depth in water taking into account that the reference frame of the Compton images in this work is centered at a depth of 100 mm. The position of the reconstructed maxima (Max) has been determined from the center of the bin and the positions along the fall-off curve (F\(_{90\%}\), F\(_{80\%}\) and F\(_{50\%}\)) have been linearly interpolated from the histograms in Fig. 9. The final reconstructed positions in Table 2 have been calculated as the average of two independent sets of simulated data and the uncertainties correspond to the standard deviation between the two independent results.

Figure 9
figure 9

1D projection of the Compton images along the proton beam axis obtained the analytical reconstruction algorithm using all the events between 1 and 7 MeV (left) and the 4 main PG lines (right). The solid curves show different proton intensities. The true depth distribution (MC) is shown as the black dashed line.

Table 2 PG maximum (Max) and 90% (F\(_{90\%}\)), 80% (F\(_{80\%}\)) and 50% (F\(_{50\%}\)) fall-off positions corresponding to the profiles of Fig. 4 compared to the real PG emission profile in the MC simulations. The values are given in mm. The values in parentheses correspond to the standard deviation of independent data sets.

According to the results of Table 2, deviations between the reconstructed positions and the actual PG distribution are in the range 0.5 mm to 5 mm for the profile selecting the energy window from 1 to 7 MeV. For the case of the 4 main peaks the agreement is in most cases better than 3 mm. As the number of protons decreases, the uncertainties tend to be slightly enhanced. The best precision is obtained for the reconstructed position of the 50% fall-off (F\(_{50\%}\)), which agrees with the real (MC) distribution within less than 1 mm in most cases (see Table. 2). On the other hand, the position of the PG maxima are systematically underestimated by at least 3 mm and have a lower accuracy related to the 2.5 mm bin size.

The computing-time performance of the relatively complex Compton imaging algorithms is also critical for the in-vivo range verification via PGI. In this work, three different 2D image reconstruction algorithms have been tested. The reconstruction times with the BP and SOE methods are perfectly compatible with real-time imaging (see Table 3). For the latter, good quality images containing a minimum of 2\(\times \)10\(^{4}\) coincident events are reconstructed in few seconds using a single-thread CPU calculation. As for the AA method, which yields the best image resolution (see Figs. 3 and 4), we used a GPU-accelerated CUDA54 implementation of the code, which allows reconstructing an image in few tenths of seconds, 120 times faster than with a conventional single-thread CPU based approach. A similar acceleration has been reported for 3D position reconstruction algorithms in our previous work34.

Discussion and conclusion

i-TED is a Compton camera array that has been specifically designed for neutron-capture nuclear physics experiments. Several design aspects of i-TED, such as its high time resolution, high efficiency and relatively low neutron sensitivity may become of interest in order to address some of the current challenges in Prompt Gamma Imaging for range verification in proton therapy treatments. The results presented in this work show the prospects of i-TED concerning detection efficiency, sensitivity to the neutron background and spatial resolution attainable with a combination of ML-based full-energy selection method and state-of-the-art reconstruction algorithms.

This work has presented a MC study on the applicability of i-TED to range verification, where the attainable image resolution is one of the critical issues. The BP and SOE algorithms yielded a good reproduction of the PG maximum emission point in spite of their limited resolution. Much higher resolution images can be obtained with i-TED using the analytical approach of Ref.46, which provided the best reproduction of the distal fall-off for several selections of PG lines with an accuracy ranging from 1 to 3 mm. To achieve these results, a ML classifier, which improved the fraction of correct Compton events up to a factor 2, has been developed to partially compensate the deterioration of the images due to partial-energy events.

In comparison, the recent MC study by Yao et al.37, obtained accuracies within 2 mm for the same F\(_{80\%}\) and F\(_{50\%}\) magnitudes using also a combination of \(\gamma \)-ray lines. Moreover, our results are also at the level of previous works on MC simulations of Compton cameras applied to PGI18,26,51, and better than the 5 mm obtained in Ref.20 or the 7 mm reported in Ref.22.

The i-TED detector was developed for neutron-capture time-of-flight experiments, where a low sensitivity to neutrons is a crucial aspect. Neutrons are also among the main contributors to the secondary patient dose and to the background in PGI systems. Indeed, previous works have discussed the possible rejection of the neutron background in PGI systems through the application of time-of-flight selections28,36,53. However, aiming at a maximum detection efficiency, the detector-to-phantom distance has to be minimized and a satisfactory TOF separation of the prompt gammas and neutrons is not feasible (see Fig. 8). In this scenario, the use of scintillators with reduced neutron sensitivity, such as LaCl\(_{3}\) or BGO, is the best solution to minimize the neutron-induced background. Still, LaCl\(_3\) shows the advantage of the high energy resolution for Compton imaging.

The maximum attainable efficiency of the full i-TED setup (see Table 1) in the energy range of interest for PGI (1-7 MeV) is at the level of the most efficient existing or designed Compton cameras for PGI37, which reports an efficiency of 4.1\(\times \)10\(^{-4}\) per proton for deposited energies above 1 MeV. The absolute efficiency per emitted prompt \(\gamma \)-ray for the i-TED array, calculated as the average of the four main PG transitions, is in the range 0.9-1.6\(\times \)10\(^{-3}\) for the focal distances of Table 1. This value outperforms most of the CC for PGI developed to date, with a range of efficiencies between 10\(^{-4}\) and 10\(^{-8}\) per emitted PG4. The high efficiency for the proposed setup over the whole energy range of interest (1–7 MeV) allows to determine the position of the PG distribution fall-off with an accuracy better than 3 mm for proton intensities as low as 10\(^{8}\), in the range of clinical interest12,16,26.

Following the promising prospects for range verification in proton therapy with i-TED presented in this work, we aim at testing this detection system in proton beam facilities, such as the 18 MeV cyclotron at CNA (Sevilla) and clinical beams. These tests will provide an experimental validation for the methods and results presented in this work and they will allow us to explore possible additional aspects that can be optimized for the present ion-range monitor application.

Methods

The i-TED Compton imager

This work has presented a MC study based on the i-TED array consisting of four high-efficiency Compton cameras38,39. This novel detection system is under development at Instituto de Física Corpuscular (IFIC) within the HYMNS-ERC project55. The first demonstrator has been already assembled, fully characterized40 and applied to high-resolution neutron TOF experiments39.

Each of the i-TED Compton modules uses 5 position-sensitive detectors (PSDs) distributed in two parallel detection planes, Scatter (S) and Absorber (A), as shown in Fig. 10. Each PSD contains a LaCl\(_{3}\)(Ce) monolithic crystal with a square-cuboid shape and a base surface of 50\(\times \)50 mm\(^2\). The LaCl\(_{3}\) is hygroscopic and thus it is encapsulated in an aluminum housing. Each crystal base is coupled to a 2 mm thick quartz window, which is optically joined to a silicon photomultiplier (SiPM) from SensL (ArrayJ-60035-64P-PCB). The photosensor features 8\(\times \)8 pixels over a surface of 50\(\times \)50 mm\(^2\). A 15 mm thick crystal is used for the PSD in the S-plane. Four 25 mm thick crystals are utilized for the PSDs placed in the A-plane (see Fig. 10). In total, 320 SiPM channels are biased and readout by means of front-end and processing PETsys TOFPET2 ASIC electronics56. In order to minimize gain shifts due to changes in the temperature of the experimental hall, every ASIC is thermally coupled to a refrigeration system composed by a Peltier cell, a heat-sink and a small-size fan (see Ref.40 for further details).

Figure 10
figure 10

Left: i-TED detector consisting of one scatter and four absorber detectors in movable and parallel detection planes. Right: Schematic view of the same i-TED detector as implemented in Geant4 indicating the dimensions of the LaCl33(Ce) crystals of the scatter and absorber planes.

The i-TED Compton modules embed the so-called dynamic electronic collimation technique40. This is accomplished by means of a linear positioning stage, that allows one to remotely vary the distance between the A- and S-planes, thereby optimizing performance for each specific application. Finally, the successful implementation of ASIC-based TOF-PET readout electronics for Compton imaging has led to a rather compact and cost-effective system, when compared to other Compton imagers57,58.

MC simulations of the proton beam and i-TED

The applicability of i-TED for range verification has been studied by means of MC simulations using the Geant4 toolkit. The detailed geometry model of each of the i-TED detectors as implemented in Geant4 is shown in Figure 10. More details can be found in our previous work39.

The modelling in Geant4 of the physics processes occurring during the irradiation of a phantom with protons can be carried out with different models, so-called Physics Lists (PL)59. In this work we have tested several officially released PLs, which combine the Quark-Gluon-String model (QGSP) for the inelastic scattering of protons above \(\sim \)10 GeV, not relevant for the present study, with three different cascade models covering the energy below 10 GeV: the Liége Intranuclear Cascade model INCL++, the Bertini (BERT) or the Binary Cascade model (BIC). The resulting prompt gamma yields for the different PLs, presented in Fig. 11, indicate that INCL++ and BIC agree within 2% in the absolute PG yield while BERT leads to a 2.3 times higher production, in agreement with the clear overestimation reported in Ref.48. Moreover, the BERT model generates a continuum of \(\gamma \)-ray energies instead of the expected discrete spectra (see Fig. 11), as it has been reported in previous works47. Last, the QGSP_BIC_AllHP Physics List was tested. This PL includes the G4ParticleHP package, still under development, which uses nuclear data libraries instead of the default models for the transport of light charged particles60. In particular, the inelastic interaction of protons with \(^{12}\)C and \(^{16}\)0 up to 150 MeV are simulated using the ENDF/B-VII.161 cross section. This PL leads to a 32% smaller PG yield, a result which is in line with the reduction suggested by the benchmark of Pinto et al.48. However, the latter was discarded since it failed to produce individual \(\gamma \)-ray lines in the proton-induced inelastic reaction (see Fig. 11).

Figure 11
figure 11

Left: Energy spectrum of proton-induced prompt \(\gamma \)-rays obtained in the Geant4 simulations with different Physics Lists. Right: Distribution of \(\gamma \)-rays (solid black) and neutrons (red) produced by 120 MeV protons as a function of the depth in water simulated with QGSP_INCLXX_HP.

As for the neutron yields in the irradiated phantoms, all the studied PLs agree within 24%. For an accurate transport of neutrons below 20 MeV, neutron-induced reactions within Geant4 were simulated by means of the G4NeutronHP package62, using the G4NDL-4.6 data library (based on the JEFF-3.363 evaluated data file). This high-accuracy neutron transport package, not included in some of the previous works20,37, has a significant impact in the slowing-down and partial capture of neutrons within the phantoms, and in the response of the detectors to neutrons. Indeed, the number of slow neutron events in the detectors would have been underestimated in up to a factor 5 if this package had not been included.

The final choice of PL was QGSP_INCLXX_HP since it includes an accurate modelling of neutron interactions and leads to the smallest PG yield among the PLs, which correctly generate discrete \(\gamma \)-ray transitions. In addition, the INCL++ model has shown the best reproduction of the neutron and \(\gamma \)-ray yields in the simulation of proton-induced reactions in other applications43,64. The right panel of Figure 11 shows the depth profile for the secondary emission of \(\gamma \)-rays and neutrons in water resulting from our MC simulations of the 120 MeV proton beam. In the case of the \(\gamma \)-rays, the largest production is due to proton-induced reactions (dotted line), which show a clear maximum at the end of the proton range. The much smaller neutron-induced production of \(\gamma \)-rays is shown with the a dotted-dashed line in the same figure. For the PG images obtained in this work, several energy cuts have been applied to select either the whole spectrum of Fig 2 between 1-7 MeV or the main single transitions.

A particular emphasis has been placed on the physical origin and time distributions of the \(\gamma \)-rays and neutrons escaping the phantom and reaching the i-TED detectors. This is of particular relevance for our study of the neutron sensitivity and the applicability of time-of-flight cuts. The left panel of Fig. 12 shows the time of arrival of \(\gamma \)-rays and neutrons to i-TED after being produced in a water phantom by 120 MeV protons. Three time structures can be identified in this figure, whose limits are indicated with dotted lines. The prompt particles, \(\gamma \)-rays in their majority, reach i-TED within the first 10 ns. A second component, which extends up to 1 ms after the proton bunch, contains both neutrons moderated in the phantom and \(\gamma \)-rays originated in neutron capture reactions, dominated by the 2.2 MeV produced in the \(^{1}\)H(n,\(\gamma \)) reaction. Beyond 1 ms, the third emission \(\gamma \)-ray component is related to the decay of the unstable nuclei produced in the phantom, simulated with the G4RadioactiveDecay model included by default. To avoid the full simulation of the decays, the transport of particles was limited to 1 ms. The add-back spectra of i-TED operated in coincidences is shown in the right panel of Fig. 12. This figure shows the strong reduction of the neutron sensitivity when a time cut of 10 ns is applied. The integral of the gamma- and neutron-induced spectra of Fig. 12 correspond to the results of i-TED (LaCl\(_{3}\)) presented in Fig. 8.

Figure 12
figure 12

Left: time distribution (log scale) of the \(\gamma \)-ray and neutron events registered in i-TED, showing three distinct components which are separated by vertical dashed lines (see text for details). Right: add-back energy spectrum for all and prompt (t<10 ns) events.

Compton imaging algorithms

Effective prompt gamma ray monitoring requires images with spatial resolutions down to a few mm and reconstruction times of the order of a few seconds, at most. Compton imaging uses as a fundamental ingredient the Compton scattering law and thus, most Compton cameras employ two layers, or more, of position sensitive detection planes. The direction of the scattered \(\gamma \)-ray is determined from the interaction position in both detection planes. In case that the scattered \(\gamma \)-ray deposits all its energy in the second detection plane, the incident \(\gamma \)-ray energy can be determined by adding-up the deposited energy in the first and second detection planes. The Compton imaging algorithms used in this work are based on this assumption, whose validity is extended by means of a supplementary machine learning classifier, that will be discussed in the following section.

There exist many different imaging reconstruction algorithms, each of them with advantages and drawbacks. For instance, algorithms based on Maximum Likelihood Expectation Maximization65,66 have shown excellent position reconstruction accuracy, but they require previous computation of the system response matrix. Since the aim of this work is to study the applicability of an imager like i-TED for clinical purposes, we have chosen algorithms which do not require any previous system response calculation. Those algorithms are listed as follow:

  • Fast back-projection (BP)67: This is the simplest approach, with rather limited spatial resolution, but also with the fastest reconstruction performance. Developed by Wilderman in 1998, the image of the individual \(\gamma \)-rays is obtained from the intersection of the back-projected Compton cones along the image plane, where the \(\gamma \)-ray source is located. The general quadratic curve from this intersection leads to a set of possible positions for the \(\gamma \)-ray source origin in the image plane. The final Compton image is made by the superposition of all the individual \(\gamma \)-ray images. While the algorithm is very fast, the spatial resolution is quite poor, when compared to other algorithms (see for example Fig 3).

  • Stochastic origin ensemble (SOE)45: The SOE algorithm for Compton imaging was developed originally by Andreyev in 2016. It is a Monte Carlo Markov chain Method based on the Metropolis-Hasting algorithm and does not require any forward or backward projections. The drawback of this algorithm is related to its iterative nature. The initial image, obtained from all the statistics available, is generated by random sampling of the possible positions of the \(\gamma \)-ray source within the intersection of Compton conical surfaces and the image plane. Once this first image is created, the iterative process begins by randomly choosing a \(\gamma \)-ray event. A new position for this event is sorted in the image space constrained to its Compton conical surface. Then, this new position is either accepted or rejected with an acceptance A based on the ratio between the local density of \(\gamma \)-ray events in the new (\(\lambda ^{\prime }_{i}\)) and old (\(\lambda _{i}\)) positions:

    $$\begin{aligned} A=min\left( 1,\frac{\lambda ^{\prime }_{i}+1}{\lambda _{i}}\right) \end{aligned}$$
    (1)

    This is repeated for a number of times equivalent to the available statistics. This process corresponds to an iteration and it is repeated until a stationary situation is reached. The number of iterations required depends on the specific problem, such as the available statistics and the binning of the Compton image.

  • Analytical algorithm (AA)46: An analytical inversion of the Compton imaging problem based on spherical harmonics was developed by Tomotani and Hisarawa in 2002. The approximate solution given a position in the image space, \(\vec{s}\), is described as:

    $$\begin{aligned} f(\vec{s})\approx \int ^{cos\omega _{max}}_{cos\omega _{min}}{dcos\omega }\int _{S}{d\vec{t} k^{-1}(\vec{t},\vec{p};cos\omega )g(\vec{t};cos\omega )} \end{aligned}$$
    (2)

    where \(\vec{t}\) is a unit vector into the projection space, \(\omega _{min}\) and \(\omega _{max}\) are the minimum and maximum Compton scattering angles included in the calculation, \(g(\vec{t};cos\omega )\) is the projection data in the image space and \(k^{-1}(\vec{t},\vec{p};cos(\omega ))\) is the inversion kernel described as

    $$\begin{aligned} k^{-1}(\vec{t},\vec{p};\omega )=\sum ^{N_{max}}_{n=0}{\frac{2n+1}{4\pi H_{n}}P_{n}(cos\omega )P_{n}(\vec{s}\cdot \vec{t})} \end{aligned}$$

    with \(H_n\) given by

    $$\begin{aligned} H_{n}=\int ^{cos\omega _{max}}_{cos\omega _{min}}{\sigma (cos\omega )P^{2}_{n}(cos\omega )dcos\omega } \end{aligned}$$
    (3)

    \(N_{max}\) is the maximum number of terms involved in the calculation, \(P_{n}\) is the Legendre polynomial of order n and \(\sigma (cos\omega )\) is the Klein-Nishina Compton differential cross-section68. The drawback of this algorithm is the large computational cost required to reconstruct a Compton image. Because of this reason, \(H_{n}\) was pre-computed for a wide range of \(\gamma \)-ray energies and Compton scattering angles corresponding to the minimum and maximum values can be experimentally registered by the detection system. Thus, \(H_{n}\) values were saved in a table format for its posterior use, saving a crucial time for quasi-real time imaging.

The reconstruction time depends remarkably on the complexity of the involved algorithms and the optimization of their parameters such as the number of iterations for the SOE algorithm or the number of terms involved for AA reconstruction. Additionally, those parameters will impact the quality of the Compton image, introducing a mismatch between the reconstructed and the experimental spatial image resolution of the Compton imager; a limited number of iterations or terms in these imaging algorithms would lead to a poor spatial resolution, while an excessive number would introduce artifacts on the final reconstructed image.

In order to evaluate the best suitable algorithm for PG monitoring, a comparison benchmark using 20000 \(\gamma \)-ray events leading to interactions in both layers were used. The image plane of 200\(\times \)200 mm\(^2\) was pixelated into 200\(\times \)200 pixels for all three algorithms. The results for the computing time obtained in this benchmark study are displayed in Table 3.

Table 3 Reconstruction time for a Compton image of 20000 events. First and second columns show the type of algorithm and its implementation in CPU or GPU.

The critical parameters for each algorithm used in the benchmark are indicated in the third column of the same table. Their values correspond to the best compromise between resolution and computation speed found by trial and error. As the results indicate, the fastest algorithm is BP followed by SOE and AA, being a factor 100 slower than BP. However, given the superior spatial resolution of AA, both multi-threading and CUDA54 implementations were used to make this algorithm time competitive with the others. Compared to the single-thread version, the multi-thread implementation, with a total of eight threads, has a speed factor of about 7.6. Despite the improvement, the time required to get the image is still not competitive for PG imaging with a factor between 17 and 30 times slower than the other algorithms. On the contrary, the CUDA version, with a speed factor of about 121 with respect to the singled-threaded version, reaches reconstruction times of 15 s, comparable to the SOE algorithm (14 s). It is worth to mention that the CUDA implementation was executed in modern NVIDIA GPUs: a NVIDIA GeForce RTX 2060 and NVIDIA GTX 1080 Ti with compatible execution times.

Machine-learning aided full-energy event selection

The i-TED detector, as any two-plane Compton camera, bases its working principle on a \(\gamma \)-ray totally depositing its energy in the absorber (A) plane after a Compton interaction in the scatter (S) layer. However, only a fraction of the coincidences, ranging from 47% for 1 MeV \(\gamma \)-rays to just 5% at energies around 7 MeV, satisfy this condition. The remaining fraction leads to a wrong reconstruction of the Compton angle and, as a consequence, an increase in the image background and a degradation of the spatial resolution. Aiming at improving the fraction of true Compton events, we have implemented a Machine-Learning algorithm for the identification of full-energy events.

To train and test the ML algorithms in the discrimination of good Compton events where the \(\gamma \)-ray deposits all its energy between the S and A planes (full-energy), and those with partial energy escape (no-full-energy), we performed dedicated simulations of the response of the i-TED detectors to 5\(\times \)10\(^{10}\) \(\gamma \)-rays of energies homogeneously distributed between 200 keV and 7 MeV and spatially originated in a random position within a 20\(\times \)20\(\times \)20 cm air cube separated 50 mm from the detector face, replicating the position of the phantom. For these ancillary simulations, we used the same physics models and distance between the S- and A-planes of i-TED as before. Each MC event (S&A coincidence) contains the same eight features determined with the detector in a real measurement: 3D coordinates of the \(\gamma \)-ray interactions in the two PSDs (6), energy deposited in the S- and A-planes (E\(_{s}\), E\(_{a}\)) (2). The energy and position resolutions of the detector were included as described before. Additionally, the Compton angle, calculated from the deposited energies, and its probability according to the Klein-Nishina formula68 for a \(\gamma \)-ray energy E\(_{\gamma }\)=E\(_{a}\)+E\(_{s}\), were also included in the training to improve performance. The MC output was split into 14 energy intervals of add-back deposited energy between 200 keV and 7 MeV and the same number (\(\sim \)10\(^{6}\)) of either kind of events were selected from the MC output for each energy interval. For each energy range we trained the same algorithm independently, aiming for the best accuracy and stability along the entire deposited energy range detected in add-back.

In this work, the performance of several state-of-the-art ML algorithms included in Scikit-learn Python module69 was evaluated. The Scikit-learn algorithms evaluated were k-nearest neighbors, logistic regression, support vector classifier, Gaussian naive Bayes, random forest, AdaBoost and quadratic discriminant analysis. However, the best classification results in terms of accuracy were obtained for other two ML algorithms: Boosted decision trees implemented in XGBoost70 and artificial neural networks implemented in Tensorflow50,71.

Figure 13
figure 13

Left: Fraction of correctly identified full-energy events (dashed) and wrongly classified no-full-energy (solid) as a function of the add-back deposited energy. The curve with squares correspond to the fraction of neutron events which are identified by the ML-classifier as good (full-energy) events. Right: gain factor of full- over no-full-energy events as a function of the add-back deposited energy. In both panels, the black and red lines represent, respectively, the results with only \(\gamma \)-ray events and after including the neutron events.

The XGBoost classifier was optimized using the Scikit-learn GridSearchCV method69. The best results were obtained using 140 trees with a depth of 4 and a gamma parameter of 0.1. The rest of parameters were set to default because of the minimal impact on the accuracy results. The classifier output is given by a probability number between 0 (no-full-energy event) and 1 (full-energy event). The discrimination threshold between classes was set to 0.5. The Tensorflow classifier model consists of a stack of four fully connected layers. In the first three layers, 256 units per layer using rectified linear activation functions. In the last layer, a single neuron with a sigmoid activation function was used with the purpose to provide a probability number between 0 and 1. The architecture and activation function was chosen based on the best performance in terms of accuracy and stability along the entire range of deposited energy in add-back. The loss function used for the minimization was binary cross-entropy, commonly used in this type of classification problem because of its best performance. As in the case of XGBoost, the identification threshold was set to 0.5.

XGboost and Tensorflow models have almost the same performance, being the latter slightly more stable for all the different deposited energy ranges where the training and test was performed. For this reason, only the model based on Tensorflow was used for the final results of this work. The accuracy of this algorithm was quantified on i-TED events from the simulations of the proton beam prior to its application to the image reconstruction. As shown in Fig. 13, the ML algorithm is able to correctly recognize 65 to 73% of the full-energy events among the prompt gamma events registered in i-TED. On the other hand, 35 to 40% of the non-peak \(\gamma \)-ray events are wrongly predicted as full-energy. As a combination of both results, this algorithm enhances the peak-to-background ratio by a factor of 1.5–2.1 (see right panel of Fig. 13). This classifier is also able to reject about 50% of the neutron events. The final peak-to-total gain factor, after the neutron events are included, ranges from 1.4 to 1.9 (see Fig. 13).