Machine Learning Based Compton Suppression for Nuclear Fusion Plasma Diagnostics

Diagnostics are critical on the path to commercial fusion reactors, since measurements and characterisation of the plasma is important for sustaining fusion reactions. Gamma spectroscopy is commonly used to provide information about the neutron energy spectrum from activation analysis, which can be used to calculate the neutron flux and fusion power. The detection limits for measuring nuclear dosimetry reactions used in such diagnostics are fundamentally related to Compton scattering events making up a background continuum in measured spectra. This background lies in the same energy region as peaks from low-energy gamma rays, leading to detection and characterisation limitations. This paper presents a digital machine learning Compton suppression algorithm (MLCSA), that uses state-of-the-art machine learning techniques to perform pulse shape discrimination (PSD) for high purity germanium (HPGe) detectors. The MLCSA identifies key features of individual pulses to differentiate between those that are generated from photopeaks and Compton scatter events. Compton events are then rejected, reducing the low energy background. This novel suppression algorithm improves gamma spectroscopy results by lowering minimum detectable activity (MDA) limits and thus reducing the measurement time required to reach the desired detection limit. In this paper, the performance of the MLCSA is demonstrated using an HPGe detector, with a gamma spectrum containing americium-241 (Am-241) and cobalt-60 (Co-60). The MDA of Am-241 improved by 51% and the signal to noise ratio (SNR) improved by 49%, while the Co-60 peaks were partially preserved (reduced by 78%). The MLCSA requires no modelling of the specific detector and so has the potential to be detector agnostic, meaning the technique could be applied to a variety of detector types and applications.


Introduction
Gamma spectroscopy is a common method used in the nuclear industry to identify and quantify the presence of radiation.High purity germanium (HPGe) detectors are often selected to measure low intensity or complex gamma-ray signatures due to their excellent ∼ keV resolution and will be the focus of this paper.In the nuclear fusion field, gamma spectroscopy is used in waste characterisation, materials research for future fusion machines, neutron flux quantification via activation foils [1], and many other areas.The nuclear structure of radionuclides common to fusion research mean it is often the case that low activity nuclides emitting low energy γ rays need to be identified in the presence of higher energy γ emitters.For example, the neutron activation of composite materials, such as stainless steel, readily produces complex activation networks, resulting in the emission of a vast spectrum of γ energies [2].This poses a problem, as when higher energy photons interact within the detector they can Compton scatter [3,4] out of the crystal and only deposit a fraction of their energy.This contributes to a continuum of detected energies, which extends to the lower energy part of the spectrum (figure 1).For example, in irradiated steel, the presence of Co-57 is indicated by a low-energy 122 keV gamma ray, which will be masked by the Compton scattering contribution of high energy Co-60 γ rays of 1173 and 1332 keV.This elevated background increases the low energy minimum detectable activities (MDA), which negatively impacts the characterisation work.There are some existing solutions to reduce the Compton scatter influence in gamma-ray spectra.One example is a Compton veto ring, where a ring of gamma spectroscopy detectors, typically sodium iodide (NaI), surround a HPGe crystal.An example of such a system is shown in figure 2, located at the radiological assay and detection lab (RADLab), at the United Kingdom atomic energy authority (UKAEA) in Oxfordshire.If a photon is detected in the HPGe crystal and NaI crystal within a set time window, the signal is rejected from the spectrum as a Compton scatter event.This method is effective at reducing the Compton continuum, but it often falsely reduces the photopeaks of interest [5].A physical Compton suppression system is also bulky, expensive, and is not easily re-configurable for multiple detector types.
In this work, the physical Compton suppression system used at UKAEA has been demonstrated to reduce the Compton background on an Am-241 (59 keV photopeak) and Co-60 spectrum by ∼ 88%, however it reduced the Co-60 photopeaks by ∼ 77% and reduced the Am-241 photopeak by 18% (based on equation 5 in section 3.2.1,peak height reduction as a percentage for energy regions of interest, discussed in sections 3.2.1 to 3.2.2).
Fig. 2 The Mirion broad energy germanium (BEGe) detector used in this work, with the physical Compton suppression system surrounding the main HPGe crystal An alternative technique is to use digital algorithms that do not rely on additional hardware like a Compton veto ring.Past attempts [6] have utilised a pulse rise time cut off to preserve low energy pulses corresponding to photoelectric absorption, while removing pulses contributing to the Compton continuum.Low energy photons are more likely to undergo photoelectric absorption close to the surface of the detector, resulting in a long charge collection time in the HPGe as the electrons drift further through the crystal to the collection electrodes.In contrast, high energy photons more readily pass through the detector, and are more likely to Compton scatter and deposit their energy further inside the crystal.This results in a shorter rise time of the generated electrical pulses.Previous solutions have rejected all pulses below a rise time threshold [6] (therefore removing pulses from high energy photons) and required extensive modelling of the specific detector to determine the optimal rise time cutoff.While effective, it relies on classification based on a single parameter, is not practical for preserving a broad range of energies, and is difficult to deploy to multiple types of detectors as the models would require modification.
Another digital technique is the use of machine learning algorithms for pulse shape discrimination (PSD).Examples of this include gamma particle tracking [7] and α-γ PSD [8].However γ-γ PSD for Compton suppression on HPGe detectors has not been undertaken.This work presents a novel digital solution to Compton suppression, the machine learning Compton suppression algorithm (MLCSA), which performs Compton suppression (via γ-γ PSD) on pre-amplified pulses to reduce the Compton continuum, while preserving higher energy photopeaks and not relying on any detector modelling or additional hardware.The MLCSA comprises a convolutional neural network (CNN) as a supervised, classification machine learning model.Incoming pulses are processed and the CNN classifies each pulse as either a pulse from a photoelectric absorption (photopeak) or from a Compton scatter event (background).The scattered pulses are rejected and the output is a spectrum containing only pulses classified as photopeaks.

Methods
The MLCSA is an algorithm that performs Compton suppression on pulses from a HPGe detector and is part of a process that comprises four components, as shown in figure 3: detector (data acquisition), digitiser (signal/pulse processing), MLCSA (classifying), and Compton-suppressed spectrum (results).The methodology to these components are described in the following sections.

Detector and data collection
The HPGe detector used to gather training data (which are later split into training and testing All sources except for the Mn-54 source were positioned 10 cm from the end cap for measuring to reduce coincidence summing effects [3].The Mn-54 source was placed at 0 cm due to its lower activity.For collection of the training data, each source was placed on the detector individually.Based on the energies of the detected pulses, these training data were split into Compton and photopeak categories.For the live data scenario, both the Am-241 and Co-60 sources were placed near the detector simultaneously (10 cm from the end cap) to create a realistic multi-nuclide spectrum.
The measured raw pulses were subsequently modified so that the CNN could be trained purely on the shape of the pulses.Raw pulses could not be used directly for training the CNN as the algorithm was found to use features such as pulse height and noise level to characterise pulses in a way that lead to overfitting [11].Therefore, a pulse processing pipeline, shown in figure 4, was created to process the pulses for the training algorithm.This process included filtering, labelling, transformations, and regularisation techniques [11] to reduce overfitting.
Fig. 4 Pulse processing pipeline which is described in detail in the main text.The greyed out steps are not used when applying this process to the test and live data set The pipeline starts by importing raw data, and removing pulses that exceed a maximum voltage and those below a pre-determined noise level.Then the data for each individual source are split into photopeak and scatter categories and labelled accordingly (based on their pulse heights).Pulses are then normalised to remove pulse height information and are randomly translated along the time axis.Gaussian noise is then added so that every pulse has the same noise level, irrespective of its energy.Finally the number of events in the photopeak and scatter categories are standardised.Example pulses for Am-241 and Co-60 photopeaks, and Co-60 Compton scatter, are shown in figure 5.The processing pipeline for live data was similar to the training pipeline, except for the removal of the greyed out steps in figure 4. The split & label step was removed because that information cannot be known for a realistic mixed γ source.The time translation step was removed because it is unnecessary due to the way the CNN was trained.The trim data step was also not possible because the data originate from a mixed source hence the number of each type of event present is unknown.

Convolutional Neural Networks
The MLCSA is a supervised classification machine learning model, and specifically uses a convolutional neural network (CNN).The CNNs are a class of artificial neural networks and are tailored for handling structured grid-like data.They are typically used in fields such as object recognition, image classification, and image segmentation.They comprise multiple layers, including convolutional and pooling layers, and excel at feature extraction and pattern recognition within images [12].The convolutional layers use learnable filters to scan and identify local features, while pooling layers reduce spatial dimensions, enhancing computational efficiency and maintaining feature robustness [11].
The CNN architecture used in this work is a basic one-dimensional (1D) CNN [13], specifically a sequential model with convolutional, max pooling, flatten, and dense layers (convolutional and pooling layers for feature extraction and dense layers for classification).This architecture is suitable for processing 1D sequences, such as time series data or any other 1D data [11].The architecture was implemented through standard sklearn and keras Python libraries.

Training and Testing process
The parameters (number of hidden layers, batches, and epochs) for the CNN in the MLCSA were selected and tuned using a process of random search hyper-parameter optimisation to find the best values for each [11].Each set of parameters were evaluated by checking the effect on the model performance, using common methods such as a confusion matrix, accuracy, precision, recall, and cross-validation (CV) score [11,14].These evaluation methods all compare the known labels to the model predicted labels and all except the confusion matrix are recorded as a percentage.This process was repeated until there were no further improvements to the evaluation results (results didn't increase in value), and at that point the CNN was concluded to have reached peak performance.The final parameters that provided the best results (shown in section 3.1), and therefore the ones used in the final CNN model were: 4 layers (3 hidden), 128 batches, and 30 epochs.The confusion matrix for the final model is shown in figure 6, where a perfect model would have nonezero values only on its main diagonal (top left to bottom right) [11].Once the CNN parameters were tuned, the model was trained using the 80% split part of the full training data set, which had been processed as described in the steps in figure 4. The final labelled training pulses were passed to the CNN model, which mapped and learnt features in the pulses based on their label.Then the performance of the CNN was evaluated (accuracy, precision, recall, and CV score), where the trained model was passed only the pulses and not the labels, so that the predictions could be made by the CNN and then compared to the actual classifications.Then, the model was passed the pulses without labels from the 20% test set, and its predictions were evaluated in the same way.The model had never seen the pulses of the test set, and so if the performance evaluations in classifying the two sets (training and testing) are similar, then the model can be seen as generalisable to new data and not likely to overfit.

Live data process
Once the CNN model was confirmed to be performing well, it was incorporated into the digital Compton suppression flow process (figure 3) as the MLCSA step.The processed pulses for a new, live data set, containing pulses from Am-241 and Co-60, were passed to the MLCSA, where the output was pulses with predicted labels of photopeaks (1) or scatter (0).

Spectrum
The final step in the digital Compton suppression procedure, shown in figure 3, is the production of a spectrum using the MLCSA's predictions, where pulses predicted and labelled as photopeaks (1) were extracted and plotted as a final 'peak only' spectrum and the scatter pulses (labelled 0) were discarded.

MLCSA performance on the test data set
The CNN model, which was trained on the 80% data set, was evaluated with the 20% subset of data.The evaluation results for the test data are shown in table 1, which shows that the CNN performed strongly, with all evaluation metrics greater than 70%.The high CV score indicated that the model is generalisable and not likely to over-fit on new data.

MLCSA performance on the live data set
Once the CNN model was trained and evaluated on the small 20% data set, it was used in the MLCSA with the live data set, comprising an Am-241 and Co-60 source simultaneously positioned 10 cm away from the detector.The data were collected and processed through the digital Compton suppression flow from figure 3, which included the second pulse processing pipeline without the greyed out processes from figure 4. The result of the MLCSA is shown in figure 7, which shows the before and after spectra.

Spectrum evaluation metrics
Three metrics were used to quantify and evaluate the Compton suppression performance of the MLCSA on the final spectrum.
The first was MDA improvement of the Am-241 photopeak.The MDA is a calculation of the lowest measurable activity for a specific set up, and was calculated using the Currie method [15,16] as where B is the background sum including a region of 10 keV either side of the photopeak, br is the branching ratio, ε is efficiency, and t is count time.
The second metric was signal to noise ratio (SNR).The signal is defined as the net area of the photopeak above the background level and the noise is defined as the area below the peak [17]: where an = (af − bg) (3) and where an is net peak area, bg is the background area underneath the photopeak, af is the full area, C is the number of channels in the photopeak, B1 and B2 are the numbers of counts at the start and end of the photopeak, respectively [17].
The final metric was the percentage difference between before and after the MLCSA was applied, including: decrease in MDA of the Am-241 photopeak, reduction in photopeak counts, reduction in max Compton counts on the spectrum for a range of energies, and SNR improvement of Am-241.These were calculated as where b and a refer to the relevant value before and after the MLCSA, respectively.

Spectrum evaluation performance
The MLCSA was able to significantly reduce the Compton continuum throughout the whole spectrum, with the largest reductions at the lower energy region (the 200 keV region was reduced by ∼ 90%), while almost fully preserving the Am-241 photopeak (the 59 keV photopeak was only reduced by 15%).The Co-60 photopeaks were preserved, but were significantly reduced in counts (∼ 78%).However, it should be noted that the lower energy (1172 keV) Co-60 photopeak was not provided in the training data set, and so the presence of the photopeak after the MLCSA suggests the CNN model is able to generalise to unseen photopeak pulses.
The percentage reduction in photopeaks in the before and after spectrum was calculated from the data shown in figure 7. The reductions for the 59 keV Am-241 photopeak, 1173 keV / 1332 keV Co-60 photopeaks, and the overall counts reduction in two scatter areas, defined as a low scatter region at 200 keV and a high scatter region at 400 keV, are shown in table 2. The percentage difference was significant on the scatter regions after the MLCSA, while remaining desirably low for the low energy Am-241 photopeak.The Co-60 peaks were reduced significantly, however this is similar to the physical Compton veto system as discussed in Section 1 [5] and is an improvement on the output of other digital methods [6] which completely remove the higher energy photopeaks.The SNR was calculated on the Am-241 photopeak using Equations 2 -4 for the before and after spectrum, producing values of 1.23 and 1.83 respectively.This produced a significant improvement in the SNR of the Am-24 photopeak, with a percent increase of 49% (calculated using equation 5), which indicates a reduction in the Compton background at the 59 keV photopeak.
The MDA of the Am-241 59 keV photopeak was calculated using Equation 1 for the before and after MLCSA spectra, and the improvement was calculated using Equation 5.The MDA before was 156 Bq and after was 76 Bq, this produced a reduction of 51%.Compared to other digital methods that achieved an MDA reduction of 37% [6], this is a significant improvement.

Discussion
The MLCSA reduced the Compton continuum and improved the SNR ratio of the low energy Am-241 photopeak in the presence of higher energy γ emitters.The resulting spectrum from the MLCSA on the Am-241 and Co-60 spectrum show that the Compton continuum from Co-60 scatter can be reduced by ∼ 90% using machine learning (specifically a basic 1D sequential CNN), while retaining the Am-241 and Co-60 peaks.The SNR ratio of the Am-241 peak was improved by 49%, and the MDA improved by 51% -this will lead to better detection and more accurate activity quantification, especially in the case where the lower energy photopeaks are present in lower quantities.This shows the potential for improving Compton suppression systems as this method is similar in performance to the physical Compton veto ring, but with a digital advantage that it could be readily adapted to other detectors.This work did not involve complex modelling of a detector, but instead relied on measuring a set of reference sources.Therefore, this could be applied to other detectors as the CNN model would only need re-training with pulses from that detector.This means the MLCSA is potentially detector agnostic, which could be demonstrated with further work and development.Other future developments of this algorithm could include: making the MLCSA run in live time, including more nuclides to the training and testing sets to make it more widely applicable, and improving the machine learning model performance.
With the significant improvement to low energy nuclide detection in the form of Compton suppression, the MLCSA has the potential to improve fusion diagnostics by improving the detection of key low energy nuclides.As an example, in foil activation experiments, information about the plasma can be calculated from key lines in the gamma spectrum following irradiation of activation foils in a fusion environment [1,5,[18][19][20][21][22]. The lines of interest are often low energy, can be low abundance, and are often obscured by the Compton scattering of higher energy activation products.The MLCSA could be applied to such spectra, which would reduce the Compton continuum and improve the analysis and therefore could reduce the errors on the fusion power calculations.
The MLCSA developed in this work has the potential to have a broader impact on the nuclear industry.Gamma spectroscopy is a method used in radioactive waste management and disposal (fission and fusion [23]), where disposal costs are calculated per unit of activity [24].Reduced Compton backgrounds from the MLCSA could make gamma spectroscopy systems more accurate, which could have financial benefits relating to disposal costs.

Fig. 1 a
Fig. 1 a) Gamma-ray interactions in a HPGe crystal: one is fully photoelectrically absorbed and produces a full energy photopeak (blue/dark) and the other Compton scatters and contributes to the Compton continuum (green/light).b) Example pulses from each interaction type.c) A typical energy spectrum showing Compton and photoelectric events

Fig. 3
Fig. 3 Digital Compton suppression flow diagram showing the four components, including the MLCSA

Fig. 5
Fig. 5 Examples of the final training pulses, with a Co-60 photopeak pulse in blue, a Co-60 Compton scatter pulse in green, and an Am-241 pulse in red

Fig. 6
Fig. 6 Confusion matrix for the pulses, corresponding to the optimal model parameters during training, where S is scatter and P is photopeak.The rows represent the true label and the columns represent the predicted labels

Fig. 7
Fig. 7 Mixed Am-241 and Co-60 spectrum before (line) and after (filled) the MLCSA.Top shows the full spectrum and bottom is zoomed along the vertical axis

Table 2
Evaluation