1 Introduction

Machine learning has proven to be a valuable tool in reconstruction and analysis tasks for high-energy physics (HEP) [1]. In particular, the classification of signal and background using machine learning algorithms has sparked significant interest in recent times. The majority of these algorithms are trained in a supervised manner, and therefore rely on a prior definition of signal provided by a theoretical framework and simulations. However, detector signatures corresponding to elusive beyond the standard model physics (BSM) processes might be missed owing to a narrow signal definition or a mis-modelling of either signal or background.

Unsupervised and semi-supervised methods aim at the identification of signal features while minimizing predictions about signal or background. A data-driven approach is adopted allowing for a model-agnostic analysis that has the advantage of being independent from theoretical assumptions and therefore not confined to specific signal hypotheses and background modelling. In this document, we present two unsupervised machine learning methods with the objective to identify anomalous detector patterns that are potential indicators for unaccounted physics scenarios. The anomaly detection is performed such that it can be considered as a sophisticated filter mechanism that defines a potential signal region for further statistical analysis. In this paper, we focus on the filter itself.

The inner region of the Belle II detector is considered to demonstrate this filtering approach. The input data consists of pixel hits coming from the Belle II pixel detector presently featuring one layer of DEPFET (Depleted P-channel FET) [2] silicon sensors. The unsupervised data-driven machine learning algorithms are trained on beam-background data recorded by the pixel detector when a single beam was circulating in the collider. Simulated pixel hits by hypothetical long-lived magnetic monopoles serve as anomalous events to evaluate the performance of the presented algorithms. The unsupervised techniques are compared to a neural network, that is trained in a supervised manner.

The paper is structured as follows: in Sect. 2, the Belle II experiment and the pixel detector are introduced. Subsequently, the dataset and data preprocessing are described in Sect. 3. The different machine learning algorithms are introduced in Sect. 4 and their performance is presented in Sect. 5.

Fig. 1
figure 1

Example clusters of single-beam background particles from the inner region of the PXD (pixel dimensions: \(50\,\upmu {\hbox {m}}\times {55\,\upmu {\hbox {m}}}\) ). The seed pixel is located at the centre of the 9 \(\times \) 9 pixel matrix. The colour scale represents the single-pixel charge in ADC values

Fig. 2
figure 2

Example clusters of simulated magnetic monopoles from the inner region of the PXD (pixel dimensions: \(50\,\upmu {\hbox {m}}\times {55\,\upmu {\hbox {m}}}\)). The seed pixel is located at the centre of the 9 \(\times \) 9 pixel matrix. The colour scale represents the single-pixel charge in ADC values

2 The Belle II experiment

The Belle II experiment, located at the SuperKEKB accelerator, has started operation in spring 2019. SuperKEKB provides electron-positron collisions with a nominal centre-of-mass energy of 10.58 GeV and a design instant luminosity of \(>\,{5\times {10}^{35}}\,{\hbox {cm}^{-2}\,\hbox {s}^{-1}}\) [3].

The Belle II detector is composed of several sub-detectors, that are arranged cylindrically around the interaction point [4]. This document only treats the inner-most detector: an all-silicon pixel detector that is part of the Belle II tracking system, as detailed below. The silicon vertex detector (SVD) based on double-sided silicon strip sensors and the central drift chamber (CDC), a cylindrical wire chamber, are the other two detectors making up the Belle II tracking system.

2.1 The Belle II pixel detector

The Belle II pixel detector (PXD) consists of pixelated DEPFET sensors, that are arranged in two layers at radii of 14 mm and 22 mm from the beam pipe [5, 6]. Presently, only the first layer is installed. The PXD features nearly 4 million pixels with pixel sizes between \(50\,\upmu {\hbox {m}}\times {55\,\upmu {\hbox {m}}}\) and \(50\,\upmu {\hbox {m}}\times {85\,\upmu {\hbox {m}}}\) and a thickness down to \({75}\,{\upmu {\hbox {m}}}\).

The data rate coming from the PXD is foreseen to reach about 20 GB/s, which necessitates an online reduction scheme [7]. The FPGA-based data-reduction system online selection node (ONSEN) is able to reduce the data rate by a factor of 30 by using reconstructed tracks provided by the online event reconstruction, which are extrapolated to the PXD layers. A region-of-interest (ROI) is defined around the intercept of these tracks with the detector layers and only pixel hits within ROIs are considered [8]. However, this filtering mechanism relies on reconstructable particle tracks, as PXD hits outside a ROI are discarded by the ONSEN. In particular, particles with a low transverse momentum or high energy loss can escape tracking. As a consequence, no ROI is generated and the PXD data associated with these particles is lost.

To guarantee a high signal efficiency of particles with non-reconstructable tracks, a new veto system based on machine learning is proposed. A proof-of-principal for a veto system dedicated to the identification of slow pions has already been presented in the past [9]. In this document, the cluster rescue veto system is extended to exotic or anomalous particle signatures that do not generate a reconstructable particle track. To assess the efficiency of the cluster rescue mechanism, we simulate the creation of long-lived hypothetical magnetic monopoles in the particle collision. As a consequence of their high energy loss, magnetic monopoles are stopped in the inner layers of the Belle II detector [10]. The lack of hits in the outer sub-detectors inhibits the reconstruction of tracks, which also leads to the deletion of PXD data associated with a monopole by the ONSEN, once the ROI selection is switched on. The aim of the proposed veto system is to identify the relevant PXD data based on anomalous event signatures and tagging it to prevent deletion. We consider unsupervised machine learning algorithms to generate the veto, that could potentially run online during data-taking on FPGA-based systems.

Moreover, the tracking algorithms are currently relying on information from the CDC to form a track. However, with the rescued PXD data, a novel tracking approach using only the silicon-based PXD and SVD detectors could be envisioned.

3 Data generation

PXD background data was recorded in dedicated beam-background runs taken in 2020. For these runs, only a single particle beam circulated in the Belle II detector. Background generation mechanisms such as the interaction of the circulating beam with residue particles in the beam pipe are responsible for the background hits detected in these runs. The PXD hits generated by background are characterised by a small charge signal in each pixel, as shown by the example clusters displayed in Fig. 1. The hits are nevertheless detected by the PXD due to the high signal-to-noise ratio of the DEPFET pixel sensors. The v-coordinate is along the beam direction and the u-coordinate perpendicular to it. In view of future online applications of the investigated algorithms, the raw PXD hit information is used without applying an offline calibration.

Dedicated background samples are regularly produced in Belle II for background monitoring and as background-overlay files for simulations. It is thus possible to obtain updated background samples in the future, which allows for an adaptation of the classification algorithm to changing beam conditions.

The signal events are simulated using the official Belle II software framework basf2 [11]. The creation of monopole-antimonopole pairs from electron-positron collisions is considered. The magnetic charge of the monopole is set to 68.5 e in accordance with the Dirac theory [12, 13] and the mass to 3 GeV. A full detector simulation with basf2 is performed, including the interaction of particles with the PXD layers. The simulated PXD hits associated with magnetic monopoles are shown in Fig.  2.

The PXD information used as input for the machine learning algorithms is extracted as follows: For each simulated event and for the background events, the charge values of a 9 \(\times \) 9 pixel matrix are considered around the PXD hit with the highest charge value (seed pixel). The matrix size is sufficiently large to capture the entire cluster for the majority of events and small enough to guarantee a fast convergence of the investigated algorithms. In addition, the global position of the seed pixel within the PXD is extracted. In total, 84 features are considered, which are normalized to the range [0, 1] to avoid dominance of a single parameter.

4 Machine learning techniques

We propose a sophisticated filter based on unsupervised machine learning algorithms to identify anomalous signatures in the PXD. The filter operates on a matrix-by-matrix basis and labels each 9 \(\times \) 9 matrix as anomalous or normal based on an anomaly score. While the scope of the anomaly score depends on the selected algorithm, we adopt the definition that low values represent normal and high values anomalous events.

4.1 Performance metrics

For all algorithms, the receiver operating characteristics (ROC) is obtained by scanning the signal efficiency \(\epsilon _S\) and recording the background rejection \(\epsilon _B\). The area-under-curve (AUC) is commonly used as a figure of merit for the performance of classifiers [14]. For anomaly detection, a high background rejection is particularly desirable. Therefore, the signal efficiency at three different operation points featuring a high background-rejection level are studied as well, i.e. the signal efficiencies \(\epsilon _S(\epsilon _B = 10^{-2})\), \(\epsilon _S(\epsilon _B = 10^{-3})\) and \(\epsilon _S(\epsilon _B = 10^{-4})\) are extracted.

The uncertainty is extracted by repeating the training and evaluation five times with random shuffling of input vectors i.e. a vector in the training set in the first iteration can be assigned to the evaluation set in the second one. In each iteration, the performance metrics are determined. Their mean represents the nominal value and the quadratic sum of deviations the uncertainty.

4.2 Multilayer perceptron (MLP)

First, a supervised multilayer perceptron (MLP) is considered, to which the unsupervised learning approaches are compared.

The supervised training is performed using 350k background and signal events each. After each training epoch, a dedicated testing set is presented to the MLP containing additional 150 k events for both classes. The training is stopped automatically once the reduction of the predicted error (loss) from the testing set is only marginal. An evaluation set comprising 500 k events for each class is considered to assess the performance of the algorithm.

4.3 Self-organizing maps (SOM)

A self-organizing map (SOM) is an unsupervised machine learning technique enabling the transformation of a high-dimensional dataset to a low-dimensional discrete grid, while keeping the topological structure [15, 16]. After training, vectors that are close in the high-dimensional input space are represented by adjacent grid points in the low-dimensional space. The same training, testing and evaluation sets as for the MLP are used.

4.4 Autoencoders (AE)

An autoencoder (AE) is a feed-forward multilayer neural network that aims to reproduce the input vector without using an identity mapping. It consists of two parts: an encoder and decoder. While the encoder compresses the input to a lower-dimensional vector, the decoder reconstructs the original input from the reduced representation. The latent space in the centre of the AE is an information bottleneck that enforces the selection of relevant patterns from the input data.

During training, only background events are presented to the AE and their reconstruction error is minimized, making the AE specialized in the reproduction of background events. In the evaluation phase, the AE is able to recognize background events by a low reconstruction error. Signal events appear anomalous to the AE and are characterized by a high error, that can therefore serve as an anomaly/classification score [17].

5 Results

In the following, the hyperparameters of the three algorithms are presented and the identification performance of hypothetical magnetic monopoles against beam background is studied. The evaluation for other example signals is presented in Appendix E.

Table 1 Hyperparameters for the multilayer perceptron
Table 2 Hyperparameters for the self-organizing maps
Table 3 Hyperparameters for the autoencoder

5.1 Hyperparameters

The hyperparameters of the three algorithms are listed in Tables 1, 2 and 3, and the architectures of the MLP and the AE is shown in Appendix A. All algorithms are optimised by performing a grid search of possible hyperparameters and selecting the ones yielding the highest signal efficiency at low background levels of \(10^{-4}\).

For the SOM, the dimension of the low-dimensional grid space is set to one to allow for a comparable performance evaluation as for the other two machine learning techniques. The low-dimensional representation will therefore span only a single line, with the aim to have grid points responding to background cluster on the one end and signal on the other end.

Fig. 3
figure 3

MLP classification distribution for signal and background

Fig. 4
figure 4

SOM classification distribution for signal and background

Fig. 5
figure 5

AE classification distribution for signal and background

Table 4 Performance metrics of the investigated machine learning techniques for the evaluation set

5.2 Classification performance

The one-dimensional classification distribution is presented in Figs. 3, 4 and 5 for the three algorithms.

For the MLP, signal/background is suppressed by approximately three orders of magnitude for low/high classification values. In case of the SOM, the trained one-dimensional grid serves as classification axis and exhibits similar suppression factors as the MLP. For the AE, a signficant overlap region is visible at low classification scores, but a high signal purity is achieved at high values. The reconstruction of example matrices for both signal and background is shown Appendix B.

Fig. 6
figure 6

ROC curves for the three machine learning algorithms: a supervised MLP – Multilayer Perceptron, an unsupervised SOM – Self-Organizing Map and an unsupervised AE – Autoencoder

The ROC curves for the three algorithms are presented in Fig. 6 and the performance results are listed in Table 4.

The AUC of the MLP reaches \(99.69^{+0.01}_{-0.01}\)%, which indicates an excellent classification performance. The AUC of the AE is about 1% lower and the one for the SOM about 3% lower. At high background-rejection levels of \(\epsilon _S(\epsilon _B = 10^{-4})\), the signal efficiency of the MLP deteriorates to \(39.5^{+3.6}_{-2.3}\)%. The efficiency for the AE still reaches \(60.1^{+3.3}_{-2.7}\)% showing that the AE outperforms the other two algorithms if a high signal purity is demanded. The origin of the different classification performance of the MLP and the AE is investigated in Appendix D and the robustness of the AE against changing signal and background sets is shown in Appendix C.

Fig. 7
figure 7

a Input and b reconstructed pixel matrices associated with background particles

6 Summary and outlook

The unsupervised identification of anomalous pixel-detector data using Self-Organizing Maps and Autoencoders was presented. To exemplify the approach, hypothetical magnetic monopoles at the Belle II pixel detector were simulated and identified against beam-background data. The two unsupervised algorithms have shown a similar performance as a supervised multilayer perceptron. The Autoencoder outperforms the multilayer perceptron, if high background-rejection levels are required.

This study is an essential cornerstone for future online applications of anomaly detection at the Belle II pixel detector in order to further improve its sensitivity to undiscovered physics. Moreover, the identification of anomalous data could be considered for experiment protection and data-quality monitoring as well, since anomalies in the data could hint at unsatisfactory beam stability or malfunctioning detector components.