Introduction

Axions are hypothetical bosons predicted by the Peccei–Quinn (PQ) mechanism. The PQ mechanism explains the absence of CP-violation in quantum chromodynamics (QCD) [1,2,3,4]. Axions could also be responsible for the cold dark matter (DM) of the universe [5,6,7]. Therefore, the existence of axions would resolve two major problems in physics. Due to astrophysical constraints [8] the mass of the axion would have to be very small \(m_a < {20}~\hbox {meV}\) and various cosmological models constrain the mass range even further [9,10,11,12,13].

Microwave cavity haloscopes [14, 15] such as Admx[16, 17], Organ[18], Haystac [19] and Cultask [20] are one class of experiments searching for low-mass DM axions. The principle of these experiments is based on axion–photon conversion in the presence of a strong magnetic field. A new experimental design is a dielectric haloscope, which will be able to probe higher mass ranges and higher frequencies than the experiments mentioned above. The MAgnetized Disk and Mirror Axion eXperiment (Madmax) [21] will be the first realisation of a dielectric haloscope with the goal of reaching the sensitivity for the QCD Axion.

Fig. 1
figure 1

Figure reproduced from Ref. [21]

Preliminary baseline design of the MADMAX approach. The experiment can be divided into three parts: (1) magnet (red racetracks), (2) booster—consisting of the mirror (copper disk at the very left), the 80 dielectric disks (green) and the system to adjust disk spacing (not shown)—(3) the receiver—consisting of the horn antenna (yellow) and the cold preamplifier inside a separated cryostat. The focusing mirror is shown as an orange disk at the right. Not to scale.

In a dielectric haloscope, microwave radiation is emitted from the surfaces of dielectric disks as a result of axion–photon conversion at the interface between the disk and the vacuum [22]. The signal from a single interface would be too weak to detect, therefore the signal is amplified (boosted) through resonance and constructive interference from multiple disks. The enhancement compared to a single metal surface is the boost factor \(\beta\). The frequency and bandwidth at which \(\beta\) is maximal can be adjusted through disk placement. Even for fixed disk thicknesses, the variation of distances between disks allows one to scan a large frequency range. MADMAX aims at probing the frequency range of 10–100 GHz. A prototype of the MADMAX experiment will consist of 20 disks, while the final experiment will likely use around 100 disks. Figure 1 shows a sketch of the planned experiment.

Simulations of the experiment can be used for sensitivity estimates as well as to constrain experimental boundary conditions such as the required mechanical precision of the disk placement system. Various approaches can be used for the simulation of the experiment at different levels of detail. So-called analytical 1D methods [22] are used in an idealised context, where the disk size is taken to be infinite and some design aspects are neglected. Actual 3D simulations [23] are more realistic but suffer from high demands on computing resources and quickly become prohibitive, especially when optimising in a 3D simulation with a large number of disks and therefore a large number of degrees of freedom. An example, where alternative simulation approaches may be helpful is the optimisation of the disk positioning for a desired frequency and bandwidth of the boost factor. Optimisation needs to be done iteratively by simulating the electromagnetic response of the experiment for a fixed set of disks many times. Disks are readjusted in each iteration until the desired boost factor curve is reached.

Applications of gradient descent to find an optimal disk configurations have already been explored [24]. In this work, however, we demonstrate a proof-of-principle that machine learning methods are capable of reproducing the actual electromagnetic response of the experiment. Specifically, we train a neural network which predicts a boost factor curve from a set of disk positions.

Simulation and Dataset

To generate the training data, the analytical 1D simulation is used to calculate the boost factor curves corresponding to various configurations of the disk positions. The 1D simulation assumes a perfect idealised geometry with 20 plane disks of infinite diameter and of \({1}~\hbox {mm}\) thickness and a mirror on one side of the cavity. The total length of the apparatus is variable but on the order of a few metres. The calculations are based on the modified axion–Maxwell’s equations from which the axion-induced emission of electromagnetic radiation from an interface between the dielectric and vacuum is derived. Each interface has an incoming and an outgoing wave that satisfy the overall boundary conditions of the system. To calculate the resulting signal that emerges from the system a transfer matrix formalism [22] is applied.

The reference boost factor curves and their corresponding disk configurations for frequencies covering a range from 19 to 25 GHz with a bandwidth of \({50}~\hbox {MHz}\) are found through iterative optimisation of the disk placement using the Nelder–Mead algorithm [25]. The bandwidth of the curve is considered during the optimisation procedure, in which the target of the objective function is defined by the lowest value of the boost factor in the desired frequency window.

Generating a truly random dataset by varying the disk spacings without any constraints would currently be prohibitive, as the fraction of the “experimentally interesting” boost factor curves would be low. Experimentally interesting curves are box-like curves of non-negligible bandwidth in the desired frequency range. Instead, the pre-optimised reference configurations are used as seeds. For each seed configuration, the disk positions are randomly varied with a uniform distribution within a range corresponding to 5% of the disk thickness—\({0.05}~\hbox {mm}\). These pseudorandom curves are then filtered to have bandwidths (FWHM) between \({40}~\hbox {MHz}\) to \({200}~\hbox {MHz}\) and maxima of the boost factor between 15,000 to 25,000. This requirement rejects about 90% of the generated configurations.

In addition, the integral of the boost factor over the entire frequency space is constant by the area law, regardless of the disk positions [22]. This property can be used to determine whether a configuration is sampled at the correct frequency space. Configurations with an integral of less than 95% are discarded. Figure 2 shows an optimised reference boost factor curve and several random curves generated according to the above prescription.

Fig. 2
figure 2

Nominal pre-optimised boost factor curve (light blue line) with a bandwidth of \({50}~\hbox {MHz}\). Also shown are several boost factor curves with randomly varied disk positions according to a uniform distribution with a maximum variation of 5%

It is visible that even these small variations of the disk positions result in significant deformations of the boost factor curve.

Fig. 3
figure 3

Model architecture. All layers except the last one use PReLU as the activation function with alpha initializer of 0.25

The final dataset contains 169,260 configurations and corresponding boost factor curves, sampled at 300 points each along the frequency space. The dataset is further divided into training, validation and testing sets with 122,290, 21,581 and 25,389 configurations respectively.

Network Architecture and Training

Inputs

In order to build a model capable of predictions at arbitrary frequency values the input data is reshaped such that each entry consists of an array of distances between the disks and a frequency value at which the boost factor is sampled. The corresponding truth target is then the boost factor value at that frequency. A single configuration consisting of 21 (20 disks, 1 mirror) positions with corresponding boost factor values at 300 frequency values becomes \(22 \cdot 300 = 6600\) individual training data points. To facilitate training, the input data and the truth target are transformed, respectively, into a uniform distribution using a Quantile Transformer [26].

Model and Training

The model consists of a 1D convolutional layer [27] with 128 filters with subsequent 1D max pooling, as well as flattening, followed by two dense layers with 512 nodes each and an output node. Each layer, except the last, uses batch normalisation [28], dropout [29] at a rate of 0.2 and PReLU [30] as an activation function with an alpha initializer of 0.25, following the recommended initialisation of the weights using the He uniform initializer. The structure of the network is shown in Fig. 3.

The Adam optimiser [31] is used for training in its default configuration with a learning rate of \(10^{-5}\). Mean absolute error (MAE) serves as the loss function in addition to the L\(^2\) norm with a weight of \(10^{-4}\). Early stopping in combination with model checkpoints is used to prevent overtraining. The batch size is 1024.

The presented model is chosen based on a cursory search and evaluation of various architectures, including layer type, layer number, number of nodes, activation functions, cost function, weight initialisation, and preprocessing.

Generalization Across Frequency Phase Space

In addition to the usual measures of controlling overtraining, such as the aforementioned regularisation and early stopping, it is of interest to see if the model can generalise reasonably well within the target phase space. The expectation is that the network should be able to predict the boost factor values at arbitrary frequency values within the target range, because the inputs are effectively smooth. To that end, a new training is conducted with a modified training set, where certain frequency ranges are blinded. Specifically, we remove frequency bands of \({500}~\hbox {MHz}\), around each half integer between \({19}~\hbox {GHz}\) and \({25}~\hbox {GHz}\), keeping about half of the original data for the training. The validation and test sets are kept the same as for the baseline training.

Fig. 4
figure 4

Example prediction with a “good” boost score of \(s \approx 0.15\) (light blue) and a “poor” boost score of \(s \approx 0.61.\) (dark blue) with the truth (orange)

Performance Evaluation

Evaluating the Prediction

The prediction task is somewhat atypical, so common performance metrics such as mean absolute error (MAE)

$$\begin{aligned} \text {MAE} = \text {mean}\left( \left| \sum _i y_{\mathrm{true}}^i - y_{\mathrm{prediction}}^i \right| \right) \end{aligned}$$

are not ideally suited to describing how well a predicted boost factor curve matches the truth. Important features of interest are peak positions and bandwidth. However, the peak position often cannot be unambiguously determined due to the typical double-peak structure, which can be seen in Fig. 4. Furthermore, we are interested in the overall shape, rather than the accuracy at any specific point. We, therefore, define a new metric suitable for this task, by comparing the overlapping areas of the curves. This metric is named boost score in the following and is defined as

$$\begin{aligned} s = 1 - \frac{2 \cdot A_o}{A_t + A_p}, \end{aligned}$$
(1)

where the score s is given by the area of the truth curve \(A_t\), the area of the prediction curve \(A_p\), and the overlapping area \(A_o\). Thus, \(s \in [0,1]\), where 0 represents a perfect prediction and 1 represents the largest possible deviation from the truth.

Figure 4 illustrates the behaviour of the boost score metric and shows that the boost score responds both to the peak position and width intuitively, making it a suitable measure of performance for this problem.

Results

To investigate the stability of the selected architecture, the trainings were repeated five times each. These five runs differ only in the random initialisation of the weights in the network. The constant behaviour of the runs indicates a good robustness of the chosen architecture. Small differences are to be expected due to the stochastic nature of the neural network.

The overall performance of the trained model is shown in Fig. 5, where the bin-wise mean score of the five runs with its respective standard error is shown. Empirically, a score below \(\approx ~0.3\), that is, covering \(70\%\) of the expected boost factor area, is sufficiently accurate to be of experimental value for Madmax boost factor predictions. For the baseline training, \(89\%\) of the predictions reach this target. For the blinded training \(74\%\) of the predictions in blinded frequency range and \(84\%\) of the predictions in the visible range meet this target.

Fig. 5
figure 5

Boost score distribution averaged over 5 trainings of the baseline configuration (dark blue). As well as the distributions of the blinded training split between the blinded and visible frequency range

In addition to provide a sense of the prediction quality, randomly sampled curves from the test set are shown in Fig. 6 at several boost score percentiles for both the baseline (left) and blinded (right) trainings.Footnote 1.

Fig. 6
figure 6

Several randomly selected predictions from the test set at the 20th, 50th and 80th percentiles are shown for the baseline (left) and blinded (right) trainings. The solid line represents the simulated “truth” boost factor curve and the dashed line the prediction of the network. The corresponding boost score is given. The blinded area is shown in green

In general the model can reproduce the peak position and the bandwidth sufficiently well. However, the actual peak structure is typically not captured. By comparing the baseline with the blinded training we show the model is able to interpolate across masked phase space, albeit at a certain loss to performance, indicating good generalization capabilities.

A major promise of using deep learning to estimate the boost factor is computational efficiency. The analytical 1D simulation takes about \({80}~\hbox {s}\) to generate the curves from the test set, which translates to about \(\approx 320~\text {curves/s}\). Corresponding values for a much more realistic three-dimensional simulation range from \({1400}~\hbox {s}\) and \(18~\text {curves/s}\) to \(80\times 10^{6}~\hbox {s}\) and \(0.0003~\text {curves/s}\), depending on the desired accuracy. Compared to the simulation of the analytical 1D code, the network prediction is faster by about a factor of 5. Moreover so, when considering the more realistic 3D simulations, the speedup is on the order of millions to tenths of millions, depending on the selected precision. Meanwhile, the prediction speed of the network should be independent of the training data used.

Conclusion

In this paper we present a first application of deep learning methods to model the boost factor response of an axion haloscope experiment. Although the current implementation has some limitations imposed in terms of the phase space covered, it is capable of predictions accurate enough to be of experimental value, while also being very robust. Furthermore, we show that the model is able to predict into fully blinded frequency regions, demonstrating the ability to generalize, although the quality of the prediction suffers a bit.