Abstract
Dielectric axion haloscopes, such as the Madmax experiment, are promising concepts for the direct search for dark matter axions. A reliable simulation is a fundamental requirement for the successful realisation of the experiments. Due to the complexity of the simulations, the demands on computing resources can quickly become prohibitive. In this paper, we show for the first time that modern deep learning techniques can be applied to aid the simulation and optimisation of dielectric haloscopes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Axions are hypothetical bosons predicted by the Peccei–Quinn (PQ) mechanism. The PQ mechanism explains the absence of CP-violation in quantum chromodynamics (QCD) [1,2,3,4]. Axions could also be responsible for the cold dark matter (DM) of the universe [5,6,7]. Therefore, the existence of axions would resolve two major problems in physics. Due to astrophysical constraints [8] the mass of the axion would have to be very small \(m_a < {20}~\hbox {meV}\) and various cosmological models constrain the mass range even further [9,10,11,12,13].
Microwave cavity haloscopes [14, 15] such as Admx[16, 17], Organ[18], Haystac [19] and Cultask [20] are one class of experiments searching for low-mass DM axions. The principle of these experiments is based on axion–photon conversion in the presence of a strong magnetic field. A new experimental design is a dielectric haloscope, which will be able to probe higher mass ranges and higher frequencies than the experiments mentioned above. The MAgnetized Disk and Mirror Axion eXperiment (Madmax) [21] will be the first realisation of a dielectric haloscope with the goal of reaching the sensitivity for the QCD Axion.
In a dielectric haloscope, microwave radiation is emitted from the surfaces of dielectric disks as a result of axion–photon conversion at the interface between the disk and the vacuum [22]. The signal from a single interface would be too weak to detect, therefore the signal is amplified (boosted) through resonance and constructive interference from multiple disks. The enhancement compared to a single metal surface is the boost factor \(\beta\). The frequency and bandwidth at which \(\beta\) is maximal can be adjusted through disk placement. Even for fixed disk thicknesses, the variation of distances between disks allows one to scan a large frequency range. MADMAX aims at probing the frequency range of 10–100 GHz. A prototype of the MADMAX experiment will consist of 20 disks, while the final experiment will likely use around 100 disks. Figure 1 shows a sketch of the planned experiment.
Simulations of the experiment can be used for sensitivity estimates as well as to constrain experimental boundary conditions such as the required mechanical precision of the disk placement system. Various approaches can be used for the simulation of the experiment at different levels of detail. So-called analytical 1D methods [22] are used in an idealised context, where the disk size is taken to be infinite and some design aspects are neglected. Actual 3D simulations [23] are more realistic but suffer from high demands on computing resources and quickly become prohibitive, especially when optimising in a 3D simulation with a large number of disks and therefore a large number of degrees of freedom. An example, where alternative simulation approaches may be helpful is the optimisation of the disk positioning for a desired frequency and bandwidth of the boost factor. Optimisation needs to be done iteratively by simulating the electromagnetic response of the experiment for a fixed set of disks many times. Disks are readjusted in each iteration until the desired boost factor curve is reached.
Applications of gradient descent to find an optimal disk configurations have already been explored [24]. In this work, however, we demonstrate a proof-of-principle that machine learning methods are capable of reproducing the actual electromagnetic response of the experiment. Specifically, we train a neural network which predicts a boost factor curve from a set of disk positions.
Simulation and Dataset
To generate the training data, the analytical 1D simulation is used to calculate the boost factor curves corresponding to various configurations of the disk positions. The 1D simulation assumes a perfect idealised geometry with 20 plane disks of infinite diameter and of \({1}~\hbox {mm}\) thickness and a mirror on one side of the cavity. The total length of the apparatus is variable but on the order of a few metres. The calculations are based on the modified axion–Maxwell’s equations from which the axion-induced emission of electromagnetic radiation from an interface between the dielectric and vacuum is derived. Each interface has an incoming and an outgoing wave that satisfy the overall boundary conditions of the system. To calculate the resulting signal that emerges from the system a transfer matrix formalism [22] is applied.
The reference boost factor curves and their corresponding disk configurations for frequencies covering a range from 19 to 25 GHz with a bandwidth of \({50}~\hbox {MHz}\) are found through iterative optimisation of the disk placement using the Nelder–Mead algorithm [25]. The bandwidth of the curve is considered during the optimisation procedure, in which the target of the objective function is defined by the lowest value of the boost factor in the desired frequency window.
Generating a truly random dataset by varying the disk spacings without any constraints would currently be prohibitive, as the fraction of the “experimentally interesting” boost factor curves would be low. Experimentally interesting curves are box-like curves of non-negligible bandwidth in the desired frequency range. Instead, the pre-optimised reference configurations are used as seeds. For each seed configuration, the disk positions are randomly varied with a uniform distribution within a range corresponding to 5% of the disk thickness—\({0.05}~\hbox {mm}\). These pseudorandom curves are then filtered to have bandwidths (FWHM) between \({40}~\hbox {MHz}\) to \({200}~\hbox {MHz}\) and maxima of the boost factor between 15,000 to 25,000. This requirement rejects about 90% of the generated configurations.
In addition, the integral of the boost factor over the entire frequency space is constant by the area law, regardless of the disk positions [22]. This property can be used to determine whether a configuration is sampled at the correct frequency space. Configurations with an integral of less than 95% are discarded. Figure 2 shows an optimised reference boost factor curve and several random curves generated according to the above prescription.
It is visible that even these small variations of the disk positions result in significant deformations of the boost factor curve.
The final dataset contains 169,260 configurations and corresponding boost factor curves, sampled at 300 points each along the frequency space. The dataset is further divided into training, validation and testing sets with 122,290, 21,581 and 25,389 configurations respectively.
Network Architecture and Training
Inputs
In order to build a model capable of predictions at arbitrary frequency values the input data is reshaped such that each entry consists of an array of distances between the disks and a frequency value at which the boost factor is sampled. The corresponding truth target is then the boost factor value at that frequency. A single configuration consisting of 21 (20 disks, 1 mirror) positions with corresponding boost factor values at 300 frequency values becomes \(22 \cdot 300 = 6600\) individual training data points. To facilitate training, the input data and the truth target are transformed, respectively, into a uniform distribution using a Quantile Transformer [26].
Model and Training
The model consists of a 1D convolutional layer [27] with 128 filters with subsequent 1D max pooling, as well as flattening, followed by two dense layers with 512 nodes each and an output node. Each layer, except the last, uses batch normalisation [28], dropout [29] at a rate of 0.2 and PReLU [30] as an activation function with an alpha initializer of 0.25, following the recommended initialisation of the weights using the He uniform initializer. The structure of the network is shown in Fig. 3.
The Adam optimiser [31] is used for training in its default configuration with a learning rate of \(10^{-5}\). Mean absolute error (MAE) serves as the loss function in addition to the L\(^2\) norm with a weight of \(10^{-4}\). Early stopping in combination with model checkpoints is used to prevent overtraining. The batch size is 1024.
The presented model is chosen based on a cursory search and evaluation of various architectures, including layer type, layer number, number of nodes, activation functions, cost function, weight initialisation, and preprocessing.
Generalization Across Frequency Phase Space
In addition to the usual measures of controlling overtraining, such as the aforementioned regularisation and early stopping, it is of interest to see if the model can generalise reasonably well within the target phase space. The expectation is that the network should be able to predict the boost factor values at arbitrary frequency values within the target range, because the inputs are effectively smooth. To that end, a new training is conducted with a modified training set, where certain frequency ranges are blinded. Specifically, we remove frequency bands of \({500}~\hbox {MHz}\), around each half integer between \({19}~\hbox {GHz}\) and \({25}~\hbox {GHz}\), keeping about half of the original data for the training. The validation and test sets are kept the same as for the baseline training.
Performance Evaluation
Evaluating the Prediction
The prediction task is somewhat atypical, so common performance metrics such as mean absolute error (MAE)
are not ideally suited to describing how well a predicted boost factor curve matches the truth. Important features of interest are peak positions and bandwidth. However, the peak position often cannot be unambiguously determined due to the typical double-peak structure, which can be seen in Fig. 4. Furthermore, we are interested in the overall shape, rather than the accuracy at any specific point. We, therefore, define a new metric suitable for this task, by comparing the overlapping areas of the curves. This metric is named boost score in the following and is defined as
where the score s is given by the area of the truth curve \(A_t\), the area of the prediction curve \(A_p\), and the overlapping area \(A_o\). Thus, \(s \in [0,1]\), where 0 represents a perfect prediction and 1 represents the largest possible deviation from the truth.
Figure 4 illustrates the behaviour of the boost score metric and shows that the boost score responds both to the peak position and width intuitively, making it a suitable measure of performance for this problem.
Results
To investigate the stability of the selected architecture, the trainings were repeated five times each. These five runs differ only in the random initialisation of the weights in the network. The constant behaviour of the runs indicates a good robustness of the chosen architecture. Small differences are to be expected due to the stochastic nature of the neural network.
The overall performance of the trained model is shown in Fig. 5, where the bin-wise mean score of the five runs with its respective standard error is shown. Empirically, a score below \(\approx ~0.3\), that is, covering \(70\%\) of the expected boost factor area, is sufficiently accurate to be of experimental value for Madmax boost factor predictions. For the baseline training, \(89\%\) of the predictions reach this target. For the blinded training \(74\%\) of the predictions in blinded frequency range and \(84\%\) of the predictions in the visible range meet this target.
In addition to provide a sense of the prediction quality, randomly sampled curves from the test set are shown in Fig. 6 at several boost score percentiles for both the baseline (left) and blinded (right) trainings.Footnote 1.
In general the model can reproduce the peak position and the bandwidth sufficiently well. However, the actual peak structure is typically not captured. By comparing the baseline with the blinded training we show the model is able to interpolate across masked phase space, albeit at a certain loss to performance, indicating good generalization capabilities.
A major promise of using deep learning to estimate the boost factor is computational efficiency. The analytical 1D simulation takes about \({80}~\hbox {s}\) to generate the curves from the test set, which translates to about \(\approx 320~\text {curves/s}\). Corresponding values for a much more realistic three-dimensional simulation range from \({1400}~\hbox {s}\) and \(18~\text {curves/s}\) to \(80\times 10^{6}~\hbox {s}\) and \(0.0003~\text {curves/s}\), depending on the desired accuracy. Compared to the simulation of the analytical 1D code, the network prediction is faster by about a factor of 5. Moreover so, when considering the more realistic 3D simulations, the speedup is on the order of millions to tenths of millions, depending on the selected precision. Meanwhile, the prediction speed of the network should be independent of the training data used.
Conclusion
In this paper we present a first application of deep learning methods to model the boost factor response of an axion haloscope experiment. Although the current implementation has some limitations imposed in terms of the phase space covered, it is capable of predictions accurate enough to be of experimental value, while also being very robust. Furthermore, we show that the model is able to predict into fully blinded frequency regions, demonstrating the ability to generalize, although the quality of the prediction suffers a bit.
Data Availability
This manuscript has associated data in a data repository. [Authors’ comment: The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.]
Notes
Percentiles are obtained form a sorted score distribution entries divided into 100 equal parts
References
Peccei RD, Quinn HR (1977) CP conservation in the presence of pseudoparticles. Phys Rev Lett 38:1440–1443. https://doi.org/10.1103/PhysRevLett.38.1440
Peccei RD, Quinn HR (1977) Constraints imposed by CP conservation in the presence of pseudoparticles. Phys Rev D 16:1791. https://doi.org/10.1103/PhysRevD.16.1791
Weinberg S (1978) A new light boson? Phys Rev Lett 40:223. https://doi.org/10.1103/PhysRevLett.40.223
Wilczek F (1978) Problem of strong \(p\) and \(t\) invariance in the presence of instantons. Phys Rev Lett 40:279. https://doi.org/10.1103/PhysRevLett.40.279
Preskill J, Wise MB, Wilczek F (1983) Cosmology of the invisible axion. Phys Lett B 120:127. https://doi.org/10.1016/0370-2693(83)90637-8
Abbott L, Sikivie P (1983) A cosmological bound on the invisible axion. Phys Lett B 120:133. https://doi.org/10.1016/0370-2693(83)90638-X
Dine M, Fischler W (1983) The not-so-harmless axion. Phys Lett B 120:137. https://doi.org/10.1016/0370-2693(83)90639-1
Raffelt GG (2008) Astrophysical axion bounds. Lect Notes Phys 741:51–71. https://doi.org/10.1007/978-3-540-73518-2_3arXiv:hep-ph/0611350
Kawasaki M, Saikawa K, Sekiguchi T (2015) Axion dark matter from topological defects. Phys Rev D 91(6):065014. https://doi.org/10.1103/PhysRevD.91.065014arXiv:1412.0789
Hiramatsu T, Kawasaki M, Saikawa K, Sekiguchi T (2012) Production of dark matter axions from collapse of string-wall systems. Phys Rev D 85:105020. https://doi.org/10.1103/PhysRevD.85.105020
Kolb EW, Tkachev II (1994) Nonlinear axion dynamics and the formation of cosmological pseudosolitons. Phys Rev D 49:5040–5051. https://doi.org/10.1103/PhysRevD.49.5040
Zurek KM, Hogan CJ, Quinn TR (2007) Astrophysical effects of scalar dark matter miniclusters. Phys Rev D 75:043511. https://doi.org/10.1103/PhysRevD.75.043511arXiv:astro-ph/0607341
Ballesteros G, Redondo J, Ringwald A, Tamarit C (2017) Standard model—axion—seesaw—Higgs portal inflation. Five problems of particle physics and cosmology solved in one stroke. JCAP 08:001. https://doi.org/10.1088/1475-7516/2017/08/001arXiv:1610.01639
Sikivie P (1983) Experimental tests of the “invisible” axion. Phys. Rev. Lett. 51:1415–1417. https://doi.org/10.1103/PhysRevLett.51.1415. Erratum-ibid. [15]
Sikivie P (1984) Experimental tests of the “invisible’’ axion—erratum. Phys Rev Lett 52:695. https://doi.org/10.1103/PhysRevLett.52.695.2
Asztalos SJ et al (2004) Improved rf cavity search for halo axions. Phys Rev D 69:011101. https://doi.org/10.1103/PhysRevD.69.011101
ADMX Collaboration (2018) Search for invisible axion dark matter with the axion dark matter experiment. Phys Rev Lett 120:151301. https://doi.org/10.1103/PhysRevLett.120.151301
McAllister BT et al (2017) The organ experiment: an axion haloscope above 15 GHz. Phys Dark Universe 18:67–72. https://doi.org/10.1016/j.dark.2017.09.010arXiv:1706.00209
Al Kenany S et al (2017) Design and operational experience of a microwave cavity axion detector for the 20–100 \(\mu\)ev range. Nucl Instrum Methods Phys Res Sect A Accel Spectrom Detect Assoc Equip 854:11–24. https://doi.org/10.1016/j.nima.2017.02.012
Chung W (2016) CULTASK the coldest axion experiment at CAPP/IBS in Korea. PoS 12:047. https://doi.org/10.22323/1.263.0047
MADMAX Collaboration (2019) A new experimental approach to probe QCD axion dark matter in the mass range above 40 \(\mu\)ev. Eur Phys J C 79:186. https://doi.org/10.1140/epjc/s10052-019-6683-xarXiv:1901.07401
Millar AJ, Raffelt GG, Redondo J, Steffen FD (2017) Dielectric haloscopes to search for axion dark matter: theoretical foundations. JCAP 01:061. https://doi.org/10.1088/1475-7516/2017/01/061arXiv:1612.07057
Knirck S et al (2019) A first look on 3d effects in open axion haloscopes. JCAP 08:026. https://doi.org/10.1088/1475-7516/2019/08/026arXiv:1906.02677
McDonald J (2022) Scanning the landscape of axion dark matter detectors: applying gradient descent to experimental design. Phys Rev D. https://doi.org/10.1103/physrevd.105.083010
Gao F, Han L (2012) Implementing the Nelder–Mead simplex algorithm with adaptive parameters. Comput Optim Appl 51(1):259–277. https://doi.org/10.1007/s10589-010-9329-3
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830. https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nat Cell Biol 521(7553):436. https://doi.org/10.1038/nature14539
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. International conference on machine learning. PMLR. arXiv:1502.03167
Srivastava N et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(56):1929–1958. http://jmlr.org/papers/v15/srivastava14a.html
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. Proceedings of the IEEE international conference on computer vision. arxiv:1502.01852
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. International Conference for Learning Representations. arXiv:1412.6980
Acknowledgements
Simulations were performed with computing resources granted by RWTH Aachen University under project rwth0583. We thank the Bundesministerium für Bildung und Forschung (BMBF) for the support under project numbers 05H20PARDA and 05H21PARD1.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jung, P.A., Santos, B.A.d., Bergermann, D. et al. Simulation of Dielectric Axion Haloscopes with Deep Neural Networks: A Proof-of-Principle. Comput Softw Big Sci 6, 18 (2022). https://doi.org/10.1007/s41781-022-00091-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41781-022-00091-5