Abstract
Gravitational-wave detection strategies are based on a signal analysis technique known as matched filtering. Despite the success of matched filtering, due to its computational cost, there has been recent interest in developing deep convolutional neural networks (CNNs) for signal detection. Designing these networks remains a challenge as most procedures adopt a trial and error strategy to set the hyperparameter values. We propose a new method for hyperparameter optimization based on genetic algorithms (GAs). We compare six different GA variants and explore different choices for the GA-optimized fitness score. We show that the GA can discover high-quality architectures when the initial hyperparameter seed values are far from a good solution as well as refining already good networks. For example, when starting from the architecture proposed by George and Huerta, the network optimized over the 20-dimensional hyperparameter space has 78% fewer trainable parameters while obtaining an 11% increase in accuracy for our test problem. Using genetic algorithm optimization to refine an existing network should be especially useful if the problem context (e.g., statistical properties of the noise, signal model, etc) changes and one needs to rebuild a network. In all of our experiments, we find the GA discovers significantly less complicated networks as compared to the seed network, suggesting it can be used to prune wasteful network structures. While we have restricted our attention to CNN classifiers, our GA hyperparameter optimization strategy can be applied within other machine learning settings.
Similar content being viewed by others
Notes
For simplicity, we assume here that N is even. This can always be made to be the case, since the observation time and sampling rate are free parameters in an analysis.
We use the same convention for the Fourier transform as in Ref. [83].
References
Aasi J et al (2015) Advanced LIGO. Class Quantum Gravity 32:074001
Accadia T et al (2012) Virgo: a laser interferometer to detect gravitational waves. JINST 7:P03012
Abbott BP et al (2016) Observation of gravitational waves from a binary black hole merger. Phys Rev Lett 116(6):061102
Abbott BP, Abbott R, Abbott T, Abernathy M, Acernese F, Ackley K, Adams C, Adams T, Addesso P, Adhikari R et al (2016) Binary black hole mergers in the first advanced LIGO observing run. Phys Rev X 6(4):041015
Abbott BP et al (2018) GW170104: observation of a 50-solar-mass binary black hole coalescence at redshift 0.2. Phys Rev Lett 118(22), 221101 (Erratum: Phys Rev Lett 121(12), 129901)
Abbott BP et al (2017) GW170814: a three-detector observation of gravitational waves from a binary black hole coalescence. Phys Rev Lett 119(14):141101
Abbott BP et al (2017) GW170608: observation of a 19-solar-mass binary black hole coalescence. Astrophys J 851(2):L35
Abbott BP et al (2017) GW170817: observation of gravitational waves from a binary neutron star inspiral. Phys Rev Lett 119(16):161101
Abbott BP et al (2019) GWTC-1: a gravitational-wave transient catalog of compact binary mergers observed by LIGO and virgo during the first and second observing runs. Phys Rev X 9(3):031040
Abbott BP et al (2016) GW151226: observation of gravitational waves from a 22-solar-mass binary black hole coalescence. Phys Rev Lett 116(24):241103
Abbott B, Abbott R, Abbott T, Abraham S, Acernese F, Ackley K, Adams C, Adhikari R, Adya V, Affeldt C et al (2019) Gwtc-1: a gravitational-wave transient catalog of compact binary mergers observed by LIGO and Virgo during the first and second observing runs. Phys Rev X 9(3):031040
Abbott BP et al (2018) Prospects for observing and localizing gravitational-wave transients with advanced LIGO, advanced Virgo and KAGRA. Living Rev Relat 21(1):3
Abbott B, Abbott R, Abbott T, Abraham S, Acernese F, Ackley K, Adams C, Adhikari RX, Adya V, Affeldt C et al (2019) Binary black hole population properties inferred from the first and second observing runs of advanced LIGO and advanced Virgo. Astrophys J Lett 882(2):L24
Ligo/virgo public alerts. https://gracedb.ligo.org/superevents/public/O3/
Jaranowski P, Krolak A (2012) Gravitational-wave data analysis. Formalism and sample applications: the Gaussian case. Living Rev Relat 15:4
Turin G (1960) An introduction to matched filters. IRE Trans Inf Theory 6(3):311–329
Harry I, Privitera S, Bohé A, Buonanno A (2016) Searching for gravitational waves from compact binaries with precessing spins. Phys Rev D 94(2):024012
Messick C et al (2017) Analysis framework for the prompt discovery of compact binary mergers in gravitational-wave data. Phys Rev D 95:042001
Chu Q (2017) Low-latency detection and localization of gravitational waves from compact binary coalescences. PhD thesis, University of Western Australia
Klimenko S et al (2016) Method for detection and reconstruction of gravitational wave transients with networks of advanced detectors. Phys Rev D 93:042004
Adams T et al (2016) Low-latency analysis pipeline for compact binary coalescences in the advanced gravitational wave detector era. Class Quantum Gravity 33:175012
Nitz A et al (2018) Rapid detection of gravitational waves from compact binary mergers with PyCBC Live. Phys Rev D 98:024050
George D, Huerta EA (2018) Deep neural networks to enable real-time multimessenger astrophysics. Phys Rev D 97:044039
Shen H, Huerta E, and Zhao Z (2019) Deep learning at scale for gravitational wave parameter estimation of binary black hole mergers. arXiv preprint arXiv:1903.01998
Hezaveh YD, Levasseur LP, Marshall PJ (2017) Fast automated analysis of strong gravitational lenses with convolutional neural networks. Nature 548(7669):555
Levasseur LP, Hezaveh YD, Wechsler RH (2017) Uncertainties in parameters estimated with neural networks: application to strong gravitational lensing. Astrophys J Lett 850(1):L7
Ciuca R, Hernández OF, Wolman M (2019) A convolutional neural network for cosmic string detection in CMB temperature maps. Mon Not R Astron Soc 485(1):1377–1383
Gabbard H, Williams M, Hayes F, Messenger C (2018) Matching matched filtering with deep networks for gravitational-wave astronomy. Phys Rev Lett 120(14):141103
Shen H, George D, Huerta E, Zhao Z (2017) Denoising gravitational waves using deep learning with recurrent denoising autoencoders. arXiv preprint arXiv:1711.09919
George D, Shen H, Huerta E (2017) Glitch classification and clustering for LIGO with deep transfer learning. arXiv preprint arXiv:1711.07468
George D, Huerta E (2018) Deep learning for real-time gravitational wave detection and parameter estimation: results with advanced LIGO data. Phys Lett B 778:64–70
Fort S (2017) Towards understanding feedback from supermassive black holes using convolutional neural networks. arXiv preprint arXiv:1712.00523
Gebhard TD, Kilbertus N, Harry I, Schölkopf B (2019) Convolutional neural networks: A magic bullet for gravitational-wave detection? Physical Review D 100(6)
Shen H, George D, Huerta E, and Zhao Z (2017) Denoising gravitational waves using deep learning with recurrent denoising autoencoders. arXiv preprint arXiv, vol 1711
George D, Shen H, Huerta E (2018) Classification and unsupervised clustering of LIGO data with deep transfer learning. Phys Rev D 97(10):101501
Bresten C, Jung J-H (2019) Detection of gravitational waves using topological data analysis and convolutional neural network: an improved approach. arXiv preprint arXiv:1910.08245
Lin Y-C, Wu J-HP (2020) Detection of gravitational waves using Bayesian neural networks. arXiv preprint arXiv:2007.04176
Krastev PG (2020) Real-time detection of gravitational waves from binary neutron stars using artificial neural networks. Phys Lett B 803:135330
Schäfer MB, Ohme F, Nitz AH (2020)Detection of gravitational-wave signals from binary neutron star mergers using machine learning. arXiv preprint arXiv:2006.01509
Lin B-J, Li X-R, Yu W-L (2020) Binary neutron stars gravitational wave detection based on wavelet packet analysis and convolutional neural networks. Front Phys 15(2):24602
Fan X, Li J, Li X, Zhong Y, Cao J (2019) Applying deep neural networks to the detection and space parameter estimation of compact binary coalescence with a network of gravitational wave detectors. Sci China Phys Mech Astron 62(6):1–8
Chua AJ, Vallisneri M (2020) Learning Bayesian posteriors with neural networks for gravitational-wave inference. Phys Rev Lett 124(4):041102
Gabbard H, Messenger C, Heng IS, Tonolini F, Murray-Smith R (2019) Bayesian parameter estimation using conditional variational autoencoders for gravitational-wave astronomy. arXiv preprint arXiv:1909.06296
Green SR, Simpson C, Gair J (2020) Gravitational-wave parameter estimation with autoregressive neural network flows. arXiv preprint arXiv:2002.07656
Wei W, Huerta E (2020) Gravitational wave denoising of binary black hole mergers with deep learning. Phys Lett B 800:135081
Khan A, Huerta E, Das A (2020) Physics-inspired deep learning to characterize the signal manifold of quasi-circular, spinning, non-precessing binary black hole mergers. Phys Lett B 808:135628
ul Islam B, Baharudin Z, Raza MQ, Nallagownden P (2014) Optimization of neural network architecture using genetic algorithm for load forecasting. In: 2014 5th international conference on intelligent and advanced systems (ICIAS). IEEE, pp 1–6
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
SageMaker. https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
Hamdia KM, Zhuang X, Rabczuk T (2020) An efficient optimization approach for designing machine learning models based on genetic algorithm. Neural Comput Appl 33:1–11
Normandin ME, Mohanty S, Weerathunga TS (2018) Particle swarm optimization based search for gravitational waves from compact binary coalescences: performance improvements. Phys Rev D 98:044029
Abbott BP et al (2016) Astrophysical implications of the binary black-hole merger GW150914. Astrophys J 818(2):L22
Maggiore M (2008) Gravitational waves, vol 1, 1st edn. Oxford University Press, New York
Owen BJ (1996) Search templates for gravitational waves from inspiraling binaries: choice of template spacing. Phys Rev D 53:6749–6761
Brown D (2004) Searching for gravitational radiation from binary black hole MACHOs in the galactic halo. PhD thesis, University of Wisconsin–Milwaukee
Cutler C, Flanagan EE (1994) Gravitational waves from merging compact binaries: how accurately can one extract the binary’s parameters from the inspiral wave form? Phys Rev D 49:2658
Romano JD, Cornish NJ (2017) Detection methods for stochastic gravitational-wave backgrounds: a unified treatment. Living Rev Relat 20(1):2
Wainstein LA, Zubakov VD (1962) Extraction of signals from noise. Prentice-Hall, Englewood Cliffs
Allen B, Anderson WG, Brady PR, Brown DA, Creighton JD (2012) FINDCHIRP: an algorithm for detection of gravitational waves from inspiraling compact binaries. Phys Rev D 85:122006
Newman ET, Penrose R (1966) Note on the Bondi–Metzner–Sachs group. J Math Phys 7:863–870
Goldberg JN, Macfarlane AJ, Newman ET, Rohrlich F, Sudarshan ECG (1967) Spin-\(s\) spherical harmonics and \(\eth\). J Math Phys 8(11):2155–2161
Blackman J, Field SE, Galley CR, Szilágyi B, Scheel MA, Tiglio M, Hemberger DA (2015) Fast and accurate prediction of numerical relativity waveforms from binary black hole coalescences using surrogate models. Phys Rev Lett 115:121102
Gwsurrogate. https://pypi.python.org/pypi/gwsurrogate/
Field SE, Galley CR, Hesthaven JS, Kaye J, Tiglio M (2014) Fast prediction and evaluation of gravitational waveforms using surrogate models. Phys Rev X 4:031006
Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc Lond A 231(694–706):289–337
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization
Hoffer E, Hubara I, Soudry D (2017) Train longer, generalize better: closing the generalization gap in large batch training of neural networks. arXiv preprint arXiv:1705.08741
Smith SL, Kindermans P-J, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
Bäck T, Fogel DB, Michalewicz Z (2018) Evolutionary computation 1: basic algorithms and operators. CRC Press
Yin D, Kannan R, Bartlett P (2019) Rademacher complexity for adversarially robust generalization. In: International conference on machine learning, pp 7085–7094
Fortin F-A, De Rainville F-M, Gardner M-A, Parizeau M, Gagné C (2012) DEAP: evolutionary algorithms made easy. J Mach Learn Res 13:2171–2175
Thangiah SR, Osman IH, Sun T (1994) Hybrid genetic algorithm, simulated annealing and tabu search methods for vehicle routing problems with time windows. Computer Science Department, Slippery Rock University, Technical report SRU CpSc-TR-94-27, vol 69
Gandomkar M, Vakilian M, Ehsan M (2005) A combination of genetic algorithm and simulated annealing for optimal dg allocation in distribution networks. In: Canadian conference on electrical and computer engineering, 2005. IEEE, pp 645–648
Park T, Ryu KR (2010) A dual-population genetic algorithm for adaptive diversity control. IEEE Trans Evol Comput 14(6):865–884
Sharapov R, Lapshin A (2006) Convergence of genetic algorithms. Pattern Recognit Image Anal 16(3):392–397
Eiben AE, Aarts EH, Van Hee KM (1990) Global convergence of genetic algorithms: a Markov chain analysis. In: International conference on parallel problem solving from nature. Springer, pp 3–12
Cerf R (1998) Asymptotic convergence of genetic algorithms. Adv Appl Probab 30(2):521–550
Finn LS (1992) Detection, measurement, and gravitational radiation. Phys Rev D 46:5236
Gray RM (2006) Toeplitz and circulant matrices: a review. Found Trends Commun Inf Theory 2(3):155–239
Allen B (2005) A chi**2 time-frequency discriminator for gravitational wave detection. Phys Rev D 71:062001
Acknowledgements
We would like to thank Prayush Kumar, Jun Li, Caroline Mallary, Eamonn O’Shea, and Matthew Wise for helpful discussions, and Vishal Tiwari for writing scripts used to compute efficiency curves. S. E. F. and D. S. D. are partially supported by NSF Grant PHY-1806665 and DMS-1912716. G.K. acknowledges research support from NSF Grants Nos. PHY-1701284, PHY-2010685 and DMS-1912716. All authors acknowledge research support from ONR/DURIP Grant No. N00014181255, which funds the computational resources used in our work. D. S. D. is partially supported by the Massachusetts Space Grant Consortium.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Fourier transform and inner product conventions
We summarize our conventions, which vary somewhat in the literature. Given a time domain vector, \({\mathbf {a}}\), the discrete version of the Fourier transform of \({\mathbf {a}}\) evaluated at frequency \(f_p = p/T\) is given by
where \(0 \le p \le N-1\). Notice that the zero frequency (\(f_p=0\)) corresponds to \(p = 0\), positive frequencies (\(0< f_p < f_s / 2\)) to values in the range \(0 < p \le N/2\), and negative frequencies (\(- f_s / 2 \le f < 0\)) correspond to values in the range \(N/2< p < N\). This follows from the usual assumptions that the signal is both periodic in the observation duration, \({a}(t) = {a}(t \pm T)\), and compactly supported, \({\tilde{a}}(f) = 0\) for \(|f| \ge f_s / 2\), where \(f_s = 1 / \Delta t\) is the sampling rate and \(f_s / 2\) is the Nyquist frequency. Consequently, the Fourier transformed signal is periodic in k with a period of N, \({\tilde{a}}(f_k) = {\tilde{a}}(f_k \pm N \Delta f)\). The value \(p = N/2\) corresponds to the Fourier transform at the maximum resolvable frequencies, \(-f_s/2\) and \(f_s/2\), for a given choice of \(\Delta t\).
Given the Fourier transformed data, \({\tilde{a}}\) and \({\tilde{b}}\), the noise-weighted inner product \(\langle \cdot , \cdot \rangle\) between \({\tilde{a}}\) and \({\tilde{b}}\) is defined as
Notice that by convention the inner product is defined with an overall factor of 2, but unlike Eq. 6 the full set of positive and negative frequencies are used. The continuum limit (\(\Delta f \rightarrow 0\)) of the summation makes clear that this is a (discretized) inner product between a(f) and b(f) over the domain \(|f| \le f_s /2\). Note that because the time-domain signal is real the Fourier transformed signal satisfies \({\tilde{a}}^*(f) = {\tilde{a}}(-f)\). As a result, the inner product expression can be “folded-over”
which now features an integral over the positive frequencies and shows the inner product to be manifestly real. We then arrive at Eq. 6. This motivates the use of the term “inner product” when discussing Eq. 6 despite the fact that when taken at face value it does not satisfy the usual properties of an inner product while Eq. (22) does. Finally, some authors set the noise at the Nyquist frequency to 0 (see, for example, Ref. [59] discussion after Eq. 7.1.) frequency.
Appendix 2: Derivation of conditional probabilities used in likelihood-ratio test
A derivation of the standard inner product used in gravitational-wave analyses can be found in Ref. [81], which makes use of methods laid out in Ref. [58]. Here, we provide a brief derivation to highlight some of the assumptions that go into the classical filter.
In the absence of a signal, we assume that the detector is a stochastic process that outputs Gaussian noise with zero mean. The likelihood that some observed output \({\mathbf {s}}\) is purely noise is therefore given by a N-dimensional multivariate normal distribution
where \(\varvec{\Sigma }\) is the covariance matrix of the noise, and \(\det \varvec{\Sigma }\) is its determinant.
It is also common to assume that the noise is wide-sense stationary and ergodic. This is generally true on the time scales that a gravitational-wave from a compact binary merger passes through the sensitive band of the detector (\(\sim \max {\mathcal {O}}(100\,\mathrm {s})\)). In that case, \(\varvec{\Sigma }\) is a real symmetric Toeplitz matrix with elements
where
is the autocorrelation function of the data.
There is no general, analytic solution for \(\varvec{\Sigma }^{-1}\). However, if \(R_{ss}\rightarrow 0\) in finite time \(\tau _{\max }\) and the observation time \(T > 2\tau _{\max }\) (i.e., \(\lceil N/2 \rceil > \lceil \tau _{\max }/\Delta t \rceil\)), then \(\varvec{\Sigma }\) is nearly a circulant matrix; it only differs in the upper-right and lower-left corners. All circulant matrices, regardless of the values of their elements, have the same eigenvectors [82]
We make the approximation that \(\varvec{\Sigma }\) is circulant and use these eigenvectors to solve the eigenvalue equation, yielding
(The \({\mathfrak {R}}\) arises because the covariance is real and symmetric.) The error in this approximation decreases with increasing observation time; indeed, the eigenvalues of \(\varvec{\Sigma }\) asymptote to Eq. 27 as \(N \rightarrow \infty\) [82]. The autocorrelation function of ground-based gravitational-wave detectors \(\approx 0\) for \(\tau > {\mathcal {O}}(10\,\mathrm {ms})\). Since the observation time for a gravitational wave is \(>{\mathcal {O}}(\mathrm {s})\), this approximation is valid in practice.
We recognize Eq. 27 as \(1/\Delta t\) times the real part of the discrete Fourier transform of \(R_{ss}[p]\).Footnote 2 Therefore, via the Wiener–Khinchin theorem,
where \(S_n[p]\) is the discrete approximation of the power spectral density (PSD) of the noise at frequency \(p/T \equiv p \Delta f\). Since the matrix of eigenvectors \({\mathbf {U}}\) are unitary, we have
To go from the second to the third line, we have substituted \(1/N = \Delta f \Delta t\) and have made use of the fact that \(S_n[p]\) is symmetric about N/2; \(c_{jk}\) depends only on the \(p=0\) and \(p=N/2\) terms, which correspond to the DC and Nyquist frequencies, respectively.
Gravitational-wave detectors have peak sensitivity within a particular frequency band \([f_0, f_{\max }]\) (for current generation detectors, this is \(f \sim [20, 2000]\,\)Hz). Outside of this range we can effectively treat the PSD as being infinite, making all terms in Eq. (29) with \(p < \lfloor f_0 / \Delta f \rfloor \equiv p_0\) zero. Likewise, if we choose a sample rate \(1/\Delta t > 2 f_{\max }\), then the Nyquist term is also effectively zero. The exponential term in the likelihood is therefore
In going from the first to the second line, we have again recognized the sums over j, k as the discrete Fourier transforms over the real time-series data. We can further simplify this by defining the inner product Eq. (6), yielding Eq. (5) for the likelihood.
Appendix 3: How to generate Gaussian noise
Somewhat surprisingly, we are unaware of a resource that describes how to implement Eq. (4) to generate time-domain noise realizations. When implementing this expression one encounters sufficiently many subtleties that we will summarize our recipe here.
Eq. (4) specifies the statistical properties satisfied by the Fourier coefficients of the noise. Note that in the literature similar expressions for the discrete Fourier transform coefficients are sometimes given, which differs from ours.
Since the frequency-domain noise, \({\tilde{n}}(f_i)\), is complex, we need to be careful when sampling the real and imaginary parts. For example, if the desired property is \(\langle {\tilde{n}}^*(f_i) {\tilde{n}}(f_j) \rangle =\delta _{ij}\), then
which gives
Furthermore, for real time-domain functions we have \({\tilde{n}}^*(f) = n(-f)\) and so only the non-negative frequencies are independently sampled. When \(f=0\), this condition implies that n(0) is real, whence \({\tilde{n}}(0) \sim {\mathcal N}(0,1)\). A similar property holds at the Nyquist frequency.
The neural networks considered in this paper use time-domain data. Synthetic time-domain noise realizations are constructed by taking an inverse Fourier transform of our frequency domain noise. In the time-domain, Eq. (4) becomes,
which follows directly from Eq. (4) and properties of the Fourier transform. We found Eq. (32) to be an indispensable sanity test of our time-domain noise realizations.
Rights and permissions
About this article
Cite this article
Deighan, D.S., Field, S.E., Capano, C.D. et al. Genetic-algorithm-optimized neural networks for gravitational wave classification. Neural Comput & Applic 33, 13859–13883 (2021). https://doi.org/10.1007/s00521-021-06024-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06024-4