1 Introduction

Observations of gravitational waves (GWs) from compact binary coalescences through the LIGO-Virgo-KAGRA interferometric network [1,2,3] are revolutionising our understanding of the Universe, from black hole (BH) physics to astrophysics, cosmology and fundamental physics [4,5,6,7]. As GWs from the inspiral and plunge phases carry information about the long-range dynamics of two coalescing BHs, the merger and ringdown (RD) regimes provide unique access to extreme curvatures and strong fields, being arguably the best route to investigate the nature of BHs and challenge our current understanding of gravity [5, 8,9,10,11,12,13].

The linear theory of BH perturbations [14,15,16,17,18,19,20] predicts that that the main RD contribution consists of a sum of spacetime vibrations known as quasinormal modes (QNMs), which are exponentially damped harmonic oscillations [21,22,23,24,25]. The excitation of different modes depends on the specific process that perturbs the BH, and for remnants from quasicircular binary coalescences only one long-lived mode typically dominates the RD emission. Asymmetries in the system are responsible for the excitation of subdominant higher angular modes (HMs) [26,27,28,29,30,31,32,33,34,35,36]. The measurement of the frequencies and damping times of these HMs allows to directly question the BH paradigm by testing the Kerr “final state conjecture”Footnote 1 [37,38,39,40,41,42,43,44,45,46,47]. However, the observation of HMs in ringdown signals is extremely challenging at current detectors sensitivities, and their characterisation is further complicated by several delicate issues affecting RD analyses.

In fact, current studies based on QNM superpositions rely on the assumption that, at late enough times during the postmerger relaxation process, the spacetime is well-described by linear perturbation theory. This assumption manifests itself in, e.g., a constant final mass and spin, and of (asymptotically) constant ringdown complex amplitudes.Footnote 2 Indeed, the merger and early postmerger phases are known to contain nonlinear features [52], and a sufficient amount of time needs to pass before the linear regime starts to hold and the above assumptions are justified [29, 30, 53,54,55]. Such complications are among the motivations behind the large amount of recent progress in modelling higher-order effects, which might soon help to extend ringdown models for comparable-mass binaries closer to the signal peak [49, 56,57,58,59,60,61]. Although techniques to estimate the (signal-to-noise ratio dependent) start of the linear regime validity have been developed on both numerical relativity (NR) simulations [55] and real data [62, 63], the applicability time-window of QNM superpositions remains the most relevant systematic in most RD analyses.

To address this problem, immediately after the first binary numerical evolutions [64,65,66], RD models going beyond QNM superpositions have been developed, e.g. [67]. The time-dependence of the BH parameters and amplitudes is now phenomenologically included through a flexible function with several free parameters, thus relaxing the requirement to describe the process from first principles. In particular, such models are required to complete effective-one-body (EOB) [68,69,70] waveforms beyond merger, and in Refs. [71,72,73,74,75,76,77] accurate templates capable of modelling the entire post-merger emission were presented. Such models allow to include larger portions of the GW signal compared to a superposition of QNMs with constant amplitudes, increasing the available signal-to-noise ratio (SNR). Although they are less sensitive to certain exotic deviations from a Kerr relaxation, by including more information and being less agnostic about the physics involved, they allow to extract a larger amount of information from the data. As extensively discussed in Ref. [78], accounting for such nonlinear contributions is of paramount importance. Failing to do so leads to overfitting, misinterpreting the physical QNM content of the signal, and in turn to faulty QNM “detections” plagued by systematic uncertainties caused by the inappropriate assumption of constant amplitude models close to the signal peak.

In this work, we rely on an accurate phenomenological model covering the entire post-peak emission, TEOBPM [71, 72, 75, 76], and use it to perform a comprehensive Bayesian analysis of the third catalog of GW events, GWTC-3 [4], using pyRing [79], a python software tailored to perform a time-domain Bayesian analysis on BH ringdown signals. Our analysis allows to systematically assess the detectability of HMs in the RD at current sensitivities, extending previous LVK ringdown-only studies which relied on models based on pure QNM superpositions [5, 8, 10]. Unlike such RD models, TEOBPM includes information from the progenitors, allowing inference on parameters such as the initial masses and spins, and incorporates time-dependent models for the modes amplitudes tuned to NR, thus providing a bridge between inspiral-merger-ringdown (IMR) data and RD-only analyses. This is similar in spirit to the strategy employed in Ref. [80]. Here, we improve on the latter by: (i) using the complete time-domain likelihood, overcoming the circularity approximation; (ii) using a time-domain model with an explicit parameterisation of all QNM-related quantities, allowing for a more direct physical interpretation of the constrains obtained; (iii) incorporating the HMs content; (iv) crucially including the uncertainty in the ringdown validity regime.

We report small positive Bayes factors (BF) in favour of the \((l,m)=(3,3)\) [\((l,m)=(2,1)\)] mode in the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 [GW190521_074359, GW191109_010717], and study in details the robustness of the result by varying the starting time of the analysis. We also introduce fractional shifts in the QNMs spectrum to study potential deviations from general relativity (GR) in the RD, discussing in detail tests of the final state conjecture under the assumption of the presence of two angular modes. Our results support encouraging prospects for the positive observation of HMs in GW signals observed in the ongoing LVK observing run (O4) [3, 81,82,83].

In Sect. 2, we discuss the basics of multimodal RD waveforms from an observational perspective. In Sect. 3, we describe our implementation of the TEOBPM RD model. In Sect. 4, we introduce a semi-analytical procedure to predict the detectability of different HMs. In Sect. 5 we discuss the details of our Bayesian RD analysis and collect the parameter estimation (PE) results on GWTC-3, comparing them with the results obtained by the LVK Collaboration, both in the IMR and RD regimes. We also describe our procedure to robustly search for HMs, including the systematic uncertainty in the ringdown starting time. In Sect. 6, we introduce fractional shifts in the QNMs spectrum to study potential deviations from GR in the RD, discussing in detail tests of the final state conjecture under the assumption of the presence of two angular modes. We conclude and discuss future prospects in Sect. 7.

2 Ringdown models

The linear gravitational perturbations of a Kerr BH are described by the Teukolsky equation [84, 85], with the main GW emission component captured by

$$\begin{aligned} \begin{aligned} h_{+} -i h_{\times }&= \frac{M}{d_L} \sum _{lm} h_{lm} Y_{lm}(\iota , \varphi ),\\ h_{lm}&\equiv A_{0,lm}\,e^{-\sigma _{lm}(t-t_0)+i\phi _{0,lm}}, \end{aligned} \end{aligned}$$
(1)

where \(M \equiv m_1 + m_2\) is the total initial mass of the binary, \(d_L\) is the luminosity distance, \(Y_{lmn}\) are spin-weighted spherical harmonics (SWSH). The plus and cross (gauge-invariant) polarizations of the GW waveform are denoted by \(h_{+}\) and \(h_{\times }\). We define the QNMs complex frequencies as

$$\begin{aligned} \sigma _{lm}\equiv \alpha _{lm}+i\omega _{lm}, \end{aligned}$$
(2)

where \(\textrm{Re}\{\sigma _{lm}\} = \alpha _{lm} \equiv 1/\tau _{lm}\) is the inverse of the damping time, \(\tau _{lm}\), while \(\textrm{Im}\{\omega _n\} = \omega _{lm}\) is the oscillation frequency. \(A_{0,lm}\) and \(\phi _{0,lm}\) are the constant amplitudes and phases of the different modes. Note that in Eq. (1) we are neglecting the overtone expansion of each angular mode, since the contribution of overtones in the linear regime (where they can be confidently identified) is negligible compared to the longest-lived angular modes [78, 86], on which we will focus.

We will refer to the mode \((l,m)=(2,2)\) as the fundamental mode, and to those with \((l,m)\ne (2,2)\) as higher modes (HMs). The values of \(\omega _{lmn}\) and \(\tau _{lmn}\) are known semi-analytically and can be computed numerically given the mass and (dimensionless) angular momentum \(\left\{ M_f, a_f\right\} \) of the final BH [86,87,88]. On the other hand, \(A_{0,lmn}\) depend on the specific process generating the perturbation, and are not known analytically for a binary merger of two BHs of comparable mass. For example, for quasicircular equal-masses binaries, the dominant contribution in the RD is given by the fundamental mode (2, 2), with leading subdominant contributions coming from the modes \(\left\{ (3,3),(2,1),(3,2)\right\} \), see Refs. [27,28,29,30,31,32, 34, 35, 58]. HMs are excited by asymmetries in the system [26], e.g. by increasing the mass ratio \(q\equiv m_1/m_2\) (\(m_1\ge m_2\)) or the adimensional spins \(\chi _{1,2}\equiv J_{1,2}/m_{1,2}^2\) of the two orbiting BHs. In addition to the properties of the source, the actual content of the modes present in the RD also depends on the SWSHs, which further modulate different modes based on the geometry of the system. In particular, the overall amplitude of the modes strongly depends on the inclination angle \(\iota \), defined as the angle between the direction of the orbital angular momentumFootnote 3 and that of the observer.

The extraction of the QNM content from a GW signal, also known as “black hole spectroscopy” [12, 13, 25, 89,90,91], is performed using various sets of models similar to Eq. (2), which can be classified by the amount of information included (i.e. of assumptions). The models with the fewest assumptions will be the most generic, capable of detecting even large deviations from GR predictions. On the other hand, because they contain little information, such models provide less precise constraints on the deviation parameters and are the least sensitive to small GR deviations. These classes of RD models range from pure superposition of damped sinusoids with unknown complex frequencies and amplitudes (i.e. with \(\sigma _{lm}\) in Eq. (2) an unknown parameter inferred from the data) [5, 8, 10, 62, 78], to templates including only perturbative predictions on the QNM spectrum [31, 32, 34, 62, 92,93,94,95,96,97] but no information on the QNM amplitudes (i.e. \(\sigma _{lm}\) constrained by GR predictions, but \(A_{0,lmn}\) a free parameter in Eq. (2)), to templates that incorporate numerical predictions of the quasinormal amplitudes (i.e. both \(\sigma _{lm}\) and \(A_{0,lmn}\) constrained by GR predictions). The latter assume that the remnant black hole forms from a binary merger, and either incorporate the explicit amplitude dependence on binary parameters [27,28,29,30, 34, 54, 80, 98,99,100,101] or only partial information on the relative amplitudes excitation [96]. Furthermore, one can include numerical predictions for early-times non-linearities by constructing phenomenological models for the full postmerger signal by promoting \(A_{0,lmn}\) to a time-dependent quantity, which is achieved by TEOBPM, but still without accounting for pre-merger data in the analysis. The class of models with the largest amount of information (hence the most accurate, but less agnostic ones) are pSEOB-like templates [102,103,104,105,106], where even the pre-merger signal is included (currently under the hypothesis that pre-merger data are correctly described by GR), and deviations are allowed only in the QNM spectrum. All these models should be regarded as complementary, and answering different questions. For example, models that impose QNM spectral predictions might miss deviations induced by the non-Kerr nature. Models that impose amplitudes predictions coming from binary inspirals may be biased if the orbital dynamics is not captured by quasicircular models (due to orbital eccentricity, precession, environmental effects etc.). In particular, the pSEOB class of templates, within its current implementation, is expected to be the most sensitive to deviations that abruptly appear around the merger phase, e.g. triggered by high-curvature dynamical couplings, similar to dynamical scalarisation [107]. These classes of scenarios would predict an inspiral signal close to the GR prediction, but a merger-ringdown phase that is rather different. Instead, searches based on ringdown-only templates that exclude pre-merger data, such as TEOBPM that is used in this study, will produce less biased results for scenarios where deviations are also present in the inspiral. Clearly, the most sensitive and accurate test would be performed using coherent IMR models in specific alternative scenarios. Recent progress has been made in constructing such models, both analytically [108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135] and numerically [136,137,138,139,140]. However, the large classes of possible deviations from GR makes this a daunting task, and naturally calls for generic models and parameterisations [141,142,143,144,145,146,147,148] capable of capturing deviations from several classes of modifications.

3 The TEOBPM ringdown model

In this work we focus on a specific RD model, which constitutes the postmerger part of state-of-the-art EOB [73, 74] and phenomenological time-domain models [149,150,151]. The tidal effective one-body post-merger (TEOBPM) model was first introduced in Ref. [71], as an NR-informed analytical representation of the postmerger waveform from BH coalescences. The model has been further improved in [72, 75, 76, 152] as the postmerger part of the TEOBResumS waveforms family. Here, we briefly review the model construction, and highlight its characteristics relevant to RD analyses.

We work with the expansion coefficients \(h_{lm}\) in Eq. (1), which are the output of NR simulations [153]. We want our model to be defined from the peak of the full IMR waveform, that we define as \(t_0 \equiv t_{22}^{peak}\). If the entire postmerger phase would be linear, we could directly apply Eq. (1) to describe the entire RD with an appropriate mode combination, and constant amplitudes [78]. Since the early times contain non-linearities, while Eq. (1) is obtained from linear perturbation theory, additional contributions are required to extend the template to the peak of the waveform. Thus, the strategy adopted in TEOBPM is to factorise the QNMs linear contribution from each mode \(h_{lm}\), and define the QNM-rescaled ringdown waveform \({\bar{h}}_{lm}\)

$$\begin{aligned} h_{lm} \equiv e^{-\sigma _{lm}(t-t_0)-i\phi _{0,lm}}\,{\bar{h}}_{lm}, \end{aligned}$$
(3)

containing all the non-linear contributions. To model non-linearities, we decompose the QNM-rescaled waveform \({\bar{h}}_{22}\) into a generic complex number, with time-dependent amplitude and phase,

$$\begin{aligned} {\bar{h}}_{22}(\tau ) \equiv A_{{\bar{h}}_{22}}(\tau )e^{+i\phi _{{\bar{h}}_{22}}(\tau )} . \end{aligned}$$
(4)

The functions \(A_{{\bar{h}}_{22}}\) and \(\phi _{{\bar{h}}_{22}}\) are informed from NR simulations. Here, we will not further describe the details of the procedure to produce these fits, and we refer the interested reader to Refs. [75, 76].

The above procedure is applied to each mode. Nevertheless, the time at which different modes peak will not necessarily be the same. That is, in general, \(t_{lm}^{\text {peak}}\) will not coincide with the merger time, defined as the peak of the fundamental mode \(t_{0}=t_{22}^{\text {peak}}\). The time delay \(\Delta t_{lm}\) is then defined as the difference between the peak of the fundamental mode (2, 2) and the peak of each mode

$$\begin{aligned} \Delta t_{lm} \equiv t_{lm}^{\text {peak}}-t_{22}^{\text {peak}} \equiv t_{lm}^{\text {peak}}-t_{0} . \end{aligned}$$
(5)

Consequently, for each HM included in the model, one additional parameter \(\Delta t_{lm}\) needs to be fit on NR simulations. From Eq. (1), the expansion coefficients for each mode are then

$$\begin{aligned} h_{lm}(t+\Delta t_{lm})= & {} e^{-\sigma _{0,lm}t-i\phi _{0,lm}}\,{\bar{h}}_{lm}\nonumber \\= & {} e^{-\sigma _{0,lm}t-i\phi _{0,lm}}\,A_{{\bar{h}}_{lm}}(t+\Delta t_{lm})\nonumber \\{} & {} e^{+i\phi _{{\bar{h}}_{lm}}(t+\Delta t_{lm})} . \end{aligned}$$
(6)

Further details on our implementation of TEOBPM can be found in Appendix A.

3.1 Characteristics of TEOBPM

Having discussed how TEOBPM is constructed and how some of its internal degrees of freedom are fixed from NR, we list the remaining free parameters of the model. Being \(\Theta _{lm}\) the parameter space on which TEOBPM is defined, \(\Theta _{lm}\) is parametrized by \({\varvec{\theta }}=\left\{ {\varvec{\theta }}_{I},{\varvec{\theta }}_{E}\right\} \in \Theta _{lm}\), where

$$\begin{aligned} \begin{aligned}&\textit{intrinsic}:\quad {\varvec{\theta }}_{I} \;\equiv \left\{ m_1, m_2, \chi _1, \chi _2\right\} ,\\&\textit{extrinsic}:\quad {\varvec{\theta }}_{E}\equiv \left\{ \alpha , \delta , \psi , \iota , d_L, t_0, \phi _{0,22}, \phi _{0,lm}\right\} . \end{aligned} \end{aligned}$$
(7)

Thus, TEOBPM is a ringdown model with 11 parameters when only the fundamental mode is used, and one additional parameter \(\phi _{0,lm}\) for each HM included. Note, however, that some of the extrinsic parameters are typically fixed from IMR results in ringdown-only analyses either because of degeneracies or to fix the analysis segment, so that the actual number of free parameters is smaller. Similarly, the orbital phase \(\varphi \) entering the spherical harmonics is not included due to the complete degeneracy with single-modes phases. This differs from usual RD models, which are typically parametrized in terms of remnant properties, such as the final mass and dimensionless spin \(\left\{ M_f, a_f\right\} \) or the QNMs complex frequencies \(\left\{ \omega _{lm}, \tau _{lm}\right\} \). This parametrization allows TEOBPM to describe ringdown characteristics in terms of the progenitors properties. These choices make the model particularly sensitive to GR deviations, as discussed below, at the cost of being less agnostic about the nature of the RD process, a fact that is particularly relevant for tests of general relativity (TGR).

Below, we highlight the advantages and the limits of TEOBPM.

Advantages:

  • Starting time The model is defined from the peak of the fundamental mode (2, 2), which can be estimated from IMR analyses, either modelled [9] or unmodelled [154]. Consequently, the uncertainty in determining the starting time of our RD analysis depends only on the width of the IMR peaktime posterior distribution. Instead, more agnostic analyses based on superposition of damped sinusoids that do not include near-peak contributions, additionally require to identify a reference time at which the pure-QNM description of the data starts to be valid [54, 55]. Since the system is evolving and thus the GW frequency is changing with time (as the remnant mass and spin, see e.g. [78]), a pure-QNM description will never be exact. Hence, for pure damped-sinusoids, an adequate reference time will be chosen in such a way that the systematic uncertainties due to non-stationary QNM contributions are much smaller than the statistical uncertainties due to the finite SNR. At present detectors sensitivity, depending on the accuracy of the model, this happens in the range \([10,15] \, M\) [5, 10, 54].

  • Amplitudes Since the initial amplitudes \(A_{0,lm}\) of the different modes are set by a fit to NR, the model includes the information on the excitation of the various modes as a function of the progenitor parameters (similar to e.g. the Kerr_HMs model [29, 30] used in Refs. [5, 10, 155, 156], also labeled “MMRDNP” in pyRing).

  • Non-linearities As the model includes the entire postmerger part of the signal, it uses more data with high SNR with respect to linear RD models, providing more sensitive results (compared to e.g. the “MMRDNP” model [29, 30]). This makes TEOBPM particularly suitable for characterising HMs.

Limitations:

  • Precession and eccentricity   The model is informed on NR simulations of quasicircular binary BHs with spins aligned or anti-aligned with the orbital angular momentum. As a result, it is only valid for events in which both precession and eccentricity are negligible.

  • Mode-mixing When the gravitational radiation is decomposed into SWSHs instead of spin-weighted spheroidal harmonics, modes with the same m and different l are affected by mode mixing [53, 157, 158]. Among the most relevant modes, the (3, 2) mixes with the fundamental mode, and this effect is not currently included in TEOBPM. Although we expected the contribution of the mode (3, 2) to be negligible at current sensitivity, see Appendix D, this issue should be addressed in the future. This contribution has recently been included in the v5 version of the SEOB model [77, 159,160,161,162], applied to spectroscopic analyses in the context of LISA in Ref. [106].

  • Accuracy The accuracy of the model has been extensively studied on NR simulations, with some of the results reported in Appendix B. We observe single-mode mismatches of \(O(10^{-3})\) for the fundamental mode (2, 2), and \(O(10^{-2})\) for the \(\left\{ (3,3),(2,1)\right\} \) modes, in agreement with the literature [76]. Others subdominant modes \(\left\{ (3,2),(4,4),(4,3),(4,2)\right\} \) show a worse performance, but their overall contribution to the waveform can be safely ignored at current sensitivity. Hence, we decided not to include these modes in the analysis and leave a systematic study of their inclusion to future work. Note that since the (2, 2) is by far the dominant mode in the systems under consideration, the mismatch of the total waveform (summing all the modes) receives only small contributions from subdominant modes, so the total mismatch is typically close to that of the fundamental mode.

  • \(\Delta t_{lm}\)   As the HMs start at different times compared to the fundamental, a ringdown-only implementation of the model is systematically missing the contribution of HMs during the time interval \(\Delta t_{lm}\), i.e. until the mode has started. Note that in the corresponding IMR version, this part of the signal comes from the inspiral dynamics and is therefore present. This results in an intrinsic limitation in the current implementation of TEOBPM in pyRing. Although we have found this effect to be negligible at current sensitivities based on the analysis of simulated signals, it may have an impact for future studies at higher sensitivities. It could be addressed, for example, by constructing time-dependent amplitude fits starting from a common peak time for all HMs.

  • \(\phi _{0,lm}\) The presence of an additional parameter for the initial phase of each HM impacts the BF by increasing the prior volume, systematically penalising the hypotheses with higher number of modes (Ockham’s razor). This problem could be solved by fixing the initial phases on NR in future refinements of the model. This implies that our current implementation leads to more conservative results when computing BFs for the presence of HMs.

4 Higher modes detectability

We now turn to the investigation of HMs observability in the RD of current GW events. To answer this question, we develop a semi-analytical procedure to predict the detectability of the different HMs over the fundamental mode (2, 2). These techniques have been first introduced in the context of TGRs to study the detection of GR deviations in the high SNR limit [144, 163, 164].

Fig. 1
figure 1

Colormap of the optimal signal-to-noise ratio needed to measure a logarithmic Bayes factor of \(\ln {{\mathcal {B}}_{lm,22}}=3\), in favour of the mode (3, 3) (left) and (2, 1) (right) over the fundamental mode. The results are for two nonspinning initial black holes \(\chi _1=\chi _2=0\), and are expressed as function of the mass ratio q and inclination \(\iota \). The dashed regions correspond to \(\text {SNR}>1000\)

In general, we define the time-domain scalar product weighted with the inverse autocovariance matrix as

$$\begin{aligned} \langle {\textbf{a}}\vert {}{\textbf{b}}\rangle \equiv {\textbf{a}}^T {\mathbb {C}}^{-1}_{{\textbf{n}}{\textbf{n}}} \bar{{\textbf{b}}}, \end{aligned}$$
(8)

where \({\textbf{a}}\) and \({\textbf{b}}\) are two time series, and \({\mathbb {C}}_{{\textbf{n}}{\textbf{n}}}\) is the autocovariance matrix of the detector noise. Specifically, we use an inverse Fourier Transform of the simulated power spectral density (PSD) from the advanced LIGO design configuration [165] to estimate the autocovariance matrix. Note that the FF only depends on the profile of the PSD, as Eq. (9) in invariant under any rescaling of the autocovariance matrix. The fitting factor (FF) is then defined as

$$\begin{aligned} \text {FF} \equiv \max _{{\varvec{\theta }}\in \Theta _{lm}} \left\{ \frac{\langle {\textbf{h}}_{lm}\vert {\textbf{h}}_{22}\rangle ^2}{\langle {\textbf{h}}_{lm}\vert {\textbf{h}}_{lm}\rangle \langle {\textbf{h}}_{22}\vert {\textbf{h}}_{22}\rangle }\right\} , \end{aligned}$$
(9)

where \({\varvec{\theta }}\) are the parameters of the model, and \(\Theta _{lm}\) is the parameter space over which the model including HMs is defined. As proven in Appendix C, the FF is closely related to the optimal signal-to-noise ratio (\(\text {SNR}_{\text {opt}}\)), see Appendix C for the definition of the latter quantity.

Suppose we have measured some data \({\textbf{d}}\), and want to understand whether a given HM present in the signal can be distinguished from the fundamental mode. This is a model selection problem between the following hypotheses:

  • \({\mathcal {H}}_{22}\): in the signal only the fundamental mode (2, 2) is present;

  • \({\mathcal {H}}_{lm}\): besides the fundamental mode (2, 2), the signal contains a given HM (lm).

As shown in Appendix D, we can derive an equation that relates the Bayes factor \({\mathcal {B}}_{lm,22}\) between the two hypotheses, the SNR, and the FF

$$\begin{aligned} \ln {{\mathcal {B}}_{lm,22}} = \frac{1}{2}\left( 1-\text {FF}^2\right) \, \text {SNR}_{\text {opt}}^2, \end{aligned}$$
(10)

which is valid for large SNRs. Equation (10) can be used to analytically predict the detectability of different HMs as a function of the parameters of the waveform model. In fact, given a set of parameters for the system, we can use the waveforms \({\textbf{h}}_{22}\) and \({\textbf{h}}_{lm}\) to calculate the FF through Eq. (9), and then Eq. (10) provides the value of \(\ln {{\mathcal {B}}_{lm,22}}\) as a function of \(\text {SNR}_{\text {opt}}\). This quantifies how much the hypothesis \({\mathcal {H}}_{lm}\) is favoured over \({\mathcal {H}}_{22}\), depending on the power of the signal \({\textbf{h}}_{lm}\) over noise. In other words, Eq. (10) answers the following question:

what is the signal-to-noise ratio needed to reach some threshold value of the Bayes factor in favour of higher modes?

In practice, we set the threshold value for a detection to be \(\ln {{\mathcal {B}}_{lm,22}}=3\), and invert Eq. (10) to find \(\text {SNR}_{\text {opt}}\) as a function of the \(\text {FF}\). The FF varies across the binary parameter space, because the excitation of the HMs depends on the progenitor parameters. Hence, we can study the SNR needed for the detection as a function of the parameters of the model. Figure 1 shows, for nonspinning initial black holes \(\chi _1=\chi _2=0\), the \(\text {SNR}_{\text {opt}}\) required to measure the modes (3, 3) and (2, 1), using TEOBPM as waveform model. The SNR is given as a function of the mass ratio \(q\in [1,5]\) and inclination \(\iota \in [0,\pi /2]\), the latter interval being sufficient for spin-aligned systems due to equatorial symmetry.

For equal mass binaries \(q\sim 1\), such as those typically observed in current events [4], we do not expect to observe HMs in the RD. Indeed, with a threshold \(\ln {{\mathcal {B}}_{lm,22}}=3\), we would need an event with \(\text {SNR}_{\text {opt}} \gtrsim 50\) to detect the mode (3, 3), with current events having \(\text {SNR}_{\text {opt}} \lesssim 15\) in the RD. However, HMs are excited by introducing asymmetries in the system, and increasing the mass ratio dramatically reduces the threshold SNR. For example, with \(q\in [2,3]\) we could confidently measure the mode (3, 3) for loud events at current sensitivities \(\text {SNR}_{\text {opt}}\sim [10,15]\), for inclinations \(\iota > \pi /4\). Note that the mode (2, 1) generally has SNRs larger than the mode (3, 3), especially for low mass ratios \(q \lesssim 2\). This implies that the (3, 3) mode is the most easily observable HM in the RD. Similar plots for the modes \(\left\{ (3,2),(4,4)\right\} \) can be found in Appendix D, showing that the SNR required to observe their contribution is larger compared with the modes \(\left\{ (3,3),(2,1)\right\} \). For instance, we need \(\text {SNR}_{\text {opt}} \gtrsim 25\) to observe the mode (4, 4) at mass ratios \(q \lesssim 2\) and inclinations \(\iota > \pi /4\). Note that the above numbers depend strongly on the choice of the threshold BF which, if decreased, suggests a potential for detection already at low mass ratios. Going beyond our conservative threshold would require simulation studies in real interferometric noisy data, where the threshold would be chosen to keep the probability of noise-induced detection below a certain value. Importantly, these simulation studies would also need to account for modelling systematics due to the imperfect representation of the NR solutions, and for the uncertain determination of the starting time from IMR signals.

To ensure numerical stability when inverting Eq. (10) to evaluate the SNR, we set a numerical cutoff around \(\text {FF}=1\) at the level of \(10^{-5}\). Therefore, regions where \(\text {FF}=1\) at that level of precision are displayed in white in Fig. 1. The sharp edges in the level curves are induced by the profile of the PSD used in the scalar product of Eq. (9). Finally, we stress that the detectability results just discussed are derived under the approximation of high SNR, \({\textbf{h}} \gg {\textbf{n}}\), and should therefore be used as a rough estimate and complemented by parameter estimation studies for cases of specific interest.

5 GWTC-3 analysis

We now describe the RD analysis with TEOBPM on real data from the latest catalog of GWs, GWTC-3. The study was performed with pyRing [79], a python package for time domain RD analyses of BH coalescences, based on the formalism described in [62, 72, 93] and Sec. VII.A.1 of [10]. For a detailed and pedagogical introduction to time-domain analyses, see Ref. [94]. To compute Bayesian evidences and probability distributions, we use the raynest [166] sampler, running each analysis with 4 parallel chains, 5000 live points and 5000 maximum number of steps in the Markov Chain Monte Carlo (MCMC).

5.1 Input parameters and events selection

In a first study, we select the subset of events that present an informative RD with the TEOBPM model. To select such events, we compute \(\ln {{\mathcal {B}}_{s,n}}\) between the hypotheses:

  • \({\mathcal {H}}_{s}\): the data contain a GW signal \({\textbf{h}}\) modeled with only the fundamental mode, \({\textbf{d}}={\mathbf {h_{22}}}+{\textbf{n}}\);

  • \({\mathcal {H}}_{n}\): the data only consist of noise \({\textbf{n}}\), \({\textbf{d}}={\textbf{n}}\);

and then select the events with \(\ln {{\mathcal {B}}_{s,n}}>3\). Following standard procedures in time-domain RD analyses, we fix some of the free parameters of TEOBPM from existing PE analyses in the full IMR regime, to fix the analysis segment. Such procedure prevents the template from latching to pre-peak data portions, and and has been verified to introduce negligible biases at the current sensitivity [5]. In particular, the peak time \(t_0\) and the sky position \(\left\{ \alpha , \delta \right\} \) are taken from the LVK TGR papers [5, 9, 10]. For repeatability, the input values used in this work are reported in Appendix E. The remaining free parameters are listed in Table 1, together with the prior ranges used, uniform in all the quoted variables. The prior ranges on masses and spins are taken from [167].

Table 1 Free parameters used in the analysis, with their prior range. The masses are expressed in solar masses \({\textrm{M}_{\odot }}\) and the luminosity distance in mega-parsecs Mpc

We apply this procedure to the 48 events included in the GWTC-3 catalog based on their false alarm rate, as described in [5, 9, 10]. Applying the above procedure, we select a subset of 18 events that are informative in the RD, which are those that will be used in the rest of the work, listed in Table 2.

Fig. 2
figure 2

Ridgeline plot with the posterior distributions of the following parameters: primary mass \(m_1\), secondary mass \(m_2\), primary adimensional spin \(\chi _1\), secondary adimensional spin \(\chi _2\), luminosity distance \(d_L\), final mass \(M_f\), final spin \(a_f\), frequency and damping time of the fundamental mode, \(f_{22}\) and \(\tau _{22}\), inclination \(\iota \). Each row correspond to one of the events analysed. The blue distributions have been obtained with TEOBPM, the red distributions are taken from IMR analyses by the LVK Collaboration, the green distributions are from RD analyses by the LVK Collaboration. In agreement with the Bayes Factors of the multimodal analysis, for the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 we use the PE from the analysis with the modes \(\left\{ (2,2), (3,3)\right\} \), while for the other events with only the fundamental mode

5.2 Parameter estimation and LVK comparison

We perform PE with the TEOBPM on the selected events and compare the results with those from the LVK Collaboration, both in the IMR and RD regime. As motivated in the previous section, for TEOBPM we consider only the modes \((l,m) = [(2,2), (3,3), (2,1)]\). Figure 2 shows the marginalized posterior distributions for a comprehensive list of parameters, and Table 2 summarises the median values of the posteriors obtained with TEOBPM for selected parameters. The LVK IMR samples are combined from different waveform models, including precessing templates.

The LVK RD samples are taken from the pyRing analysis using the Kerr_221 template, as presented in the LVK GWTC-3 TGR catalog [5], starting at \(t_0=t_{22}^{\text {peak}}\) like TEOBPM. The Kerr_221 model assumes a Kerr remnant BH and the fundamental mode with its first overtone. Overtones close to the peak [93, 168] do not correspond to physical vibrational frequencies of the underlying spacetime [78], but their use allows the entire post-peak signal to be fitted with sufficient accuracy for current (low-SNR) GW detections. Hence, the overtone was exploited as an effective term to push the analysis at earlier times, capturing more power present in the signal. This allows a direct comparison with the TEOBPM results, as both analyses include the same amount of data. Note, however, that the two models adopt two different parameterisations. While TEOBPM depends on the progenitor parameters, the Kerr_221 depend on remnant parameters, such as \(\left\{ M_f, a_f, A_{lmn}, \phi _{lmn}\right\} \). For TEOBPM and LVK IMR, the remnant parameters \(\left\{ M_{f}, a_{f}, f_{22}, \tau _{22}\right\} \) are obtained with fits on NR data [87, 169]. The posteriors shown are generated using the kernel density estimation (KDE) method in scipy.

Results are generally consistent across the different models. We note that, in general, TEOBPM obtains distributions that are sharper than LVK RD and broader than LVK IMR. This behaviour is expected, as the IMR analysis uses more data and recovers higher SNRs. Nonetheless, in most cases TEOBPM’s posteriors are comparable to those of LVK IMR, and in certain cases sharper in some parameters. There are several possible reasons for this behaviour. A first possibility is related to the fact that IMR models also include pre-peak data, so that noise features could contribute to produce different posteriors. This could be either because noisy features present in the inspiral data are not included in the RD analysis, or because IMR analyses are less sensitive to noise fluctuations due to the larger SNR. In addition, the LVK IMR samples are obtained with precessing models, and precession introduces a strong correlation between the parameters, broadening the marginalised posteriors. For this reason, since we assume aligned spins in TEOBPM, the spins comparison with LVK IMR should not be expected to yield identical results. Finally, neglecting the correlation between sky position and luminosity distance, as done in the RD analyses, will also artificially sharpen the posteriors. We leave a more detailed investigation of these features to future work. Instead, TEOBPM contains more information in the model compared to LVK RD, and will therefore be able to extract more SNR if the signal is well described by GR. Furthermore, sampling on the initial masses and spins \(\left\{ m_{1,2}, \chi _{1,2}\right\} \) instead of remnant parameters \(\left\{ M_f, a_f, A_{lmn}, \phi _{lmn}\right\} \) further reduces correlations, thus sharpening the TEOBPM’s posteriors compared to LVK RD.

Table 2 Median and \(90\%\) symmetric credible intervals of some relevant parameters: primary mass \(m_1\), secondary mass \(m_2\), dimensionless primary spin \(\chi _1\), inclination \(\iota \), luminosity distance \(d_L\)

The initial spins are the parameters which vary the most compared to LVK IMR, with TEOBPM systematically recovering higher values of \(\chi _{1}\). However, we note that the IMR pointy distributions in \(\chi _{1,2}\) are mostly dominated by the prior, which is different from ours. In fact, we use a flat prior in \(\chi _{1,2}\in [-0.8,0.95]\), while IMR analyses typically have three free components for each spin, and set a prior uniform in spin magnitude on the sphere. See [4]. Indeed, we highlight how the agreement with our results improves dramatically in the few events where the IMR posteriors on \(\chi _{1,2}\) differ from the prior, such as GW170729, GW190519_153544 and GW190706_222641. To understand whether these high values of the primary spin could be induced by detector noise, we simulated several mock signals at different SNRs, and found no biases in the recovery of the injected value. We also investigated how our flat prior in the component masses affects the prior in the effective spin through correlations with the (non-flat) mass ratio, finding that such an effect is not sufficient to justify the results.

Fig. 3
figure 3

Posteriors distributions of the initial masses \(m_{1,2}\) for the different events on the x-axis. The red posteriors are from the analysis with only the fundamental mode (2, 2), the blue ones using the fundamental mode and the mode (3, 3), the green ones using the fundamental mode and the mode (2, 1). For the two subplots, the bottom row shows the logarithmic Bayes factor \(\ln {{\mathcal {B}}_{lm,22}}\) between the HM considered and only the fundamental mode, colored according to which hypothesis is favoured

5.3 Multimodal analysis

In the previous section we compared ringdown-only estimates to full IMR analyses, when assuming the \((l,m) = [(2,2), (3,3), (2,1)]\) modes. These results can be interpreted as a time-domain version of the IMR consistency test [5, 170,171,172,173,174,175], similar to what was done in Ref. [176], although here we do not perform a corresponding inspiral analysis and only compare to full IMR results. For an application to astrophysical source properties using similar techniques, see Ref. [177].

Now, we turn our attention to the detection of modes beyond the fundamental, with the goal of extracting multiple QNM modes from the system, allowing for tests of GR predictions with a more immediate interpretation. To this end, we conduct a multimodal search to investigate the presence of HMs in the RD of the selected events. Based on the predictions on the detectability in Sect. 4, we only use the \(\left\{ (3,3),(2,1)\right\} \) HMs and consider the following hypotheses:

  • \({\mathcal {H}}_{22}\):   \(\left\{ (2,2)\right\} \);

  • \({\mathcal {H}}_{22,33}\):   \(\left\{ (2,2),(3,3)\right\} \);

  • \({\mathcal {H}}_{22,21}\):   \(\left\{ (2,2),(2,1)\right\} \);

  • \({\mathcal {H}}_{22,33,21}\):   \(\left\{ (2,2),(3,3),(2,1)\right\} \);

where each logical proposition represents the assumption of having a gravitational wave signal \({\textbf{h}}\) in the data \({\textbf{d}}\) with that specific set of modes. We then perform model selection between these hypotheses, expressing the results through the logarithmic BF \(\ln {{\mathcal {B}}_{lm,22}}\), where the model lm corresponds to one of the above hypotheses including at least one HM.

Results are summarized in Table 3, where we report the BFs between the templates with and without HMs. To gauge the impact of the priors, we also show the values of the network optimal signal-to-noise ratio \(\text {SNR}^{\text {net}}_{\text {opt}}\), the information H [178] and the maximum value of the logarithmic likelihood \(\text {lnL}_{\text {max}}\). None of the events shows strong preference for HMs. Nonetheless, the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 have a low but positive BF. To appreciate the consequences of this fact, in Fig. 3 we study how the PE on the initial masses \(m_{1,2}\) varies when HMs are included. In the case of the mode (2, 1), all the logarithmic BFs are very close to zero, and there are no appreciable differences in the posteriors with the additional mode. Instead, in the case of the mode (3, 3), we can clearly see how the distributions change for the events which show a positive BF for this HM. In particular, for these events the posteriors move towards higher mass ratios when mode (3, 3) is included, a behaviour that is qualitatively in agreement with the physical excitation of this mode.

However, these results are not yet sufficient to conclude a (3, 3) detection, as the search was only performed at one fixed starting time, given by the median value of the IMR peaktime distribution for each event. This procedure is reasonable to keep the computational cost of the search under control, but needs to be extended if aiming to claim a mode detection. In fact, since the peaktime distribution is an additional parameter, its uncertainty needs to be appropriately included (i.e. marginalised over) in the inference to obtain unbiased results. It is well known that RD analyses are highly sensitive to the choice of the starting time, and a naive treatment of this issue can lead to non-robust or even highly biased results [36, 58, 62, 78]. This problem typically manifests itself in the following forms:

  • Moving to earlier times with respect to the model’s prescription implies the use of data that are outside the regime of applicability of the model. In practice, for a linear RD model, this normally results in the use of data containing non-linearities from the postmerger in an attempt to obtain more data with higher SNR. For TEOBPM, this means including in the data some premerger part of the signal before the peak of the waveform. In both cases, the first part of the data is affected by un-modelled low frequencies at high SNR, pushing the final mass to larger values. We also note that in the case of TEOBPM, there is also a bias when starting at later times after the peak. This is because we are assuming given values of the amplitudes and phases evaluated at the peak time. Applying them at a different reference time will result in a bias. This situation does not occur for RD models consisting of a superposition of QNMs with free amplitudes, where starting at later times only implies a loss in SNR.

  • With a fixed starting time, it is not possible to evaluate the impact of short-transient detectors-noise realizations. Given the low-SNR contained in the post-merger only signal, gauging the impact of such effects becomes significantly more important. By repeating the analysis at multiple times, it is possible to trace certain features present in the BF or posterior to noise artifacts.

TEOBPM is defined to start at the peak of the IMR waveform. Thus, we should probe the entire region where the IMR peaktime distribution has support. This procedure is still dependent on the specific IMR model used, and results may vary for different IMR waveforms, but for events where the systematic uncertainties of the IMR modelling are under control, this typically has a small impact. In the top subplot of Figs. 4, 7, 10, 11 we show how the BFs and the \(m_{1,2}\) posteriors change as the starting time is varied across the \(95\%\) CI of the IMR peaktime distribution, for the subset of events that prefer the (3, 3) mode. All the four events display similar features:

  • one or more peaks in the BF within the \(95\%\) CI of the IMR peaktime distribution;

  • a change in the posterior distributions favouring higher mass ratios in correspondence with a BF peak.

We interpret this behaviour as resulting from the impact of the mode (3, 3) in these events, which allows to improve the PE in the region of applicability of our model. Note that this effect is visible despite the low evidence for the HM, suggesting that the posterior distributions are more sensitive than BFs in targeting features in the data, although the latter constitute a more conservative and robust metric [179].

For the events GW200129_065458 and GW200224_222234, the BF continues to increase at late times outside the support of the peaktime distribution. This effect is difficult to interpret due to the expected increase of modelling systematics when starting at times much later than the peak, and should be investigated in the future. Likely cause scould to be the impact of systematics introduced by different IMR models in reconstructing the peak time, or precessional degrees of freedom not included in our analysis [4, 180,181,182,183,184].

Finally, we discuss our results for the event GW190521, in which several works have investigated the presence of the mode (3, 3) in its RD [10, 96, 155, 156, 185, 186]. In the upper subplot of Fig. 8, we observe a peak in the BF as function of the starting time, which is however shifted from the median value of the peaktime distribution and lies outside the \(95\%\) CI. At our resolution of \(1M_f\), the BF reaches its maximum at \(t_0 = -6 M_f = {-7.6}\,{\textrm{ms}}\), with \(M_f \simeq {258.8}\,{\textrm{M}_{\odot }}\). Our input parameters, including the peaktime distribution, were obtained with the NRSur7dq4 IMR waveform. Our results are in qualitative agreement with [96, 186], since also in those analysis significant evidence for HMs could only be obtained well outside the validity regime of the employed models. Apart from the systematics induced by the start time, precession or eccentricity are known to bear a large impact on this event [186,187,188,189,190], so further studies will be needed to properly characterise HMs detections in this type of systems.

Assessing the significance for the weak evidence we obtained for HMs, as quantified by BFs, would in principle require simulation studies [185]. In such a study, a large number of NR signals containing only the (2,2) mode or the full HM content would be added on top of both gaussian and real detector noise (the latter would especially be required for low-significance detections), and recovered using our model. Then, setting a threshold minimising the number of false positives detection of HMs when simulating a signal containing only the (2,2) mode, would imply a “detection threshold” for the BF. Since here we do not claim significant evidence for HMs, we do not perform such a study, which would bear a large computational cost. We note however, that gathering experience from a selected number of these simulations, systematics due to modelling and start time determination typically bear a much stronger impact on false positive detections.

6 Tests of general relativity

The final-state conjecture states that astrophysical BHs are uniquely described by the Kerr metric [37,38,39,40,41,42,43,44,45,46,47, 62]. In our Bayesian framework, this is an assumption nested within our waveform model that can to be verified against observations. We can use the RD to test this hypothesis (commonly referred to as “no-hair” conjecture), by performing model selection between GR and non-GR models. Additional degrees of freedom (“hairs”) of the final BH system are modelled by deviation parameters with respect to the GR prediction, typically in the QNM spectrum. Different parameterizations of the deviations can be used, depending on the degrees of assumptions built into the model. For example, for RD models composed of pure QNM superpositions, at least two different modes need to be resolved to remove the degeneracy between \(\left\{ f_{22}, \tau _{22}\right\} \) and \(\left\{ M_f, a_f\right\} \), as discussed e.g. in Refs. [91, 92, 191]. In modern analyses, this is typically done using overtones to phenomenologically model the early postmerger, see [5, 10, 93]. Instead, if the RD model used has amplitudes fixed from numerical relativity simulations (e.g. [27,28,29,30, 34, 36, 98,99,100]), the numerical calibration will impose a constraint on the parameter space which naturally decouples \(\left\{ f_{22}, \tau _{22}\right\} \) and \(\left\{ M_f, a_f\right\} \). In this way the no-hair conjecture can be tested with only a single mode, at the price of introducing additional GR inputs in the model, i.e. at the price of a less agnostic search. This is the case of TEOBPM, and similar considerations hold for the LVK analysis with pSEOBNR [5]. Nonetheless, even though TEOBPM allow us to perform such tests with only the fundamental mode, we expect the test to improve when multiple modes are observed. We stress that the above considerations are only qualitative, and the actual situation on real data strongly depends on the SNR and characteristics of the system.

We introduce fractional deviations in the oscillation frequencies of the modes as

$$\begin{aligned} \omega _{lm} = \omega _{lm}^{\text {{GR}}} (1 + \delta \omega _{lm}), \end{aligned}$$
(11)

where \(\omega _{lm}^{\text {{GR}}}\) is the value predicted by GR. The same procedure is applied to the damping time \(\tau _{lm}\). Note that for TEOBPM the deviations are introduced directly in Eq. (6).

For example, if only the fundamental mode is considered, the deviations can in general affect:

  • only the frequency \(\delta \omega _{22}\);

  • only the damping time \(\delta \tau _{22}\);

  • both the frequency and damping time.

In the presence of multiple modes, the number of possibilities quickly increases, as all combinations between the deviations in frequencies and damping times for the different selected modes have to be considered. The systematic study of all these combinations is already challenging with two modes, in order to collect and compare the different results, in addition to the large computational cost.

6.1 TGR with two modes

We first test the set of events with positive (albeit weak) evidence for the mode (3, 3) and consider the following hypotheses:

$$\begin{aligned} {\mathcal {H}}_{\delta \omega _{22}},\quad {\mathcal {H}}_{\delta \omega _{33}},\quad {\mathcal {H}}_{\delta \tau _{22}},\quad {\mathcal {H}}_{\left\{ \delta \omega _{22}, \delta \tau _{22}, \delta \omega _{33}\right\} }. \end{aligned}$$
(12)
Fig. 4
figure 4

PE and model selection on the event GW190521_074359, as a function of the starting time. For each subplot, each column correspond to a separate analysis with a different starting time. The red posterior distributions are obtained with only the fundamental mode (2, 2), the blue ones with the modes \(\left\{ (2,2), (3,3)\right\} \). For each subplot, the first two rows display the initial masses \(m_{1,2}\), and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis. The shaded region in the bottom row outlines the \(95\%\) CI of the corresponding IMR peaktime distribution. The subplot in the top correspond to the GR analysis, while the underline three include deviations from GR in the form of fractional deviations in the QNMs. Specifically, the green distributions are obtained with the modes \(\left\{ (2,2), (3,3)\right\} \) and deviations only on the frequency of the fundamental mode \(\delta \omega _{22}\), the pink ones with the deviations only in the frequency of the mode (3, 3), and the yellow ones with deviations only in the damping time of the fundamental mode \(\delta \tau _{22}\)

The baseline GR model consists of the modes \(\left\{ (2,2), (3,3)\right\} \), and each hypothesis is constructed from this basic model with the deviation parameters contained in the subscript of each hypothesis. We neglect the deviations \(\delta \tau _{33}\), since damping times (and so their deviations) are in general more difficult to be measured than the frequencies. As in Sect. 5.3, we repeat the analysis by varying the starting time across the \(95\%\) CI of the IMR peaktime distribution. This amounts to “marginalise” a-posteriori over the start time uncertainty. The results are shown in Figs. 4, 7, 10, 11. The marginalized distributions for the deviations are consistent with GR, i.e. they have support on zero up to the \(90\%\) CI, except in the cases discussed below. The BFs also show no strong preference for the non-GR hypotheses, with the largest BF in favour of GR deviations being \({\mathcal {B}} \simeq 3\) in the support of the peak time distribution. This happens for the subset of events with lower SNR and higher masses, which merge at frequencies where the noise variance rises sharply, so such low-significance results are to be expected when analysing a large number of events at many different starting times, just based on the statistical properties of the noise. Outside of the peak time support, the BFs tend to grow, sometimes to significantly large values. However, as discussed previously, the model assumes that the start time coincides with the waveform peak, hence spurious GR deviations are expected when this assumption is violated significantly. A procedure to combine results at multiple starting times into a single estimate would help to immediately assess the impact of this effect. Such a procedure would re-weight the obtained BFs according to the value of the peak time probability distribution,Footnote 4 combining the various BF\(_{t_{st}=t_i}\), into a unique BF\(_w\), where i runs on the time samples overlapping with the \(t_p\) support. A similar algorithm has been implemented in Ref. [63]. A crucial difference is that the latter analysis assumed a pre-merger model, allowing such marginalisation to be easily performed. Within time-domain methods, the analysis segment is selected by excising all data before \(t_{st}\), so analyses performed at different \(t_{st}\) will include different portions of data. This implies that combining different BFs would produce a BF\(_w\) which is not evaluated on a fixed amount of data, hence cannot be straightforwardly interpreted as a proper BF, in the sense of standard probability rules. More refined procedures will be required to produce such a combined BF\(_w\). However, an upper bound on BF\(_w\) can always be constructed by taking \(\mathrm BF_{max} {:}{=}max \,\, BF_{t_i}\), which is the approach we follow here. In the following, we briefly discuss our results, to illustrate the main features observed in the posteriors and BFs.

  • GW170729   The GR value is mostly included in the support of the deviation posteriors, although for a few cases at the boundaries of the peak time posterior, the GR value lies only in the tails of the distribution, especially when considering deviations on the decay time. This is consistent with \(t_{st}\) induced systematics at times far away from the probable peak, as discussed above. Additionally, since the event has a relatively low \(\text {SNR}^{\text {net}}_{\text {opt}}\sim 5.7\), noise-induced deviations are expected. This is reflected in bimodalities in the posteriors, both in the deviation parameters and \(\{m_{1,2}, \chi _1, d_L\}\), which vary inconsistently over time. The bimodalities manifest in a peak consistent with GR, and a secondary peak which appears at the lower prior bound (equal to -1) of the deviation parameters \(\delta \omega _{22}\) and \(\delta \tau _{22}\). This secondary peak correlates with high mass ratios and low distances. Such behaviour indicates that the signal can be fitted equally well by the higher mode (significantly enhanced for high mass ratio), with the fundamental mode suppressed (the lower bounds of the deviation parameters correspond to a very fast decaying signal, with very low frequency outside of the detector sensitive band), and is another manifestation of the low SNR. Concerning the upper bound, we prefer to keep the deviations bounds fixed to the (already significantly wide) interval \([-1,1]\), since railing is expected to hold even when the bounds are increased, as for all signals where little information is present, as verified e.g. in Ref. [103].

  • GW190521_074359   The posteriors on \(\delta \omega _{22}\) in the time interval \([-1.5,0.5]M_f\) are slightly shifted towards larger values, but the GR prediction always lies well within the \(90\%\) CI. This effect is compensated by the increase in the mass \(m_1\), which is positively correlated with \(\delta \omega _{22}\). As expected, the BF also increases slightly in the same time interval. \(\delta \tau _{22}\) always favours GR, and \(\delta \omega _{33}\) is poorly resolved,Footnote 5.

  • GW200129_065458   The deviation posteriors are always centered on the GR value, with the BFs always favouring GR. Such an improved agreement is expected given the large value of \(\text {SNR}^{\text {net}}_{\text {opt}}\sim 12.5\), which also allows for deviations on \(\delta \omega _{33}\) to be slightly resolved. The deviation observed in Ref. [104] are due to waveform systematics when including the pre-merger signal, which do not appear here. This is likely because of the reduced impact of precession on the post-merger signal.

  • GW200224_222234   The deviations \(\delta \omega _{22}\) and \(\delta \tau _{22}\) are centered on GR, with BFs favouring GR accordingly. Also for this event, the deviations \(\delta \omega _{33}\) are partially resolved, with a peak emerging, although the support of the posterior does not exclude any region of the prior, hence results are essentially uninformative.

Note how, in all the events, the deviations \(\delta \omega _{22}\) try to capture the premerger part of the signal at times far away from \(t_p\), with large variations in the posteriors accompanied by high non-GR BFs, indicating a weak consistency of the model with the data. This interesting, and expected, behaviour serves to remind how sensitive the results of RD analyses are to the underlying assumptions, and how important is the proper handling of \(t_{st}\). For completeness, Fig. 8 reports the deviations in GW190521, that are well measured and centered on the GR value for \(\left\{ \delta \omega _{22}, \delta \omega _{33}\right\} \). This is particularly appreciable in correspondence with the BF peak that lies outside the IMR peaktime distribution, as discussed in Sect. 5.3. However, we observe that the posteriors in \(\delta \tau _{22}\) are systematically shifted toward higher values with respect to the GR expectation. This behaviour is accompanied by a positive BF for non-GR in the entire support of the peaktime distribution. Further studies on the GR deviations with only the fundamental mode will be necessary to draw conclusions.

Finally, Fig. 9 shows the results for the event GW191109_010717, for which previous LVK analyses have identified the presence of non-GR deviations induced by non-stationarities in the noise [5]. We reproduce this result, and label this event as the showcase of a non-robust analysis, as manifest by the highly varying posteriors.

6.2 TGR with the fundamental mode

We summarise in Fig. 13 the results on the events with no evidence of HMs, for which we test the hypotheses \({\mathcal {H}}_{\delta \omega _{22}},\) and \({\mathcal {H}}_{\delta \tau _{22}}\). For simplicity, we perform the analysis at only one starting time, given by the median value of the IMR peaktime distribution. For the first gravitational signal GW150914, among the loudest ringdown observed to date, we instead provide the complete analysis at different starting times in Fig. 12. All the posteriors have confident support on the GR value, for all the events. The BFs also favours GR, except for GW190828_063405 which has a slightly positive BF for non-GR and large variations in the deviations posteriors, especially in \(\delta \tau _{22}\). This result has already been observed in the LVK analysis [10] with both the pyRing and pSEOBNR pipelines, where the overestimation of the damping time was tracked (through injections close to the trigger) to artifacts of noise fluctuations caused by the low SNR (in our analysis \(\text {SNR}^{\text {net}}_{\text {opt}}\sim 5.0\)).

6.3 \(M_f\)-\(a_f\) “no-hair” test

In this last section, we discuss an interesting representation of “no-hair” tests that can be constructed in the presence of at least two free modes. This is inspired by GR tests on binary pulsars, see e.g. Figure 6 of Ref. [192]. The achievable precision on such type of tests with future detectors upgrades has been studied for example in Ref. [102].

We first compute the samples of the effective frequency using Eq. (11), combining the GR and non-GR frequencies,Footnote 6 We then project these values in the \(\left\{ M_f, a_f\right\} \) plane. Since there are several combinations of \(\left\{ M_f, a_f\right\} \) which give the same frequency values, a sample of \(f_{lm}\) is mapped onto a curve in the \(\left\{ M_f, a_f\right\} \) plane. Similarly, the set of all the \(f_{22}\) samples, for example, generates a bundle in \(\left\{ M_f, a_f\right\} \). Repeating the same procedure with \(f_{33}\) and \(\tau _{22}\), we end up with three bundles of curves in the plane \(\left\{ M_f, a_f\right\} \), as shown in Fig. 5. At this point, two important considerations are due (see e.g. [87, 91, 92]).

  • If two frequencies and one damping time are detectable, the system presents three observables fixed by two parameters, \(\left\{ M_f, a_f\right\} \). Thus, if the samples are consistent with GR, the three bundles should intersect in a single point in the high SNR limit, corresponding to the GR-predicted value of \(\left\{ M_f, a_f\right\} \). If the observed effective frequencies and damping times are not consistent with GR, the three bundles will not intersect, as the two quantities \(\left\{ M_f, a_f\right\} \) do not reproduce the three independent observables.

  • A subtle step in the process is that the projection of the effective QNM frequencies is done assuming GR, in the sense that we invert the GR fits to obtain \(\left\{ M_f, a_f\right\} \) from \(\left\{ f_{lm}, \tau _{lm}\right\} \). This is crucial to bring up inconsistencies in the measured QNM frequencies assuming GR.

Fig. 5
figure 5

Projection of the non-GR effective QNM complex frequencies \(\left\{ \omega _{lm} = \omega _{lm}^{\text {{GR}}} (1 + \delta \omega _{lm}), \tau _{lm} = \tau _{lm}^{\text {{GR}}} (1 + \delta \tau _{lm})\right\} \) in the plane of the final mass and spin \(\left\{ M_f, a_f\right\} \). The green bundle displays the median, \(95\%\) and \(84\%\) CIs of the projected samples from the fundamental mode effective frequency \(\omega _{22}\), the pink bundle from the effective frequency of the mode (3, 3), the yellow bundle from the effective damping time of the fundamental mode \(\tau _{22}\). The solid black curves represent the \(50\%\) and \(90\%\) credible levels of the corresponding GR analysis, i.e. the analysis with the same starting time but without GR deviations, and the dotted grey curves the same credible levels for the non-GR analysis. Each subplot refers to one of the events that showed evidence of the mode (3, 3). The samples used are taken from the analysis at the starting time corresponding with the maximum BF in favour of the mode (3, 3), as described in Sect. 6.3

Figure 5 shows the outcome of this procedure for the subset of events with evidence for the mode (3, 3), together with the \(50\%\) and \(90\%\) CIs of the \(\left\{ M_f, a_f\right\} \) GR and non-GR samples. For each event, we use the posteriors from the analysis \({\mathcal {H}}_{\left\{ \delta \omega _{22}, \delta \tau _{22}, \delta \omega _{33}\right\} }\) at the starting time corresponding to the maximum BF for the mode (3, 3). Specifically, we consider the samples obtained at 0M for GW170729, \(-6M\) for GW190521, \(-1M\) for GW190521_074359, \(-15M\) for GW191109_010717, 3M for GW200129_065458, \(-2M\) for GW200224_222234. These arbitrary choices are motivated by an attempt to use a starting time that enhances the presence of the mode (3, 3), and could impact the test if some of the premerger signal is present in the data. However, given the extremely broad bundles in Fig. 5, we believe that a detailed investigation at different starting times is not of interest. Indeed, since the projected posteriors basically cover the entire \(\left\{ M_f, a_f\right\} \) plane, we conclude that this test is still largely uninformative at current sensitivities, as expected from SNR estimates [78, 87]. Nevertheless, we highlight a few points. First, GW200129_065458 is the loudest event, with \(\text {SNR}^{\text {net}}_{\text {opt}}\sim 12.5\), and correspondingly has a tighter bundle, as expected. With a perfect overlap between the GR and non-GR samples, and the projected \(\omega _{33}\) not covering the entire plane, this event gives an idea of what a negative test will look like at higher sensitivities. On the other hand, we point to the event GW191109_010717 with noise-induced GR deviations, to show how a non-GR analysis might manifest itself. Here, we observe non-overlapping GR and non-GR posteriors, together with widely separated median curves in \(\left\{ \omega _{22}, \tau _{22}, \omega _{33}\right\} \). The projected \(\omega _{22}\) in GW170729 is much broader than in the other events because of its low \(\text {SNR}^{\text {net}}_{\text {opt}}\sim 5.7\). GW190521 shows a different pattern from the other events, with the \(\tau _{22}\) bundle outside the GR samples. This behaviour can be traced in Fig. 8, where we observe a posterior in \(\delta \tau _{22}\) that is shifted from zero at \(t_0 = -6 M_f\). This feature can be attributed to the systematics induced by the low-probability \(t_{st}\), which was adopted to showcase the effect of systematics in this type of tests.

7 Conclusions

We presented a comprehensive ringdown analysis of the third catalog of gravitational-wave events GWTC-3 employing a highly-accurate template, TEOBPM, which increases the sensitivity of the search by modelling non-linearities in the early postmerger phase. Our work provides a robust framework for characterising the detectability of higher modes and deviations from general relativity. With this method, we report a low-significance detection of the \((l,m) = (3,3)\) mode in four events, and tentative support for the Kerr hypothesis as predicted by general relativity (“no-hair” tests) in the presence of two modes.

The parameter estimation results with TEOBPM are in agreement with previous analyses from the LIGO-Virgo-KAGRA Collaboration. This, together with an increased precision in the constraints obtained compared to pure superposition of QNMs, serves to demonstrate the robustness and enhanced sensitivity of our framework compared to standard spectroscopic analyses. Our results highlight the benefits of using ringdown models that include non-linearities, indicating how such models should play an important role in future ringdown analyses.

The addition of the \((l,m)=(3,3)\) mode, albeit such mode is flagged with low-significance by Bayes Factors, still visibly affects the posterior distributions. The fact that these results are observed for the first time in this work, for many events considered, is a combination of the improved waveform model used and of the systematic inclusion of the peak time uncertainty deployed in our analysis. Beyond modes detection, such a robust assessment of start time-induced systematics is even more crucial when conducting tests of general relativity. Assuming the presence of multiple modes, we allow for deviations in the QNM spectrum and find agreement with general relativistic predictions.

Our method could be further improved by repeating the analysis with inputs from different inspiral-merger-ringdown waveforms, gauging modeling systematics, and by characterising the statistical significance of the results on simulated signals, see e.g. [104]. Future enhancements in the analysis methodology include the simultaneous marginalisation of sky position and start time together with the other analysis parameters. It would also be interesting to combine the posteriors on the deviations to draw information on the population distribution. This is the subject of ongoing studies.

Our investigation will serve as a solid foundation for spectroscopic ringdown analyses with numerically informed models, both in the ongoing LIGO-Virgo-KAGRA observing run (O4) and with future more sensitive gravitational-waves detectors, nearing the possibility of robust multimodal tests of general relativity in the strong field regime.