Searching for ringdown higher modes with a numerical relativity-informed post-merger model

Robust measurements of multiple black hole vibrational modes provide a unique opportunity to characterise gravity in extreme curvature and dynamical regimes, to better investigate the nature of compact objects and search for signs of new physics. We use a numerically-tuned quasicircular non-precessing ringdown model, $\texttt{TEOBPM}$, and the $\texttt{pyRing}$ analysis infrastructure to perform a time-domain spectroscopic analysis of the third catalog of transient gravitational-wave signals, GWTC-3, searching for higher angular modes. The $\texttt{TEOBPM}$ model effectively includes non-linearities in the early post-merger signal portion, and carries information about the progenitors parameters through time-dependent excitation amplitudes of the black hole quasinormal modes. Such a strategy allows us to accurately model the full post-merger emission, recovering higher signal-to-noise ratios compared to templates based on more agnostic superpositions of damped-sinusoids. We find weak evidence for the presence of $(l,m)=(3,3)$ [$(l,m)=(2,1)$] mode in several events, with the largest Bayes factor in favour of this mode being $\mathcal{B}\simeq 2.6$ [$\mathcal{B}\simeq 1.2$] within the peak time distribution support. For GW190521, we observe $\mathcal{B}\simeq 5.1$, but only for times outside the peak time support reconstructed using the highly accurate $\texttt{NRSur7dq4}$ model, indicating significant systematics affecting such putative detection. Allowing for deviations from general relativity under the assumption of the presence of two modes, we find tentative support for the Kerr"final state conjecture". Our work showcases a systematic methodology to robustly identify and characterise higher angular modes in ringdown-only signals, highlighting the significant impact of modelling assumptions and peak time uncertainty on spectroscopic measurements, at current signal-to-noise ratios.

Observations of gravitational waves (GWs) from compact binary coalescences through the LIGO-Virgo-KAGRA interferometric network [1][2][3] are revolutionising our understanding of the Universe, from black hole (BH) physics to astrophysics, cosmology and fundamental physics [4][5][6][7].As GWs from the inspiral and plunge phases carry information about the long-range dynamics of two coalescing BHs, the merger and ringdown (RD) regimes provide unique access to extreme curvatures and strong fields, being arguably the best route to investigate the nature of BHs and challenge our current understanding of gravity [5,[8][9][10][11][12][13].
In fact, current studies based on QNM superpositions rely on the assumption that, at late enough times during the postmerger relaxation process, the spacetime is welldescribed by linear perturbation theory.This assumption manifests itself in, e.g., a constant final mass and spin, and of (asymptotically) constant ringdown complex amplitudes 2 .Indeed, the merger and early postmerger phases are known to contain nonlinear features [52], and a sufficient amount of time needs to pass before the linear regime starts to hold and the above assumptions are justified [29,30,[53][54][55].Such complications are among the motivations behind the large amount of recent progress in modelling higher-order effects, which might soon help to extend ringdown models for comparable-mass binaries closer to the signal peak [49,[56][57][58][59][60][61].Although techniques to estimate the (signal-to-noise ratio dependent) start of the linear regime validity have been developed on both numerical relativity (NR) simulations [55] and real data [62,63], the applicability time-window of QNM superpositions remains the most relevant systematic in most RD analyses.
To address this problem, immediately after the first binary numerical evolutions [64][65][66], RD models going beyond QNM superpositions have been developed, e.g.[67].The time-dependence of the BH parameters and amplitudes is now phenomenologically included through a flexible function with several free parameters, thus relaxing the requirement to describe the process from first principles.In particular, such models are required to complete effective-one-body (EOB) [68][69][70] waveforms beyond merger, and in Refs.[71][72][73][74][75][76][77] accurate templates capable of modelling the entire post-merger emission were presented.Such models allow to include larger portions of the GW signal compared to a superposition of QNMs with constant amplitudes, increasing the available signalto-noise ratio (SNR).Although they are less sensitive to certain exotic deviations from a Kerr relaxation, by including more information and being less agnostic about the physics involved, they allow to extract a larger amount of information from the data.As extensively discussed in Ref. [78], accounting for such nonlinear contributions is of paramount importance.Failing to do so leads to overfitting, misinterpreting the physical QNM content of the signal, and in turn to faulty QNM "detections" plagued by systematic uncertainties caused by the inappropriate assumption of constant amplitude models close to the signal peak.
In this work, we rely on an accurate phenomenological model covering the entire post-peak emission, TEOBPM [71,72,75,76], and use it to perform a comprehensive Bayesian analysis of the third catalog of GW events, GWTC-3 [4], using pyRing [79], a python software tailored to perform a time-domain Bayesian analysis on BH ringdown signals.Our analysis allows to systematically assess the detectability of HMs in the RD at current sensitivities, extending previous LVK ringdownonly studies which relied on models based on pure QNM superpositions [5,8,10].Unlike such RD models, TEOBPM includes information from the progenitors, allowing inference on parameters such as the initial masses and spins, and incorporates time-dependent models for the modes amplitudes tuned to NR, thus providing a bridge between inspiral-merger-ringdown (IMR) data and RD-only analyses.This is similar in spirit to the strategy employed in Ref. [80].Here, we improve on the latter by: i) using the complete time-domain likelihood, overcoming the circularity approximation; ii) using a time-domain model with an explicit parameterisation of all QNM-related quantities, allowing for a more direct physical interpretation of the constrains obtained; iii) incorporating the HMs content; iv) crucially including the uncertainty in the ringdown validity regime.
We report small positive Bayes factors (BF) in favour of the (l, m) = (3,3) [(l, m) = (2, 1)] mode in the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 [GW190521_074359, GW191109_010717], and study in details the robustness of the result by varying the starting time of the analysis.We also introduce fractional shifts in the QNMs spectrum to study potential deviations from general relativity (GR) in the RD, discussing in detail tests of the final state conjecture under the assumption of the presence of two angular modes.Our results support encouraging prospects for the positive observation of HMs in GW signals observed in the ongoing LVK observing run (O4) [3,[81][82][83].
In Sec.II, we discuss the basics of multimodal RD waveforms from an observational perspective.In Sec.III, we describe our implementation of the TEOBPM RD model.In Sec.IV, we introduce a semi-analytical procedure to predict the detectability of different HMs.In Sec.V we discuss the details of our Bayesian RD analysis and collect the parameter estimation (PE) results on GWTC-3, comparing them with the results obtained by the LVK Collaboration, both in the IMR and RD regimes.We also describe our procedure to robustly search for HMs, including the systematic uncertainty in the ringdown starting time.In Sec.VI, we introduce fractional shifts in the QNMs spectrum to study potential deviations from GR in the RD, discussing in detail tests of the final state conjecture under the assumption of the presence of two angular modes.We conclude and discuss future prospects in Sec.VII.

II. RINGDOWN MODELS
The linear gravitational perturbations of a Kerr BH are described by the Teukolsky equation [84,85], with the main GW emission component captured by h lm ≡ A 0,lm e −σ lm (t−t0)+iϕ 0,lm , ( where M ≡ m 1 + m 2 is the total initial mass of the binary, d L is the luminosity distance, Y lmn are spin-weighted spherical harmonics (SWSH).The plus and cross (gaugeinvariant) polarizations of the GW waveform are denoted by h + and h × .We define the QNMs complex frequencies as where Re{σ lm } = α lm ≡ 1/τ lm is the inverse of the damping time, τ lm , while Im{ω n } = ω lm is the oscillation frequency.A 0,lm and ϕ 0,lm are the constant amplitudes and phases of the different modes.Note that in Eq. (1) we are neglecting the overtone expansion of each angular mode, since the contribution of overtones in the linear regime (where they can be confidently identified) is negligible compared to the longest-lived angular modes [78,86], on which we will focus.We will refer to the mode (l, m) = (2, 2) as the fundamental mode, and to those with (l, m) ̸ = (2, 2) as higher modes (HMs).The values of ω lmn and τ lmn are known semi-analytically and can be computed numerically given the mass and (dimensionless) angular momentum {M f , a f } of the final BH [86][87][88].On the other hand, A 0,lmn depend on the specific process generating the perturbation, and are not known analytically for a binary merger of two BHs of comparable mass.For example, for quasicircular equal-masses binaries, the dominant contribution in the RD is given by the fundamental mode (2, 2), with leading subdominant contributions coming from the modes {(3, 3), (2, 1), (3, 2)}, see Refs.[27-32, 34, 35, 58].HMs are excited by asymmetries in the system [26], e.g. by increasing the mass ratio q ≡ m 1 /m 2 (m 1 ≥ m 2 ) or the adimensional spins χ 1,2 ≡ J 1,2 /m 2 1,2 of the two orbiting BHs.In addition to the properties of the source, the actual content of the modes present in the RD also depends on the SWSHs, which further modulate different modes based on the geometry of the system.In particular, the overall amplitude of the modes strongly depends on the inclination angle ι, defined as the angle between the direction of the orbital angular momentum3 and that of the observer.
The extraction of the QNM content from a GW signal, also known as "black hole spectroscopy" [12,13,25,[89][90][91], is performed using various sets of models similar to Eq. ( 2), which can be classified by the amount of information included (i.e. of assumptions).The models with the fewest assumptions will be the most generic, capable of detecting even large deviations from GR predictions.On the other hand, because they contain little information, such models provide less precise constraints on the deviation parameters and are the least sensitive to small GR deviations.These classes of RD models range from pure superposition of damped sinusoids with unknown complex frequencies and amplitudes (i.e. with σ lm in Eq. ( 2) an unknown parameter inferred from the data) [5,8,10,62,78], to templates including only perturbative predictions on the QNM spectrum [31,32,34,62,[92][93][94][95][96][97] but no information on the QNM amplitudes (i.e.σ lm constrained by GR predictions, but A 0,lmn a free parameter in Eq. ( 2)), to templates that incorporate numerical predictions of the quasinormal amplitudes (i.e. both σ lm and A 0,lmn constrained by GR predictions).The latter assume that the remnant black hole forms from a binary merger, and either incorporate the explicit amplitude dependence on binary parameters [27-30, 34, 54, 80, 98-101] or only partial information on the relative amplitudes excitation [96].Furthermore, one can include numerical predictions for early-times non-linearities by constructing phenomenological models for the full postmerger signal by promoting A 0,lmn to a time-dependent quantity, which is achieved by TEOBPM, but still without accounting for pre-merger data in the analysis.The class of models with the largest amount of information (hence the most accurate, but less agnostic ones) are pSEOB-like templates [102][103][104][105][106], where even the pre-merger signal is included (currently under the hypothesis that pre-merger data are correctly described by GR), and deviations are allowed only in the QNM spectrum.All these models should be regarded as complementary, and answering different questions.For example, models that impose QNM spectral predictions might miss deviations induced by the non-Kerr nature.Models that impose amplitudes predictions coming from binary inspirals may be biased if the orbital dynamics is not captured by quasicircular models (due to orbital eccentricity, precession, environmental effects etc.).In particular, the pSEOB class of templates, within its current implementation, is expected to be the most sensitive to deviations that abruptly appear around the merger phase, e.g.triggered by high-curvature dynamical couplings, similar to dynamical scalarisation [107].These classes of scenarios would predict an inspiral signal close to the GR prediction, but a merger-ringdown phase that is rather different.Instead, searches based on ringdownonly templates that exclude pre-merger data, such as TEOBPM that is used in this study, will produce less biased results for scenarios where deviations are also present in the inspiral.Clearly, the most sensitive and accurate test would be performed using coherent IMR models in specific alternative scenarios.Recent progress has been made in constructing such models, both analytically  and numerically [136][137][138][139][140].However, the large classes of possible deviations from GR makes this a daunting task, and naturally calls for generic models and parameterisations [141][142][143][144][145][146][147][148] capable of capturing deviations from several classes of modifications.

III. THE TEOBPM RINGDOWN MODEL
In this work we focus on a specific RD model, which constitutes the postmerger part of state-of-the-art EOB [73,74] and phenomenological time-domain models [149][150][151].The tidal effective one-body post-merger (TEOBPM) model was first introduced in Ref. [71], as an NR-informed analytical representation of the postmerger waveform from BH coalescences.The model has been further improved in [72,75,76,152] as the postmerger part of the TEOBResumS waveforms family.Here, we briefly review the model construction, and highlight its characteristics relevant to RD analyses.
We work with the expansion coefficients h lm in Eq. ( 1), which are the output of NR simulations [153].We want our model to be defined from the peak of the full IMR waveform, that we define as t 0 ≡ t peak 22 .If the entire postmerger phase would be linear, we could directly apply Eq. ( 1) to describe the entire RD with an appropriate mode combination, and constant amplitudes [78].Since the early times contain non-linearities, while Eq. ( 1) is obtained from linear perturbation theory, additional contributions are required to extend the template to the peak of the waveform.Thus, the strategy adopted in TEOBPM is to factorise the QNMs linear contribution from each mode h lm , and define the QNM-rescaled ringdown waveform hlm containing all the non-linear contributions.To model nonlinearities, we decompose the QNM-rescaled waveform h22 into a generic complex number, with time-dependent amplitude and phase, The functions Ah 22 and ϕh 22 are informed from NR simulations.Here, we will not further describe the details of the procedure to produce these fits, and we refer the interested reader to Refs.[75,76].
The above procedure is applied to each mode.Nevertheless, the time at which different modes peak will not necessarily be the same.That is, in general, t peak lm will not coincide with the merger time, defined as the peak of the fundamental mode t 0 = t peak 22 .The time delay ∆t lm is then defined as the difference between the peak of the fundamental mode (2, 2) and the peak of each mode Consequently, for each HM included in the model, one additional parameter ∆t lm needs to be fit on NR simulations.From Eq. ( 1), the expansion coefficients for each mode are then Further details on our implementation of TEOBPM can be found in App. 1.

A. Characteristics of TEOBPM
Having discussed how TEOBPM is constructed and how some of its internal degrees of freedom are fixed from NR, we list the remaining free parameters of the model.Being Θ lm the parameter space on which TEOBPM is defined, Θ lm is parametrized by θ = {θ I , θ E } ∈ Θ lm , where Thus, TEOBPM is a ringdown model with 11 parameters when only the fundamental mode is used, and one additional parameter ϕ 0,lm for each HM included.Note, however, that some of the extrinsic parameters are typically fixed from IMR results in ringdown-only analyses either because of degeneracies or to fix the analysis segment, so that the actual number of free parameters is smaller.Similarly, the orbital phase φ entering the spherical harmonics is not included due to the complete degeneracy with single-modes phases.This differs from usual RD models, which are typically parametrized in terms of remnant properties, such as the final mass and dimensionless spin {M f , a f } or the QNMs complex frequencies {ω lm , τ lm }.This parametrization allows TEOBPM to describe ringdown characteristics in terms of the progenitors properties.These choices make the model particularly sensitive to GR deviations, as discussed below, at the cost of being less agnostic about the nature of the RD process, a fact that is particularly relevant for tests of general relativity (TGR).Below, we highlight the advantages and the limits of TEOBPM. Advantages: • Starting time The model is defined from the peak of the fundamental mode (2, 2), which can be estimated from IMR analyses, either modelled [9] or unmodelled [154].Consequently, the uncertainty in determining the starting time of our RD analysis depends only on the width of the IMR peaktime posterior distribution.Instead, more agnostic analyses based on superposition of damped sinusoids that do not include near-peak contributions, additionally require to identify a reference time at which the pure-QNM description of the data starts to be valid [54,55].Since the system is evolving and thus the GW frequency is changing with time (as the remnant mass and spin, see e.g.[78]), a pure-QNM description will never be exact.Hence, for pure damped-sinusoids, an adequate reference time will be chosen in such a way that the systematic uncertainties due to non-stationary QNM contributions are much smaller than the statistical uncertainties due to the finite SNR.At present detectors sensitivity, depending on the accuracy of the model, this happens in the range [10,15] M [5,10,54].
• Ampitudes Since the initial amplitudes A 0,lm of the different modes are set by a fit to NR, the model includes the information on the excitation of the various modes as a function of the progenitor parameters (similar to e.g. the Kerr_HMs model [29,30] used in Refs.[5,10,155,156], also labeled "MMRDNP" in pyRing).
• Non-linearities As the model includes the entire postmerger part of the signal, it uses more data with high SNR with respect to linear RD models, providing more sensitive results (compared to e.g. the "MMRDNP" model [29,30]).This makes TEOBPM particularly suitable for characterising HMs. Limitations: • Precession and eccentricity The model is informed on NR simulations of quasicircular binary BHs with spins aligned or anti-aligned with the orbital angular momentum.As a result, it is only valid for events in which both precession and eccentricity are negligible.
• Mode-mixing When the gravitational radiation is decomposed into SWSHs instead of spin-weighted spheroidal harmonics, modes with the same m and different l are affected by mode mixing [53,157,158].Among the most relevant modes, the (3, 2) mixes with the fundamental mode, and this effect is not currently included in TEOBPM.Although we expected the contribution of the mode (3,2) to be negligible at current sensitivity, see App. 4, this issue should be addressed in the future.This contribution has recently been included in the v5 version of the SEOB model [77,[159][160][161][162], applied to spectroscopic analyses in the context of LISA in Ref. [106].
• Accuracy The accuracy of the model has been extensively studied on NR simulations, with some of the results reported in App. 2. We observe singlemode mismatches of O(10 −3 ) for the fundamental mode (2, 2), and O(10 −2 ) for the {(3, 3), (2, 1)} modes, in agreement with the literature [76].Others subdominant modes {(3, 2), (4, 4), (4, 3), (4, 2)} show a worse performance, but their overall contribution to the waveform can be safely ignored at current sensitivity.Hence, we decided not to include these modes in the analysis and leave a systematic study of their inclusion to future work.Note that since the (2, 2) is by far the dominant mode in the systems under consideration, the mismatch of the total waveform (summing all the modes) receives only small contributions from subdominant modes, so the total mismatch is typically close to that of the fundamental mode.
• ∆t lm As the HMs start at different times compared to the fundamental, a ringdown-only implementation of the model is systematically missing the contribution of HMs during the time interval ∆t lm , i.e. until the mode has started.Note that in the corresponding IMR version, this part of the signal comes from the inspiral dynamics and is therefore present.This results in an intrinsic limitation in the current implementation of TEOBPM in pyRing.
Although we have found this effect to be negligible at current sensitivities based on the analysis of simulated signals, it may have an impact for future studies at higher sensitivities.It could be addressed, for example, by constructing time-dependent amplitude fits starting from a common peak time for all HMs.
• ϕ 0,lm The presence of an additional parameter for the initial phase of each HM impacts the BF by increasing the prior volume, systematically penalising the hypotheses with higher number of modes (Ockham's razor).This problem could be solved by fixing the initial phases on NR in future refinements of the model.This implies that our current implementation leads to more conservative results when computing BFs for the presence of HMs.

IV. HIGHER MODES DETECTABILITY
We now turn to the investigation of HMs observability in the RD of current GW events.To answer this question, we develop a semi-analytical procedure to predict the detectability of the different HMs over the fundamental mode (2, 2).These techniques have been first introduced in the context of TGRs to study the detection of GR deviations in the high SNR limit [144,163,164].
In general, we define the time-domain scalar product weighted with the inverse autocovariance matrix as (3, 3) (2, 1) where a and b are two time series, and C nn is the autocovariance matrix of the detector noise.Specifically, we use an inverse Fourier Transform of the simulated power spectral density (PSD) from the advanced LIGO design configuration [165] to estimate the autocovariance matrix.Note that the FF only depends on the profile of the PSD, as Eq. ( 9) in invariant under any rescaling of the autocovariance matrix.The fitting factor (FF) is then defined as where θ are the parameters of the model, and Θ lm is the parameter space over which the model including HMs is defined.As proven in App. 3, the FF is closely related to the optimal signal-to-noise ratio (SNR opt ), see App. 3 for the definition of the latter quantity.Suppose we have measured some data d, and want to understand whether a given HM present in the signal can be distinguished from the fundamental mode.This is a model selection problem between the following hypotheses: • H 22 : in the signal only the fundamental mode (2,2) is present; • H lm : besides the fundamental mode (2, 2), the signal contains a given HM (l, m).
As shown in App. 4, we can derive an equation that relates the Bayes factor B lm,22 between the two hypotheses, the SNR, and the FF ln which is valid for large SNRs.Eq. ( 10) can be used to analytically predict the detectability of different HMs as a function of the parameters of the waveform model.In fact, given a set of parameters for the system, we can use the waveforms h 22 and h lm to calculate the FF through Eq. ( 9), and then Eq. (10)  what is the signal-to-noise ratio needed to reach some threshold value of the Bayes factor in favour of higher modes?
In practice, we set the threshold value for a detection to be ln B lm,22 = 3, and invert Eq. ( 10) to find SNR opt as a function of the FF.The FF varies across the binary parameter space, because the excitation of the HMs depends on the progenitor parameters.Hence, we can study the SNR needed for the detection as a function of the parameters of the model.Fig. 1 shows, for nonspinning initial black holes χ 1 = χ 2 = 0, the SNR opt required to measure the modes (3, 3) and (2, 1), using TEOBPM as waveform model.The SNR is given as a function of the mass ratio q ∈ [1, 5] and inclination ι ∈ [0, π/2], the latter interval being sufficient for spin-aligned systems due to equatorial symmetry.For equal mass binaries q ∼ 1, such as those typically observed in current events [4], we do not expect to observe HMs in the RD.Indeed, with a threshold ln B lm,22 = 3, we would need an event with SNR opt ≳ 50 to detect the mode (3,3), with current events having SNR opt ≲ 15 in the RD.However, HMs are excited by introducing asymmetries in the system, and increasing the mass ratio dramatically reduces the threshold SNR.For example, with q ∈ [2, 3] we could confidently measure the mode (3,3) for loud events at current sensitivities SNR opt ∼ [10,15], for inclinations ι > π/4.Note that the mode (2, 1) generally has SNRs larger than the mode (3,3), especially for low mass ratios q ≲ 2. This implies that the (3, 3) mode is the most easily observable HM in the RD.Similar plots for the modes {(3, 2), (4, 4)} can be found in App. 4, showing that the SNR required to observe their contribution is larger compared with the modes {(3, 3), (2, 1)}.For instance, we need SNR opt ≳ 25 to observe the mode (4, 4) at mass ratios q ≲ 2 and inclinations ι > π/4.Note that the above numbers depend strongly on the choice of the threshold BF which, if decreased, suggests a potential for detection already at low mass ratios.Going beyond our conservative threshold would require simulation studies in real interferometric noisy data, where the threshold would be chosen to keep the probability of noise-induced detection below a certain value.Importantly, these simulation studies would also need to account for modelling systematics due to the imperfect representation of the NR solutions, and for the uncertain determination of the starting time from IMR signals.
To ensure numerical stability when inverting Eq. ( 10) to evaluate the SNR, we set a numerical cutoff around FF = 1 at the level of 10 −5 .Therefore, regions where FF = 1 at that level of precision are displayed in white in Fig. 1.The sharp edges in the level curves are induced by the profile of the PSD used in the scalar product of Eq. ( 9).Finally, we stress that the detectability results just discussed are derived under the approximation of high SNR, h ≫ n, and should therefore be used as a rough estimate and complemented by parameter estimation studies for cases of specific interest.

V. GWTC-3 ANALYSIS
We now describe the RD analysis with TEOBPM on real data from the latest catalog of GWs, GWTC-3.The study was performed with pyRing [79], a python package for time domain RD analyses of BH coalescences, based on the formalism described in [62,72,93] and Sec.VII.A.1 of [10].For a detailed and pedagogical introduction to time-domain analyses, see Ref. [94].To compute Bayesian evidences and probability distributions, we use the raynest [166] sampler, running each analysis with 4 parallel chains, 5000 live points and 5000 maximum number of steps in the Markov Chain Monte Carlo (MCMC).

A. Input parameters and events selection
In a first study, we select the subset of events that present an informative RD with the TEOBPM model.To select such events, we compute ln B s,n between the hypotheses: • H s : the data contain a GW signal h modeled with only the fundamental mode, d = h 22 + n; • H n : the data only consist of noise n, d = n; and then select the events with ln B s,n > 3. Following standard procedures in time-domain RD analyses, we fix some of the free parameters of TEOBPM from existing PE analyses in the full IMR regime, to fix the analysis segment.Such procedure prevents the template from latching to pre-peak data portions, and and has been verified to introduce negligible biases at the current sensitivity [5].
In particular, the peak time t 0 and the sky position {α, δ} are taken from the LVK TGR papers [5,9,10].For repeatability, the input values used in this work are reported in App. 5.The remaining free parameters are listed in Tab.I, together with the prior ranges used, uniform in all the quoted variables.The prior ranges on masses and spins are taken from [167].
We apply this procedure to the 48 events included in the GWTC-3 catalog based on their false alarm rate, as described in [5,9,10].Applying the above procedure, we select a subset of 18 events that are informative in the RD, which are those that will be used in the rest of the work, listed in Tab.II.
Table I: Free parameters used in the analysis, with their prior range.
The masses are expressed in solar masses M ⊙ and the luminosity distance in mega-parsecs Mpc.

B. Parameter estimation and LVK comparison
We perform PE with the TEOBPM on the selected events and compare the results with those from the LVK Collaboration, both in the IMR and RD regime.As motivated in the previous section, for TEOBPM we consider only the modes (l, m) = [(2, 2), (3, 3), (2, 1)].Fig. 2 shows the marginalized posterior distributions for a comprehensive list of parameters, and Tab.II summarises the median values of the posteriors obtained with TEOBPM for selected parameters.The LVK IMR samples are combined from different waveform models, including precessing templates.The LVK RD samples are taken from the pyRing analysis  Each row correspond to one of the events analysed.The blue distributions have been obtained with TEOBPM, the red distributions are taken from IMR analyses by the LVK Collaboration, the green distributions are from RD analyses by the LVK Collaboration.In agreement with the Bayes Factors of the multimodal analysis, for the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 we use the PE from the analysis with the modes {(2, 2), (3, 3)}, while for the other events with only the fundamental mode.
using the Kerr_221 template, as presented in the LVK GWTC-3 TGR catalog [5], starting at t 0 = t peak 22 like TEOBPM.The Kerr_221 model assumes a Kerr remnant BH and the fundamental mode with its first overtone.Overtones close to the peak [93,168] do not correspond to physical vibrational frequencies of the underlying spacetime [78], but their use allows the entire post-peak signal to be fitted with sufficient accuracy for current (low-SNR) GW detections.Hence, the overtone was exploited as an effective term to push the analysis at earlier times, capturing more power present in the signal.This allows a direct comparison with the TEOBPM results, as both analyses include the same amount of data.Note, however, that the two models adopt two different parameterisations.While TEOBPM depends on the progenitor parameters, the Kerr_221 depend on remnant parameters, such as {M f , a f , A lmn , ϕ lmn }.For TEOBPM and LVK IMR, the remnant parameters {M f , a f , f 22 , τ 22 } are obtained with fits on NR data [87,169].The posteriors shown are generated using the kernel density estimation (KDE) method in scipy.
Results are generally consistent across the different models.We note that, in general, TEOBPM obtains distributions that are sharper than LVK RD and broader than LVK IMR.This behaviour is expected, as the IMR analysis uses more data and recovers higher SNRs.Nonetheless, in most cases TEOBPM's posteriors are comparable to those of LVK IMR, and in certain cases sharper in some parameters.There are several possible reasons for this behaviour.A first possibility is related to the fact that IMR models also include pre-peak data, so that noise features could contribute to produce different posteriors.This could be either because noisy features present in the inspiral data are not included in the RD analysis, or because IMR analyses are less sensitive to noise fluctuations due to the larger SNR.In addition, the LVK IMR samples are obtained with precessing models, and precession introduces a strong correlation between the parameters, broadening the marginalised posteriors.For this reason, since we assume aligned spins in TEOBPM, the spins comparison with LVK IMR should not be expected to yield identical results.Finally, neglecting the correlation between sky position and luminosity distance, as done in the RD analyses, will also artificially sharpen the posteriors.We leave a more detailed investigation of these features to future work.Instead, TEOBPM contains more information in the model compared to LVK RD, and will therefore be able to extract more SNR if the signal is well described by GR.Furthermore, sampling on the initial masses and spins {m 1,2 , χ 1,2 } instead of remnant parameters {M f , a f , A lmn , ϕ lmn } further reduces correlations, thus sharpening the TEOBPM's posteriors compared to LVK RD.
The initial spins are the parameters which vary the most compared to LVK IMR, with TEOBPM systematically recovering higher values of χ 1 .However, we note that the IMR pointy distributions in χ 1,2 are mostly dominated by the prior, which is different from ours.In fact, we 42.9 +9.2 −6.5  use a flat prior in χ 1,2 ∈ [−0.8, 0.95], while IMR analyses typically have three free components for each spin, and set a prior uniform in spin magnitude on the sphere.See [4].Indeed, we highlight how the agreement with our results improves dramatically in the few events where the IMR posteriors on χ 1,2 differ from the prior, such as GW170729, GW190519_153544 and GW190706_222641.To understand whether these high values of the primary spin could be induced by detector noise, we simulated several mock signals at different SNRs, and found no biases in the recovery of the injected value.We also investigated how our flat prior in the component masses affects the prior in the effective spin through correlations with the (non-flat) mass ratio, finding that such an effect is not sufficient to justify the results.

C. Multimodal analysis
In the previous section we compared ringdown-only estimates to full IMR analyses, when assuming the (l, m) = [(2, 2), (3, 3), (2, 1)] modes.These results can be interpreted as a time-domain version of the IMR consistency test [5,[170][171][172][173][174][175], similar to what was done in Ref. [176], although here we do not perform a corresponding inspiral analysis and only compare to full IMR results.For an application to astrophysical source properties using similar techniques, see Ref. [177].Now, we turn our attention to the detection of modes beyond the fundamental, with the goal of extracting multiple QNM modes from the system, allowing for tests of GR predictions with a more immediate interpretation.(2, 2), (2, 1) Figure 3: Posteriors distributions of the initial masses m 1,2 for the different events on the x-axis.The red posteriors are from the analysis with only the fundamental mode (2, 2), the blue ones using the fundamental mode and the mode (3, 3), the green ones using the fundamental mode and the mode (2, 1).For the two subplots, the bottom row shows the logarithmic Bayes factor ln B lm,22 between the HM considered and only the fundamental mode, colored according to which hypothesis is favoured.
To this end, we conduct a multimodal search to investigate the presence of HMs in the RD of the selected events.Based on the predictions on the detectability in Sec.IV, we only use the {(3, 3), (2, 1)} HMs and consider the following hypotheses: where each logical proposition represents the assumption of having a gravitational wave signal h in the data d with that specific set of modes.We then perform model selection between these hypotheses, expressing the results through the logarithmic BF ln B lm, 22 , where the model lm corresponds to one of the above hypotheses including at least one HM.Results are summarized in Tab.III, where we report the BFs between the templates with and without HMs.To gauge the impact of the priors, we also show the values of the network optimal signal-to-noise ratio SNR net opt , the information H [178] and the maximum value of the logarith-mic likelihood lnL max .None of the events shows strong preference for HMs.Nonetheless, the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 have a low but positive BF.To appreciate the consequences of this fact, in Fig. 3 we study how the PE on the initial masses m 1,2 varies when HMs are included.In the case of the mode (2, 1), all the logarithmic BFs are very close to zero, and there are no appreciable differences in the posteriors with the additional mode.Instead, in the case of the mode (3, 3), we can clearly see how the distributions change for the events which show a positive BF for this HM.In particular, for these events the posteriors move towards higher mass ratios when mode (3,3) is included, a behaviour that is qualitatively in agreement with the physical excitation of this mode.
However, these results are not yet sufficient to conclude a (3, 3) detection, as the search was only performed at one fixed starting time, given by the median value of the IMR peaktime distribution for each event.This procedure is reasonable to keep the computational cost of the search under control, but needs to be extended if aiming to claim a mode detection.In fact, since the peaktime distribution is an additional parameter, its uncertainty needs to be appropriately included (i.e.marginalised over) in the inference to obtain unbiased results.It is well known that RD analyses are highly sensitive to the choice of the starting time, and a naive treatment of this issue can lead to non-robust or even highly biased results [36,58,62,78].This problem typically manifests itself in the following forms: • Moving to earlier times with respect to the model's prescription implies the use of data that are outside the regime of applicability of the model.In practice, for a linear RD model, this normally results in the use of data containing non-linearities from the postmerger in an attempt to obtain more data with higher SNR.For TEOBPM, this means including in the data some premerger part of the signal before the peak of the waveform.In both cases, the first part of the data is affected by un-modelled low frequencies at high SNR, pushing the final mass to larger values.We also note that in the case of TEOBPM, there is also a bias when starting at later times after the peak.This is because we are assuming given values of the amplitudes and phases evaluated at the peak time.
Applying them at a different reference time will result in a bias.This situation does not occur for RD models consisting of a superposition of QNMs with free amplitudes, where starting at later times only implies a loss in SNR.
• With a fixed starting time, it is not possible to evaluate the impact of short-transient detectorsnoise realizations.Given the low-SNR contained in the post-merger only signal, gauging the impact of such effects becomes significantly more important.By repeating the analysis at multiple times, it is possible to trace certain features present in the BF or posterior to noise artifacts.
TEOBPM is defined to start at the peak of the IMR waveform.Thus, we should probe the entire region where the IMR peaktime distribution has support.This procedure is still dependent on the specific IMR model used, and results may vary for different IMR waveforms, but for events where the systematic uncertainties of the IMR modelling are under control, this typically has a small impact.In the top subplot of Fig.s 4,7,10,11 we show how the BFs and the m 1,2 posteriors change as the starting time is varied across the 95% CI of the IMR peaktime distribution, for the subset of events that prefer the (3, 3) mode.All the four events display similar features: • one or more peaks in the BF within the 95% CI of the IMR peaktime distribution; • a change in the posterior distributions favouring higher mass ratios in correspondence with a BF peak.
We interpret this behaviour as resulting from the impact of the mode (3, 3) in these events, which allows to improve the PE in the region of applicability of our model.Note that this effect is visible despite the low evidence for the HM, suggesting that the posterior distributions are more sensitive than BFs in targeting features in the data, although the latter constitute a more conservative and robust metric [179].
For the events GW200129_065458 and GW200224_222234, the BF continues to increase at late times outside the support of the peaktime distribution.This effect is difficult to interpret due to the expected increase of modelling systematics when starting at times much later than the peak, and should be investigated in the future.Likely cause scould to be the impact of systematics introduced by different IMR models in reconstructing the peak time, or precessional degrees of freedom not included in our analysis [4,[180][181][182][183][184].
Finally, we discuss our results for the event GW190521, in which several works have investigated the presence of the mode (3,3) in its RD [10,96,155,156,185,186].In the upper subplot of Fig. 8, we observe a peak in the BF as function of the starting time, which is however shifted from the median value of the peaktime distribution and lies outside the 95% CI.At our resolution of 1M f , the BF reaches its maximum at t 0 = −6M f = −7.6 ms, with M f ≃ 258.8 M ⊙ .Our input parameters, including the peaktime distribution, were obtained with the NRSur7dq4 IMR waveform.Our results are in qualitative agreement with [96,186], since also in those analysis significant evidence for HMs could only be obtained well outside the validity regime of the employed models.Apart from the systematics induced by the start time, precession or eccentricity are known to bear a large impact on this event [186][187][188][189][190], so further studies will be needed to properly characterise HMs detections in this type of systems.
Assessing the significance for the weak evidence we obtained for HMs, as quantified by BFs, would in principle require simulation studies [185].In such a study, a large number of NR signals containing only the (2,2) mode or the full HM content would be added on top of both gaussian and real detector noise (the latter would especially be required for low-significance detections), and recovered using our model.Then, setting a threshold minimising the number of false positives detection of HMs when simulating a signal containing only the (2,2) mode, would imply a "detection threshold" for the BF.Since here we do not claim significant evidence for HMs, we do not perform such a study, which would bear a large computational cost.We note however, that gathering experience from a selected number of these simulations, systematics due to modelling and start time determination typically bear a much stronger impact on false positive detections.

VI. TESTS OF GENERAL RELATIVITY
The final-state conjecture states that astrophysical BHs are uniquely described by the Kerr metric [37][38][39][40][41][42][43][44][45][46][47]62].In our Bayesian framework, this is an assumption nested within our waveform model that can to be verified against observations.We can use the RD to test this hypothesis (commonly referred to as "no-hair" conjecture), by performing model selection between GR and non-GR models.Additional degrees of freedom ("hairs") of the final BH system are modelled by deviation parameters with respect to the GR prediction, typically in the QNM spectrum.Different parameterizations of the deviations can be used, depending on the degrees of assumptions built into the model.For example, for RD models composed of pure QNM superpositions, at least two different modes need to be resolved to remove the degeneracy between {f 22 , τ 22 } and {M f , a f }, as discussed e.g. in Refs.[91,92,191].In modern analyses, this is typically done using overtones to phenomenologically model the early postmerger, see [5,10,93].Instead, if the RD model used has amplitudes fixed from numerical relativity simulations (e.g.[27-30, 34, 36, 98-100]), the numerical calibration will impose a constraint on the parameter space which naturally decouples {f 22 , τ 22 } and {M f , a f }.In this way the no-hair conjecture can be tested with only a single mode, at the price of introducing additional GR inputs in the model, i.e. at the price of a less agnostic search.This is the case of TEOBPM, and similar considerations hold for the LVK analysis with pSEOBNR [5].Nonetheless, even though TEOBPM allow us to perform such tests with only the fundamental mode, we expect the test to improve when multiple modes are observed.We stress that the above considerations are only qualitative, and the actual situation on real data strongly depends on the SNR and characteristics of the system.
We introduce fractional deviations in the oscillation frequencies of the modes as where ω GR lm is the value predicted by GR.The same procedure is applied to the damping time τ lm .Note that for TEOBPM the deviations are introduced directly in Eq. (6).
For example, if only the fundamental mode is considered, the deviations can in general affect: • only the frequency δω 22 ; • only the damping time δτ 22 ; • both the frequency and damping time.
In the presence of multiple modes, the number of possibilities quickly increases, as all combinations between the deviations in frequencies and damping times for the different selected modes have to be considered.The systematic study of all these combinations is already challenging with two modes, in order to collect and compare the different results, in addition to the large computational cost.

A. TGR with two modes
We first test the set of events with positive (albeit weak) evidence for the mode (3, 3) and consider the following hypotheses: The baseline GR model consists of the modes {(2, 2), (3, 3)}, and each hypothesis is constructed from this basic model with the deviation parameters contained in the subscript of each hypothesis.We neglect the deviations δτ 33 , since damping times (and so their deviations) are in general more difficult to be measured than the frequencies.As in Sec.V C, we repeat the analysis by varying the starting time across the 95% CI of the IMR peaktime distribution.This amounts to "marginalise" aposteriori over the start time uncertainty.The results are shown in Fig.s 4,7,10,11.The marginalized distributions for the deviations are consistent with GR, i.e. they have support on zero up to the 90% CI, except in the cases discussed below.The BFs also show no strong preference for the non-GR hypotheses, with the largest BF in favour of GR deviations being B ≃ 3 in the support of the peak time distribution.This happens for the subset of events with lower SNR and higher masses, which merge at frequencies where the noise variance rises sharply, so such low-significance results are to be expected when analysing a large number of events at many different starting times, just based on the statistical properties of the noise.Outside of the peak time support, the BFs tend to grow, sometimes to significantly large values.However, as discussed previously, the model assumes that the start time coincides with the waveform peak, hence spurious GR deviations are expected when this assumption is violated significantly.A procedure to combine results at multiple starting times into a single estimate would help to immediately assess the impact of this effect.Such a procedure would re-weight the obtained BFs according to the value Figure 4: PE and model selection on the event GW190521_074359, as a function of the starting time.For each subplot, each column correspond to a separate analysis with a different starting time.The red posterior distributions are obtained with only the fundamental mode (2, 2), the blue ones with the modes {(2, 2), (3, 3)}.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution.The subplot in the top correspond to the GR analysis, while the underline three include deviations from GR in the form of fractional deviations in the QNMs.Specifically, the green distributions are obtained with the modes {(2, 2), (3, 3)} and deviations only on the frequency of the fundamental mode δω 22 , the pink ones with the deviations only in the frequency of the mode (3, 3), and the yellow ones with deviations only in the damping time of the fundamental mode δτ 22 .
of the peak time probability distribution 4 , combining the various BF tst=ti , into a unique BF w , where i runs on the time samples overlapping with the t p support.A similar algorithm has been implemented in Ref. [63].A crucial difference is that the latter analysis assumed a premerger model, allowing such marginalisation to be easily performed.Within time-domain methods, the analysis segment is selected by excising all data before t st , so analyses performed at different t st will include different portions of data.This implies that combining different BFs would produce a BF w which is not evaluated on a fixed amount of data, hence cannot be straightforwardly interpreted as a proper BF, in the sense of standard probability rules.
More refined procedures will be required to produce such a combined BF w .However, an upper bound on BF w can always be constructed by taking BF max := max BF ti , which is the approach we follow here.In the following, we briefly discuss our results, to illustrate the main features observed in the posteriors and BFs.
• GW170729 The GR value is mostly included in the support of the deviation posteriors, although for a few cases at the boundaries of the peak time posterior, the GR value lies only in the tails of the distribution, especially when considering deviations on the decay time.This is consistent with t st induced systematics at times far away from the probable peak, as discussed above.Additionally, since the event has a relatively low SNR net opt ∼ 5.7, noise-induced deviations are expected.This is reflected in bimodalities in the posteriors, both in the deviation parameters and {m 1,2 , χ 1 , d L }, which vary inconsistently over time.The bimodalities manifest in a peak consistent with GR, and a secondary peak which appears at the lower prior bound (equal to -1) of the deviation parameters δω 22 and δτ 22 .This secondary peak correlates with high mass ratios and low distances.Such behaviour indicates that the signal can be fitted equally well by the higher mode (significantly enhanced for high mass ratio), with the fundamental mode suppressed (the lower bounds of the deviation parameters correspond to a very fast decaying signal, with very low frequency outside of the detector sensitive band), and is another manifestation of the low SNR.Concerning the upper bound, we prefer to keep the deviations bounds fixed to the (already significantly wide) interval [−1, 1], since railing is expected to hold even when the bounds are increased, as for all signals where little information is present, as verified e.g. in Ref. [103].
• GW190521_074359 The posteriors on δω 22 in the time interval [−1.5, 0.5]M f are slightly shifted to-wards larger values, but the GR prediction always lies well within the 90% CI.This effect is compensated by the increase in the mass m 1 , which is positively correlated with δω 22 .As expected, the BF also increases slightly in the same time interval.δτ 22 always favours GR, and δω 33 is poorly resolved5 .
• GW200129_065458 The deviation posteriors are always centered on the GR value, with the BFs always favouring GR.Such an improved agreement is expected given the large value of SNR net opt ∼ 12.5, which also allows for deviations on δω 33 to be slightly resolved.The deviation observed in Ref. [104] are due to waveform systematics when including the pre-merger signal, which do not appear here.This is likely because of the reduced impact of precession on the post-merger signal.
• GW200224_222234 The deviations δω 22 and δτ 22 are centered on GR, with BFs favouring GR accordingly.Also for this event, the deviations δω 33 are partially resolved, with a peak emerging, although the support of the posterior does not exclude any region of the prior, hence results are essentially uninformative.
Note how, in all the events, the deviations δω 22 try to capture the premerger part of the signal at times far away from t p , with large variations in the posteriors accompanied by high non-GR BFs, indicating a weak consistency of the model with the data.This interesting, and expected, behaviour serves to remind how sensitive the results of RD analyses are to the underlying assumptions, and how important is the proper handling of t st .For completeness, Fig. 8 reports the deviations in GW190521, that are well measured and centered on the GR value for {δω 22 , δω 33 }.This is particularly appreciable in correspondence with the BF peak that lies outside the IMR peaktime distribution, as discussed in Sec.V C.However, we observe that the posteriors in δτ 22 are systematically shifted toward higher values with respect to the GR expectation.This behaviour is accompanied by a positive BF for non-GR in the entire support of the peaktime distribution.Further studies on the GR deviations with only the fundamental mode will be necessary to draw conclusions.Finally, Fig. 9 shows the results for the event GW191109_010717, for which previous LVK analyses have identified the presence of non-GR deviations induced by non-stationarities in the noise [5].We reproduce this result, and label this event as the showcase of a non-robust analysis, as manifest by the highly varying posteriors.

B. TGR with the fundamental mode
We summarise in Fig. 13 the results on the events with no evidence of HMs, for which we test the hypotheses H δω22 , and H δτ22 .For simplicity, we perform the analysis at only one starting time, given by the median value of the IMR peaktime distribution.For the first gravitational signal GW150914, among the loudest ringdown observed to date, we instead provide the complete analysis at different starting times in Fig. 12.All the posteriors have confident support on the GR value, for all the events.The BFs also favours GR, except for GW190828_063405 which has a slightly positive BF for non-GR and large variations in the deviations posteriors, especially in δτ 22 .This result has already been observed in the LVK analysis [10] with both the pyRing and pSEOBNR pipelines, where the overestimation of the damping time was tracked (through injections close to the trigger) to artifacts of noise fluctuations caused by the low SNR (in our analysis SNR net opt ∼ 5.0).

C. M f -a f "no-hair" test
In this last section, we discuss an interesting representation of "no-hair" tests that can be constructed in the presence of at least two free modes.This is inspired by GR tests on binary pulsars, see e.g.Fig. 6 of Ref. [192].The achievable precision on such type of tests with future detectors upgrades has been studied for example in Ref. [102].
We first compute the samples of the effective frequency using Eq. ( 11), combining the GR and non-GR frequencies 6 .We then project these values in the {M f , a f } plane.Since there are several combinations of {M f , a f } which give the same frequency values, a sample of f lm is mapped onto a curve in the {M f , a f } plane.Similarly, the set of all the f 22 samples, for example, generates a bundle in {M f , a f }.Repeating the same procedure with f 33 and τ 22 , we end up with three bundles of curves in the plane {M f , a f }, as shown in Fig. 5.At this point, two important considerations are due (see e.g.[87,91,92]).
• If two frequencies and one damping time are detectable, the system presents three observables fixed by two parameters, {M f , a f }.Thus, if the samples are consistent with GR, the three bundles should intersect in a single point in the high SNR limit, corresponding to the GR-predicted value of {M f , a f }.
If the observed effective frequencies and damping times are not consistent with GR, the three bundles will not intersect, as the two quantities {M f , a f } do not reproduce the three independent observables. 6For TEOBPM, this is done by first using the fits in [169] to compute M f , a f from {m 1,2 , χ 1,2 }, then the fits in [87] to move from M f , a f to {f lm , τ lm }.Finally, the fractional deviations are summed through Eq. ( 11).
• A subtle step in the process is that the projection of the effective QNM frequencies is done assuming GR, in the sense that we invert the GR fits to obtain {M f , a f } from {f lm , τ lm }.This is crucial to bring up inconsistencies in the measured QNM frequencies assuming GR.
Fig. 5 shows the outcome of this procedure for the subset of events with evidence for the mode (3,3), together with the 50% and 90% CIs of the {M f , a f } GR and non-GR samples.For each event, we use the posteriors from the analysis H {δω22,δτ22,δω33} at the starting time corresponding to the maximum BF for the mode (3,3).Specifically, we consider the samples obtained at 0M for GW170729, −6M for GW190521, −1M for GW190521_074359, −15M for GW191109_010717, 3M for GW200129_065458, −2M for GW200224_222234.These arbitrary choices are motivated by an attempt to use a starting time that enhances the presence of the mode (3, 3), and could impact the test if some of the premerger signal is present in the data.However, given the extremely broad bundles in Fig. 5, we believe that a detailed investigation at different starting times is not of interest.Indeed, since the projected posteriors basically cover the entire {M f , a f } plane, we conclude that this test is still largely uninformative at current sensitivities, as expected from SNR estimates [78,87].Nevertheless, we highlight a few points.First, GW200129_065458 is the loudest event, with SNR net opt ∼ 12.5, and correspondingly has a tighter bundle, as expected.With a perfect overlap between the GR and non-GR samples, and the projected ω 33 not covering the entire plane, this event gives an idea of what a negative test will look like at higher sensitivities.On the other hand, we point to the event GW191109_010717 with noise-induced GR deviations, to show how a non-GR analysis might manifest itself.Here, we observe non-overlapping GR and non-GR posteriors, together with widely separated median curves in {ω 22 , τ 22 , ω 33 }.The projected ω 22 in GW170729 is much broader than in the other events because of its low SNR net opt ∼ 5.7.GW190521 shows a different pattern from the other events, with the τ 22 bundle outside the GR samples.This behaviour can be traced in Fig. 8, where we observe a posterior in δτ 22 that is shifted from zero at t 0 = −6M f .This feature can be attributed to the systematics induced by the low-probability t st , which was adopted to showcase the effect of systematics in this type of tests.

VII. CONCLUSIONS
We presented a comprehensive ringdown analysis of the third catalog of gravitational-wave events GWTC-3 employing a highly-accurate template, TEOBPM, which increases the sensitivity of the search by modelling nonlinearities in the early postmerger phase.Our work provides a robust framework for characterising the detectability of higher modes and deviations from general relativity.With this method, we report a low-significance detection Figure 5: Projection of the non-GR effective QNM complex frequencies ω lm = ω GR lm (1 + δω lm ), τ lm = τ GR lm (1 + δτ lm ) in the plane of the final mass and spin M f , a f .The green bundle displays the median, 95% and 84% CIs of the projected samples from the fundamental mode effective frequency ω 22 , the pink bundle from the effective frequency of the mode (3,3), the yellow bundle from the effective damping time of the fundamental mode τ 22 .The solid black curves represent the 50% and 90% credible levels of the corresponding GR analysis, i.e. the analysis with the same starting time but without GR deviations, and the dotted grey curves the same credible levels for the non-GR analysis.Each subplot refers to one of the events that showed evidence of the mode (3,3).The samples used are taken from the analysis at the starting time corresponding with the maximum BF in favour of the mode (3, 3), as described in Sec.VI C.
of the (l, m) = (3, 3) mode in four events, and tentative support for the Kerr hypothesis as predicted by general relativity ("no-hair" tests) in the presence of two modes.
The parameter estimation results with TEOBPM are in agreement with previous analyses from the LIGO-Virgo-KAGRA Collaboration.This, together with an increased precision in the constraints obtained compared to pure superposition of QNMs, serves to demonstrate the robustness and enhanced sensitivity of our framework compared to standard spectroscopic analyses.Our results highlight the benefits of using ringdown models that include nonlinearities, indicating how such models should play an important role in future ringdown analyses.
The addition of the (l, m) = (3, 3) mode, albeit such mode is flagged with low-significance by Bayes Factors, still visibly affects the posterior distributions.The fact that these results are observed for the first time in this work, for many events considered, is a combination of the improved waveform model used and of the systematic inclusion of the peak time uncertainty deployed in our analysis.Beyond modes detection, such a robust assessment of start time-induced systematics is even more crucial when conducting tests of general relativity.Assuming the presence of multiple modes, we allow for deviations in the QNM spectrum and find agreement with general relativistic predictions.
Our method could be further improved by repeating the analysis with inputs from different inspiral-mergerringdown waveforms, gauging modeling systematics, and by characterising the statistical significance of the results on simulated signals, see e.g.[104].Future enhancements in the analysis methodology include the simultaneous marginalisation of sky position and start time together with the other analysis parameters.It would also be interesting to combine the posteriors on the deviations to draw information on the population distribution.This is the subject of ongoing studies.
Our investigation will serve as a solid foundation for spectroscopic ringdown analyses with numerically informed models, both in the ongoing LIGO-Virgo-KAGRA observing run (O4) and with future more sensitive gravitational-waves detectors, nearing the possibility of robust multimodal tests of general relativity in the strong field regime.
(grant no.DNRF162) by the Danish National Research Foundation.V.G. acknowledges support form the French space agency CNES in the framework of LISA.G.C. acknowledges support by the Della Riccia Foundation under an Early Career Scientist Fellowshipm, funding from the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 847523 'INTERACTIONS'.This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101131233.This research made use of data, software and/or web tools obtained from the Gravitational Wave Open Science Center [193,194], a service of the LIGO Scientific Collaboration, the KAGRA Collaboration and the Virgo Collaboration.The authors are grateful for computational resources provided by the LIGO Laboratory (LHO) and supported by National Science Foundation Grants PHY-0757058 and PHY-0823459.This material is based upon work supported by NSF's LIGO Laboratory which is a major facility fully funded by the National Science Foundation.LIGO Laboratory and Advanced LIGO are funded by the United States National Science Foundation (NSF) as well as the Science and Technology Facilities Council (STFC) of the United Kingdom, the Max-Planck Society (MPS), and the State of Niedersachsen/Germany for support of the construction of Advanced LIGO and construction and operation of the GEO600 detector.Additional support for Advanced LIGO was provided by the Australian Research Council.Virgo is funded, through the European Gravitational Observatory (EGO), by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale di Fisica Nucleare (INFN) and the Dutch Nikhef, with contributions by institutions from Belgium, Germany, Greece, Hungary, Ireland, Japan, Monaco, Poland, Portugal, Spain.KAGRA is supported by Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan Society for the Promotion of Science (JSPS) in Japan; National Research Foundation (NRF) and Ministry of Science and ICT (MSIT) in Korea; Academia Sinica (AS) and National Science and Technology Council (NSTC) in Taiwan.
Software.The violin plots and the ridgeline plots have been produced using the open source python package namib [195], which makes use of the statsmodels [196] and joypy [197] packages.

APPENDIX 1. Geometric units in TEOBPM
In TEOBPM, the expansion coefficients h lm are rescaled through the following transformations.
• Define the mass-distance units conversion factor as which converts a mass expressed in solar masses M ⊙ into a distance in Megaparsesc (Mpc).The waveform amplitude is rescaled by where M ≡ m 1 + m 2 is the total mass of the binary, and ν ≡ m1m2 (m1+m2) 2 is the symmetric mass ratio (this convention is specific to TEOBPM and e.g.NR waveforms do not follow this convention).

• Define the mass-time units conversion factor as
which converts a mass expressed in solar masses M ⊙ into a time in seconds.The time variable in expressed in units of M f , i.e. the black hole final mass expressed in solar masses, where t mrg is the merger time, defined as the time at which the maximum of the waveform's amplitude of the fundamental mode (2, 2) occurs, that is t mrg ≡ t peak 22 , corresponding to the maximum of |h 22 (t)|.With the above transformations, the waveform is said to be expressed in geometric units.Adhering to the conventions used in the rest of the text, we set G = c = M ⊙ = 1, and the conversion factors become C md = 1/Mpc and C mt = 1.Note that at the end of the implementation, these transformations need to be reversed to obtain the properly scaled signal in physical units.

Quality assessment of TEOBPM
We ascertain the accuracy of the TEOBPM model against numerical relativity simulations from the Simulating eXtreme Spacetimes catalog (SXS) of numerical simulations of merging black holes [203,204].Specifically, we have selected the simulations with spin aligned or anti-aligned through the sxs package [202], and isolated the ringdown part cutting the waveform at the time corresponding to the peak of each mode.To compare the simulations with our implementation of TEOBPM in pyRing, we generate the TEOBPM waveform from the parameters of the NR simulation at reference time, for each mode separately.The initial phase of the TEOBPM waveform is chosen to maximise the match with the simulation.We use the mismatch (MM) to quantify the agreement between the two waveforms, where FF is the fitting factor, and the time domain scalar product is defined as The bar indicates complex conjugation and the T transposition.Note that, in the definition of FF, there is no need to maximise the scalar product over the parameter space since the two waveforms are the closest possible by construction.The mismatch takes values in the interval [0, 1].
For the modes {(2, 2), ( }, Fig. 14 shows the percent mismatch of the scaled expansion coefficients r(h lm )/M , as function of the mass ratio q and the final adimensional spin a f .The quantities h lm are defined in Eq. ( 4).Each dot in the figure corresponds to a simulation, for a total of 578 simulations.The various modes show different accuracy.
• The fundamental mode is in general very accurate, with a mean percent mismatch of MM% = 0.008 ± 0.009.
• The mode (3, 3) is also highly reliable, with a mean of MM% = 1.0 ± 2.0.We note, however, that the largest mismatches occur for equal-mass systems q ∼ 1, for which this mode is suppressed by symmetry.This suggests that the higher values of the mismatch in such simulations is guided by numerical uncertainties.
• The mode (2, 1) has mismatches comparable with those of the mode (3, 3), MM% = 2.0 ± 9.0, except for a subset of simulations in which the accuracy drops.These simulations lie in a region of the parameter space in which this mode is not suppressed.
• The remaining modes {(3, 2), (4, 4), (4, 3), (4, 2)} show in general larger values of the mismatch, typically around 10% and even higher for some modes (although the mode (4, 4) tends to perform better than the others).This is not of great concern, since these are subleading modes, and contribute very little to the overall GW signal.Given their mismatch values and total contribution to the waveform, we decide to not use these modes in this work, noting that improved fits for these modes will be needed in the future, when performing analyses at higher sensitivities.
Eventually, we want to stress an important fact, that the mismatches discussed above are computed on the different modes separately, and do not refer to the waveform given by the sum of the various modes.Consequently, we are missing a complementary information given by the relative excitation between the fundamental mode and the different HMs.Since the amplitude of the HMs is only a small fraction of that of the the fundamental mode for the systems considered, the mismatch on the global waveform is in general very close to that of the mode (2, 2) alone, although this aspect has not been studied systematically.

SNR in time domain
The output of interferometric measurements can be described as where and the same goes for h and n.We define the signal-tonoise ratio (SNR) as the ratio between the power of the model P h and that of the noise P n , where the power is the expectation value of squared time series, and we used the fact that h is deterministic.In the last passage we introduce the autocovariance matrix of the noise.We now outline a procedure to re-weight h and n in such a way that the SNR is maximised.We introduce the linear filter k, (for the moment an unknown time series), that can be applied to a vector a through the standard scalar product To find the filter k that maximises the SNR (21), we rewrite the latter in terms of the filtered functions ĥ and n SNR = P ĥ where we have used that also the filter k is deterministic.Define the scalar product between two vectors a and b, weighted with the inverse autocovariance matrix ⟨a|b⟩ ≡ a T C −1 nn b. (24) Then, Eq. ( 23) can be written in terms of the scalar product (24) as (25) Eq. ( 25) conveys that the filtered SNR is proportional to the square of the scalar product ⟨C nn k|h⟩, therefore it is maximum when this scalar product is maximum.Geometrically, this happens when the vector C nn k is parallel to h.Therefore, the filter k * that maximises the SNR is for some constant α. k * is known as the matched filter.
Substituting the matched filter (26) in Eq. ( 23), we obtain the optimal signal-to-noise ratio In analogy with the optimal signal-to-noise ratio SNR opt , we also define the matched filter signal-to-noise ratio as

Analytical prediction of HMs
In Eq. (30) we have introduced the Ockham factor W and the maximum likelihood value L max , defined as Setting the prior ratio equal to one and taking the logarithm, Eq. ( 29) becomes ln B lm,22 = ln L max,lm − ln L max,22 + ln The likelihood of a deterministic model can be written as where we have used the fact that the scalar product in Eq. ( 24) is symmetric.This result can be derived from the maximum entropy principle, as proven in Sec.4.5 and App.B.2 of [205].Note that the data d are fixed, whereas the waveform depends on the set of parameters θ ∈ Θ lm introduced in section (III A).Thus, the scalar product ⟨d|h⟩ varies over the parameter space Θ lm .Consider a rescaling of the waveform h by some constant α, then where we used the linearity of the scalar product.The maximum of the logarithmic likelihood with respect to α is given by where in the last line we used the definitions of the FF (9) and optimal SNR (27).Using equation (32) Fig. 6 shows the prediction on the detectability for the modes {(3, 2), (4, 4)} as a function of the mass ratio q and inclination ι.The threshold for detection is arbitrarily set as ln B lm,22 = 3; in colours, the optimal SNR.Compared to Fig. 1, which shows the same results for the dominant modes {(3, 3), (2, 1)}, note the increase in the SNR required to detect the modes {(3, 2), (4, 4)}.Indeed, at current sensitivities in the RD, SNR ∼ 15 − 25, only the mode (4, 4) could be detected for both large inclinations and mass ratios.Finally, we note how the detectability pattern of the mode (3, 2) highly differs from that of the other modes.This is expected due to the large contributions received by mode-mixing [53,86,157,158,206].

Input parameters of pyRing
In table IV, we report the values of the input parameters from IMR used to perform our RD analysis with pyRing.The starting time t 0 is estimated from the peak of preferred IMR analysis by the LVK collaboration [4].Specifically, for each set of parameters samples from the IMR analysis we generate the corresponding signal and compute the time at which the modulus of h 2 + + h 2 × occurs.The starting time is selected as the median value of the constructed distribution.The time is expressed in the time scale Global Positioning System (GPS) and is referred to the LIGO Hanford detector.If LIGO Hanford was offline, t 0 refers to LIGO Livingstone.Also the sky position values are estimated from the IMR analysis of the LVK Collaboration.Specifically, the values in table IV are the median values of the posterior distributions for the following models: • SEOBNRv4ROM for O1 and O2; • IMRPhenomPv2 for O3a, except for -IMRPhenomPv3HM for GW190412; -NRSur7dq4 for GW190521; -IMRPhenomPv3HM for GW190814; • IMRPhenomXPHM for O3b, except for IMRPhenomPv2 for GW191215_223052.
For our RD analysis, we used cleaned data publicly available and downloaded from the GWOSC online database.Specifically, they have been taken from the channels H1_HOFT_CLEAN_SUB60HZ_C01, L1_HOFT_CLEAN_SUB60HZ_C01, V1Online, as clarified at the end of the webpage https://www.gw-openscience.org/GWTC-3/.

Figure 1 :
Figure 1: Colormap of the optimal signal-to-noise ratio needed to measure a logarithmic Bayes factor of ln B lm,22 = 3, in favour of the mode (3, 3) (left) and (2, 1) (right) over the fundamental mode.The results are for two nonspinning initial black holes χ 1 = χ 2 = 0, and are expressed as function of the mass ratio q and inclination ι.The dashed regions correspond to SNR > 1000.

Figure 2 :
Figure2: Ridgeline plot with the posterior distributions of the following parameters: primary mass m 1 , secondary mass m 2 , primary adimensional spin χ 1 , secondary adimensional spin χ 2 , luminosity distance d L , final mass M f , final spin a f , frequency and damping time of the fundamental mode, f 22 and τ 22 , inclination ι.Each row correspond to one of the events analysed.The blue distributions have been obtained with TEOBPM, the red distributions are taken from IMR analyses by the LVK Collaboration, the green distributions are from RD analyses by the LVK Collaboration.In agreement with the Bayes Factors of the multimodal analysis, for the events GW170729, GW190521_074359, GW200129_065458, GW200224_222234 we use the PE from the analysis with the modes {(2, 2), (3, 3)}, while for the other events with only the fundamental mode.

Figure 6 :
Figure 6: Colormap of the optimal signal-to-noise ratio needed to measure a logarithmic Bayes factor of ln B lm,22 = 3, in favour of the mode (3, 2) (left) and (4, 4) (right) over the fundamental mode.The results are for two nonspinning initial black holes χ 1 = χ 2 = 0, and are expressed as function of the mass ratio q and inclination ι.The dashed regions correspond to SNR > 1000.

Figure 7 :
Figure 7: PE and model selection on the event GW170729, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green, pink and yellow assume GR deviations on δω 22 , δω 33 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution.

22 - 22 Figure 8 :
Figure 8: PE and model selection on the event GW190521, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green, pink and yellow assume GR deviations on δω 22 , δω 33 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution. m

Figure 9 :
Figure9: PE and model selection on the event GW191109_010717, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green, pink and yellow assume GR deviations on δω 22 , δω 33 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution. m

Figure 10 :
Figure10: PE and model selection on the event GW200129_065458, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green, pink and yellow assume GR deviations on δω 22 , δω 33 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution.

Figure 11 :
Figure 11: PE and model selection on the event GW200224_222234, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green, pink and yellow assume GR deviations on δω 22 , δω 33 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution.

Figure 12 :
Figure12: PE and model selection on the event GW150914, as a function of the starting time.For each subplot, the first two rows display the initial masses m 1,2 , and the bottom row the logarithmic Bayes factor between the competing hypotheses, with a color corresponding to the favoured hypothesis.: red and blue assume GR with the modes {(2, 2)} and {(2, 2), (3, 3)}; green and yellow assume GR deviations on δω 22 and δτ 22 .The shaded region in the bottom row outlines the 95% CI of the corresponding IMR peaktime distribution.

22 Figure 13 :Figure 14 :
Figure13: Posteriors distributions of the initial masses m 1,2 and deviations from GR, for the different events on the x-axis.The red posteriors are from the GR analysis with only the fundamental mode (2, 2), the green ones include fractional deviations on the frequency of the fundamental mode δω 22 , the yellow ones include fractional deviations on the damping time of the fundamental mode δτ 22 .For the two subplots, the bottom row shows the logarithmic Bayes factor between the GR and non-GR analysis, for the competing hypothesis in colors.
provides the value of ln B lm,22 as a function of SNR opt .This quantifies how much the hypothesis H lm is favoured over H 22 , depending on the power of the signal h lm over noise.In other words, Eq. (10) answers the following question:

Table II :
Median and 90% symmetric credible intervals of some relevant parameters: primary mass m 1 , secondary mass m 2 , dimensionless primary spin χ 1 , inclination ι, luminosity distance d L .
h(t, θ) is a deterministic model of the observables, embedded in some instrumental noise n(t).Note the explicit dependence of h(t, θ) on a set of parameters θ that parametrize the model.The noise n(t) is treated as a stochastic variable.More in detail, the measurements consist of discrete time series of the sampled signal d(t) at n different times, such that d = h + n, with d ≡ {d(t 1 ), . . ., d(t n )}, 22ven two competing hypotheses H lm and H 22 , we define the Bayes factor B lm,22If H i is parametrized by θ i ∈ Θ i , from Bayes theorem follows that the likelihood p(d|H i , I) is 22is nested in that of H lm , so that Θ 22 ⊆ Θ lm7.Assuming that the waveform h lm which maximises L lm is close to the data, we can expand the data asd ≡ h lm +∆h, =⇒ ⟨d|h⟩ = ⟨h lm |h⟩ + O{∆h}.(38) ⟨h lm |h lm ⟩ ⟨h 22 |h 22 ⟩ ⟨h lm |h lm ⟩

Table IV :
Input parameters used in our analysis from IMR for each event analysed.t 0 being the starting time of the analysis in GPS time, {α, δ} being the right ascension and declination.The last column indicates the interferometers (IFOs) used for that event.