1 Introduction

The detection of gravitational waves from the mergers of black holes (BHs) has revealed numerous populations of BH binaries in the Universe (Abbott et al. 2021a,b). Several formation channels have been proposed to explain the origin of such BH binaries (e.g., see Mapelli 2021; Sasaki et al. 2018). Astrophysically, binary BHs can be directly formed as the end product of the stellar evolution of a field binary. As another formation channel, individual BHs formed in a dense environment can later form binaries dynamically. An exhaustive summary of the astrophysical scenarios for the formation of BH binaries and their mergers as well as the expected merger rate in each scenario is found in Mandel and Broekgaarden (2021). It is also possible that BHs that might have been created immediately after the Big-Bang (so-called primordial BHs (PBHs)) form binaries in the radiation dominated epoch and become the source of the detected GWs. However, it is unclear if the current data favors the existence of PBHs (Franciolini et al. 2021).

Because of the considerable theoretical uncertainties in each channel, it is still unknown how much each formation scenario contributes to merger rate (Belczynski et al. 2021). Conversely, we may provide feedback to these theoretical models and update them from observational data whose information in terms of merger rate, redshift, mass distribution, spin, etc. has been increasing and will continue to increase in the future owing to the progress of GW detectors. Along this path, we can attempt to elucidate if a single channel dominates the observed merger events or a few different channels nearly equally contribute. To this end, we focus on the particular type of merger rate density written as

$$ {\mathcal{R}}(m_{1},m_{2},t)={\mathcal{R}}_{0}~ h(m_{1},m_{2}) f(t). $$
(1)

Here, \(m_{1}\) and \(m_{2}\) are the masses of individual BHs in the binary measured in the source frame, and \(t\) is the cosmic age when the merger occurred. \(h(m_{1},m_{2})\) is normalized such that \(\int h(m_{1},m_{2}) dm_{1} dm_{2}=1\), and \(f(t)\) is normalized such that \(f(t_{0})=1\), where \(t_{0}\) is the age of the Universe. Thus, \({\mathcal{R}}_{0}\) represents the merger rate at the present time. The dimension of ℛ is \(/{\text{Gpc}}^{3} /{\text{yr}}/M_{\odot}^{2}\), and the rate density is defined for the comoving volume and cosmic time. Thus, \({\mathcal{R}}(m_{1},m_{2},t) dV_{c} dt dm_{1} dm_{2}\) represents the number of merger events of BHs with masses \(m_{1}\) and \(m_{2}\) which happen in the comoving volume \(dV_{c}\) and during the time interval \((t,t+dt)\).

A crucial property of the merger rate density above is that it depends on the BH mass and the merger time (i.e., redshift) in a separate manner: it is simply given by the product of the mass-dependent function and the time-dependent function. In other words, the mass distribution of merger rate density does not evolve over time. Whether such evolution occurs depends on the formation channels. In the isolated field binary scenario, the massive binary stars evolve into BH binaries after the mass transfer and the common envelope phase whose physical processes have been investigated intensively (see e.g. Mapelli 2021 for a comprehensive review of this scenario). In this case, the merger rate density is given by the convolution of star formation rate and the merger time delay distribution (Vitale et al. 2019). Both of these may depend on the binary masses, and the resultant merger rate density exhibits the time evolution of the mass distribution (Dominik et al. 2013; Tanikawa et al. 2021). In particular, the BH masses of the binary strongly depend on some factors such as the initial masses of the main sequence stars, metallicity, whose typical values will change with redshift, and whether the pair-instability pulsation supernova occur (Belczynski et al. 2016). Dense environments such as globular clusters are the sites where BHs form binaries dynamically (Fragione and Kocsis 2018), which undergo mergers, and even successive mergers may form intermediate-mass BHs (Fragione et al. 2022). The merger rate of the BH binaries formed in the globular clusters is given by the convolution of the globular cluster formation rate and the merger time delay distribution (Rodriguez and Loeb 2018). The time dependence of the mass distribution is determined by whether the mergers are dominated by the binaries ejected from the globular cluster or the binaries that remain inside until they merge. In the former case, ejection efficiency depends on the BH masses, which yields the time-dependent mass distribution (Rodriguez et al. 2016). On the other hand, if the latter case is the dominant process, mergers follow shortly after a BH-BH encounter (Rodriguez et al. 2018) and the time evolution of the mass distribution will be suppressed (Samsing et al. 2020). As another merger channel, BH-BH encounters in galactic nuclei (Gondán et al. 2018; Rasskazov and Kocsis 2019; Gondán and Kocsis 2021) may be expected to have very little time evolution of their mass distribution (Chatterjee et al. 2017; Yang et al. 2020), similarly to single-single GW captures in globular clusters (Samsing et al. 2020). Young massive clusters and open clusters are also potential sites that contribute to the GW events (Banerjee 2018). However, the evolution of merger rate density is not fully understood yet. In addition to binary systems, BH mergers in triple or quadruple systems may have an important contribution to the GW events with some interesting observational consequences, such as a large eccentric orbit in the stage corresponding to the frequency range covered by ground interferometers, a large spin of the merged BHs, and formation of BHs in the low-mass gap range (\(\lesssim 5~M_{\odot}\)) and the high-mass gap range (\(\gtrsim 50~M_{\odot}\)) (Antonini and Perets 2012; Fragione and Kocsis 2019; Fragione et al. 2020). More studies are needed to clarify how the merger rate density in such multiple systems evolves with redshifts. Finally, in the PBH scenario, the mass distribution remains almost constant over time (Kocsis et al. 2018; Raidal et al. 2019).

A quick overview of several representative scenarios of BH mergers above shows that each scenario suggests different features of merger rate density. If all (or some) of these scenarios predominantly contribute to the merger rate density, the total merger rate density becomes a superposition of the merger rate densities in individual scenarios and will not take the separable form (1) in general. Thus, an observational confirmation of the time independence of the mass distribution will disfavor the possibility that multiple scenarios contribute to merger events, and thus the idea that a single channel is dominating the merger events is supported. On the other hand, confirmation of the opposite case does not necessarily imply the contribution of multiple channels since even a single channel may give a more complex merger rate density than Eq. (1). That is, if observations reveal the time evolution of the mass distribution, more robust theoretical predictions concerning the merger rate density in individual scenarios are required to draw a reliable conclusion as to whether the merger events come from a single channel or multiple channels.

We have argued that the observational determination of the time evolution of the mass distribution provides an important key to clarifying the origin of BH mergers. Since there are currently non-negligible theoretical uncertainties of the merger rate density in each channel, an agnostic statistical approach that is free from a priori assumptions on the shape of merger rate density would be a natural path to proceed, which is a motivation for the study described in this paper. In light of this, our aim is to formulate a statistical method to test whether the observed merger rate density obeys the form of Eq. (1).Footnote 1 As we will demonstrate, our method does not assume a priori the functional shapes of \(h (m_{1},m_{2})\) and \(f(t)\), both of which strongly depend on the formation channel as well as the underlying assumptions that have considerable uncertainties due to the dearth of robust theoretical predictions of the merger rate density in the proposed channels. Our approach is different from that in a previous study (Fishbach et al. 2021) that also focused on the time evolution of the mass distribution in that a Bayesian approach was taken in which some specific parametrizations for the merger rate density are assumed. Thus, compared with the previous study, our approach is advantageous and new in that regard. On the other hand, as we will discuss in Sect. 3.4, our approach cannot be straightforwardly applied when a selection bias, which is an important factor for realistic data, is included. In practice, only the merger events below some redshifts for which the selection bias does not significantly affect the sampling can be used, which degrades the effectiveness of our approach owing to the reduction in data size as well as the decrease in the maximum redshift that we can explore. Such an issue does not arise when using an Bayesian approach: the effect of the selection bias can be directly incorporated (Mandel et al. 2019). In this way, our approach has both an advantage and a disadvantage compared with the Bayesian approach and plays a complementary role in elucidating the origin of BH binaries.

2 Formulation of the method

For convenience, instead of the masses of individual BHs and the cosmic merger time, we will use the total mass \(M=m_{1}+m_{2}\), the mass ratio \(q=\frac{m_{2}}{m_{1}}\ (m_{2} \le m_{1})\), and the redshift \(z\) in the following analysis. In terms of the new variables, the expected number of merger events in the small mass area \(dM dq\) and the redshift bin \((z,z+dz)\) during the observation time \(T\) is given by

$$ dN={\mathcal{R}}(M,q,z) \frac{T}{1+z} \frac{4\pi r^{2}(z) dz}{H(z)} \frac{M}{{(1+q)}^{2}}dMdq. $$
(2)

Here, \(\frac{T}{1+z}\) is the time interval corresponding to \(T\) in the source frame, \(r(z)\) is the comoving distance to the redshift \(z\), \(dV_{c}=\frac{4\pi r^{2}(z)}{H(z)} dz\) is the comoving volume of the thin shell \((z,z+dz)\), and \(\frac{M}{{(1+q)}^{2}}\) is the Jacobian due to the transformation from \((m_{1},m_{2})\) to \((M,q)\). Notice that the separability of the mass dependence and the merger time dependence, which ℛ possesses (i.e., Eq. (1)), is retained by \(dN\), which plays a crucial role in the following analysis.

The number of events given above does not take into account the selection bias of the detector, which becomes important for the region in the \((M,q,z)\) space close to and beyond the detection horizon. This effect can be included by multiplying the detection probability \(p_{\mathrm{det}}(M,q,z)\), which is the probability that a given detector (or a network of detectors) detects a merger event with masses \((M,q)\) occurring at \(z\), on the right-hand side of Eq. (2). The concrete shape of \(p_{\mathrm{det}}\) depends on the detector (or a network of detectors) (Chen et al. 2021). Since \(p_{\mathrm{det}}\) does not take the separable form in general, the inclusion of the events corresponding to \(p_{\mathrm{det}}<1\) invalidates the separability ansatz for \(dN\). In the following analysis, we assume an ideal case \(p_{\mathrm{det}}=1\) or equivalently consider only events much within the detection horizon. In Sect. 3.4, we briefly discuss how much the selection bias affects the performance of our method.

2.1 Basic idea

Let us take two distinct closed regions in the two-dimensional mass plane spanned by \((M,q)\) (regions 1 and 2 in Fig. 1) and two intervals \((z_{a}, z_{b})\) indicated by \(L\) and \((z_{b},z_{c})\) indicated by \(H\) in the redshift axis. The shapes of regions 1 and 2 are arbitrary. For those regions, we can further define four regions as schematically described in Fig. 1. For instance, “\(1,L\)” stands for the region whose projection onto the mass plane coincides with region 1 and the projection onto the redshift axis coincides with \((z_{a},z_{b})\). Then, the expected number of merger events in this region is given by

$$ N_{1,L}=\int _{z_{a}}^{z_{b}} \int _{\mathrm{region~1}} dN. $$
(3)

The expected number in the other regions can be expressed in a similar manner. If the merger rate density takes the separable form (1), by substituting Eq. (2), we obtain

$$\begin{aligned} N_{1,L}=&\int _{z_{a}}^{z_{b}} \int _{\mathrm{region~1}} {\mathcal{R}}(M,q,z) \\ &\times \frac{T}{1+z} \frac{4\pi r^{2}(z) dz}{H(z)} \frac{M}{{(1+q)}^{2}}dMdq \\ =&T {\mathcal{R}}_{0} \bigg[ \int _{z_{a}}^{z_{b}} \frac{4\pi r^{2}(z)}{(1+z)} \frac{f(z)}{H(z)}dz \bigg] \\ &\times \bigg[ \int _{\mathrm{region~1}} \frac{M}{{(1+q)}^{2}} h (M,q) dMdq \bigg]. \end{aligned}$$
(4)
Fig. 1
figure 1

Definition of the division of the \((M,q,z)\) subspace into four regions. The horizontal axis represents the two-dimensional mass plane

It then follows that a ratio defined by

$$\begin{aligned} R_{A} &\equiv \frac{N_{A,H}}{N_{A,L}} \\ &= \int _{z_{b}}^{z_{c}} \frac{4\pi r^{2}(z)}{(1+z)} \frac{f(z)}{H(z)}dz \bigg/ \int _{z_{a}}^{z_{b}} \frac{4\pi r^{2}(z)}{(1+z)} \frac{f(z)}{H(z)}dz \end{aligned}$$
(5)

becomes independent of \(A\), where \(A\) stands for either 1 or 2. Taking the contraposition of this statement, we can state that if the ratio \(R_{A}\) depends on \(A\), the merger rate does not take the separable form similarly to Eq. (1). Therefore, the hypothesis that the time dependence of the merger rate density is independent of the BH masses can be tested by checking whether the ratio \(R_{A}\) is independent of \(A\), which is the basic idea underlying the following analysis.

2.2 Hypothesis testing

Having explained the basic idea, we formulate the statistical test of whether the merger rate density takes the separable form given by Eq. (1). We do this by hypothesis testing. For technical convenience, instead of \(R_{A}\), we will use a different quantity defined by \(p_{A} \equiv \frac{N_{A,H}}{N_{A,L}+N_{A,H}}=\frac{R_{A}}{1+R_{A}}\) in the following analysis. Given that there is one-to-one correspondence between \(p_{A}\) and \(R_{A}\), the use of \(p_{A}\) is not essentially better than the use of \(R_{A}\). From the discussion in the previous subsection, the merger rate density with the separable form leads to the relation \(p_{1}=p_{2}\). This is the mathematical expression that is suitable and ready for making the statistical test that we want to conduct. Since we aim to clarify if the time evolution of the merger rate density is independent of BH masses, we choose our null hypothesis \(H_{0}\) to be

$$ H_{0}:~p_{1}=p_{2} $$
(6)

and the alternative hypothesis \(H_{1}\) as

$$ H_{1}:~p_{1} \neq p_{2}. $$
(7)

In what follows, we will explain how to test the hypothesis \(H_{0}\).

We use the lower-case letter \(n\) to denote the number of sample merger events in each subregion introduced in the previous subsection (see also Fig. 1). For instance, the number of events in region \(A\) (\(A=1,2\)) in \((z_{a},z_{b})\) is \(n_{A,L}\) (the same for the others). Then, \(n_{A,H}\) obeys the binomial distribution \({\mathrm{Bin}} (n_{A}, p_{A})\), where \(n_{A} \equiv n_{A,L}+n_{A,H}\) is the sample size in region \(A\). For a large sample size, which is the assumption we are going to make, this distribution is well approximated by the normal distribution, i.e., \({\mathrm{Bin}} (n_{A}, p_{A}) \approx N(n_{A} p_{A}, n_{A} p_{A} (1-p_{A}))\). Thus, a statistical quantity \({\bar{p}_{A}} \equiv \frac{n_{A,H}}{n_{A}}\) obeys the normal distribution \(N(p_{A}, p_{A} (1-p_{A})/n_{A})\).

Now, assuming that the hypothesis \(H_{0}\) is true, a test statistic \(T_{\mathrm{stat}}\) defined by

$$ T_{\mathrm{stat}} \equiv \frac{{\bar{p}_{1}}-{\bar{p}_{2}}}{\sqrt{{\bar{p}}(1-{\bar{p}}) \left ( \frac{1}{n_{1}}+ \frac{1}{n_{2}}\right ) }}, $$
(8)

where \({\bar{p}} \equiv \frac{n_{1} {\bar{p}_{1}}+n_{2} {\bar{p}_{2}}}{n_{1}+n_{2}}\) is the pooled population proportion, obeys the normal distribution \(N(0,1)\). Thus, we can/(cannot) reject the hypothesis \(H_{0}\) at a significance level \(\alpha \) by computing whether the magnitude of \(T_{\mathrm{stat}}\) is larger/smaller than \(\sqrt{2} {\mathrm{Erfc}}^{-1} (\alpha )\) (two-tailed test), where \({\mathrm{Erfc}}(x) \equiv \frac{2}{\sqrt{\pi}} \int _{x}^{\infty }e^{-t^{2}}dt\) is the complementary error function. This is the main strategy of our statistical test.

To have a rough idea of how much the above-mentioned method works for testing the merger rate density given by Eq. (1), let us crudely estimate the required sample size to reject the hypothesis \(H_{0}\) at the \(5\%\) significance level for the case where the merger rate density does not take the separable form. To this end, we perform a simple parametrization for such a case using \(p_{1}-p_{2}=\Delta p~(\neq 0)\). The factor in the denominator \(\sqrt{{\bar{p}}(1-{\bar{p}})}\) takes the maximum at \({\bar{p}}=\frac{1}{2}\), and we choose the value \(\sqrt{{\bar{p}}(1-{\bar{p}})}=\frac{1}{2}\) that minimizes \(T_{\mathrm{stat}}\) when other parameters are fixed. By replacing \({\bar{p}_{1}}-{\bar{p}_{2}}\) appearing in the numerator of Eq. (8) by \(\Delta p\) as a representative value, we reject the hypothesis \(H_{0}\) when \(\| \Delta p \| > 0.98 \sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}}\). The right-hand side of this condition becomes minimum at \(n_{1}=n_{2}\) for a fixed \(n=n_{1}+n_{2}\). Thus, the minimum sample size \(n\) needed to reject \(H_{0}\) for the merger rate density parametrized by \(\Delta p\) is at least about \(3.84/{(\Delta p)}^{2}\).

3 Demonstration

In this section, we demonstrate the statistical approach introduced in the previous section by studying the distribution of \(T_{\mathrm{stat}}\) of the samples for some specific merger rate densities. By doing this, we can determine the possible effectiveness of the proposed method when it is applied to future observational data. In what follows, we study two representative cases for the merger rate density: the separable form and the nonseparable form. The study of the former case allows us to confirm the robustness of the method by checking that the distribution of \(T_{\mathrm{stat}}\) of the mock data with a large sample size approximates the normal distribution \(N(0,1)\). This can also be used to check the probability of making a type I error for the null hypothesis \(H_{0}\). Meanwhile, the analysis of the latter case illustrates how efficiently the null hypothesis \(H_{0}\) is rejected when the alternative hypothesis \(H_{1}\) is true. Thus, this case provides us a good estimate of making a type II error.

As it is evident from how the statistical method is formulated, the choice of the shapes of regions 1 and 2 is completely arbitrary. In the following analysis, we define these regions as

$$\begin{aligned} {\mathrm{region~1}}=\{ (M,q) \| M \le M_{\mathrm{div}}, ~0\le q \le 1 \}, \\ {\mathrm{region~2}}=\{ (M,q) \| M \ge M_{\mathrm{div}}, ~0\le q \le 1 \}. \end{aligned}$$
(9)

Here, \(M_{\mathrm{div}}\) is the critical total mass that divides regions 1 and 2.

3.1 Separable form

In this subsection, we study the merger rate density that takes the separable form (1). As the shape of the merger rate density, we consider two examples: the mergers of the PBH binaries and the mergers of the astrophysical BH binaries that follow the star formation rate.

3.1.1 PBH mergers

As for the PBH mergers, we assume that the mass dependent part \(h (m_{1},m_{2})\) is given by

$$ h (m_{1}, m_{2})=C \psi (m_{1}) \psi (m_{2}), $$
(10)

where \(\psi (m)\) is the PBH mass function, and \(C\) is a normalization constant such that \(\int h(m_{1},m_{2}) dm_{1} dm_{2}=1\). The time evolution part \(f(t)\) is given by

$$ f(t) ={\left ( \frac{t}{t_{0}} \right )}^{-\frac{34}{37}}, $$
(11)

where \(t_{0}\) is the age of the Universe. This time dependence is realized for PBH binaries that formed in the radiation-dominated epoch (Nakamura et al. 1997; Ioka et al. 1998; Sasaki et al. 2016), and this formation channel dominates the PBH merger rate if the PBH binaries are not disrupted by other gravitational sources throughout their subsequent evolution (Sasaki et al. 2018). The PBH mass function \(\psi (m)\) strongly depends on the models of the early universe. In this paper, we consider the log-normal shape

$$ \psi (m)= \exp \left ( -\frac{1}{2\sigma ^{2}}\ln ^{2} \left ( \frac{m}{m_{0}} \right ) \right ), $$
(12)

which is a widely used phenomenological functional form (Carr et al. 2017). Here, \(m_{0}\) and \(\sigma \) are free parameters, and we choose \(M_{0}=40~M_{\odot}, \sigma =0.2\) in our analysis.

The red dots in the left panel of Fig. 2 are the plots in the histogram of \(T_{\mathrm{stat}}\) of one thousand realizations, each of which has \(n_{1}+n_{2}=1000\) sample size. The blue dots are the histogram of \(T_{\mathrm{stat}}\) obeying the normal distribution \(N(0,1)\) that, as discussed in the previous section, should be realized if the merger rate density takes the separable form. As evident from the figure, the distribution of \(T_{\mathrm{stat}}\) of the sample data is consistent with the normal distribution, which explicitly demonstrates the validity of the statistical method presented in the previous section. This is justified quantitatively by computing the \(p\)-value based on the Anderson-Darling test, which yields \(p=0.82\). Our choice of parameters defining the four regions shown in Fig. 1 is \((M_{\mathrm{div}}, z_{b}, z_{c})=(80~M_{\odot}, 0.5, 1.0)\).

Fig. 2
figure 2

Histograms of \(T_{\mathrm{stat}}\) of one thousand realizations for the merger rate density with the separable form. Each realization has \(n_{1}+n_{2}=1000\) sample size. The left panel is for the PBH merger rate density, and the right panel is for the merger rate density of the astrophysical BHs. The explicit shape of the merger rate density and the underlying assumptions for each model are given in the main text

3.1.2 Mergers of astrophysical BHs

As for the shape of the merger rate density of the astrophysical BHs, we adopt the simple phenomenological model studied by Abbott et al. (2019).Footnote 2 This model is disfavored by the updated analysis presented by Abbott et al. (2021c). Nevertheless, we adopt this model in our analysis because our purpose is to demonstrate the effectiveness of our statistical method, and the use of the simple model would be sufficient for this purpose. Notice also that there remains a possibility that only a fraction of the merger events obey this model. The mass-dependent part \(h(m_{1},m_{2})\) in this model is given by

$$\begin{aligned} h(m_{1},m_{2})&=C m_{1}^{-\alpha} {\left ( \frac{m_{2}}{m_{1}} \right )}^{ \beta _{q}} \Theta (m_{2}-m_{\mathrm{min}}) \\ &\quad{}\times \Theta (m_{\mathrm{max}}-m_{1}) \Theta (m_{1}-m_{2}), \end{aligned}$$
(13)

where \(\Theta (x)\) is the Heaviside function and \(C\) is the normalization constant. This shape contains four free parameters: \(\alpha \), \(\beta _{q}\), \(m_{\mathrm{min}}\), and \(m_{\mathrm{max}}\). In the analysis by Abbott et al. (2019), this model has been compared with the data obtained during the first and second observation runs of LIGO and Virgo, and the posteriors of the four free parameters are derived. In our analysis, we choose them to be \(\alpha =1.3\), \(\beta _{q}=7\), and \((m_{\mathrm{min}},m_{\mathrm{max}})=(8~M_{\odot},40~M_{\odot})\), which are consistent with the posteriors mentioned above. The time evolution part \(f(t)\) is assumed to exactly follow the star formation rate (Madau and Dickinson 2014), namely,

$$ f(z)=\frac{1}{0.997} \frac{{(1+z)}^{2.7}}{1+{\left ( \frac{1+z}{2.9} \right )}^{5.6}}. $$
(14)

Here, we abuse the notation of \(f(t)\) by changing the argument from the cosmic time \(t\) to the redshift \(z\) since the star formation rate is commonly given in terms of \(z\).

The red dots in the right panel of Fig. 2 show the histogram of \(T_{\mathrm{stat}}\) of one thousand realizations for the model defined by Eqs. (13) and (14). The sample size of each realization is the same as that in the case of the PBH mergers (i.e., the left panel); \(n=n_{1}+n_{2}=1000\). Our choice of parameters defining the four regions shown in Fig. 1 is \((M_{\mathrm{div}}, z_{b}, z_{c})=(60~M_{\odot}, 0.5, 1.0)\). The blue dots are the histogram of \(T_{\mathrm{stat}}\) obeying the normal distribution \(N(0,1)\). As it is the same as that in the left panel, the distribution of \(T_{\mathrm{stat}}\) of the sample data is consistent with the normal distribution (\(p\)-value based on the Anderson-Darling test is 0.06). Thus, from the two examples, we are able to confirm that the hypothesis testing can reject the null hypothesis \(H_{0}\) at a given significance level \(\alpha \) if the value of \(T_{\mathrm{stat}}\) constructed from data is larger than \(2{\mathrm{Erfc}}^{-1} (\alpha )\).

3.2 Non-separable form

Having checked that the probability of rejecting the null hypothesis \(H_{0}\) even when \(H_{0}\) is true is controlled by the significance level, we next investigate how likely it is not to reject the null hypothesis even when it is false.

3.2.1 Case 1: toy model

To this end, we first consider an extreme toy model in which the merger rate density is given by

$$\begin{aligned} &{\mathcal{R}}(m_{1},m_{2},z) \\ &\quad={\mathcal{R}}_{0} h(m_{1},m_{2}) \left ( \Theta (M_{c}-M)+{(1+z)}^{5} \Theta (M-M_{c}) \right ), \end{aligned}$$
(15)

where \(h(m_{1},m_{2})\) is defined by Eq. (13) and \(M_{c}\) is a free parameter that we choose as \(M_{c}=40~M_{\odot}\). Whereas the merger rate of the BH binaries with \(m_{1}+m_{2} < M_{c}\) does not evolve with the redshift, that with \(m_{1}+m_{2} >M_{c}\) has a strong dependence on the redshift as \(\propto {(1+z)}^{5}\). Thus, the merger rate density (15) takes the non-separable form that does not belong to the class defined by Eq. (1) and provides one example of the alternative hypothesis \(H_{1}\).

The red dots in the left panel of Fig. 3 show the histogram of \(T_{\mathrm{stat}}\) of one thousand realizations of the merger rate density given by Eq. (15). It is clear that the distribution of \(T_{\mathrm{stat}}\) is markedly shifted to the negative side and peaks at around \(T_{\mathrm{stat}} =-4\). Our choice of parameters defining the four regions shown in Fig. 1 is \((M_{\mathrm{div}}, z_{b}, z_{c})=(40~M_{\odot}, 0.7, 1.0)\). For this choice, \(p_{1}\) and \(p_{2}\) are found to be \((p_{1},p_{2})=(0.51,0.74)\) and

$$\begin{aligned} &\int _{\mathrm{Region~1}} {\mathcal{R}}(m_{1},m_{2}, t) dm_{1} dm_{2} dt \\ &\quad= c_{1} \int _{\mathrm{Region~1+Region~2}} {\mathcal{R}}(m_{1},m_{2}, t) dm_{1} dm_{2} dt \end{aligned}$$
(16)

with \(c_{1} \approx 0.055\). Using these values as typical ones for the quantities appearing in the definition of \(T_{\mathrm{stat}}\) (8), we can estimate the typical value of \(T_{\mathrm{stat}}\) as

$$ T_{\mathrm{stat}} = -3.8~ \sqrt{ \frac{n}{1000}}, $$
(17)

which is consistent with the peak value of the mock data in Fig. 3.

Fig. 3
figure 3

Histograms of \(T_{\mathrm{stat}}\) of one thousand realizations for the merger rate density with the specific non-separable given by Eq. (15). Each realization has \(n_{1}+n_{2}=1000\) sample size

For the current example, we find that the number of realizations yielding \(T_{\mathrm{stat}}>-2\) out of our particular one thousand realizations is 45. Thus, if the real merger rate density is given by Eq. (15), we can reject the null hypothesis \(H_{0}\) at about the \(5\%\) significance level for the adopted parameter values when the sample size is larger than 1000.

The width of the distribution of the red dots in Fig. 3 is \({\mathcal{O}}(1)\). Actually, this does not depend on \(n\) because the typical variation of \(T_{\mathrm{stat}}\) due to the randomness of the sampling gives a scaling \(\frac{\delta T_{\mathrm{stat}}}{T_{\mathrm{stat}}} \propto n^{-1/2}\), and combining it with the scaling \(T_{\mathrm{stat}} \propto n^{1/2}\) shown in the above equation yields \(\delta T_{\mathrm{stat}} \propto n^{0}\), whereas the numerical value of the proportionality coefficient, which is \({\mathcal{O}}(1)\), may depend on the underlying merger rate density as well as the parameters \((M_{\mathrm{div}}, z_{b}, z_{c})\). To see the latter point in more detail, Fig. 4 shows the contour of the coefficient of \(\sqrt{n/1000}\) of Eq. (17) in the \((M_{\mathrm{div}}, z_{b})\) plane (\(z_{c}=1\)) in the left panel and in the \((M_{\mathrm{div}}, z_{c})\) plane (\(z_{b}=z_{c}/2\)) in the right panel. From the left panel, we clearly see that \(-T_{\mathrm{stat}}\) peaks at around \(M_{\mathrm{div}}=40~M_{\odot}\). This is natural because it corresponds to the boundary \(M_{c}\) in the merger rate density (15), which gives the different redshift evolutions: the difference in the distribution of the merger events between regions 1 and 2 becomes the most prominent when the different redshift evolutions (i.e., the first and second terms in Eq. (15)) are separately covered by different regions. This consideration is corroborated by the right panel where \(-T_{\mathrm{stat}}\) decreases as \(z_{c}\) decreases even when \(M_{\mathrm{div}}\) is fixed to \(40~M_{\odot}\). By restricting the region of the merger events to low redshifts only, we find that the first and second terms are nearly identical and, consequently, the distribution of the merger events in region 1 becomes indistinguishable from that in region 2. From this investigation, we find that the effectiveness of the current method is controlled by three factors: the total number of merger events \(n\), the maximum redshift covered by observations \(z_{c}\), and the characteristic BH mass providing a non-separable measure in the merger rate density. Although the first two are solely determined by observations, the last one depends on the concrete shape of the underlying merger rate density, which we do not know a priori in real observations. When applying our method to real data, we will need to compute \(T_{\mathrm{stat}}\) and test the null hypothesis for various values of \(M_{\mathrm{div}}\).

Fig. 4
figure 4

Contour plot of \(T_{\mathrm{stat}}\) for \(n=1000\). In the left panel, \(M_{\mathrm{div}}\) and \(z_{b}\) are varied while \(z_{c}=1\) is fixed. In the right panel, \(M_{\mathrm{div}}\) and \(z_{c}\) are varied while \(z_{b}\) is fixed to \(z_{c}/2\)

3.2.2 Case 2: mixture of astrophysical BHs and PBHs

The above example is unrealistic in the sense that it is not based on astrophysics and is introduced only for the purpose of demonstrating the principle of our statistical method. In the second example, we consider a less extreme case in which the merger rate is a mixture of the mergers of the astrophysical BHs and those of PBHs, each of which has been separately investigated in the previous subsection. Namely, we assume the merger rate density given by

$$\begin{aligned} &{\mathcal{R}}(m_{1},m_{2},t) \\ &\quad=(1-r) {\mathcal{R}}_{\mathrm{astro}}(m_{1},m_{2},t)+r { \mathcal{R}}_{\mathrm{PBH}}(m_{1},m_{2},t), \end{aligned}$$
(18)

where \({\mathcal{R}}_{\mathrm{astro}}\) and \({\mathcal{R}}_{\mathrm{PBH}}\) are the merger rate densities of the astrophysical BHs and PBHs introduced by Eqs. (10)–(14), respectively. Here, we choose the normalization of the individual contributions such that they give the same merger rate at the present time \(t_{0}\);

$$\begin{aligned} &\int {\mathcal{R}}_{\mathrm{astro}} (m_{1},m_{2},t_{0}) dm_{1} dm_{2} \\ &\quad=\int { \mathcal{R}}_{\mathrm{PBH}} (m_{1},m_{2},t_{0}) dm_{1} dm_{2}. \end{aligned}$$
(19)

Thus, \(r\) denotes the fraction of the PBH contribution to the total merger rate at the present time. Since \({\mathcal{R}}_{\mathrm{astro}}\) and \({\mathcal{R}}_{\mathrm{PBH}}\) have different \(z\) dependences, the above merger rate density is non-separable for \(0< r<1\).

Our choice of parameters \(M_{\mathrm{div}}\) and \(z_{c}\) is \(60~M_{\odot}\) and 1.0. The left panel of Fig. 5 shows \(p_{1}-p_{2}\) as a function of \(z_{b}\). From this, we find that \(p_{1}-p_{2}\) becomes maximal at \(z_{b} \approx 0.7\). In the following analysis, we take \(z_{b} =0.7\). The right panel of Fig. 5 shows the histogram of \(T_{\mathrm{stat}}\) of the mock data of the 1000 sample size for the merger rate density given by Eq. (18). The result shows that the peak of the histogram is located at about \(T_{\mathrm{stat}}=1.5\). For the current merger rate density with the same values of the parameters as those adopted for generating the mock data, we find \((p_{1}, p_{2}) \approx (0.636, 0.595)\) and

$$\begin{aligned} &\int _{\mathrm{Region~1}} {\mathcal{R}}(m_{1},m_{2}, t) dm_{1} dm_{2} dt \\ &\quad= c_{1} \int _{\mathrm{Region~1+Region~2}} {\mathcal{R}}(m_{1},m_{2}, t) dm_{1} dm_{2} dt \end{aligned}$$
(20)

with \(c_{1} \approx 0.54\). Using these average values as typical ones for the quantities appearing in the definition of \(T_{\mathrm{stat}}\) (8), we can estimate the typical value of \(T_{\mathrm{stat}}\) as

$$ T_{\mathrm{stat}} = 1.3~ \sqrt{ \frac{n}{1000}} $$
(21)

in terms of the sample size \(n=n_{1}+n_{2}\). As expected, this value is consistent with the peak value of \(T_{\mathrm{stat}}\) at which the histogram of \(T_{\mathrm{stat}}\) of the mock data becomes maximal, and the width of the distribution is \({\mathcal{O}}(1)\).

Fig. 5
figure 5

Left panel showing \(p_{1}-p_{2}\) as a function of \(z_{b}\) with other parameters (\(M_{\mathrm{div}}, z_{c}\)) being fixed for the merger rate density given by Eq. (18). Right panel showing the histogram of \(T_{\mathrm{stat}}\) of one thousand realizations for the same merger rate density as that in the left panel. Each realization has \(n_{1}+n_{2}=1000\) sample size

To summarize, these examples demonstrate that if a given non-separable merger rate density is realized in nature, the typical value of \(T_{\mathrm{stat}}\) given by

$$ \frac{p_{1}-p_{2}}{\sqrt{p(1-p) \left ( \frac{1}{c_{1}}+ \frac{1}{c_{2}}\right ) }} \sqrt{n}, $$
(22)

where \(p = c_{1} p_{1}+c_{2} p_{2}\) and \(c_{1}\) is defined by Eq. (16) and \(c_{2} \equiv 1-c_{1}\), provides a good indicator of whether the null hypothesis \(H_{0}\) can be rejected for the data containing \(n\) merger events.

3.3 Effect of measurement error of source parameters on the distribution of \(T_{\mathrm{stat}}\)

Thus far, in our analysis, we assumed no observational errors on the parameters of the source binaries \((M, q, z)\). In reality, those parameters are always accompanied by errors. Since such errors will let us mistakenly place the position of the individual merger event in the \((M,q,z)\) space, the determination of the number of merger events in each region described in Fig. 1 is affected accordingly. As a result, it is expected that the effectiveness of the hypothesis testing will be degraded to some extent. In this subsection, we evaluate the significance of the effect of the errors on the hypothesis testing.

To simplify our analysis, we assign \(10\%\) error randomly to the three parameters \((M,q,z)\) of any merger events. This is not true in reality since the magnitude of the error in general depends on the binary masses and the distance to the binary. However, the following analysis based on this simplification enables us to capture how the observational error affects our method at least qualitatively.

As an explicit example, we first consider the merger rate density of the PBH binaries investigated in Sect. 3.1.1 with the same values of the parameters. For the binary parameters \((M,q,z)\) of each randomly generated merger event, we multiply a random number corresponding to the \(10\%\) error, namely, we change the parameters \((M_{i}, q_{i}, z_{i})\) of the \(i\)-th merger event into \((M_{i} (1+a_{i}), q_{i} (1+b_{i}), z_{i} (1+c_{i}))\), where \((a_{i}, b_{i}, c_{i})\) are uncorrelated random numbers in the range \([-0.1,0.1]\).

Figure 6 shows the histogram of \(T_{\mathrm{stat}}\) of one thousand realizations, each of which contains \(n_{1}+n_{2}=1000\) sample size. As we can see, the histogram of \(T_{\mathrm{stat}}\) of the mock data is hardly distinguishable from that of the normal distribution. Namely, the observational errors with the current magnitude minimally affect the effectiveness of the hypothesis testing. This result may be understood as follows. The observational error, by which some events near the boundaries dividing the four subspaces in Fig. 1 are counted randomly in different subspaces, erases the contrast among the number of events in each subspace. As a result, \(p_{1}\) and \(p_{2}\) tend to take similar values, which suppress \(T_{\mathrm{stat}}\). That is, the observational error should effectively make the apparent mass distribution of the merger events look more independent of the redshift. To corroborate this explanation, we also constructed the histogram of \(T_{\mathrm{stat}}\) where the error has now been increased to \(50\%\), which is shown as the left panel of Fig. 7. As we can verify, the histogram is still consistent with the normal distribution \(N(0,1)\). As another example, the right panel of Fig. 7 shows the histogram of \(T_{\mathrm{stat}}\) of the merger rate density considered in Sect. 3.2.1 with \(10\%\) errors added. Clearly, the histogram shifts toward the normal distribution \(N(0,1)\) compared with those in Fig. 3 for which the observational error is not included. Actually, the probability that \(T_{\mathrm{stat}}\) is within \(95\%\) region of the normal distribution is fairly larger than \(5\%\).

Fig. 6
figure 6

Histogram of \(T_{\mathrm{stat}}\) of one thousand realizations for the merger rate density of the PBH binaries with the observational errors (\(10\%\)) of the binary parameters being included. Each realization has \(n_{1}+n_{2}=1000\) sample size

Fig. 7
figure 7

Left panel: histogram of \(T_{\mathrm{stat}}\) of one thousand realizations for the merger rate density of the PBH binaries with the observational errors (\(50\%\)) of the binary parameters being included. Right panel: histogram of \(T_{\mathrm{stat}}\) of one thousand realizations for the merger rate density considered in Sect. 3.2.1 with the observational errors (\(10\%\)) of the binary parameters being included

These investigations show that the inclusion of the observational error tends to favor the null hypothesis compared with the case where no errors are included. Thus, if the null hypothesis is rejected even after including the observational errors, it is a strong indication that the mass distribution of the merger events evolves with redshift.

3.4 Application to O3 data

LIGO Scientific, Virgo, and KAGRA Collaboration released the GW events taken during the third observation run (O3) as GWTC-2.1 and GWTC-3 (Abbott et al. 2021a,b). Excluding small-mass compact objects (\(< 3~M_{\odot}\)) that are either BHs or neutron stars, there are 74 events that we can reasonably identify as BH-BH merger events. Figure 8 shows a scatter plot of those events in the \((M,z)\) plane. At first glance, this number may appear sufficiently large to enable us to draw a statistically meaningful conclusion on the basis of our hypothesis testing. In this subsection, we will show that the detection bias is crucial, and this reduces the number of events that can be used owing to the decrease of the maximum redshift of usable events.

Fig. 8
figure 8

Scatter plot of GW events obtained during the O3 run. Data taken from Abbott et al. (2021a,b)

In all of our analyses presented up to this stage, we have assumed that the detection probability of the merger events in the region of the parameter space defined in Fig. 1 is unity, namely, the detection probability \(p_{\mathrm{det}}\) has been taken to be 1. This assumption is valid as long as the detection horizon of the GW detector is so large that there are sufficiently large numbers of merger events that are well inside the detection horizon. This ideal situation may be achieved by using future detectors, but may not be achieved in observations by current detectors such as LIGO O3. Since, in the region where \(p_{\mathrm{det}}\) is smaller than 1, \(p_{\mathrm{det}}\) depends nontrivially on \((M,q,z)\) and, in particular, takes a non-separable form in general, the inclusion of merger events in the region where \(p_{\mathrm{det}}\) is less than 1 will degrade the effectiveness of the hypothesis testing. Thus, in applying our hypothesis testing to the events obtained during the O3 run, we first need to restrict the range of the \((M,q)\) space in Fig. 1 to the one where the effect of the selection bias is not significant. In practice, this restriction is equivalent to the requirement on \(z_{c}\) such that the distribution of \(T_{\mathrm{stat}}\) when the underlying distribution takes the separable form (1) retains nearly the normal distribution \(N(0,1)\).

To investigate such \(z_{c}\) for the LIGO-Virgo network O3 run, we show the histograms of \(T_{\mathrm{stat}}\) of the mock data obeying the merger rate density of astrophysical BHs used in Sect. 3.1.2 for two cases \(z_{c}=0.3, 0.5\) in Fig. 9. \(M_{\mathrm{div}}\) dividing regions 1 and 2 has been chosen such that the number of merger events in region 1 becomes equal to that in region 2. The selection bias has been computed by running a public Python code whose information is given by Chen et al. (2021). To make our analysis consistent with the O3 catalog, we choose \(n\), which is the number of merger events, to be \(n=30\) for \(z_{c}=0.3\) and \(n=48\) for \(z_{c} =0.5\). The discontinuous feature of the histograms, which is more prominent in the left panel, is due to the discreteness of \(T_{\mathrm{stat}}\) caused by the smallness of the sample size \(n\). In the absence of the selection bias, both histograms must obey the normal distribution \(N(0,1)\). We find that the histogram in the right panel (\(z_{c}=0.5\)) clearly deviates from the normal distribution. This suggests that if we apply our method to the O3 catalog by restricting the GW events only to those whose redshift is less than 0.5, it can happen with a non-negligible probability that \(T_{\mathrm{stat}}\) computed from the data lies outside the \(2\sigma \) region even if the mass distribution of the underlying merger rate density does not evolve with the redshifts. Meanwhile, the histogram in the left panel (\(z_{c}=0.3\)) is consistent with the normal distribution. We expect that the effect of the selection bias is not significant in applying our method to the O3 catalog if only the GW events whose redshift is less than 0.3 are used.

Fig. 9
figure 9

Left panel: histogram of \(T_{\mathrm{stat}}\) where \(z_{c} =0.3\) and the selection bias is included. The number of merger events is taken to be 30 to be consistent with the O3 catalog. Right panel: histogram of \(T_{\mathrm{stat}}\) where \(z_{c} =0.5\) and the selection bias is included. The number of merger events is taken to be 48 to be consistent with the O3 catalog

Figure 10 shows \(T_{\mathrm{stat}}\) of the O3 catalog for various values of \(z_{c}\) in the range \((0.2, 1,0)\). For all \(z_{c}\), we find that \(T_{\mathrm{stat}}\) is negative and the figure shows the absolute value of \(T_{\mathrm{stat}}\). From the figure, we observed that although \(T_{\mathrm{stat}}\) remains within the \(2\sigma \) region, it becomes outside of the \(2\sigma \) region for \(z_{c} \gtrsim 0.4\) and reaches \(T_{\mathrm{stat}} \simeq -5\) at large \(z_{c}\). As we have already discussed above, we attribute this behavior to the selection bias. Thus, the result that \(T_{\mathrm{stat}} \simeq -5\) at large \(z_{c}\) does not mean that the O3 data supports that the mass distribution of the merger rate evolves with the redshifts. For \(z_{c} \lesssim 0.3\) where the effect of the selection bias is expected to be unimportant, \(T_{\mathrm{stat}}\) is consistent with the normal distribution. On the basis of this observation, we conclude that the current GW observations are consistent with that the mass distribution does not evolve with the redshifts.

Fig. 10
figure 10

Scatter plot of GW events obtained during the O3 run. Data taken from Abbott et al. (2021a,b)

To summarize, the investigation in this subsection shows that the selection bias degrades the effectiveness of our method for the O3 catalog by reducing both the number of the merger events and the maximum redshifts (\(z_{c}\)). Within the range where the method can be applied, there is no indication of the time evolution of the mass distribution of the merger rate density.

4 Conclusions

There are several known formation channels of the binary BHs that can merge within the age of the Universe. However, owing to our lack of theoretical understanding, we are still far from predicting robustly how much each channel contributes to the total merger rate density. Generally, the fraction of each contribution depends on the BH masses as well as the merger redshift. It is known that some formation channels predict that the time dependence of the merger rate density is (exactly or nearly) independent of the BH masses. Naturally, this motivates us to investigate the statistical testing on the time independence of the mass distribution by which we may be able to obtain some clues to clarify the origin of the binary BHs. In this paper, we formulated the methodology to perform the above-mentioned test and demonstrated the effectiveness of the proposed method by using mock data.

After providing the definition of what we exactly mean by the mass independence of the time evolution of the merger rate density, we reformulated it into another equivalent but more convenient form for the statistical analysis. As a simple statistical test, we adopted the so-called hypothesis testing. Our null hypothesis is that the time evolution of the merger rate density does not depend on the BH masses. To test the null hypothesis, we introduced the test statistic that obeys the normal distribution \(N(0,1)\) for the large sample size if the null hypothesis is true. In Sect. 3, by generating the mock data in two specific examples, both of which satisfy the null hypothesis, we confirmed explicitly that the test statistic follows the normal distribution. We also considered two other examples in which the time evolution of the merger rate density varies for different BH masses and showed that the central value of the test statistic deviates from zero. An analytical estimation suggests that the shift of the test statistic is proportional to the square of the sample size, and the shift of the test statistic computed from the mock data is fairly consistent with the analytical estimation. For a given merger rate density that does not fulfill the null hypothesis, this result supports the reasonable estimate of the minimal sample size necessary to reject the null hypothesis. These results demonstrate the effectiveness of our hypothesis testing to determine from (future) observational data whether the merger rate density evolves over time independently of the BH masses.

The LIGO-Virgo-KAGRA Collaboration released more than 70 merger events detected during the O3 observation run. To not undermine the hypothesis testing due to the selection bias caused by the low value of the detection probability, we investigated how \(T_{\mathrm{stat}}\) varies as we change the maximum redshift of the merger events that we include for the computation of \(T_{\mathrm{stat}}\). We found that the selection bias degrades the effectiveness of our method for the O3 catalog by reducing both the number of the merger events and the maximum redshifts (\(z_{c}\)). Within the range where the method can be applied, the current GW observations are consistent with that the mass distribution does not evolve with the redshifts. This limitation due to the selection bias is expected to be eased in future observations that will deliver much more information about the merger events in terms of both the number and redshifts.

It should be stressed that our statistical test does not require a priori specification of the mass distribution, which is largely uncertain, as well as the shape of the time evolution. Thus, the result of the statistical test is valid independent of the mass distribution and the time evolution. This is in sharp contrast to previous statistical studies that derived/constrained the properties of the BH mergers under specific assumptions on the mass distribution.