1 Introduction

Goodness-of-fit (GoF) testing is one of the standard tasks in statistics. The testing procedure can be stated in the one-sample or two-sample setting. In case of the one-sample problem, we observe a sample of m independent realizations \(\{x_1, \ldots , x_m \}\) of a d-dimensional random vector X with an unknown distribution function G, i.e. \(x_i \sim G\). The task is to test whether G is equal to a specific distribution F, i.e. we would like to test

$$\begin{aligned} H_0: G = F\quad \text {vs.}\quad H_1: G \ne F. \end{aligned}$$
(1)

In the setting of the two-sample problem we are given two independent samples consisting of m and n (\(m \ne n\) in general) independent realizations of d-dimensional random vectors X and Y with an unknown distribution function F and G, respectively. This means \(X=\{x_1, \ldots , x_m\}, x_i \sim F\) and \(Y=\{y_1,\ldots ,y_n\}, y_j \sim G\), while the hypothesis is the same as in (1).

In this paper, we consider a more general notion of equivalence, replacing the equal sign above by the relation of being Euler equivalent (cf. Definition 2.1).

We are interested in the setting in which the underlying distribution is continuous. In this case, prominent GoF tests for samples from \(\mathbb {R}\) rely on the empirical distribution function, see (D’Agostino and Stephens 1986, chapter 4). These include, in the one dimensional case, the Kolmogorov–Smirnov, Cramér–von-Mises and Anderson–Darling tests. In higher dimensions, Kolmogorov–Smirnov leads to Fasano and Franceschini (1987) and Peacock (1983) tests; a general case was considered by Justel et al. (1997). A multivariate version of Cramér–von-Mises was proposed by Chiu and Liu (2009). Since those tests are based on empirical distribution function, their generalization to \(\mathbb {R}^d\) for \(d \ge 2\) is conceptually and computationally difficult. Moreover, we are not aware of an efficient implementation of a general goodness of fit tests for high dimensional samples.

To tackle this challenge we propose to replace the cumulative distribution function with Euler characteristic curves (ECCs) (Gonzalez and Wintz 1977; Richardson and Werman 2014; Worsley 1996), a tool from computational topology that provides a signature of the considered sample. To a given sample X, this notion associates a function \(\chi (X):[0,\infty ) \rightarrow \mathbb {Z}\), which can serve as a stand-in for the empirical distribution function in arbitrary dimensions. Subsequently, for one-sample tests, inspired by the Kolmogorov–Smirnov test, we define the test statistic to be the supremum distance between the ECC of the sample and the expected ECC for the distribution. This topologically driven testing scheme will be referred to as “TopoTest” for short.

The key characteristic of any goodness of fit test is its power, i.e. the type II error should be small, under the requirement that the type I error is fixed at level \(\alpha \). We show that the proposed test satisfies this condition and that it performs very well in practical cases. In particular, even restricted to one dimensional samples, its power is comparable to those of the standard GoF tests.

The paper is organized as follows: Sect. 1.1 reviews the necessary background from topology as well as the current work in the topic. In Sect. 2 we present the theoretical justification of our method. In Sect. 3 the algorithms implementing proposed GoF tests are detailed. Sections 4 and 5 present the numerical experiments and comparison of the presented technique to existing methods. In particular, comparing to a higher dimensional version of the Kolmogorov–Smirnov test, we find that our procedure provides better power and takes less time to compute. Finally, in Sect. 7 the conclusions are drawn.

1.1 Background

Since the seminal work of Edelsbrunner (2002) and Zomorodian and Carlsson (2005), topological Data Analysis (TDA) is a fast growing interdisciplinary area combining tools and results of such diverse fields of science as algebra, topology, statistics and machine learning, just to name few. For a survey from a statistician’s perspective, see Wasserman (2018). One of the areas in which TDA can contribute to statistics is related to applications of topological summaries of the data to hypothesis testing. Despite ongoing research and growing interest in TDA methods, attempts to construct statistical tests within the classical Neyman–Person hypothesis testing paradigm based on persistent homology, the most popular topological summaries of data, are limited because the distributions of test statistics under the null hypothesis are unknown. Therefore, the approaches that are most common in the literature utilize sampling and permutation based techniques (Cericola et al. 2016; Robinson and Turner 2017; Vejdemo-Johansson and Mukherjee 2022). In this work, a different topological summary of the data, namely the Euler characteristic curve (ECC), is used to construct one-sample and two-sample statistical tests. The application of ECCs is motivated by recent theoretical findings regarding the asymptotic distribution of ECC, which enables us to construct tests in rigorous fashion. Since the finite sample distributions of ECCs remain unknown, extensive Monte Carlo simulations were conducted to investigate the properties and performances of the proposed tests.

1.1.1 Tools from computational topology

To start with an example, let us consider the set X of nine points in \(\mathbb {R}^2\) (Fig. 1a). The most elementary way of assigning a numeric quantity to them is to simply count them. This is a topological invariant, the number of connected components. Now if two points coincide, they should not be regarded as separate. If they are very close together, say less than some given \(\varepsilon >0\) apart, we can also connect them. So let us draw an edge between them (Fig. 1b). The number of connected components is now one less, suggesting we should subtract the number of edges from the number of points. In order to formalize what we mean by points that are close to each other, we introduce a scale parameter \(r\in \mathbb {R}^{\ge 0}\). Then we draw edges between pairs of points whose distance is at most r. Letting \(r=0\) initially and increasing it, we draw more and more edges, thereby reducing the number of connected components (Fig. 1c). Once three points are within distance r of each other, according to our intuition they should be considered as one connected component. But we have three points and three edges, which yield a difference of zero. To correct this mismatch with our intuition, we add the number of triangles (Fig. 1d). This procedure continues to higher dimensions: Once k points are within distance r of each other, we add \((-1)^{k{-1}}\).

Fig. 1
figure 1

With increasing scale parameter, we draw in edges and triangles. We keep track of the number of components, which is here \(\#\text {points} - \#\text {edges} + \#\text {triangles}\)

Fig. 2
figure 2

We consider three different constructions of filtered simplicial complexes with a fixed sample as vertex set

These ideas will now be formalized. For a textbook reference on these topics, we refer the reader to Edelsbrunner and Harer (2010).

Definition 1.1

An abstract simplicial complex K is a collection of nonempty sets which are closed under the subset operation:

$$\begin{aligned} \tau \in K\;\text {and}\; \sigma \subseteq \tau \Rightarrow \sigma \in K. \end{aligned}$$

The elements of K are called simplices. If \(\sigma \subsetneq \tau \in K\), we say that \(\sigma \) is a face of \(\tau \). The dimension of a simplex \(\sigma \in K\) is \(\dim (\sigma ) = \vert \sigma \vert -1\), where \(\vert \cdot \vert \) denotes the cardinality of a set. The dimension of K is the the maximal dimension of any of its simplices.

The construction of drawing edges, triangles etc. between points which are close to each other can be formalized in slightly different flavours. Perhaps the simplest is the Vietoris–Rips construction:

Definition 1.2

For a finite subset \(X\subseteq \mathbb {R}^d\) and \(r\ge 0\) define the Vietoris–Rips complex at scale r to be the abstract simplicial complex

$$\begin{aligned} \mathcal {R}_r(X) = \left\{ \sigma \subseteq X :\text {diam}(\sigma )\le 2r \right\} \end{aligned}$$

where \(\text {diam}\) is the diameter of the simplex \(\text {diam}(\sigma ) = \max \{d(x,x'):x,x' \in \sigma , x\ne x'\}\).

A closely related notion is the Čech complex:

Definition 1.3

For a finite subset \(X\subseteq \mathbb {R}^d\) and \(r\ge 0\) define the Čech complex at scale r to be the abstract simplicial complex

$$\begin{aligned} \mathcal {C}_r(X) = \left\{ \sigma \subseteq X :\bigcap _{x\in \sigma } B_r(x) \ne \emptyset \right\} , \end{aligned}$$

where \(B_r(x)\) is the closed ball of radius r centered at x.

Finally, the Alpha complex (which is the most useful in practice and used in our implementations), requires the following notion from computational geometry:

Definition 1.4

Let \(X\subseteq \mathbb {R}^d\) be a finite set. The Voronoi cell of \(x\in X\) is the subset of points in \(\mathbb {R}^d\) that have x as a closest point in X,

$$\begin{aligned} V_X(x) = \{y\in \mathbb {R}^d :\forall x' \in X \Vert y-x\Vert \le \Vert y-x'\Vert \}. \end{aligned}$$

Definition 1.5

For a finite subset \(X\subseteq \mathbb {R}^d\) and \(r\ge 0\) define the Alpha complex at scale r to be the abstract simplicial complex

$$\begin{aligned} \mathcal {A}_r(X) = \left\{ \sigma \subseteq X :\bigcap _{x\in \sigma } B_r(x)\cap V_X(x) \ne \emptyset \right\} . \end{aligned}$$

For illustrations of the Alpha, Čech and Vietoris–Rips complex on a small sample, consider Fig. 2a–c, respectively. We refer to r as the scale parameter or the filtration value. The latter name comes from the fact that for \(r<r'\), the complex at scale r is a subcomplex of the one at scale \(r'\).

The main advantage of the Alpha complex is its small size in low dimensions (de Berg et al. 2008); namely the Alpha complex on a random sample scales exponentially with the dimension of the sample and linearly with the sample size, see Edelsbrunner et al. (2017) for a further discussion. This is acceptable for low dimension, but impractical for higher ones. The Vietoris–Rips complex does not scale with the dimension but it scales exponentially with the sample size. For small samples in high dimensions, this construction should be preferred.

Counting the simplices with a sign yields the Euler characteristic, a fundamental topological invariant.

Definition 1.6

Let K be a finite abstract simplicial complex. Its Euler characteristic is

$$\begin{aligned} \chi (K) = \sum \limits _{\sigma \in K}(-1)^{\dim (\sigma )}. \end{aligned}$$

In the following we use the Čech construction in the theoretical part. Due to its sparse nature the Alpha construction is used in the implementation. They are topologically equivalent by the nerve lemma (Edelsbrunner and Harer 2010, III.2), hence they give the same ECC.

It should be noted that, for a given sample X, the Euler characteristics of its Vietoris–Rips complex, \(\chi (\mathcal {R}_r(X))\), may be different from \(\chi (\mathcal {A}_r(X))\) and \(\chi (\mathcal {C}_r(X))\). An example can be found in the sample presented in Fig. 2c in which the 2-simplex (triangle) on the left is filled in the Vietoris–Rips complex, but empty for the Čech and Alpha complex.

Keeping track of how the Euler characteristic changes with the scale parameter yields the main tool of our interest:

Definition 1.7

Given a finite subset \(X\subseteq \mathbb {R}^d\), define its Euler characteristic curve (ECC) as

$$\begin{aligned} \chi (X):[0,\infty ) \rightarrow \mathbb {Z}, \; r \mapsto \chi (\mathcal {A}_r(X)). \end{aligned}$$

The ECC of the sample from Fig. 1a is displayed in Fig. 3.

Fig. 3
figure 3

The ECC of the sample from Fig. 1a. The filtration values a–d correspond to the complexes in Fig. 1a–d

First applications of the ECC date to back to work of Worsley on astrophysics and medical imaging (Worsley 1996).

1.1.2 Topology of random geometric complexes

In the considered setting, the vertex set from which we build simplicial complexes is sampled from some unknown distribution. The literature distinguishes two approaches, Poisson and Bernoulli sampling; see Bobrowski and Kahle (2018) for a survey. In the first setting, the samples are assumed to be generated by a spatial Poisson process. We focus on the Bernoulli sampling scheme in this paper. This means that we consider samples of n points sampled i.i.d. from some d dimensional distribution. Furthermore, there are three regimes to be considered when the sample size goes to infinity (Penrose 2003, Section 1.4). We consider the geometric complex at scale \(r{_n}\) for a sequence \(r_n\rightarrow 0\) whose topology is determined by the behaviour whether

$$\begin{aligned} n\cdot r{_n}^d \rightarrow {\left\{ \begin{array}{ll} \infty ,\\ \lambda \in (0,\infty ) \text { constant},\\ 0. \end{array}\right. } \end{aligned}$$

In the supercritical regime, \(n \cdot r{_n}^d \rightarrow \infty \), so that the domain gets densely sampled the geometric complex is highly connected. Intuitively, this regime maintains only global topological information and forgets about local density. In the subcritical regime, \(n \cdot r{_n}^d \rightarrow 0\), so that the domain gets sparsely sampled and the geometric complex is, informally speaking, disconnected (consult Bobrowski and Kahle 2018 for details). In this paper, we focus on the thermodynamic regime, i.e. we keep the quantity \(n \cdot r{_n}^d{ = \lambda }\) constant. Up to a constant factor, the quantity \(n\cdot r_n^d\) is the average number of points in a ball of radius \(r_n\) (Bobrowski and Kahle 2018, Section 1). This value neither goes to zero nor to infinity as \(n\rightarrow \infty \) in the thermodynamic regime, leading to complex topology; see for instance (Penrose 2003, Chapter 9). Now it is straightforward to observe that a subset of our sample \(\sigma \subseteq X\) forms a simplex in the Čech complex at scale \(r_n\) iff

$$\begin{aligned} \bigcap _{x\in \sigma } B_{r_n}(x) \ne \emptyset \Leftrightarrow \bigcap _{x\in n^{1/d}\sigma } B_\lambda (x) \ne \emptyset . \end{aligned}$$

This is because for any \(x\in X, x'\in \mathbb {R}^d\), we have

$$\begin{aligned} \Vert x'-x\Vert \le r_n&\Leftrightarrow n^{1/d}\Vert x'-x\Vert \le n^{1/d}r_n \\&\Leftrightarrow \Vert n^{1/d}x'-n^{1/d}x\Vert \le \lambda ^{1/d} \end{aligned}$$

This observation motivates us to scale a sample of size n by \(n^{1/d}\). In fact, this setup aligns with the approach of Krebs et al. (2021). Due to this scaling, the average number of points in a ball of radius \(r{=\lambda ^{1/d}}\) stays the same as we increase \(n\rightarrow \infty \). Therefore, it makes sense to compare ECCs at fixed radius \(r{=\lambda ^{1/d}}\) for samples of different sizes. Visually speaking, we can compare (expected) ECCs from samples of different sizes in a common coordinate system using the r-axis scaled in this way. In particular, one can study the point-wise limit of the expected ECC; that is, when the sample size approaches infinity for a fixed r. Moreover, this rescaling allows us to conduct two sample tests with samples of different sizes, cf. Sect. 2.2.

1.2 Previous work

Let us briefly review some related work on the intersection of topology and statistics. The most popular tool of TDA is persistent homology. Its key property is stability (Cohen-Steiner et al. 2007); informally speaking, a small perturbation of the input yields a small change in the output. However, persistent homology is a complicated setting for statistics; for example, there are no unique means (Turner et al. 2014).

For a survey on the topology of random geometric complexes see Bobrowski and Kahle (2018). A text book for the case of one-dimensional complexes, i.e. graphs, is Penrose (2003). The Euler characteristic of random geometric complexes has been studied in Bobrowski and Adler (2014) and Bobrowski and Mukherjee (2013). Notably, in Bobrowski and Mukherjee (2013), the limiting ECC in the thermodynamic regime is computed for the uniform distribution on \([0,1]^3\). More recently, Thomas and Owada (2021) provided a functional central limit theorem for ECCs, which was subsequently generalized by Krebs et al. (2021). The Euler characteristic has been studied in the context of random fields (Adler and Taylor 2007b) by Adler and Taylor. Adler suggested to use it for model selection purposes and normality testing (Adler 2008, Section 7). Building on this work, such a normality test has been extensively studied in Bernardino et al. (2017). Using topological summaries for statistical testing has moreover been suggested by Cipriani et al. (2022) for persistence vineyards, Biscio et al. (2020) for persistent Betti numbers and Botnan and Hirsch (2021) for multiparameter persistent Betti numbers. Vejdemo-Johansson and Mukherjee (2022) describe a framework for multiple hypothesis testing for persistent homology. Very recently, Vishwanath et al. (2022) provided criteria to check the injectivity of topological summary statistics including ECCs.

1.3 Our contributions

In this paper, to the best of our knowledge, we present the first mathematically rigorous approach using the Euler characteristic curves to perform general goodness-of-fit testing. Our procedure is theoretically justified by Theorem 2.4. The concentration inequality for Gaussian processes (Lemma 2.2) might be of independent interest.

Simulations conducted in Sects. 4 and 5 indicate that TopoTest outperforms the Kolmogorov-Smirnov test we used as a baseline in arbitrary dimension both in terms of the test power but also in terms of computational time for moderate sample sizes and dimensions.

The implementation of TopoTest is publicly available at https://github.com/dioscuri-tda/topotests.

2 Method

2.1 One-sample test

While topological descriptors are computable and have a strong theory underlying them, they are not complete invariants of the underlying distributions, as recently pointed out in Vishwanath et al. (2022). Hence the statement of the null hypothesis and the alternative require some care.

Definition 2.1

We say two distributions FG are Euler equivalent, denoted \(F \overset{{\chi }}{=}G\), if \(\chi _F(t) \overset{D}{=}\ \chi _G(t)\) for all \(t>0\).

For instance, if G arises from F via translations, rotations or reflections, \(F \overset{{\chi }}{=}G\). For a more interesting instance of Euler equivalent distributions, see Example 3.1 below.

We aim to solve the following: Given a fixed null distribution F and a sample X following an unknown distribution G, we test

(2)

Compare this formulation to the problem stated in (1). As the ECC of the Alpha and Čech complexes are equal, we will use them interchangeably. We write

$$\begin{aligned} \chi (n,r) = \chi ( \mathcal {C}_r(X)), \end{aligned}$$

where n is the cardinality of X. Given some distribution F on \(\mathbb {R}^d\) against which we want to test, we are interested in the expected ECC of the Čech complex of scale r of n i.i.d. points drawn according to F, denoted as \(\mathbb {E}_F(\chi (n,r))\). The TopoTest employs the supremum distance between the ECC computed based on sample points, \(\chi (\mathcal {C}_r(X))\), and the expected ECC, \(\mathbb {E}_F(\chi (n,r))\), under \(H_0\), i.e. the test statistic is

$$\begin{aligned} \Delta _n:= n^{-1/2}\sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r))|, \end{aligned}$$
(3)

where \(T \in \mathbb {R}^+\). Therefore, by using ECC as topological summary of the dataset we reduce the initial d-dimensional problem to a one-dimensional setting. If \(\Delta _n\) defined in (3) is large enough the null hypothesis is rejected, while for small values of \(\Delta _n\) the test fails to reject the \(H_0\). More precisely: given the significance level \(\alpha \) we consider a rejection region \(R_\alpha = [t_\alpha , \infty )\) such that

$$\begin{aligned}&\mathbb {P}(\Delta _n \in R_\alpha \vert H_0) \nonumber \\&\quad = \mathbb {P}\left( \left. n^{-1/2}\sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r))| >t_\alpha \right| H_0 \right) \nonumber \\&\quad =\alpha . \end{aligned}$$
(4)

The threshold value \(t_\alpha \) depends on the significance level \(\alpha \) and F (and hence also on dimension d), however the dependence on F is dropped in the notation. We prove that this test is consistent below in Sect. 2.3.

Remark

The test statistic (3) is based on the difference between sample ECC and ECC expected under \(H_0\). A natural, yet still open question, arises; how likely it is that two isometry-nonequivalent distributions will be Euler-equivalent and hence indistinguishable for test statistics (4). In a naive search where we considered over 1000 different univariate probability distributions defined in \(\mathbb {R}_+\) we could not find any such example. Therefore we believe that, the Euler-equivalence is not a practical limitation of our method.

2.2 Two-sample test

A test statistic based on the Euler characteristic curve can also be adapted to the two-sample problem. Given two samples \(X, Y \subset \mathbb {R}^d\) of possibly different sizes, following unknown distributions \(X\sim F\) and \(Y\sim G\), we are testing the null hypothesis \(H_0:G \overset{{\chi }}{=}F\). The test statistic in this setting is the supremum distance between the normalized ECCs

$$\begin{aligned} \Delta (\chi (X), \chi (Y)) = \sup \limits _{r \in [0, T]} \left| \frac{1}{|X|} \chi (\mathcal {A}_r(X)) - \frac{1}{|Y|} \chi (\mathcal {A}_r(Y)) \right| . \end{aligned}$$

Moreover, recall that we rescale the samples to have a fixed average number of points in a ball of radius r, independently of the sample size. Since the null distribution is unknown, we fall back on a permutation test (Arias-Castro 2022, Section 16.3) to compute the p-value, see Algorithm 2 for the details.

As for any permutation test, the procedure is computationally expensive as it requires computing ECCs for a variety of point sets resampled from the union of the two input datasets. The application of this approach is therefore limited to rather small sizes of input data sets. See Sect. 5 for results of a simulation study in which the performance of this approach is compared with the two-sample Kolmogorov–Smirnov test.

2.3 Power of the one-sample test

2.3.1 Overview

The TopoTest relies on the Functional Central Limit Theorem of Krebs et al. (2021, Theorem 3.4), hence it works under the following, rather technical, assumption

Assumption 1

The null distribution has compact convex support inside \([0,1]^d\). It admits a bounded density \(\kappa \) that can be uniformly approximated by blocked functions \(\kappa _n\).

Recall from (Krebs et al. 2021, equation 3.8), that the approximation by blocked functions means \(\lim _{n\rightarrow \infty } \Vert \kappa -\kappa _n\Vert = 0\), where each \(\kappa _n\) is constant on grid elements of a partition of the unit hypercube \([0,1]^d\) into an equidistant grid of \(m^d\) subcubes. In particular, bounded measurable functions satisfy this assumption.

We will show, for a fixed significance level \(\alpha \), that the mean of the test statistic \(\Delta _n\) does not grow with n under the null hypothesis, while it grows at least like \(\sqrt{n}\) under the alternative hypothesis. Moreover, in both cases \(\Delta _n\) is concentrated around its mean allowing to control the type II error of the TopoTests.

2.3.2 Case \(H_0\) true

By Thomas and Owada (2021) and Krebs et al. (2021, Theorem 3.4), we have convergence of \(\Delta _n\) in distribution in the Skorokhod \(J_1\)-topology to a centered Gaussian process \(f_r\),

$$\begin{aligned} n^{-1/2} \left( \chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r)) \right) \xrightarrow [n \rightarrow \infty ]{D} f_r. \end{aligned}$$
(5)

Here it is assumed that the sample is drawn from a distribution satisfying Assumption 1 and scaled by \(n^{1/d}\). Let us denote

$$\begin{aligned} Z_T = \sup \limits _{r \in [0, T]} |f_r|. \end{aligned}$$

In the following we will approximate the finite-sample distribution of \(n^{-1/2} (\chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r)))\) by the limiting Gaussian process \(f_r\). Therefore, for sufficiently large n we assume that

$$\begin{aligned} \Delta _n {\mathop {=}\limits ^{D}} Z_T. \end{aligned}$$
(6)

The quality of this approximation was studied numerically—please refer to Fig. 4.

For \(Z_T\) we have the Borell-TIS inequalityFootnote 1 (Adler and Taylor 2007a, Section 2.1),

$$\begin{aligned} \mathbb {P}\left( Z_T> t \right)&= \mathbb {P}\left( \sup \limits _{r \in [0, T]} |f_r| > t\right) \nonumber \\&\le \exp \left( -\left[ t-\mathbb {E}\left( \sup \limits _{r \in [0, T]} \vert f_r\vert \right) \right] ^2 / 2\sigma _T^2\right) , \end{aligned}$$
(7)

where \(\sigma _T^2 = \sup \limits _{r \in [0, T]} \mathbb {E}(f_r^2)\).

Fig. 4
figure 4

Numerical inspection of the quality of finite sample approximation (6). The empirical distribution of \(Z_T\) converges with the increasing sample size. Even for three-dimensional case the distribution obtained for \(n=100\) is a reasonable approximation for large-sample empirical distribution. An inset in each plot shows left- and right-hand side of the inequality (7)—this provides another justification for approximation (6)

Therefore, for n large enough,

$$\begin{aligned}&\mathbb {P}\left( \Delta _n> t |H_0\right) \\&\quad = \mathbb {P}\left( \sup \limits _{r \in [0, T]} \left| \frac{ \chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r))}{\sqrt{n}}\right| >t\right) \\ \nonumber&\quad \le \exp \left( -\left[ t-\mathbb {E}\left( \sup \limits _{r \in [0, T]} \frac{1}{\sqrt{n}} | \chi (\mathcal {C}_r(X)) \right. \right. \right. \\&\qquad \left. \left. \left. - \mathbb {E}_F(\chi (n,r))|\right) \right] ^2 / 2\sigma _T^2\right) . \end{aligned}$$

Plugging in (4) yields

$$\begin{aligned} \alpha&\le \exp \left( -\left[ t_\alpha -\mathbb {E}\left( \sup \limits _{r \in [0, T]} \frac{1}{\sqrt{n}}| \chi (\mathcal {C}_r(X)) \right. \right. \right. \\&\quad \left. \left. \left. - \mathbb {E}_F(\chi (n,r))|\right) \right] ^2 / 2\sigma _T^2\right) \end{aligned}$$

which leads to

$$\begin{aligned} t_\alpha&\le \sqrt{-2 \sigma _T^2 \ln \alpha )} \nonumber \\&\quad + \mathbb {E}\left( \sup \limits _{r \in [0, T]} \frac{1}{\sqrt{n}} | \chi (\mathcal {C}_r(X)) - \mathbb {E}_F(\chi (n,r))|\right) , \end{aligned}$$
(8)

i.e. \(t_\alpha =O(1)\).

2.3.3 Case \(H_0\) false

Now let us study the asymptotic size of

$$\begin{aligned} \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))| \end{aligned}$$

as \(n \rightarrow \infty \) when .

We have

$$\begin{aligned}&\mathbb {E}\left( \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))|\right) \\&\quad \ge \sup \limits _{r \in [0, T]} \mathbb {E}\left| \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))\right| \\&\quad \ge \sup \limits _{r \in [0, T]} | \mathbb {E}_G(\chi (n,r)) - \mathbb {E}_F(\chi (n,r))|. \end{aligned}$$

Because the limiting distributions of the ECCs are different under the alternative hypothesis, this last expression diverges. Due to Bobrowski and Mukherjee (2013), Corollary 4.5, \(E_F(\chi (n,r)) \sim n\) with constant depending on F and d. In our setting, we obtain

$$\begin{aligned} \mathbb {E}\left( \sup \limits _{r \in [0, T]} n^{-1/2} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))|\right) = \Omega (\sqrt{n}). \end{aligned}$$
(9)

To complete the discussion, it is required to show that in the case of \(H_0\) false, one also has a concentration around the mean, i.e. one needs to control

$$\begin{aligned} C_{F,G}(t)&= \mathbb {P}\left( n^{-1/2}\left| \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))|\right. \right. \nonumber \\&\quad \left. \left. - \mathbb {E}\left( \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))|\right) \right| > t\right) . \end{aligned}$$
(10)

The lemma below provides a generalization of the Borell-TIS inequality to the case of non-centred Gaussian process.

Lemma 2.2

Let \(f_r\) be a centred Gaussian process and g(r) some deterministic function. We have

$$\begin{aligned}&\mathbb {P}\left( \left| \sup _{r \in [0,T]} |f_r + g(r)| - \mathbb {E}\left( \sup _{r\in [0,T]} |f_r+g(r)|\right) \right| > t \right) \nonumber \\&\quad \le 2e^{-t^2/2\sigma ^2}, \end{aligned}$$
(11)

where \(\sigma = \sup _{r\in [0,T]} \left( \mathbb {E}[f_r^2] \right) ^{1/2}\).

Proof

We follow the strategy of Ledoux (2005, Section 7.1). Argument (2.35) in Ledoux (2005) yields that if \(\gamma \) is a standard Gaussian measure on \(\mathbb {R}^n\) then for every 1-Lipschitz function F on \(\mathbb {R}^n\) and \(t\ge 0\) we have

$$\begin{aligned} \gamma \left( \left\{ F \ge \int F d\gamma + t \right\} \right) \le e^{-t^2/2}. \end{aligned}$$
(12)

Let \(r_1,\ldots ,r_n\) be fixed in [0, T] and consider centered Gaussian random vector \((f_{r_1}, \ldots , f_{r_n})\) in \(\mathbb {R}^n\) with covariance matrix \(\Gamma = B^TB\). Consequently, the law of \((f_{r_1}, \ldots , f_{r_n})\) is the same as the law of \(B\mathcal {N}\) where \(\mathcal {N}=(N_1, \ldots , N_n)^T\) is distributed according to the standard Gaussian measure \(\gamma \) on \(\mathbb {R}^n\). Let \(F: \mathbb {R}^n \rightarrow \mathbb {R}\) be defined as

$$\begin{aligned} F(x) = \max _{1 \le i \le n}\left| (Bx)_i + g(r_i) \right| , x\in \mathbb {R}^n. \end{aligned}$$

Although we have a different F in our setting than (Ledoux 2005), we can still bound the Lipschitz norm of F to be at most the operator norm of \(B:(\mathbb {R}^n, \Vert \cdot \Vert _2) \rightarrow (\mathbb {R}^n,\Vert \cdot \Vert _\infty ) \). Indeed, consider any \(c>0\) such that \(\Vert Bx\Vert _{\infty }\le c \Vert x\Vert _2\) for all \(x\ne 0\). Using the triangle inequality, we estimate that for any \(x\ne y \in \mathbb {R}^n\),

$$\begin{aligned} \vert F(x) - F(y)\vert&= \left| \max \limits _{1\le i\le n} |(Bx)_i+g(r_i)|\right. \\&\quad \left. - \max \limits _{1\le i\le n} |(By)_i+g(r_i)|\right| \\&\le \max \limits _{1\le i\le n} \left| (Bx)_i+g(r_i) - (By)_i-g(r_i)\right| \\&= \max \limits _{1\le i\le n} \left| (B(x-y)_i\right| \\&\le c \Vert x-y\Vert _2. \end{aligned}$$

Notice that \(f_{r_i}=\sum _{j=1}^n B_{ij}N_j\) and by independence of \(\{N_j\}_{1\le j\le n}\) we have \(\mathbb {E}f_{(r_i)}^2 = \sum _{j=1}^n B_{ij}^2\). This allows us to bound the operator norm of B as follows:

$$\begin{aligned} \Vert B\Vert _{op}&= \max _{1 \le i \le n} \left( \sum _{j=1}^n B_{ij}^2 \right) ^{1/2} = \max _{1 \le i \le n}\left( \mathbb {E}(f_{(r_i)}^2)\right) ^{1/2}\\&\le \sup _{r\in [0,T]} \left( (\mathbb {E}(f_{(r_i)}^2) \right) ^{1/2} = \sigma . \end{aligned}$$

Consequently, \(F/\sigma \) is 1-Lipschitz and by (12) we have

$$\begin{aligned} \mathbb {P}\left( \frac{1}{\sigma }F(\mathcal {N}) - \mathbb {E}\left[ \frac{1}{\sigma }F(\mathcal {N})\right] \ge \tilde{t} \right) \le e^{-\tilde{t}^2/2} \end{aligned}$$

Letting \(t=\sigma \tilde{t}\) and by symmetry argument we obtain

$$\begin{aligned} \mathbb {P}\left( |F(\mathcal {N}) - \mathbb {E}(F(\mathcal {N}))| \ge t \right) \le 2e^{-t^2/2\sigma ^2} \end{aligned}$$

and

$$\begin{aligned}&\mathbb {P}\left( \left| \sup _{1\le i \le n}|f_{r_i} + g(r_i)| - \mathbb {E}\left( \sup _{1 \le i \le n} |f_{r_i}+g(r_i)|\right) \right| \ge t \right) \\&\quad \le 2e^{-t^2/2\sigma ^2}. \end{aligned}$$

The right hand side does not depend on \(f(r_i)\), hence letting \(n \rightarrow \infty \), inequality (11) is obtained. \(\square \)

Using the Lemma 2.2 we obtain following theorem

Theorem 2.3

Concentration around the mean \(C_{F,G}(t)\), defined in (10), is exponentially bounded

$$\begin{aligned} C_{F,G}(t) \le 2e^{-t^2/2\sigma _{G}^{2}}. \end{aligned}$$
(13)

Proof

Subtracting and adding \(\mathbb {E}_G(\chi (n,r))\) in (10) yields

$$\begin{aligned}&C_{F,G}(t) \\&\quad = \mathbb {P}\left( n^{-1/2}\left| \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))| \right. \right. \\&\qquad \left. \left. - \mathbb {E}\left( \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))|\right) \right|> t\right) \\&\quad = \mathbb {P}\left( n^{-1/2}\left| \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_G(\chi (n,r))\right. \right. \\&\qquad \left. \left. + \mathbb {E}_G(\chi (n,r)) - \mathbb {E}_F(\chi (n,r))| \right. \right. \\&\qquad \left. \left. - \mathbb {E}\left( \sup \limits _{r \in [0, T]} | \chi (\mathcal {C}_r(Y)) - \mathbb {E}_G(\chi (n,r)) \right. \right. \right. \\&\qquad \left. \left. \left. + \mathbb {E}_G(\chi (n,r))- \mathbb {E}_F(\chi (n,r))|\right) \right|> t\right) \\&\quad = \mathbb {P}\left( \left| \sup \limits _{r \in [0, T]} |g_r + h(r)| \right. \right. \\&\qquad \left. \left. - \mathbb {E}\left( \sup \limits _{r \in [0, T]}|g_r+h(r)|) \right) \right| > t\right) , \end{aligned}$$

where the notation

$$\begin{aligned} g_r&= \left( \chi (\mathcal {C}_r(Y)) - \mathbb {E}_G(\chi (n,r))\right) /\sqrt{n},\\ h(r)&= \left( \mathbb {E}_G(\chi (n,r)) - \mathbb {E}_F(\chi (n,r))\right) /\sqrt{n} \end{aligned}$$

was introduced. Note that by (5) applied for distribution G the \(g_r\) converges to a centred Gaussian process, whereas h(r) is a deterministic function. Let \(\sigma _G^2 = \sup \limits _{r \in [0, T]} \mathbb {E}(g_r^2)\). Therefore using the same argument as in (6) by Lemma 2.2 bound (13) is obtained. \(\square \)

The rate of type I error is controlled by the significance level \(\alpha \). An asymptotic upper bound for type II error is given by the following theorem.

Fig. 5
figure 5

The area of shaded blue region is the probability of a type II error occurring. As \(n \rightarrow \infty \), it goes to zero. (Color figure online)

Theorem 2.4

For fixed \(\alpha \), the probability of a type II error goes to 0 exponentially as \(n \rightarrow \infty \).

Proof

We will use the threshold \(t_\alpha \) defined in (4) and the concentration inequality of Theorem 2.3. The idea is illustrated in Fig. 5. Introduce

$$\begin{aligned} t_{\alpha ,n}^{*} = \mathbb {E}\left( \sup \limits _{r \in [0, T]}n^{-1/2}\left| \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))\right| \right) - t_\alpha . \end{aligned}$$

Due to Eq. (9) first term above is \(\Omega (\sqrt{n})\) while second term is O(1), therefore \(t_{\alpha ,n}^{*} = \Omega (\sqrt{n})\) and is positive for sufficiently large n. Hence we can estimate

$$\begin{aligned}&\mathbb {P}(\text {type II error}) \\&\quad \le \mathbb {P}\left( \sup \limits _{r \in [0, T]}n^{-1/2}\left| \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))\right| < t_\alpha \right) \\&\quad = \frac{1}{2}\mathbb {P}\left[ n^{-1/2}\left| \mathbb {E}\left( \sup \limits _{r \in [0, T]}\left| \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))\right| \right) \right. \right. \\&\qquad \left. \left. - \sup \limits _{r \in [0, T]}\left| \chi (\mathcal {C}_r(Y)) - \mathbb {E}_F(\chi (n,r))\right| \right| > t_{\alpha ,n}^{*} \right] \\&\quad \le \exp \left( \frac{-{t_{\alpha ,n}^{*}}^2}{2\sigma ^2}\right) \sim e^{-n} \rightarrow 0. \end{aligned}$$

\(\square \)

2.4 Properties of the TopoTests

TopoTests rely on the Euler characteristics curve which is computed based on the Alpha complex of the input sample. The Alpha complex captures distance pattern between all data points in the samples. Therefore, TopoTest is not capable to discriminate distributions that are isometry equivalent, e.g. differ only by translation, reflection or rotation. As a consequence TopoTest, contrary to Kolmogorov–Smirnov, is not able to distinguish between e.g. \(\mathcal {N}\left( (0, 0), \begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \end{bmatrix}\right) \) from \(\mathcal {N}\left( (\mu _1, \mu _2), \begin{bmatrix} 1 &{} \alpha \\ \alpha &{} 1 \end{bmatrix} \right) , \alpha \in [-1, 0) \cup (0, 1]\) as those distributions are equivalent up to translation and rotation. As a consequence, the alternative hypotheses in Kolmogorov–Smirnov and TopoTest are in fact slightly different: in the former we have \(H_1: G \ne F\) while in later the inequality is understood only up to Euler equivalence, cf. Eq. (2). The same discussion also applies to the null hypothesis. Hence, such pairs of distributions were excluded from the forthcoming numerical study.

2.5 Non-compactly supported distributions

The results on the asymptotic convergence presented in Sect. 2.3 work for compactly supported distributions. However, most of the distributions considered in practice, starting from normal distributions, are defined on non–compact support and the presented results do not apply to them directly. There are a number of ways we can adjust such a distribution so that the presented methodology applies. In what follows we discuss three possible strategies, starting from the one we consider the most practical one

  1. 1.

    Restricting a distribution to a compact subset;

    In this case, the given distribution is restricted to a compact rectangle. In our case we choose a symmetric rectangle \([-a,a]^d\) for a being the maximal representable double precision number. This ensures that every sample that can be analyzed in a computer is automatically coming from such a restricted distribution. We note that, formally, such a restricted distribution need to be rescaled to become a probability distribution. However, in all practically relevant cases we are aware of, such a restricted distribution will be infinitesimally close, on its domain, to the original one, defined on an unbounded domain. Therefore, we argue that in practice, the presented methods can be applied even to distributions with no compact support. Additionally, the simulations performed provide strong evidence for this claim.

  2. 2.

    Rescaling a distribution to a compact subset;

    Here a transformation, \(\arctan (\gamma x): \mathbb {R} \rightarrow [-\frac{\pi }{2},\frac{\pi }{2}]\) is applied separately to each coordinate to map the unbounded domain to a compact region.

    We observe that for \(x \in [-2,2]\), or for any similar interval centered around zero, \(\arctan (x)\) is close to a linear function, hence the distance between points before and after applying the map, should be proportional to each other regardless of the points. To keep such a distortion of distances between points before and after rescaling, the scaling parameter \(\gamma \) is used. For instance, we may choose it in the way that 10 standard deviations in our data, after divided by \(\gamma \), have values in the interval \([-2,2]\). For multivariate distributions the scaling can be applied separately in each dimension. Such a rescaling does not have any major impact on the powers of the tests as discussed in Sects. 4 and 5. At the same time, it allows to map any unbounded distribution to a compact domain. One should note, however, that a bounded distribution, transformed by \(\arctan \) may be, in some pathological cases, unbounded. Hence, before using this transformation, the boundedness of the output distribution needs to be verified.

    Transforming into copula;

    The marginals \(F_1, \ldots , F_d\) of the distribution F are continuous, hence one can apply the probability integral transform (Casella and Berger 2002) to each component of the random vector X sampled form a distribution F. Then the random vector

    $$\begin{aligned} (U_1, \ldots , U_d) = (F_1(X_1), \ldots , F_d(X_d)) \end{aligned}$$
    (14)

    is supported on a unit cube \([0, 1]^d\) and has uniformly distributed marginals. The joint distribution function of \((U_1, \ldots , U_d)\) forms a copula. Since the null distribution F is given, the marginal distributions \(F_1, \ldots , F_d\) can be derived. The transformation (14) must be applied to both the sample and null distribution F. Transformation (14) preserves the correlation structure and transforms the initial distribution F onto a compact support fulfilling the Assumption 1. Although such transformation is easy to compute and quite general, simulation studies showed that the power of resulting test is significantly reduced.

3 Algorithms

3.1 One-sample test

The test statistic for one-sample TopoTest, \(\Delta \) defined in (3), involves \(\mathbb {E}_F(\chi (n,r))\) being the ECC expected under \(H_0\). There is no compact formula that can be applied to compute \(\mathbb {E}_F(\chi (n,r))\) for an arbitrary distribution function F in arbitrary dimension d although some formulas are available in case of the multivariate uniform distribution (Bobrowski and Mukherjee 2013). However one can use the approximation of \(\mathbb {E}_F(\chi (n,r))\) based on average ECC computed on a collection of randomly generated ECCs. Notice that \(\chi (\mathcal {C}_r(X))\) can only take on finitely many values because the underlying sample is finite. Therefore, \(\mathbb {E}_F(\chi (n,r))\) is finite. The strong law of large numbers applies and we can approximate this expectation empirically, i.e. let \(Y_1,\ldots Y_M\) be i.i.d. samples each consisting of n points drawn i.i.d. from F, then

$$\begin{aligned} \widehat{\mathbb {E}}_F(\chi (n,r)):= \sum \limits _{i=1}^{M} \frac{\chi (\mathcal {C}_r(Y_i))}{M} \xrightarrow [M\rightarrow \infty ]{a.s.} \mathbb {E}_F(\chi (n,r)). \end{aligned}$$
(15)

Due to the continuous mapping theorem, the above point-wise convergence result allows us to use an empirical estimate \(\widehat{\mathbb {E}}_F(\chi (n,r))\) instead of \(\mathbb {E}_F(\chi (n,r))\) in practice when computing the statistic \(\Delta _n\) leading to statistic

$$\begin{aligned} \widehat{\Delta }_n&:= \widehat{\Delta }(\chi (\mathcal {C}(X), \widehat{E}_F(\chi (n,r))))\nonumber \\&:= \sup \limits _{r \in [0, T]} \frac{1}{\sqrt{n}} | \chi (\mathcal {C}_r(X)) - \widehat{\mathbb {E}}_F(\chi (n,r))|, \end{aligned}$$
(16)

that was actually used in simulations. It should be mentioned that the estimator \(\widehat{\mathbb {E}}_F(\chi (n,r))\) does not depend on the sample being tested and by increasing M can be arbitrary close to \(\mathbb {E}_F(\chi (n,r))\).

The algorithm for computing the TopoTest for one sample can be divided into two steps. Firstly, in the preparation step an average ECC for given null distribution F is computed. Then the critical value of the test statistic is estimated empirically by drawing a set of random samples from F and computing the distance between ECCs corresponding to those samples and the average ECC computed previously. Secondly, in the testing step, the distance of the ECC of the given sample to the averaged ECC for the considered distribution is computed and compared to the critical values obtained in the first step. This procedure is provided in details by Algorithm 1.

Algorithm 1:
figure a

Algorithm for one-sample testing

Remark

The preparation step in Algorithm 1 depends only on sample size n and null distribution F but is independent of actual sample X. Hence needs to be performed only once if several data samples of size n are considered.

Remark

The threshold value \(t_{\alpha }\) used in the TopoTest is obtained from a numerical Monte Carlo simulation performed for a family of finite samples of a size n and does not explicitly employ asymptotic bounds from Sect. 2.

Remark

The Monte Carlo parameters M and m should be sufficiently large to obtain an accurate resulting test. For the distributions considered in this paper, values \(M=m=1000\) were selected.

Remark

The need to utilize the Monte Carlo approach to determine threshold value \(t_\alpha \) stems from the fact that the distribution of the test statistic (3) depends on the distribution of F and the size of the samples for which TopoTest was built. In general, this distribution is unknown. The simulations showed that employing an asymptotic distribution, approximated numerically by using a large sample size n in the preparation step, provided incorrect empirical significance levels in case of samples much smaller than n.

Fig. 6
figure 6

Euler characteristic curves of two samples of a size 50; \(X \sim \mathcal {U}(0,1) \times \mathcal {U}(0,1)\) (in black) and \(Y \sim \beta (3,3) \times \beta (3,3)\) (in red). The green curve represents the expected ECC for \(\mathcal {U}(0,1) \times \mathcal {U}(0,1)\). Samples are shown in the inset. (Color figure online)

Example

Consider the samples \(X, Y\subseteq [0,1]^2\) consisting of the 50 black and 50 red points as shown in the inset in Fig. 6. Let us look at the two samples separately, for each of them we perform the one-sample test against the uniform distribution. We want to test, at significance level \(\alpha =0.05\), whether they follow (up to an isometry of \(\mathbb {R}^2\)) the uniform distribution. The ECC of X is shown in black and the one of Y in red in Fig. 6. The green curve represents the expected ECC under the null hypothesis, estimated via \(M=1000\) Monte Carlo iterations using (15). We find the test statistic (16) computed between the \(\chi (\mathcal {A}_r(X))\) and the average curve is \(\widehat{\Delta }_n = 0.612\). Comparing this with the computed threshold of \(t_\alpha = 1.318\), we conclude that we do not have evidence to reject the null hypothesis. The p-value is 0.916. In contrast, test statistics computed for \(\chi (\mathcal {A}_r(Y))\) is much larger and equals \(\widehat{\Delta }_n = 2.267\). Again using \(\alpha = 0.05\), the test provides evidence to reject the null hypothesis with p-value computed to be 0.00. And indeed, we generated X from the bivariate uniform distribution (i.e. null distribution) whereas Y was sampled from \(\beta (3,3) \times \beta (3,3)\), i.e. Cartesian product of two independent univariate \(\beta (3,3)\) distributions. .

Example

Consider the distributions F and G with densities

$$\begin{aligned} f(x)&= \frac{1}{2}\mathbb {I}_{(0, 2)}(x) + \frac{1}{2}\mathbb {I}_{(2, 3)}(x), \\ g(x)&= \frac{1}{4}\mathbb {I}_{(0,1)}(x) + \frac{1}{2}\mathbb {I}_{(1,2)}(x) + \frac{1}{4}\mathbb {I}_{(2,3)}(x) \end{aligned}$$

Observe that for each \(t>0\),

$$\begin{aligned} \int \limits _{f\ge t} f(x) \text {d}x = \int \limits _{g\ge t} g(x) \text {d}x = {\left\{ \begin{array}{ll} 1 &{} \text {if}\; t\le 1/4,\\ 1/2 &{} \text {if}\; 1/4 < t \le 1/2,\\ 0 &{} \text {if}\; t > 1/2. \end{array}\right. } \end{aligned}$$
(17)

Hence by Lemma 5.1 of Vishwanath et al. (2022), the ECCs of F and G in the thermodynamic limit follow the same distribution. The limiting ECCs for F and G are shown in Fig. 7. Note that distributions F and G are not isometric-equivalent and yet the corresponding ECCs are the same as the distributions are \(\beta \)-equivalent, hence also Euler equivalent. F and G therefore form an example of distributions that are indistinguishable by TopoTest. Indeed, the power of one-sample Kolmogorov–Smirnov test, when F is used as a null distribution and 50 elements samples are drawn from G, is 0.91 and only 0.05, i.e. \(\alpha \), for TopoTest.

3.2 Two-sample test

In Sect. 2.2 a related approach to the two-sample problem was presented. This idea is formally provided by the Algorithm 2 while a particular realization is presented in the example below.

Fig. 7
figure 7

Expected ECCs of distributions F and G for \(n=50\). The inset shows the corresponding densities f and g

Algorithm 2:
figure b

Two-sample testing

Let us begin with the situation in which the null hypothesis is not rejected.

Example

Consider both X and Y sampled from \(\mathcal {U}(0,1)^2\) with \(\vert X\vert = 30\), \(\vert Y\vert =50\), shown in the inset of Fig. 8.

Fig. 8
figure 8

Normalized Euler Characteristic Curves of two samples of size 30 and 50 drawn from bivariate uniform distribution, \(\mathcal {U}(0,1) \times \mathcal {U}(0,1)\). Samples are shown in the inset

We compute the supremum distance between the normalized ECCs to be \(D=0.227\), as illustrated in Fig. 8. Using \(K=1000\) Monte Carlo iterations we find that a distance between ECCs at least as extreme as D happens roughly 73% of the time. We conclude that we do not have evidence to reject the null hypothesis at significance level \(\alpha =0.05\).

Now let us turn to an example in which the null hypothesis is rejected.

Example

In the Fig. 9, we have sampled X as 30 points from the bivariate uniform distribution on the unit square \(\mathcal {U}(0,1)^2\), whereas Y consists of 50 points sampled from \(\beta (3,3)\times \mathcal {U}(0,1)\). We compute the distance between corresponding normalized ECCs to be \(D=0.453\). In \(K=1000\) Monte Carlo iterations, we find that an ECC distance at least as extreme as D never happens, hence using \(\alpha = 0.05\) this establishes evidence to reject the null hypothesis.

Fig. 9
figure 9

Normalized Euler Characteristic Curves of two samples of size 30 and 50 drawn from different distributions: \(X \sim \mathcal {U}(0,1) \times \mathcal {U}(0,1)\) and \(Y \sim \beta (3, 3) \times \mathcal {U}(0,1)\)

Fig. 10
figure 10

Average power of TopoTest (left panel) and Kolmogorov–Smirnov test for selected trivariate on compact support on \([0, 1]^3\). Average power, at significance level \(\alpha =0.05\), is estimated based on \(K=1000\) Monte Carlo realizations for sample size \(n=100\)

4 Numerical experiments, one-sample problem

In this study, Monte Carlo simulations were used to evaluate the power of TopoTests and compare it with the power of corresponding Kolmogorov–Smirnov tests. In case of univariate distributions, Cramér–von Mises was considered as well for completeness. To obtain more detailed insight into performance of TopoTests, samples of various sizes ranging from \(n=30\) up to \(n=1000\), were examined. In the following subsections three types of experiments are presented:

  1. 1.

    Fixing the null distribution to be standard normal and test samples drawn from a vast variety of alternative distributions with different parameters; Laplace, uniform, t-distribution, as well as Cauchy, logistic distributions and mixture of Gaussians. This set of experiments allowed to assess how well TopoTests performs to recognize standard normal distributions.

  2. 2.

    Fixing a family of distributions, and treat each of them as null distribution while all others are considered as alternative distribution. For each such a pair of distributions, the empirical power of the test, i.e. 1 minus probability of type II error, was computed using Monte Carlo methods. The result was visualized in a form of heat-maps.

  3. 3.

    In addition, for various dimensions, a relation between power of the test and number n of data points in the sample was examined. As expected, the power of the test increases monotonically with the sample size.

In this section both simulations satisfying Assumption 1 and those that do not satisfy it (for instance multivariate normal) were considered. To theoretically underpin this approach, several ideas were suggested in Sect. 2.5. In practice, the fact that the Assumption 1 was not satisfied in some cases did not affect the test powers.

Remark

In this section we benchmark TopoTest by comparing its power with the power of Kolmogorov–Smirnov test, i.e. the probability that the test correctly rejects null hypothesis when the alternative distribution is different than null distribution. Since TopoTests is not able to distinguish different but Euler-equivalent distributions, which Kolmogorov–Smirnov can distinguish, the setting under which it operates (2) is different from the Kolmogorov–Smirnov setting (1), and hence the reported power of TopoTest might be overestimated. To mediate this effect a vast collection of distributions was considered.

4.1 Compactly supported distributions

As a first example a collection of distributions supported on three-dimensional unit cube \([0, 1]^3\) was considered. The collection consisted of a number of three-fold Cartesian products of independent beta, cosine (rescaled to fit unit interval) and uniform univarite distributions. In such setup the Assumption 1 is fulfilled and developed theory can be applied straightforwardly. In Fig. 10 the power of TopoTest was compared with power of Kolmogorov–Smirnov test for a collection of trivariate distributions on compact domain. Several sample sizes were considered but here only results obtained for \(n=100\) are reported as similar conclusions can be drawn for different values of n.

Table 1 Empirical powers of the one-sample TopoTest for different alternative distributions and sample sizes n—the null distribution was standard normal \(\mathcal {N}(0,1)\)

The TopoTest provided higher power for vast majority of considered pairs of null and alternative distributions resulting in average power, at significance level \(\alpha =0.05\), for this collection of distributions to be 0.82 for TopoTest and 0.73 for Kolmogorov–Smirnov. In fact, for collection of distributions considered in Fig. 10 in only one, out of 72, comparisons the power of Kolmogorov–Smirnov test was higher than the one for TopoTest, and the difference was slim (0.07 vs. 0.08).

4.2 Univariate unbounded distributions

In this section we consider a vast collection of univariate unbounded distribution represented on a computer (hence, restricted to a representable range of double precision numbers). The collection include normal distributions \(\mathcal {N}(0, \sigma ^2)\) with different values of \(\sigma \), Cauchy, Laplace, Logistic distributions, Student’s t-distributions with increasing number of degrees of freedom \(\nu \) as well as Gaussian mixtures defined as \(GM(p, \mu , \sigma ) = p\mathcal {N}(0,1) + (1-p)\mathcal {N}(\mu , \sigma )\), for \(p\in \{0.1, 0.3, 0.5, 0.7, 0.9\}\), \(\mu \in \{0, 1\}\) and \(\sigma \in \{\frac{1}{2}, 1, 2\}\). For completeness some distributions defined on compact support are considered as well.

Table 1 provides the empirical power of TopoTests, assessed based on \(K=5000\) Monte Carlo simulations, in distinguishing a standard normal \(\mathcal {N}(0,1)\) from a number of alternative distributions at significance level \(\alpha =0.05\).

As we can observe in Table 1, TopoTest outperformed the Kolmogorov–Smirnov test when distinguishing between the standard normal distribution from the normal distribution with variance different from 1, regardless of the sample size. The power of the TopoTest is also greater when the alternative distribution is Student’s t-distribution: the difference compared to the Kolmogorov–Smirnov test was particularly pronounced when the number of degrees of freedom \(\nu \) was small. When \(\nu \) was 10 or more, the power of both tests is much lower, as expected, but still TopoTest outperformed the Kolmogorov–Smirnov test. Similar conclusion can be drawn for heavier tail alternative distributions such as Cauchy, Laplace or Logistic distribution: the empirical probability of type II error was always lower for TopoTest than for Kolmogorov–Smirnov counterpart. On the other hand, when Gaussian mixtures were considered, it was the Kolmogorov–Smirnov test that performs better, regardless of the value of mixing coefficient p.

Table 2 The same as Table 1 but for two dimensional distributions

4.3 Two and three dimensional unbounded distributions

In Table 2 result for collection of bivariate distributions are shown. The MG(a) denotes a multivariate normal distribution with non-diagonal covariance matrix, i.e.

$$\begin{aligned} MG(a) = \mathcal {N}\left( 0, \begin{bmatrix} 1 &{} a &{} a &{} \dots &{} a \\ a &{} 1 &{} a &{} \dots &{} a \\ \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ a &{} a &{} a &{} \dots &{} 1 \end{bmatrix} \right) , \end{aligned}$$
(18)

where the value of the parameter a varies from 0 to 1 to reflect increasing correlation of components.

Similarly to the univariate case, TopoTests provided lower type II errors in case of alternative distributions being products involving a Student’s t-distribution. This conclusion holds also when one of the marginal distribution was a \(\mathcal {N}(0,1)\) and second being Student’s t-distribution. A similar result is true for bivariate distributions being a Cartesian product involving Logistic or Laplace distribution. We notice that TopoTest usually provided higher efficiency in case of Gaussian mixtures. On the other hand, TopoTest is significantly weaker than Kolmogorov–Smirnov when considering correlated multivariate normal distributions MG. All of these conclusions can be generalized to three dimensional distributions as initiated by results in Table 3.

The last row of Tables 12 and 3 show the average powers of TopoTest and Kolmogorov–Smirnov test for the considered set of alternative distributions. The average power of TopoTest is greater than that of Kolmogorov–Smirnov test for all studied sample sizes.

Table 3 The same as Table 1 but for three dimensional distributions

4.4 All-to-all tests

Results presented in Tables 12 and 3 focused on the ability to discriminate the standard normal distribution from a set of different distributions. However in TopoTest one can choose arbitrary continuous distributions as null and alternative. Hence below we present power matrices where all possible pairs of null and alternative distributions formed from the previous set were considered—results are presented in Figs. 1112 and 13. For easier evaluation of the effectiveness of the TopoTest in comparison to Kolmogorov–Smirnov, the difference in power was shown in the figures. Hence, the blue region corresponds to combinations of null and alternative distribution for which the TopoTest yielded higher power while red regions reflect the combinations for which TopoTest was outperformed by Kolmogorov–Smirnov. redWhite color stands for combinations for which both tests performed similar.

Fig. 11
figure 11

Comparison of the power of TopoTest and Kolmogorov–Smirnov one-sample tests in case of univariate probability distributions. In each matrix element a difference between power of TopoTest and Kolmogorov–Smirnov test was given. The difference in power was estimated based on \(K=1000\) Monte Carlo realizations. Left and right panels shows tests powers for sample sizes \(n=100\) and \(n=250\), respectively. The average power (excluding diagonal elements) of TopoTest is 0.722 (0.832) and 0.634 (0.794) for Kolmogorov–Smirnov for \(n=100\) (\(n=250\)). (Color figure online)

Fig. 12
figure 12

The same as Fig. 11 but for bivariate distributions. Results based on \(K=1000\) Monte Carlo realizations. Average power is 0.642 (0.772) for TopoTest and 0.560 (0.720) for Kolmogorov–Smirnov for \(n=100\) (\(n=250\)). (Color figure online)

Fig. 13
figure 13

The same as Fig. 11 but for three-dimensional distributions. Results based on \(K=250\) Monte Carlo realizations. Average power is 0.708 (0.824) for TopoTest and 0.602 (0.763) for Kolmogorov–Smirnov for \(n=100\) (\(n=250\)). (Color figure online)

The analysis was conducted also dimension \(d=5\) as can be seen in Fig. 14. For \(d>3\) the Kolmogorov–Smirnov test was not preformed due to too long computation time, hence results for TopoTest are presented only as this method provided feasible computational complexity.

As can be seen the TopoTest stayed sensitive enough to differentiate between multivariate normal distribution and Cartesian products of involving Student’s t-distribution and standard normal as marginals, especially given that considered samples sizes are low for such high dimensional spaces.

The heatmap presented in Fig. 12 reveals several prominent red-blocks, i.e. combinations of null and alternative distributions for which the power of the TopoTest is significantly lower than the power of KS test: e.g. the combination \(G=p\mathcal {N}(0, 1) + (1-p)\mathcal {N}(0, 2)\) and \(F=p\mathcal {N}(0, 1) + (1-p)\mathcal {N}(\mu , 2), \mu =1\). This observation is related to the Lemma 5.1 by Vishwanath et al. (2022) (c.f. Example 3.1) regarding equivalence in expected ECCs. Although the distributions F and G are not Euler equivalent and the condition (17) is not met but only approximately, the expected ECCs are quite similar for small values of \(\mu \) making them hard to distinguish by the TopoTest test statistic (4). Similar situations holds for trivariate distributions as shown in Fig. 13.

Fig. 14
figure 14

Average power of TopoTest for five dimension distributions, for sample sizes \(n=250\) and \(n=500\). Results based on \(K=1000\) Monte Carlo realizations

Fig. 15
figure 15

Average power of the TopoTest (black curve) and Kolmogorov–Smirnov (red curve) as a function of sample size n for dimensions \(d=1, 2, 3\). In case of \(d=1\) the average power of Cramér–von Mises (green curve) test was shown as well. To guide an eye the data points are connect by lines. (Color figure online)

4.5 Dependence of the test power on sample size

The dependence of the power of TopoTest and Kolmogorov–Smirnov tests on the sample size n is shown in Fig. 15 for random samples in dimensions \(d=1, 2, 3\). To compute average power, all combinations of null and alternative distributions, as considered in Figs. 1112 and 13, were taken into account, except alternative being the same as null distribution. In all cases, the average power increased with sample size as expected. In case of univariate distribution (leftmost panel in Fig. 15) the results obtained using Cramér-von Mises test were added for completeness. The overall performance of this test is similar to Kolmogorov–Smirnov, hence detailed analysis was omitted. The TopoTest however provides higher average power for all sample sizes regardless of the data dimension. It should be noted that powers presented in Fig. 15 should not be directly compared across different dimensions as the actual value depends on the list of considered distributions which is different for each dimension.

Table 4 Empirical powers of the two-sample TopoTest for different alternative distributions and sample sizes n—the null distribution is standard normal \(\mathcal {N}(0,1)\)
Table 5 The same as Table 4 but for \(d=2\)

5 Numerical experiments, two-sample problem

A numerical study was conducted also for two-sample problems, in which Algorithm 2 was applied. The two-sample problem was considered for completeness purpose as practical application is limited by high computational costs, therefore results presented here are restricted to comparison of empirical power of two-sample TopoTest and Kolmogorov–Smirnov tests in \(d=1\) (cf. Table 4) and \(d=2\) (cf. Table 5). Simulations showed that in both cases the TopoTest outperformed the Kolmogorov–Smirnov test: in the vast majority of examined cases the power of the former is greater. Moreover, the average power for TopoTest is greater than the corresponding average power of Kolmogorov–Smirnov test for all sample sizes n.

As in the Sect. 4, the above collection of distribution is examined also in all-to-all settings. The difference in average power between TopoTest and Kolmogorov–Smirnov tests are shown are shown Fig. 16.

Fig. 16
figure 16

Difference in average power of two-sample TopoTest and two-sample Kolmogorov–Smirnov tests for univariate (left panel) and bivariate (right panel) distributions. In both cases sample sizes were \(n=100\) and \(K=500\) Monte Carlo realizations were performed to estimate the average power. Average power of TopoTest is 0.643 (0.537) while for Kolmogorov–Smirnov it is 0.453 (0.437) in \(d=1\) (\(d=2\))

6 Real data analysis

In this section, we show two exemplary applications of the developed method to the analysis of real data.

First, we consider Fisher’s Iris data in the one-sample setting. This data includes three multivariate samples corresponding to three different species of Iris, i.e. Iris setosa, Iris virginica, and Iris versicolor. There are 50 samples from each species, containing four measurements of the flower. We would like to determine if the distribution of each species follows a four-dimensional normal distribution. This can be formulated as a one-sample problem, where G is the distribution of a sample, and F is the specified four-dimensional normal distribution. F involves an unknown mean vector \(\mathbf {\mu }\) and unknown covariance matrix \(\Sigma \). For each species, \(\mathbf {\mu }\) and \(\Sigma \) are estimated by sample mean and sample covariance matrix. Our one-sample test for testing \(H_0: G = F\) against \(H_1: G \ne F\) gave p-values of 0.057, 0.569 and 0.999 for Iris setosa, Iris virginica and Iris versicolor, respectively. These p-values indicate that, at significance level 0.05, \(H_0\) should not be rejected for each of the Iris species. However, when the same procedure is applied to the entire Iris dataset (i.e. without splitting into species), the p-value is \(<10^{-4}\), hence the null hypothesis is to be rejected, which indicates that multivariate normal distribution does not fit whole Iris dataset. The conclusions are consistent with the literature (Dhar et al. 2014).

Fig. 17
figure 17

Spatial distribution of selected social services with the municipality of Rennes, France (left panel), corresponding Euler curves (right panel)

In our second example, we consider a dataset introduced in Floch et al. (2018) consisting of a collection of geographic locations of four distinct social services, i.e. doctor offices, clothing stores, schools, and pharmacies, in the municipality of Rennes, France. It is visualized in Fig. 17 as a map. The two-sample TopoTest is used to detect if there are any significant differences in the distribution of those facilities. The test was conducted for all possible pairs. The p-values for all tests involving the distribution of clothing stores were below \(10^{-4}\), meaning that in the Algorithm 2 in all of \(K=10{,}000\) iterations \(d_{(p)} < D\), which indicates that their geographic distribution is significantly different from the distribution of doctor offices, schools, and pharmacies. Such conclusion is supported by the plots of corresponding ECCs (c.f. Fig. 17, right panel): The curve computed for clothing stores (blue) is visually distinct from other curves. Contrary, no statistical differences were observed between the distribution of pharmacies and the distribution of schools—the p-value of the TopoTest is 0.306. All the above conclusions are in agreement with the previous findings about that dataset made using the Fasano-Franceschi test (Fasano and Franceschini 1987; Puritz et al. 2022). However, in addition to that, the TopoTest rejects the hypothesis of equal geographical distributions of doctor offices vs. pharmacies and doctor offices vs. schools (in both cases the p-value is below \(10^{-4}\)), while the Fasano-Franceschi does not (p-value 0.881 and 0.435, respectively as computed using fasano.franceschini.test R package). This is an interesting observation in the context of previously discussed simulation study results, where we show that TopoTest is more powerful than the Kolmogorov–Smirnov test (closely related to the Fasano-Franceschi test) and hence more often correctly rejects the null hypothesis.

7 Discussion

Using Euler characteristic curves, we introduced a new framework for goodness-of-fit testing in arbitrary dimensions. In addition, we provide a theoretical justification of the method. Although the distribution of the test statistic is unknown for finite n, and contrary to the Kolmogorov–Smirnov test, depends on F, the asymptotic distribution is given by (5), while Theorem 2.4 provides an upper bound on the type II error.

A simulation study was conducted to address the power of the TopoTest in comparison with Kolmogorov–Smirnov test. A one- and two-sample setting was considered. In both cases, the TopoTest in many cases yielded better performance than Kolmogorov–Smirnov. It should be however highlighted that Kolmogorov–Smirnov test and TopoTests operate in slightly different frameworks—the former in capable to distinguish between distributions that differ e.g. in location parameter while the TopoTests are insensitive to the distribution shifts, rotations, reflections as described in Sect. 2.4.