Robust Change-Point Detection for Functional Time Series Based on $U$-Statistics and Dependent Wild Bootstrap

The aim of this paper is to develop a change-point test for functional time series that uses the full functional information and is less sensitive to outliers compared to the classical CUSUM test. For this aim, the Wilcoxon two-sample test is generalized to functional data. To obtain the asymptotic distribution of the test statistic, we proof a limit theorem for a process of $U$-statistics with values in a Hilbert space under weak dependence. Critical values can be obtained by a newly developed version of the dependent wild bootstrap for non-degenerate 2-sample $U$-statistics.


Introduction
Statistical methods for observations consisting of functions are widely discussed since at least the work by Ramsay [1982], and there is a growing interest in recent years because more and more data is available in high resolution that can not be treated as multivariate data.Functional data analysis might even be helpful for one-dimensional time series (see e.g.Hörmann and Kokoszka [2010]).Functional observations are often modelled as random variables taking values in a Hilbert space, we recommend the book by Hörmann and Kokoszka [2012] for an introduction.
In this paper, we will propose new methods for the detection of change-points: Suppose that we observe X 1 , ..., X n being a part of a time series (X n ) n∈Z with values in a separable Hilbert space H (equipped with inner product ⟨•, •, ⟩ and norm ∥ • ∥ = ⟨•, •⟩).The at most one change-point problem is to test the null hypothesis of stationarity against the alternative of an abrupt change of the distribution at an unknown time point k ⋆ : X 1 D = ...
(where X i D = X j means that X i and X j have the same distribution).Functional data is often projected on lower dimensional spaces with functional principal components, see Berkes et al. [2009] for a change in mean of independent data and Aston and Kirch [2012] for a change in mean of time series.Fremdt et al. [2014] proposed to let the dimension on the subspace on which the data is projected grow with the sample size.But is is also possible to use changepoint tests without dimension reduction as done by Horváth et al. [2014] under independence, by Sharipov et al. [2016] and Aue et al. [2018] under dependence.Since using the asymptotic distribution would require knowledge of the infinitedimensional covariance operator, it is convenient to use bootstrap methods.In the context of change-point detection for functional time series, the nonoverlapping block bootstrap was studied by Sharipov et al. [2016], the dependent wild bootstrap by Bucchia and Wendler [2017] and the block multiplier bootstrap (for Banachspace-valued times series) by Dette et al. [2020].
Typically, these tests are based on variants of the CUSUM-test, where CUSUM stands for cumulated sums.Such tests make use of sample means and thus, they are sensitive to outliers.For real-valued time series, several authors have constructed more robust tests based on the Mann-Whitney-Wilcoxon-U -test.For the twosample problem (do the two real-valued samples X 1 , ..., X n1 and Y 1 , ..., Y n2 have the same location?),the Mann-Whitney-Wilcoxon-U -statistic can be written as (where 0/0 is set to 0).Chakraborty and Chaudhuri [2017] have generalized this test statistic to Hilbert spaces by replacing the sign by the so called spatial sign: They have shown the weak convergence to a Gaussian distribution for independent random variables.For change-point detection, one encounters several problems: In practice, the change-point is typically unknown, so it is not known where to split the sequence of the observations into two samples.In many applications, the assumption of independence is not realistic, one rather has to deal with time series.Furthermore, the covariance operator is not known.
To deal with these problems, we will study limit theorems for two-sample Uprocesses with values in Hilbert spaces and deduce the asymptotic distribution of the Wilcoxon-type change-point-statistic max k=1,...,n−1 for a short-range dependent, Hilbert-space-valued time series (X n ) n∈Z .Changepoint tests based on Wilcoxon have been studied before, but mainly for real-valued observations, starting with Darkhovsky [1976] and Pettitt [1979].Yu and Chen [2022] used the maximum of componentwise Wilcoxon-type statistics.Very recently and independently of our work, Jiang et al. [2022] introduced a test statistic based on spatial signs for independent, high-dimensional observations, which is very similar to the square of our test statistic.However, Jiang et al. [2022] obtained the limit for a growing dimension of the observations and assuming that the entries of each vector form a stationary, weakly dependent time series, while we consider observations in a fixed Hilbert space H and take the limit for a growing number of observations.Furthermore, they use self-normalization instead of bootstrap to obtain critical values.
Let us note that spatial signs have been used for change-point detection before by other authors: Vogel and Fried [2015] have studied a robust test for changes in the dependence structure of a finite-dimensional time series based on the spatial sign covariance matrix.
As the Mann-Whitney-Wilcoxon-U -statistic is a special case of a two-sample U -statistic, authors like Csörgő and Horváth [1989], Gombay and Horváth [2002] studied more general U -statistics for change point detection under independence and Dehling et al. [2015] under dependence.We will provide our theory not only for the special case of the test statistic based on spatial signs, but for general test statistics based on two-sample H-valued U -statistics under dependence.
As the limit depends on the unknown, infinite-dimensional long-run covariance operator, one would either need to estimate this operator, or one could use resampling techniques.Leucht and Neumann [2013] have developed a variant of the dependent wild bootstrap (introduced by Shao [2010]) for U -statistics.However, their method works only for degenerate U -statistics.As the Wilcoxon-type statistic is non-degenerate, we propose a new version of the dependent wild bootstrap for this type of U -statistic.The bootstrap version of our change-point test statistic is max k=1,...,n−1 where ε 1 , ..., ε n is a stationary sequence of dependent N (0, 1)-distributed multipliers, independent of X 1 , ..., X n .We will prove the asymptotic validity of our new bootstrap method.Our variant of the dependent wild bootstrap is similar, but not identical to the variant proposed by Doukhan et al. [2015] for non-degenerate von Mises statistics.Note that this bootstrap differs from the multiplier bootstrap proposed by Bücher and Kojadinovic [2016], as it does not rely on pre-linearization, that means replacing the U -statistic by a partial sum.

Main Results
We will treat the CUSUM statistic and the Wilcoxon-type statistic as two special cases of a general class based on two-sample U -statistics.Let h : H 2 → H be a kernel function.We define For h(x, y) = x − y, we obtain with a short calculation which is the CUSUM-statistic for functional data.On the other hand, with the kernel h(x, y) = (x − y)/∥x − y∥, we get the Wilcoxon-type statistic.Other kernels would be possible, e.g.h(x, y) = (x−y)/(c+∥x−y∥) for some c > 0 as a compromise between the CUSUM and the Wilcoxon approach.Before stating our limit theorem for this class based on two-sample U -statistics, we have to define some concepts and our assumptions.We will start with our concept of short range dependence, which is based on a combination of absolute regularity (introduced by Volkonskii and Rozanov [1959]) and P -near-epoch dependence (introduced by Dehling et al. [2017]).In the following, let H be a separable Hilbert space with inner product ⟨•, •⟩ and norm ∥x∥ = ⟨x, x⟩.
Definition 1 (Absolute Regularity).Let (ζ n ) n∈Z be a stationary sequence of random variables.We define the mixing coefficients (β m ) m∈Z by where F b a is the σ-field generated by ζ a , . . ., ζ b , and call the sequence P-NED has the advantage of not implying finite moments (unlike L p -NED), which is useful to allow for heavy tailed distributions.
Additionally, we will need assumptions on the kernel: Antisymmetric kernels are natural candidates for comparing two distributions, because if X and X are independent, H-valued random variables with the same distribution and h is antisymmetric, we have E[h(X, X)] = 0, so our test statistic should have values close to 0, see also Račkauskas and Wendler [2020].
H ] ≤ M, we say that the kernel has uniform m-th moments under approximation.
Furthermore, we need the following mild continuity condition on the kernel, which is called variation condition and was introduced by Denker and Keller [1986].The kernel h(x, y) = (x − y)/∥x − y∥ will fulfill the condition, as long as there exists a constant C such that P (∥X 1 − x∥ ≤ ϵ) ≤ Cϵ for all x ∈ H and ϵ > 0. This can be proved along the lines of Remark 2 in Dehling et al. [2022].P (∥X 1 − x∥ ≤ ϵ) ≤ Cϵ for all x ∈ H, ϵ > 0 does not hold if the distribution of X 1 has points with positive mass, but it still can hold if the distribution is concentrated on finite-dimensional sub-spaces.
Definition 6 (Variation condition).The kernel h fulfills the variation condition if there exist L, ϵ 0 > 0 such that for every ϵ ∈ (0, ϵ 0 ): Finally, we will need Hoeffding's decomposition of the kernel to be able to define the limit distribution: Definition 7 (Hoeffding's decomposition).Let h : H×H → H be an antisymmetric kernel.Let X, X be two i.i.d.random variables with the same distribution as X 1 .Hoeffding's decomposition of h is defined as Now we can state our first theorem on the asymptotic distribution of our test statistic under the null hypothesis (stationarity of the time series): Theorem 1.Let (X n ) n∈Z be stationary and P-NED on an absolutely regular se- ) and for some δ > 0. Assume that h : H 2 → H is an antisymmetric kernel that fulfills the variation condition and is either bounded or has uniform (4+δ)-moments under approximation.Then it holds that where W is an H-valued Brownian motion and the covariance operator S of W (1) is given by For the kernel h(x, y) = x − y, we obtain as a special case a limit theorem for the functional CUSUM-statistic similar to Corollary 1 of Sharipov et al. [2016] (although our assumptions on near epoch dependence are stronger).In the next section, we will compare the Wilcoxon-type statistic and the CUSUM-statistic with a simulation study.The proofs of the results can be found in Section 5.The next theorem will show that the test statistic converges to infinity in probability under some alternatives, so a test based on this statistic consistently detects these type of changes.
For this, we consider the following model: We have a stationary, H ⊗ H-valued sequence (X n , Z n ) n∈Z and we observe Y 1 , ..., Y n with so λ ⋆ ∈ (0, 1) is the proportion of observations after which the change happens.If the distribution of X i and Z i is not the same, then the alternative hypothesis holds: A simple example might be Z i = X i + µ, where µ ∈ H and µ ̸ = 0.However, let us point out that not all changes in distribution can be consistently detected.The change is detectable, if E[h(X 1 , Z1 )] ̸ = 0 for an independent copy Z1 of Z 1 .For example, with the kernel h(x, y) = x − y and Z i = X i + µ with µ ̸ = 0, the change is always detectable.
Theorem 2. Let (X n , Z n ) n∈Z be P-NED on an absolutely regular sequence ) and Assume that h : H 2 → H is an antisymmetric kernel that fulfills the variation condition and is either bounded or has uniform (4 + δ)-moments under approximation for both processes (X n ) n∈Z and (Z n ) n∈Z , that E[∥h(X 1 , Z1 )∥ 4+δ ] < ∞, and that E[h(X 1 , Z1 )] ̸ = 0, were Z1 is an independent copy of Z 1 .Then These results on the asymptotic distribution can not be applied directly in many practical applications, because the covariance operator is unknown.For this reason, we introduce the dependent wild bootstrap for non-degenerate U -statistics: Let (ε i,n ) i≤n,n∈N be a rowwise stationary triangular scheme of N (0, 1)-distributed variables (we often drop the second index for notational convenience: The bootstrap version of our U -statistic is then Theorem 3. Let the assumptions of Theorem 1 hold for (X n ) n∈Z and h : H 2 → H. Assume that (ε i,n ) i≤n,n∈N is independent of (X n ) n∈Z , has standard normal marginal distribution and Cov(ε i , ε j ) = w(|i − j|/q n ), where w is symmetric and continuous with w(0) = 1 and where W and W ⋆ are two independent, H-valued Brownian motions with covariance operator as in Theorem 1.
From this statement, it follows that the bootstrap is consistent and it can be evaluated using the Monte Carlo method.If you generate several copies of the bootstraped test statistic independent conditional on X 1 , .., X n , the empirical quantiles of the bootstraped test statistics can be used as critical values for the test.For a deeper discussion on bootstrap validity, see Bücher and Kojadinovic [2019].Of course, in practical applications, the function w and the bandwidth q n have to be chosen.We will apply a method by Rice and Shang [2017] for the bandwidth selection.
Instead of using multipliers with a standard normal distribution, one might also choose other distributions for (ε i,n ) i≤n,n∈N .This is done for the traditional wild bootstrap to capture skewness.Under the hypothesis, the distribution of h(X i , X j ) is close to symmetric for i and j far apart, so we do not expect a large improvement by non-Gaussian multipliers and limit our analysis in this paper to the case of Gaussian multipliers.

Data Example and Simulation Results
Bootstrap procedure.Since no theoretical values of the limit distribution of our test-statistic exist, we perform a bootstrap to find critical values for a testdecision.The procedure to find the critical value for significance level α ∈ (0, 1) is the following: • Calculate h(X i , X j ) for all i < j • For each of the bootstrap iterations t = 1, ..., m: , where (ε

reject the null hypothesis
To ensure a certain covariance structure within the multiplier (that fulfills the assumptions of the multiplier theorem), we calculate them as (ε where η 1 , ..., η i are i.i.d.N (0, 1)-distributed and A is the square root of the quadratic spectral covariance matrix constructed with bandwidth-parameter q (chosen with the method by Rice and Shang [2017] described below).That means AA t = B, where B has the entries Bandwidth.We use a data adapted bandwidth parameter q adpt in the bootstrap which is evaluated for each simulated data sample X 1 , ..., X n by the following procedure: • Calculate X1 , ..., Xn where Xi = Xi ⊗ Xk for k = 1, ..., q 0 , where ⊗ is the outer product w is a kernel function, we use the quadratic spectral kernel − cos( 6πk/q 5 ) • Receive the data adapted bandwidth For theoretical details about the data adapted bandwidth we refer to Rice and Shang [2017].
Data example.We look at data of 344 monitoring stations of the 'Umweltbundesamt' for air pollutants located all over Germany (Source: Umweltbundesamt, https://www.umweltbundesamt.de/daten/luft/luftdaten/stationenAccessed on 06.08.2020).The particular data is the daily average of particulate matter with particles smaller than 10µm (P M 10 ) measured in µg/m 3 from January 1, 2020 to May 31, 2020.This means we have n = 152 observations and treat the measurements of all stations on one day as a data from R 344 .
Since the official restrictions of the German Government in course of the COVID-19 pandemic came into force on March 22, 2020, an often asked question was whether these restrictions (social distancing, closed gastronomy, closed/reduced work or work from home) had an effect on the air quality in Germany.This question comes from the assumption that the restrictions lead to reduced traffic, resulting in reduced amount of particulate matter.
There are several publications from various countries studying the effects of lockdown measures on air pollution parameters like nitrogen oxides (N O, N O 2 ), ozone (O 3 ) and particulate matter (P M 10 , P M 2.5 ).For example, Lian et al. [2020] investigated data from the city of Wuhan, or Zangari et al. [2020] for New York City.Data for Berlin, as for 19 other Cities around the world, are investigated by Fu et al. [2020].They observed a decline in particular matter (P M 10 and P M 2.5 , only significant for P M 2.5 ) in the period of lockdown.But the observed time period is rather short (one month -Mar.17 to Apr. 19, 2020) and the findings for a densely populated city may not simply be transferred to the whole of Germany.In contrast to that, we use data from measuring stations located across the whole country and over a period of five months.
Looking at the empirical p-values of the CUSUM test and the Wilcoxon-type test (based on spatial signs) resulting from m = 3000 Bootstrap iterations in Table 1, we see that with CUSUM, the null hypothesis H 0 is never rejected for any significance level α < 0.2.But the Wilcoxon-type test rejects H 0 for significance level α larger than 0.03.
Since the data exhibits a massive outlier located at January 1 (likely New Year's firework), we repeated the test procedure without the data of this day.
We observed that the resulting p-value for the Wilcoxon-type test changed just slightly (Table 2).Whereas the p-value for CUSUM decreased notably -it is now around 0.08.In this example we see that CUSUM is clearly more influenced by the outlier in the data than the spatial signs based test.Evaluation showed that the data adapted bandwidth was set to q adpt = 3 for both the CUSUM test and the Wilcoxon-type test for both scenarios.
p-values (data excluding Jan. 1) CUSUM Spatial Sign 0.078 0.030 Table 2. Empirical p-values for CUSUM and spatial sign test with data adapted bandwidth for data excluding January 1, 2020.m = 3000 Bootstrap iterations were used.
A natural approach to estimate the location k of the change-point, is to determine the smallest 1 ≤ k < n for which the test statistic attains its maximum: The maximum of the spatial sign test statistic, which marks our estimated change point, is received at March 15, 2020.(The maximum of the CUSUM statistic is indeed located at the same point.)The estimated change-point in our example lies a week before the official restrictions regarding COVID-19 were imposed.One could argue that the citizen, being aware of the situation, changed their behaviour beforehand, without strict official restrictions.Data projects using mobile phone data (e.g Covid-19 Mobility Project and Destatis) indeed show a decline in mobility preceding the official restrictions on March 22 by around a week.(see https://ww w.covid-19-mobility.org/de/data-info/,https://www.destatis.de/DE/Service/EXDAT/Datensaetze/mobilitaetsindikatoren-mobilfunkdaten.html)But if we look at our data (Fig. 1), one gets the impression that a change in mean would rather be upwards than downwards, meaning that the daily average pollution increased after March 15, 2020 compared to the beginning of the year.Indeed, after averaging over the 344 monitoring stations and applying the twosample Hodges-Lehmann estimator to the resulting one-dimensional time series, we estimate the average increase to be 3.8 µg/m 3 .However, our test does not reject the null hypothesis when applied to this one-dimensional time series.
Similar findings about in increase in P M 10 were made by Ropkins and Tate [2021].They studied the impact of the COVID-19 lockdown on air quality across the UK.While using long-term data (Jan.2015 to Jun. 2020) from Rural Background, Urban Background and Urban Traffic stations, they observed an increase for P M 10 and P M 2.5 while locking down.Noting that this trend is "highly inconsistent with an air quality response to the lockdown", they discussed the possibility that the lockdown did not greatly limit the largest impacts on particulate matter.We assume that the findings are to some extend comparable to Germany due to the similar geographic and demographic characteristics of the countries.
Furthermore, the German 'Umweltbundesamt' states that traffic is not the main contributor to P M 10 in Germany (anymore) and other sources of particulate matter (e.g.fertilization, Saharan dust, soil erosion, fires) can overlay effects of reduced traffic (source: https://www.umweltbundesamt.de/faq-auswirkungen-der-corona-krise-auf-die#welche-auswirkungen-hat-die-corona-krise-auf -die-feinstaub-pm10-belastung).It is known that one mayor meteorological effect on particulate matter is precipitation, since it washes the dust out of the air (scavenging).Comparing the data with the meteorological recordings (Fig. 2  Comparing this findings with Figure 1, we can see that it fits the data quite well.Especially in February and the first half of March, with higher quantity of precipitation, we have relatively low quantity of P M 10 .Beginning with the drought weather, the concentration of P M 10 goes up and especially the bottom-peaks are now higher than before, meaning that days with a concentration of P M 10 as low as in the beginning of the year are clearly more rare. We like to note that this findings do not contradict the satellite data published by ESA (e.g.https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-5P/Air_pollution_remains_low_as_Europeans_stay_at_h ome) which shows a reduced air pollution over Europe in 2020 compared to 2019.While the satellites measure atmospheric pollution, the data of the 'Umweltbundesamt' is collected at stations at ground level.It is known that there is a difference between these two sorts of pollution.
Simulation Study.In this section we report the results of our simulation study.We compare size and power performance of our test statistic with the well established CUSUM.To do so, we construct different data examples which are described below.Note that we can easily adapt the bootstrap and the adapted bandwidth procedure described above to CUSUM by using h(x, y) = x−y instead of the spatial sign kernel function h(x, y) = (x − y)/∥x − y∥.
Generating Sample.We use a functional AR(1)-process on [0, 1], where the innovations are standard Brownian motions.We use an approximation on a finite grid with d grid points, if not indicated otherwise.To be more precise, we simulate data as follows: The scalar a ∈ R is an AR-parameter, we use a = 1.The first (BI + 1) simulations are not used.Through this simulation structure we achieve dependence within n and d.We consider n = 200 and d = 100 if not shated otherwise.
Size.To calculate the empirical size, data simulation and test procedure via bootstrap is repeated S = 3000 times with m = 1000 bootstrap repetitions.We count the number of times the null hypothesis was rejected both for the CUSUM-type and the Wilcoxon-type statistic.By using S = 3000 simulation runs, the standard deviation of the rejection frequencies is always below 1% and is below 0.4% if the true rejection probability is at 5%.To analyse how good the test statistics performs if outliers are present or if gaussianity is not given, we study two additional simulations: • Data simulated as above, but with presence of outliers: .2n, 0.4n, 0.6n, 0.8n} 10X i i ∈ {0.2n, 0.4n, 0.6n, 0.8n} • Data simulated similar to the above, but with ξ i , ξ As we can see in Table 3, the Wilcoxon-type test and the CUSUM test perform almost similarly under normality, both are somewhat undersized, especially for a smaller size of n = 100, but also for n = 200 or n = 250.In the presence of outliers or for heavy-tailed data, the rejection frequency of the Wilcoxon-type test does not change much, see where u = (1, ..., 1) t .Scenario 2: Sinus-jump after n/2 of observations: Scenario 3: Uniform jump of +0.3 after n/2 observations in presence of outlier at 0.2n, 0.4n, 0.6n, 0.8n: Scenario 4: Heavy tails -In the simulation of (X i ) i≤n we use ξ i , ξ and a uniform jump of +5 after n/2 observations As in the analysis under null hypothesis H 0 , we chose m = 1000 bootstrap repetitions.The data simulation and test procedure via bootstrap is repeated S = 3000 times for each scenario and the number of times H 0 was rejected is counted to calculate the empirical power.To compare our test-statistic with CUSUM, we calculate the Wilcoxon-type test (spatial sign) and the CUSUM test simultaneously in each simulation run.
Comparing the size-power plots for both test statistics (Figure 3), we see that the Wilcoxon-type test outperforms the CUSUM test in Scenarios 1 and 2. For these two scenarios with a jump after one half of the observations, Wicoxon-type test provides similar empirical size and at the same time higher empirical power.In the third scenario, the jump with outlier in the data, we see that the CUSUM test shows a lower empirical size than the Wilcoxon-type test.But the spatial sign based test shows clearly more empirical power.In Scenario 4, we see that the CUSUM test barely provides any empirical power at all.Even for α = 0.1 CUSUM shows an empirical power < 0.04.In heavy contrast, the Wilcoxon-type test shows relatively large empirical power, being greater than 0.9 for α ≥ 0.025.
For exact values of the empirical power in each scenario, see Table 5 in the appendix.In the appendix can also be found a short examination of the behaviour of the test statistics if the change-point lies more closely to the beginning of the observations or if d is larger than n (Table 7).Here shall just be noted that the spatial sign based test suffers less loss in power than the CUSUM test if the change point lies closer to the edges or if d >> n.

Auxilary Results
4.1.Hoeffding Decomposition and Linear Part.The proofs will make use of Hoeffding's decomposition of the kernel h, so recall that Hoeffding's decomposition of h is defined as where X, X are independent copies of X 0 .It is well known that h 2 is degenerate, that means E[h 2 (x, X)] = E[h 2 (X, y)] = 0, see e.g.Section 1.6 in the book of Lee [2019].
Lemma 1 (Hoeffding's decomposition of U n,k ).Let h : H × H → H be an antisymmetric kernel.Under Hoeffding's decomposition it holds for the test statistic that Proof.To prove the formula for U n,k , we use Hoeffding's decomposition for h:

□
To use existing results about partial sums, we need to investigate the properties of the sequence (h 1 (X n )) n∈Z .
Lemma 2. Under the assumptions of Theorem 1, (h Proof.By Hoeffding's decomposition for h it holds that ∀x, Let X, X be independent copies of X 0 .Then by Jensen's inequality for conditional expectations and the variation condition We introduce the following notation: Let X n,k = f k (ζ n−k , ..., ζ n+k ) and Xn,k and independent copy of this random variable.Now, we can find the approximation constants of (h 1 (X n )) n by using (1) and some further inequalities: δ by the assumption on the P-NED coefficients = Ck −8 3+δ δ .
By taking the square root, we get the result: Proposition 1.Under Assumptions of Theorem 1 it holds: where (W (λ)) λ∈[0,1] is a Brownian motion with covariance operator as defined in Theorem 1.
Proof.We want to use Theorem 1 Sharipov et al. [2016] for (h 1 (X n )) n∈Z , so we have to check the assumptions: Assumption 1: (h 1 (X n )) n∈Z is L 1 -NED.We know by Lemma 2 that (h 1 (X n )) n∈Z is L 2 -NED.Thus, L 1 -NED follows by Jensen's inequality: δ .Assumption 2: Existing (4 + δ)-moments.This follows from the assumption of uniform moments under approximation: In the case that h is bounded, the same holds for h 1 .
Assumption 3: This holds directly by the assumed rate on the coefficients β m .
We have checked that all assumptions for Theorem 1 Sharipov et al. [2016] are fulfilled and since E[h 1 (X 0 )] = 0 because h is antisymmetric, the statement of the theorem follows.□ 4.2.Degenerate part.
Lemma 3.Under the assumptions of Theorem 1, there exists a universal constant C > 0 such that for every i, k, l ∈ N, ϵ > 0 it holds that where X i,l = f l (ζ i−l , ..., ζ i+l ).
Proof.By Lemma D1 Dehling et al. [2017] there exist copies (ζ ′ n ) n∈Z , (ζ ′′ n ) n∈Z of (ζ n ) n∈Z which are independent of each other and satisfy With the help of these, we can write (5) by using the triangle inequality.We will look at the three summands separately.
For abbreviation, we define For (3.A), we use Hölder's inequality together with our assumptions on uniform moments under approximation and get where we used property (2) of the copied series (ζ ′ n ) n∈Z , (ζ ′′ n ) n∈Z for the second to last inequality.For (3.B), we split up again: For the first summand, we use variation condition.For the second, notice that on B: by our moment assumptions and Hölder's inequality Combining the results for (3.A) and (3.B) we get .
We can now look at (4).Again, we split the term into two summands, (similar as for (3)) we use the variation condition for the first and Hölder's inequality for the second summand: Lastly, we split up (5) as well: ( Since on B it is X i+k+2l,l = X ′ i+k+2l,l and X i,l = X ′′ i,l , the second summand equals zero.For the first summand, we use Hölder's inequality again and the properties of (ζ ′ n ) n≤i+l , (ζ ′′ n ) n≤i+l , see (2): (5) ≤ 2M We can finally put everything together: □ Lemma 4.Under the assumptions of Theorem 1 it holds for any n 1 < n 2 < n 3 < n 4 and l = n Proof.The important step of the proof is to bound the left hand side expectation from above by a sum of E[∥h 2 (X i , X j ) − h 2 (X i,l , Y j,l )∥ 2 ] 1/2 terms.We can then use Lemma 3 to achieve the stated approximation.First note that And for j there are at most (n 4 − n 3 ) possibilities.So The analog holds for h 2 (X i,l , X j,l ).Thus, by Lemma 3. ( 6) Now set ϵ = l −8 3+δ δ and define β k = 1 if k < 0. Then by our assumptions on the approximation constants and the mixing coefficients So the statement of the lemma is proven.□ Lemma 5.Under the assumptions of Theorem 1, it holds for any n 1 < n 2 < n 3 < n 4 and l = n ∀i, j, ∈ N and Xi,l = f l ( ζi−l , ..., ζi+l ), where ( ζn ) n∈ζ is an independent copy of (ζ n ) n∈ζ .
Proof.For ( ζn ) n∈Z an independent copy of (ζ n ) n∈ζ , write Xi = f (( ζi+n ) n∈Z ).So ( Xi ) i∈Z is an independent copy of (X n ) n∈Z .We will use Hoeffding's decomposition and rewrite h 2 as h 2 (x, y) = h(x, y) − E[h(x, Xj )] − E[h( Xi , y)] and similarly for h 2,l .By doing so, we obtain Here E X denotes the expectation with respect to X, E = E X, X is the expectation with respect to X and X.We bound the two terms separately, starting with ( 8): Now, for the first summand, we obtain by using the variation condition for the first summand and Hölder's inequality for the second.By our moment and P-NED assumptions .
For (8.B) we use similar arguments: Putting these two terms together, we get . Bounding (7) works completely analogous, just with i and j interchanged, so .

All together this yields
.

So we finally get that
where the last line is achieved by setting ϵ = l −8 3+δ δ and similar calculations as in Lemma 4. □ Lemma 6.Under the assumptions of Theorem 1, it holds for any n 1 < n 2 < n 3 < n 4 and l = n For the definition of h 2,l , see Lemma 5.
Proof.In this proof, we want to use Lemma 1 Yoshihara [1976], which is the following: Let g(x 1 , ..., x k ) be a Borel function.For any 0 ≤ j ≤ k − 1 with for some δ > 0, where I = {i 1 , ..., i j }, I C = {i j+1 , ..., i k } and X ′ an independent copy of X, it holds that Now, for the proof of the lemma, first observe that we can rewrite the squared norm as the scalar product and thus: Again, the second expectation equals zero: Plugging this into (13), we get that Third case: Checking that (♢) holds true for I = {i 1 } and I C = {j 1 , j 2 } works completely similar to the second case.And noting that we have to condition on X i1,l , X ′ j2,l in this case, yields: We can conclude for the quadratic term: For a fixed m we have the following possibilities to choose: Since we assumed m = j 1 − i 1 , there are • at most n 2 − n 1 < n 4 possibilities for i 1 , so only 1 possibility for j 1 • at most (n 4 − n 3 ) possibilities for j 2 , so at most m possibilities for i 2 , since by the definition of m the value j 2 − i 2 is smaller (or equal) than m.
So, recalling that δ = δ/2, we have it works very similar.Just a few comments on what changes: We get in the first case I = {i 1 , j 1 , j 2 }, I C = {j 2 }, which leads to defining the function g(X i1,l , X j1,l , X i2,l , X ′ j2,l ) := ⟨h 2,l (X i1,l , X j1,l ), h 2,l (X i2,l , X ′ j2,l )⟩ and conditioning on X i1,l , X j1,l , X i2,l .For the second case it is )⟩ and we condition on X i1,l , X j1,l .This proves the lemma.□ Proposition 2. Under the assumptions of Theorem 1, it holds that a) Proof.Part a) We split the expectation with the help of the triangle inequality into three parts: We want to use Lemma 4-6 to bound the three terms.Because the summands of ( 15) are all positive, we have by Lemma 4 (16) can be bounded in the same way, using Lemma 5.For ( 17), the idea is to rewrite the double sum.First note that for So we can conclude by Lemma 6 that as n ≤ 2 s .By Theorem 1 Móricz [1976] (which also holds in Hilbert spaces) it follows that and by taking the square root This yields all together Part b) Recall that s is chosen such that n ≤ 2 s and thus n 3 2 ≤ 2 3s 2 .To prove almost sure convergence, it is enough to prove that for any ϵ > 0 We do this by using Markov's inequality and our result from a): By the Borel-Cantelli lemma follows the almost sure convergence so λ ⋆ ∈ (0, 1) is the proportion of observations after which the change happens.
We assume that the process (X i , Z i ) i∈Z is stationary and P-NED on an absolutely regular sequences (ζ n ) n∈Z .
Let h : H × H → H be an antisymmetric kernel and assume that E[h(X 0 , Z0 )] ̸ = 0, where Z0 is an independent copy of Z 0 and independent of X 0 .Since X 0 and Z0 are not identically distributed, Hoeffding's decomposition of h equals Lemma 7. Let the Assumption of Theorem 2 hold for (X i , Z i ) i∈Z and let h ⋆ 2 as defined in (19).Then it holds that where Z0 is an independent copy of Z 0 and independent of X 0 . Proof.
Proof.The proof follows the steps of Theorem 1. So, we have to check the assumptions of Theorem 1 Sharipov et al. [2016].We will do this for h ⋆ 1 (X i ), for h 1 (Z i ) everything holds similarly.First note that Corollary 1.Under assumptions of Theorem 1, it holds that is stochastically bounded.
Proof.This follows from Lemma 8 above: Both summands converge weakly to a Gaussian limit and are stochastically bounded.□ 4.4.Dependent Wild Bootstrap.
Proposition 3. Let (ε i ) i≤n,n∈N be a triangular scheme of random multiplier independent from Then under the Assumptions of Theorem 1, it holds that Proof.The statement follows along the line of the proofs of the Lemmas 5 to 6 and Proposition 2. For this, note that by the independence of (ε i ) i≤n,n∈N and (X i ) i∈Z and by Lemma 3 From this, we can conclude that for any n 1 < n 2 < n 3 < n 4 and l = n as in Lemma 4. Similary, we obtain (making use of the independence of (ε i ) i≤n and (X i ) i∈Z again) and along the lines of the proof of Lemma 5 for any n 1 < n 2 < n 3 < n 4 and l = n With the same type of argument, we also obtain the analogous result to Lemma 6: and then we can proceed as in the proof of Proposition 2. □ Lemma 9.Under the assumptions of Theorem 3, for any t 0 = 0 < t 1 < t 2 , ..., t k = 1 and any a 1 , ..., a k ∈ H To simplify the notation, we introduce a triangular scheme V i,n = ⟨a j , h 1 (X i )⟩ for i = ⌊nt j−1 ⌋ + 1, ..., i = ⌊nt j ⌋.By our assumptions, Cov(ε i , ε j ) = w(|i − j|/q n ), so we obtain for the variance condition on X 1 , ..., X n : This is the kernel estimator for the variance, which is consistent even for heteroscedastic time series under the assumptions of Jong and Davidson [2000].The L 2 -NED follows by Lemma 2. Note that the mixing coefficients for absolute regularity are larger than the strong mixing coefficients used by Jong and Davidson [2000], so their mixing assumption follows directly from ours.□ Proposition 4.Under the assumptions of Theorem 3, we have the weak convergence (in the space where W and W ⋆ are independent Brownian motions with covariance operator as in Theorem 1. Proof.We have to prove finite-dimensional convergence and tightness.As the tightness for the first component was already established in the proof of Theorem 1 of Sharipov et al. [2016], we only have to deal with the second component.The tightness of the partial sum process of h 1 (X i )ε i , i ∈ N, can be shown along the lines of the proof of the same theorem: For this note that by the independence of (ε i ) i≤n and X the rest follows as in Lemma 2.24 of Borovkova et al. [2001] and in the proof of Theorem 1 of Sharipov et al. [2016].
For the finite dimensional convergence, we will show the weak convergence of the second component conditional on h 1 (X i )ε i , i ∈ N, because the weak convergence of the first component is already established in Proposition 1.By the continuity of the limit process, it is sufficient to study the distribution for t 1 , .., t k ∈ Q ∩ [0, 1] and by the Cramér-Wold-device and the separability of H, it is enough to show the convergence of the condition distribution of 1 Gaussian with expectation 0 and variance converging to the right limit in probability by Lemma 9.
Using a well-known characterization of convergence in probability, for every subseries there is another subseries such that this convergence holds almost surely.So we can construct a subseries that the almost sure convergence holds for all k, t 1 , .., t k ∈ Q ∩ [0, 1] and all a 1 , ..., a k from the countable subset of H, so we can find a subseries such that the convergence of the finite-dimensional distributions holds almost surely.Thus, the finite-dimensional convergence of the conditional distribution holds in probability and the statement of the proposition is proved.□

Proof of Main Results
Proof of Theorem 1.We will bound the maximum from above by the sum of the degenerate and the linear part, using Hoeffding's decomposition, as shown in Lemma 1: by triangle inequality.For the degenerate part, we can use the convergence to 0 from Proposition 2: since convergence in probability follows from almost sure convergence.Now observe that we can write the linear part as By the continuous mapping theorem it follows that (x(λ)−λx( 1)

□
Proof of Theorem 2. We can bound the maximum from below using the reverse triangle inequality and then make use of previous results: where by using the reverse triangle inequality again.By Corollary 1 we know that is stochastically bounded.And by Lemma 7 it holds that But since E[h(X 0 , Z0 )] ̸ = 0 the last part diverges to infinity: n,k || conditional on X 1 , ..., X n .For this, we apply the Hoeffding decomposition: The second sum converges to 0 by Proposition 3. The first summand can be split into three parts with a short calculation: By Proposition 4 and the continuous mapping theorem, we have the weak convergence ∥W ⋆ (λ) − λW ⋆ (1)∥ conditional on X 1 , ..., X n .For the second part, note that Var 1 n for n → ∞ by our assumptions on q n .So Because the ε i are Gaussian, it follows that By Theorem 1 of Móricz [1976], we have E max  The size-power plots of Scenarios 5 and 6 (Figure 4) show that spatial sign based test suffers less loss in power than the CUSUM test if the change-point lies closer to the beginning of the observations or if d becomes larger than n.
In particular we see (Table 7) that in Scenario 5 with γ = 0.3, the power of both statistics is smaller than in Scenario 1 where the change-point is in the middle of the observations.Nevertheless, the empirical power of the spatial sign based test is still larger than the empirical power of CUSUM and for α = 0.1 spatial signs still provides empirical power of about 0.9.For γ = 0.15 we see a drastic decline in  power for both statistics, with empirical power smaller than 0.4 even for α = 0.1.Spatial sign nevertheless keeps a small advantage over CUSUM in this scenario.
In the last scenario we observe the situation of d >> n.For empirical size, we generated data as described in Chapter 3, but with n = 150 and d = 350 and received the values presented in Table 6.We see that the size of both statistics is even smaller than under Scenario 1.But looking at the empirical power (Table 7), we see a reduction of power for both statistics compared to Scenario 1.Nevertheless, we can still observe that the Wilcoxon-type test provides a greater empirical power than CUSUM.Particularly for α = 0.1, the test using spatial sign still shows a power of about 0.9.

Figure 1 .
Figure 1.Daily average of P M 10 in µg/m 3 for 344 monitoring stations from January 1, 2020 to May 31, 2020.Each line corresponds to one station.The blue vertical line is the estimated change-point location.The massive outlier at January 1 could result from New Year's fireworks.

Figure 2 .
Figure 2. Daily rainfall (precipitation) in mm in Germany averaged over 1637 weather stations.

□ 4 . 3 .
Results under Alternative.Recall our model under the alternative: (X n , Z n ) n∈Z is a stationary, H ⊗ H-valued sequence and we observe Y 1 , ..., Y n with 2 ||U n,k || has already been established in Theorem 1, it is enough to prove the convergence in distribution of max 1≤k<n 1 n 3/2 ||U ⋆ (X i ) is stochastically bounded, see Proposition 1.For the third part, we consider increments of the partial sum and bound the variance of increments similar as above by Var k i=l+1 ε i ≤ Ckq n .

Table 1 .
Empirical p-values for CUSUM and spatial sign test with data adapted bandwidth.m = 3000 Bootstrap iterations were used.

precipitation height from 1637 weather stations in Germany 01 January to 31 May 2020
) another explanation for the change-point gets visible: While January was relatively warm with few precipitation, February and first half of March had much of it.Beginning in the middle of March, a relatively drought period started and lasted through April and May.(Data extracted from DWD Climate Data Center (CDC): Daily station observations precipitation height in mm, v19.3, 02.09.2020.https: //cdc.dwd.de/portal/202107291811/mapview)

Table 4 .
In contrast, the CUSUM test is very conservative in these situations.

Table 3 .
Empirical size of CUSUM and spatial sign test with Gaussian data, significance level α and different sample sizes n.

Table 4 .
Empirical size of CUSUM and spatial sign test with significance level α, sample size n = 200 and different distributions.Power.To evaluate the performance of the test statistics in presence of a change in mean, we construct four scenarios.

Table 5 .
□ Empirical power of CUSUM and patial sign for different significance level α, Scenario 1-4.Appendix.The two additional scenarios to analyse what happens if the change point lies more closely to the beginning of the observations or if d >> n are designed as follows: Scenario 5: Uniform Jump of +0.3 after γn observations:

Table 6 .
Empirical size of CUSUM and spatial sign for different significance level α, Scenario 6.

Table 7 .
Empirical power of CUSUM and spatial sign for different significance level α, Scenario 5 and 6.