First of all, I would like to compliment the authors for writing this very fine survey paper on some recent developments in change-point analysis. There is a lot of current research activity devoted to change-point analysis, and many articles have appeared since the publication of the seminal book by Csörgő and Horváth (1997). Among the topics covered in the present paper are empirical process techniques, Darling–Erdős laws, changes in correlations, changes in regression parameters, sequential testing, panel models, and functional data. Horváth and Rice provide an excellent survey of some of the recent results, which will be helpful to all researchers in this area.

In my discussion, I will complement the paper by Horváth and Rice by presenting some recent results on U-statistics-based robust change-point tests for time series. In a series of papers, see Dehling and Fried (2012), Dehling et al. (2013a, b, c) and Betken (2014), we have investigated such tests and derived their asymptotic distribution, both in the case of short-range as well as long-range dependent time series.

We consider a model where the data are generated by \( X_i=\mu _i+\epsilon _i\), where \(\mu _i\) is an unknown signal and \(\epsilon _i\) is a stationary ergodic noise process with \(E(\epsilon _i)=0\). Given the observations \(X_1,\ldots ,X_n\), we wish to test the hypothesis of no change, i.e. \( H: \mu _1=\cdots =\mu _n\), against the alternative

$$\begin{aligned} A: \mu _1=\cdots =\mu _k\ne \mu _{k+1}=\cdots =\mu _n,\quad \text{ for } \text{ some } 1\le k\le n-1. \end{aligned}$$

The motivation for our change-point tests, as for many other change-point tests, arises from the two-sample problem that one obtains when the change point \(k\) is assumed to be known. In this situation, we are given two samples \(X_1,\ldots ,X_k,\) and \(X_{k+1},\ldots ,X_n\), and want to test for a difference in location. In this paper, we focus on three tests, namely the Gauss test, the Wilcoxon test and the Hodges–Lehmann test, which are associated with the test statistics

$$\begin{aligned} T_{n,1}(k)&= \frac{1}{n-k}\sum _{i=k+1}^n X_i -\frac{1}{k} \sum _{i=1}^k X_i \\ T_{n,2}(k)&= \sum _{i=1}^k \sum _{j=k+1}^n 1_{\{X_i\le X_j\}} \\ T_{n,3}(k)&= \hbox {median} \{(X_j-X_i): 1\le i\le k, k+1\le j\le n\}, \end{aligned}$$

respectively. When the change point is unknown, we consider some summary statistic, e.g. a weighted maximum \(\max _{1\le k\le n-1} |a_n(k)T_{n,i}(k)|\).

To derive the asymptotic distribution of these change-point test statistics, we study weak convergence of the processes \((T_{n,i}([n\lambda ]))_{0\le \lambda \le 1}\), properly normalized. For the first two statistics, this leads to the two-sample U-statistic process

$$\begin{aligned} U_{[n\lambda ], n-[n\lambda ]} =\frac{1}{[n\lambda ](n-[n\lambda ])} \sum _{i=1}^{[n\lambda ]} \sum _{j=[n\lambda ]+1}^n h(X_i,X_j), \quad 0\le \lambda \le 1. \end{aligned}$$

The Hodges–Lehmann estimator is the median of the empirical distribution of the pairwise differences \(X_j-X_i, 1\le i \le k < j\le n\). Thus, we are led to the study of the empirical distribution and the empirical quantiles of arbitrary kernels \(g(X_i,X_j), 1\le i\le k < j\le n\). We define the empirical distribution function

$$\begin{aligned} U_{[n\lambda ],n-[n\lambda ]}(t)=\frac{1}{[n\lambda ](n-[n\lambda ])} \sum _{i=1}^{[n\lambda ]} \sum _{j=[n\lambda ]+1}^n 1_{\{g(X_i,X_j)\le t \}}, \quad 0\le \lambda \le 1, t\in \mathbb {R}, \end{aligned}$$

and the quantile function \(Q_{[n\lambda ],n-[n\lambda ]}(p)=U^{-1}_{[n\lambda ],n-[n\lambda ]}(p)\), where \(U^{-1}\) denotes the generalized inverse.

We have obtained results both under the assumption of short-range dependent (SRD) as well as of long-range dependent (LRD) noise. We will first focus on the SRD case, where we assume that the noise has a representation as a functional of a \(\beta \)-mixing process \((Z_i)_{i\in \mathbb {Z}}\), i.e. that \(\epsilon _i=f(Z_i,Z_{i-1},\ldots )\), where \(f\) is a Lipschitz continuous function. This covers all relevant examples from time series analysis, such as ARMA and GARCH models. In addition, many deterministic dynamical systems such as expanding maps of the unit interval are covered as well.

Dehling et al. (2013a) have studied the asymptotic distribution of the two-sample U-statistics process in the case of SRD noise, extending results obtained earlier by Csörgő and Horváth (1988) in the case of IID noise. Under some technical conditions, concerning the \(\beta \)-mixing coefficients and the continuity of \(f\) and \(h\), and under the null hypothesis of no change, we could show that

$$\begin{aligned} \sqrt{n} \lambda (1-\lambda ) (U_{[n\lambda ], n-[n\lambda ]}-\theta )_{0\le \lambda \le 1} \mathop {\rightarrow }\limits ^{\mathcal {D}} ((1\!-\!\lambda )W_1(\lambda ) +\lambda (W_2(1)-W_2(\lambda )))_{0\le \lambda \le 1}, \end{aligned}$$

where \((W_1,W_2)\) is two-dimensional Brownian motion with covariance structure

$$\begin{aligned} \hbox {Cov}(W_i(\lambda ), W_j(\mu )) =(\lambda \wedge \mu ) \sum _{k\in \mathbb {Z}} \hbox {Cov} (h_i(X_0),h_j(X_k)), \end{aligned}$$

for \(i,j\in \{1,2\}\) and \(0\le \lambda ,\mu \le 1\). The functions \(h_i\) are the first-order terms in the Hoeffding decomposition of the kernel \(h\), given by \(h(x,y)=\theta +h_1(x)+h_2(y)+\psi (x,y)\). Here, \(\theta =Eh(X,Y)\), \(h_1(x)=Eh(x,Y)-\theta \), \(h_2(y)=Eh(X,y)-\theta \) and \(\psi (x,y)=h(x,y)-\theta -h_1(x)-h_2(y)\), where \(X,Y\) are independent random variables with the same distribution as \(X_1\).

Applying the above limit theorem for the two-sample U-process to the kernels \(h(x,y;t)=1_{\{ g(x,y)\le t\}}\), \(t\in \mathbb {R}\), one can obtain convergence of the two-sample empirical U-process \((\sqrt{n}\lambda (1-\lambda ) (U_{[n\lambda ],n-[n\lambda ]}(t)-U(t)))_{0\le \lambda \le 1}\), where \(U(t)=P(g(X,Y)\le t)\), for fixed \(t\). Using a Bahadur–Kiefer representation, one can then obtain convergence of the corresponding two-sample quantile process. In a forthcoming paper, Dehling et al. (2014) show that, again under some technical conditions, concerning the \(\beta \)-mixing coefficients and the continuity of \(f\), \(g\) and \(U\), and under the null hypothesis of no change, the two-sample quantile process

$$\begin{aligned} \sqrt{n}\lambda (1-\lambda ) (Q_{[n\lambda ],n-[n\lambda ] }(p) -Q(p))_{0\le \lambda \le 1} \end{aligned}$$

converges in distribution to the process \(((1-\lambda ) W_1(\lambda ) +\lambda (W_2(1)-W_2(\lambda ))_{0\le \lambda \le 1}\), where \((W_1(\lambda ),W_2(\lambda ))\) is two-dimensional Brownian motion with covariance function

$$\begin{aligned} \hbox {Cov}(W_i(\mu ),W_j(\lambda )) =\frac{\mu \wedge \lambda }{u^2(Q(p))} \sum _{k \in \mathbb {Z}} \hbox {Cov}(h_i(X_0,Q(p)), h_j(X_k,Q(p))). \end{aligned}$$

Here, \(Q(p)=U^{-1}(p)\) denotes the quantile function. Moreover, we denote the terms of the Hoeffding decomposition of \(h(x,y;t)\) by \(h_1(x;t)\) and \(h_2(y;t)\), and define \(u(t)=U^\prime (t)\).

When applying this result to the case of the Hodges–Lehmann change-point test, we can show that

$$\begin{aligned} \sqrt{n}\max _{1\le k\le n} \frac{k}{n}\left( 1-\frac{k}{n}\right) |\hbox {median}\{ (X_j-X_i): 1\le i\le k<j\le n \}| \end{aligned}$$

has asymptotically the same distribution as \(\frac{\sigma }{u(0)}\sup _{0\le \lambda \le 1} |W^{(0)}(\lambda )|\), where \(\sigma ^2=\sum _{k=-\infty }^\infty \hbox {Cov}(F(X_0),F(X_k))\) and \((W^{(0)}(\lambda ))_{0\le \lambda \le 1}\) denotes standard Brownian bridge.

Dehling et al. (2013b) have studied the asymptotic distribution of the Wilcoxon change-point test in the case of LRD noise. We specifically consider Gaussian subordinated processes, i.e. we assume that \(\epsilon _i=H(\xi _i)\), where \((\xi _i)_{i\ge 1}\) is a stationary Gaussian process with standard normal marginals and autocorrelation function \(\rho _k=k^{-D} L(k)\), \(0<D<1\), \(L\) slowly varying, and where \(H\) is some measurable real-valued function. Moreover, we define \(J_k(x)=E(1_{\{ H(\xi )\le x\}} H_k(\xi ))\), where \(H_k\) is the \(k\)th-order Hermite polynomial. The smallest integer \(m\) such that \(J_m(x)\not \equiv 0\) is called the Hermite rank. Define the normalizing constants \(d_n^{2}=\mathrm {Var}(\sum _{i=1}^n H_m(\xi _i))\) and recall that \(d_n^{2}\sim c n^{2H}L^m(n)\), where \(H:=1-mD/2\) is the Hurst coefficient.

Assuming that \(m<1/D\), Dehling et al. (2013b) showed that, under the null hypothesis of no change, the process

$$\begin{aligned} \frac{1}{n\,d_n}\left( \sum _{i=1}^{[n\lambda ]} \sum _{j=[n\lambda ]+1}^n \left( 1_{\{X_i\le X_j \}} -\frac{1}{2}\right) \right) _{0\le \lambda \le 1} \end{aligned}$$

converges in distribution to the process \(\frac{\int J_m(x) dF(x)}{m!} (Z_m(\lambda )-\lambda Z_m(1))_{0\le \lambda \le 1}\), where \((Z_m(\lambda ))_{0\le \lambda \le 1}\) denotes an \(m\)th-order Hermite process. In a recent paper, Betken (2014) gives a self-normalized version of the Wilcoxon change-point test which is asymptotically distribution free under the null hypothesis.

In a subsequent paper, Dehling et al. (2013c) have investigated the asymptotic distribution of the Wilcoxon and the CUSUM change-point test under local alternatives, and calculated their asymptotic relative efficiencies. In the case of Gaussian errors, the ARE equals \(1\), while for heavy-tailed data, the Wilcoxon test has superior power. For finite samples, these results are confirmed by simulations.