1 Introduction

The problem of a mean change at an unknown location in a sequence of observations has received considerable attention in the literature. For example, Sen and Srivastava [1], Hawkins [2], Worsley [3] proposed tests for a change in the mean of normal series. Yao [4] proposed some estimators of the change point in a sequence of independent variables. For serially correlated data, Bai [5] considered the estimation of the change point in linear processes. Horváth and Kokoszka [6] gave an estimator of the change point in a long-range dependent series.

Most of the existing results in the statistic and econometric literature have concentrated on the case that the innovations are Gaussian. In fact, many economic and financial time series can be very heavy-tailed with infinite variances; see e.g. Mittnik and Rachev [7]. Therefore, the series with infinite-variance innovations aroused a great deal of interest of researchers in statistics, such as Phillips [8], Horváth and Kokoskza [9], Han and Tian [10, 11]. It is more efficient to construct robust procedures for heavy-tailed innovations, such as the M procedures in Hušková [12, 13] and the references therein. De Jong et al. [14] proposed a robust KPSS test based on the ‘sign’ of the data minus the sample median, which behaves rather well for heavy-tailed series. In this paper, we shall construct a robust test for the mean change in a sequence.

The rest of this paper is organized as follows: Section 2 introduces the models and necessary assumptions for the asymptotic properties. Section 3 gives the asymptotic distribution and the consistency of the test proposed in the paper. In Section 4, we shall show the statistical behaviors through simulations. All mathematical proofs are collected in the Appendix.

2 Model and assumptions

In the following, we concentrate ourselves on the model as follows:

$$ Y_{t}=\mu(t)+X_{t},\quad \mu(t)=\left \{ \begin{array}{@{}l@{\quad}l} \mu_{1}, & t\leq{k_{0}},\\ \mu_{2}, & t>{k_{0}}, \end{array} \right . $$
(1)

where \(k_{0}\) is the change point.

In order to obtain the weak convergence and the convergence rate, \(X(t)\) satisfies the following.

Assumption 1

  1. 1.

    The \(X_{j}\) are strictly stationary random variables, and \(\tilde{\mu}\) is the unique population median of \(\{X_{t}, 1\leq{t}\leq{T}\}\).

  2. 2.

    The \(X_{j}\) are strong (α-) mixing, and for some finite \(r>2\) and \(C>0\), and for some \(\eta>0\), \(\alpha(m)\leq Cm^{-r/(r-2)-\eta}\).

  3. 3.

    \(X_{j}-\tilde{\mu}\) has a continuous density \(f(x)\) in a neighborhood \([-\eta,\eta]\) of 0 for some \(\eta>0\), and \(\inf_{x\in[-\eta,\eta]}f(x)>0\).

  4. 4.

    \(\sigma^{2}\in(0,\infty)\), where \(\sigma^{2}\) is defined as follows:

    $$\sigma^{2}=\lim_{T\rightarrow\infty}E \Biggl(T^{-1/2}\sum _{t=1}^{T}\operatorname{sgn}(X_{t}- \tilde{\mu}) \Biggr)^{2}. $$

To derive the CLT of sign-transformed data, we need a kernel estimator, so we make the following assumption on the kernel function.

Assumption 2

  1. 1.

    \(k(\cdot)\) satisfies \(\int_{-\infty}^{\infty}|\psi(\xi)|\,d\xi<\infty\), where

    $$\psi(\xi)\,d\xi=(2\pi)^{-1}\int_{-\infty}^{\infty}k(x) \exp(-it\xi)\,dx. $$
  2. 2.

    \(k(x)\) is continuous at all but a finite number of points, \(k(x)=k(-x)\), \(|k(x)|\leq l(x)\) where \(l(x)\) is nondecreasing and \(\int_{0}^{\infty}|l(x)|\,dx\leq\infty\), and \(k(0)=1\).

  3. 3.

    \(\gamma_{T}/T\rightarrow0\), and \(\gamma_{T}\rightarrow\infty\) as \(T\rightarrow\infty\).

Remark 1

De Jong et al. [14] test the stationarity of a sequence under Assumption 1. We detect change in the mean of a sequence, so Assumption 1 holds under the null hypothesis and the alternative one. Since there is no moment condition for \(X_{t}\) in Assumption 1, even Cauchy series are allowed. The α-mixing sequences can include many time series, such as autoregressive or heteroscedastic series under some conditions. Assumption 2 allows some choices such as the Bartlett, quadratic spectral, and Parzen kernel functions.

3 Main results

Let \(m_{T}=\operatorname{med}\{Y_{1},\ldots,Y_{T}\}\). Then we transform the data \(Y_{1},\ldots, Y_{T}\) into the indicator data \(\operatorname{sgn}(Y_{t}-m_{T})\), where \(\operatorname{sgn}(x)=1\) if \(x>0\), \(\operatorname{sgn}(x)=-1\) if \(x<0\), \(\operatorname{sgn}(x)=0\) if \(x=0\). Based on these indicator data, De Jong et al. [14] replace \(\hat{\epsilon}_{t}=Y_{t}-\bar{Y}_{T}\) with \(\operatorname{sgn}(Y_{t}-m_{T})\) in the usual KPSS test and their simulations show that the new KPSS test exhibits some robustness for the heavy-tailed series.

The popularly used test to detect a mean change is based on the CUSUM type as follows:

$$ \Xi_{T}(\tau)=\frac{[T\tau][T(1-\tau)]}{T^{2}} \Biggl\{ \frac{1}{[T\tau]}\sum _{t=1}^{[T\tau]}{Y_{t}} -\frac{1}{[T(1-\tau)]}\sum _{t=[T\tau]+1}^{T}{Y_{t}} \Biggr\} . $$
(2)

We rewrite \(\Xi_{T}(\tau)\) under \(H_{0}\) as

$$ \Xi_{T}(\tau)=\frac{[T\tau][T(1-\tau)]}{T^{2}} \Biggl\{ \frac{1}{[T\tau]} \sum_{t=1}^{[T\tau]}{(Y_{t}- \bar{Y}_{T})} -\frac{1}{[T(1-\tau)]}\sum_{t=[T\tau]+1}^{T}{(Y_{t}- \bar{Y}_{T})} \Biggr\} , $$
(3)

According to the idea of De Jong et al. [14], replace \(\hat{\epsilon}_{t}=Y_{t}-\bar{Y}_{T}\) with \(\operatorname{sgn}(Y_{t}-m_{T})\) in (3); then we get a robust version of CUSUM as follows:

$$ \Xi_{T}=\frac{[T\tau][T(1-\tau)]}{T^{2}} \Biggl\{ \frac{1}{[T\tau]}\sum _{t=1}^{[T\tau]}\operatorname{sgn}(Y_{t}-m_{T}) -\frac{1}{[T(1-\tau)]}\sum_{t=[T\tau]+1}^{T} \operatorname{sgn}(Y_{t}-m_{T}) \Biggr\} . $$
(4)

Then the test statistic proposed in this paper is

$$ \Gamma_{T}=T^{1/2}{\sigma}^{-1}\max _{\tau\in{(0,1)}}\bigl|\Xi_{T}(\tau)\bigr|. $$
(5)

Under Assumptions 1, 2, we can obtain two asymptotic results as follows.

Theorem 1

If Assumptions 1, 2 hold, then under the null hypothesis \(H_{0}\), we have

$$ T^{1/2}{\sigma}^{-1}\max_{\tau\in{(0,1)}}| \Xi_{T}| \quad\Longrightarrow\quad \sup_{\tau\in{(0,1)}}\bigl|W(\tau)- \tau W(1)\bigr|, \quad\textit{as } T\rightarrow\infty, $$
(6)

where ‘’ stands for the weak convergence.

Under the alternative hypothesis \(H_{1}\), a change in the mean happens at some time, we denote the time as \([T\tau_{0}]\). Let \(F(\cdot)\) be the common distribution function of \(X_{t}\) and \(\mu^{*}\) be the median of

$$ F^{*}(\cdot)=\tau_{0}F(\cdot-\mu_{1})+(1- \tau_{0})F(\cdot-\mu_{2}). $$
(7)

Then we have the following.

Theorem 2

If Assumptions 1, 2 hold, then under the alternative hypothesis \(H_{1}\), we have

$$ \max_{\tau\in{(0,1)}}\bigl|\Xi_{T}(\tau)\bigr| \stackrel{P}{ \rightarrow}\tau_{0}(1-\tau_{0})|\Delta|, $$
(8)

where \(\Delta=F(\mu^{*}-\mu_{1})-F(\mu^{*}-\mu_{2})\).

Remark 2

By Theorem 1, we reject \(H_{0}\) if \(\Gamma_{T}>c_{p}\), where the critic value \(c_{p}\) is the \((1-p)\) quantile of the Kolmogorov-Smirnov distribution. By Theorem 2, \(\Gamma_{T}\) is consistent asymptotically as the sample size \(T\stackrel{P}{\rightarrow}\infty\).

In order to apply the test in (5), we employ the HAC estimator instead of the unknown \(\sigma^{2}\) as

$$ \hat{\sigma}_{T}^{2}=T^{-1}\sum _{i=1}^{T}\sum_{j=1}^{T}k \bigl((i-j)/\gamma_{T}\bigr)\operatorname{sgn}(Y_{i}-m_{T}) \operatorname{sgn}(Y_{j}-m_{T}), $$
(9)

then the following theorem proves two results of the estimator \(\hat{\sigma}_{T}^{2}\) under \(H_{0}\) and \(H_{A}\), respectively.

Theorem 3

(i) Assuming that the conditions of Theorem  1 hold, then we have, as \(T\rightarrow\infty\),

$$ \hat{\sigma}^{2}_{T}\stackrel{P}{\rightarrow} \sigma^{2}. $$
(10)

(ii) Assuming that the conditions of Theorem  2 hold, then we have, as \(T\rightarrow\infty\),

$$ \hat{\sigma}^{2}_{T}\stackrel{P}{\rightarrow} \sigma_{2}^{2}, $$
(11)

where \(\sigma^{2}_{2}\) is defined as follows:

$$\sigma^{2}_{2}=\lim_{T\rightarrow\infty}E \Biggl(T^{-1/2}\sum_{t=1}^{T} \operatorname{sgn}\bigl(Y_{t}-\mu^{*}\bigr) \Biggr)^{2}. $$

4 Simulation and empirical application

4.1 Simulation

In this section, we present Monte Carlo simulations to investigate the size and the power of the robust CUSUM and the ordinary CUSUM tests. Since a lot of information has been lost during the inference by using the indicator data instead of the original data, so we are concerned whether the indicator CUSUM test is robust to the heavy-tailed sequences; moreover, we may ask: how large is the loss in power in using indicators when the data has a nearly normal distribution? The HAC estimator \(\hat{\sigma}^{2}\) in the robust CUSUM test is a kernel estimator, so it is important to analyze whether the performance is affected by the choice of the kernel function \(k(\cdot)\) and the bandwidth \(\gamma_{T}\).

We consider the model as follows:

$$ Y_{t}=\left \{ \begin{array}{@{}l@{\quad}l} 0+X_{t}, & t\leq{T\tau_{0}},\\ \mu_{2}+X_{t}, & t>{T\tau_{0}}, \end{array} \right . $$
(12)

\(X_{t}\) is an autoregressive process \(X_{t}=0.5X_{t-1}+e_{t}\), where the \(\{e_{t}\}\) are independent noise generated by the program from JP Nolan. We vary the tail thickness of \(\{e_{t}\}\) by the different characteristic indices \(\alpha=1.97,1.83,1.41,1.14\), respectively. Accordingly the break times are \(\tau_{0}=0.3, 0.5\), respectively. During the simulations, we adopt 1.358 as the asymptotic critical value of \(\sup_{\tau\in{(0,1)}}|W(\tau)-\tau W(1)|\) at 95% for the various sample sizes \(T=300,500, 1{,}000\).

First, we consider the size of the tests. Tables 1 and 2 report the results when \(\sigma^{2}\) are estimated by the Bartlett kernel and the quadratic spectral kernel with the bandwidth \(\gamma_{T}=[4(T/100)^{1/4}]\) and \(\gamma_{T}=[8(T/100)^{1/4}]\), respectively, in \(1{,}000\) repetitions. From Tables 1 and 2, the ordinary CUSUM test based on the Bartlett kernel has better sizes, however, the one based on the quadratic spectral kernel has a severe problem of overrejection, so we can conclude that the choice of the kernel function has higher impact on the sizes of the two CUSUM tests than the selection of the bandwidth. Comparing the two tests based on the Bartlett kernel, the ordinary CUSUM test becomes underrejecting as the tail index α changes from 2 to 1, and the sizes of the robust test are closer to the nominal size 0.05. Furthermore, the size is closer to 0.05 as the sample size T increases, which is consistent with Theorem 1.

Table 1 The empirical levels of the robust CUSUM test and the CUSUM test for dependent innovations
Table 2 The empirical levels of the robust CUSUM test and the CUSUM test for dependent innovations

Now we shall show the power of the two tests through empirical powers. The empirical powers are calculated based on the rejection numbers of the null hypothesis \(H_{0}\) in \(1{,}000\) repetitions when the alternative hypothesis \(H_{1}\) holds. The results are included in Tables 3, 4, 5, 6. On the basis of Tables 3, 4, 5, 6, we can draw some conclusions. (i) The two CUSUM tests based on the Bartlett kernel and the quadratic spectral kernel become more powerful as the sample size T becomes larger. (ii) As the tail of the innovations gets heavier, the ordinary CUSUM test becomes less powerful, especially, the test hardly works, while the CUSUM test based on indicators is rather robust to the heavy-tailed innovations. (iii) The selection of the bandwidth has lower impact on the powers of the two CUSUM tests.

Table 3 The empirical powers of the robust CUSUM test and the CUSUM test for dependent innovations
Table 4 The empirical powers of the robust CUSUM test and the CUSUM test for dependent innovations
Table 5 The empirical powers of the robust CUSUM test and the CUSUM test for dependent innovations
Table 6 The empirical powers of the robust CUSUM test and the CUSUM test for dependent innovations

Finally, we consider the effects of the skewness in the innovations \(\{ e_{t}\}\) on the power of the proposed test through simulations. In order to obtain the results reported in Table 7, we take the \(e(t)\) in the model (12) as chi square distributions with a freedom degree \(n=1,2\mbox{ and }10\), respectively. On the basis of the simulations, the skewness of the innovations affects the powers the two CUSUM test significantly.

Table 7 The empirical powers of the two CUSUM test for the skewed dependent innovations

4.2 Empirical application

In this section, we take an empirical application on a series of daily stock price of LBC (SHANDONG LUBEI CHEMICAL Co., LTD) in the Shanghai Stocks Exchange. The stock prices in the group are observed from July 1st, 2004 to December 30th, 2005 with samples of 367 observations (as shown in Figure 1) and can be found in http://stock.business.sohu.com. As in Figure 2, the logarithm sequence is seen to exhibit a number of ‘outliers’, which are a manifestation of their heavy-tailed distributions, see Wang et al. [15]; the data can be well fitted by stable sequences.

Figure 1
figure 1

Stock prices of LBC in Shanghai Stock Exchange.

Figure 2
figure 2

The logarithm return rates of LBC in Shanghai Stock Exchange.

Fitting a mean and computing the test proposed in this paper \(\Gamma _{1}=4.2123>1.358\), which indicates that a change in mean occurred, and \(\Xi_{T}(k)\) attains its maximum at \(k_{0}=175\) (21st, March, 2004) (as shown in Figure 3). Recall that LBC issued an announcement that its net profits in 2005 would decrease to 50% of that in 2004, in the 3rd Session Board of Directors’ 17th Meeting on March 8th, 2005 (\(k_{1}=166\)). The influence of the bad news was so strong that the stock price fell immediately in the following nine days, the mean of the logarithm return rate has a significant change after \(k_{0}=175\).

Figure 3
figure 3

The robust CUSUM values of LBC in Shanghai Stock Exchange.

5 Concluding remarks

In this paper, we construct a nonparametric test based on the indicators of the data minus the sample median. When there exists no change in the mean of the data, the test has the usual distribution of the sup of the absolute value of a Brownian bridge. As Bai [5] pointed out, it is a difficult task in applications of autoregressive models. First, the order of an autoregressive model is not assumed to be known a priori and has to be estimated. Second, the often-used way to determine the order via the Akaike information criterion (AIC) and the Bayes information criterion (BIC) tends to overestimate its order if a change exists. However, the proposed test does not rely on the precise autoregressive models and the prior knowledge on the tail index α, so the proposed test is more applicable, although there exists a little distortion in its size for dependent sequences.