1 Introduction

It is a common task—not only in statistics—to provide procedures for detecting and estimating changepoints in all kinds of mathematical and stochastic models. Such procedures are also important from a practical point of view and they may be often crucial in many real life problems. For instance, detecting a changepoint in some data generating model may trigger some model retraining mechanisms or, more frequently, it may govern important decisions effecting specific subjects or even the whole population—such as different pandemic restrictions related to the recent Covid-19 infection spread. On the other hand, the estimation of changepoints may lead to some correction procedures, specific treatment implementations, additional target-specific decisions, or just a deeper understanding of the underlying data generating process.

Considering the basic stochastic principles of the changepoint detection and various estimation methods, two different approaches are usually adopted in practical implementations. If the whole data sample is available at the very beginning of the analysis, the detection algorithm is called an offline procedure. If the data arrive in time (usually in an observation-by-observation manner) and the changepoint detection algorithm runs concurrently as new observations appear, such algorithms are referred to as online procedures.

In this paper, we focus on the online regime, where the proposed changepoint detection algorithm will be applied for a nonlinear parametric regression model. In addition to this nonlinearity, the conditional expectile estimation of the unknown parameters is adopted—similarly as in Newey and Powell (1987) where, however, the authors investigated a simple linear model instead—to have a coherent risk measure while also accounting for possibly asymmetric random error distributions. The changepoint detection itself is performed in terms of a consistent statistical test which is based on an accumulating dataset used in each consecutive step of the proposed online procedure.

There is a vast literature available on both—the offline and online changepoint detection strategies considering different models and various technical assumptions. Bearing in mind just the online procedures, Nedényi (2018) proposed an online testing approach based on a CUSUM test statistic to detect changes in a parameter of a discrete-time stochastic process. Linear regression models with independent error terms are considered in Chu et al (1996) and Horváth et al (2004), where a standard least squares estimator is employed. Possible detection delays in a sequential changepoint test for a multiple linear regression model are discussed in Aue et al (2009). Linear regression models with dependent observations are investigated in Fremdt (2015) and the online changepoint detection procedures within autoregressive times series are studied, for instance, in Hušková et al (2007). Some generalizations for multivariate cases can be found in Aue et al (2009) or Hoga (2017) and their results are further generalized in Barassi et al (2020) where a semiparametric CUSUM test is proposed to perform the online changepoint detection for various correlation structures of nonlinear multivariate regression models with dynamically evolving volatilities. Nonlinear integer-valued times series are also discussed from this perspective in Lee and Lee (2019). A very nice overall review of the online procedures can be also found in Basseville and Nikiforov (1993).

The method presented in this paper advocates the idea of semi-parametric CUSUM approaches in a combination with some robustness with respect to the underlying error terms. Firstly, a nonlinear regression model is assumed to govern the data generating process. Although the underlying regression function is deterministic, it is allowed to be nonlinear with respect to a set of unknown parameters. This introduces a relatively flexible class of possible functions. Second, despite the independent error terms assumed for the proposed online detection regime, there are no restrictive assumptions imposed on the underlying error distribution and, in particular, substantial robustness is achieved with the proposed expectile estimation that also allows for asymmetric and heavy-tailed error distributions. The conditional expectiles define the only coherent and elicitable risk measure (see, for instance, Bellini et al (2018) or Ziegel (2016)) which is particularly important in situations where some risk related assessment is needed. Moreover, despite many similarities with conditional quantiles, the conditional expectiles are well-known to be viable also in situations when the conditional quantiles fail (see Philipps (2022) for a more comprehensive comparison). Third, the proposed test statistic follows, under the null hypothesis of no change, a relatively simple distribution which does not depend on the underlying regression function nor the set of the unknown parameters. Finally, the whole procedure can be implemented in a straightforward way and all necessary calculations performed within the proposed online regime can be easily obtained. Thus, the presented real-time changepoint detection method has a great potential for a practical applicability which goes way beyond the Covid-19 example illustrated at the end.

The rest of the paper is structured as follows: The underlying data and the corresponding changepoint model are described in the next section. A real-time changepoint detection in terms of a formal statistical test is introduced in Sect. 3. The asymptotic properties of the proposed test are also detailed there. In Sect. 4, finite sample properties are investigated and the Covid-19 prevalence data from Prague, Czech Republic, are analysed using the proposed methodological framework. Section 5 concludes with some final remarks. All theoretical proofs and further technical details are postponed to the Appendix.

2 Asymmetric least squares with changepoint

Let us consider a set of historical data denoted as \(\{(Y_i, \textbf{X}_i^\top )^\top ;~i = 1, \dots , m\}\) for some deterministic q-dimensional vector of explanatory variables \(\varvec{X}_i = (X_{i 1}, \dots , X_{i q})^\top \) and some integer \(m \in \mathbb {N}\). The data are assumed to follow a general nonlinear parametric regression model

$$\begin{aligned} Y_i=f(\textbf{X}_i, {\varvec{\beta }})+\varepsilon _i, \qquad i=1, \dots , m, \end{aligned}$$
(1)

where \(f(\cdot , {\varvec{\beta }})\) is an explicit function depending on some unknown vector parameter \(\varvec{\beta } = (\beta _1, \dots , \beta _p)^\top \in \Gamma \subseteq \mathbb {R}^{p}\) with the true (unknown) value denoted as \({\varvec{\beta }}^0 \in \mathbb {R}^p\). A different approach could consider \(\varvec{X}_i\)’s as random vectors, however, we concentrate on the fixed design as we want to adopt a robust (i.e., distribution-free) approach with only minimal assumptions being imposed on the underlying data distribution. Nevertheless, with respect to the forthcoming theory, analogous results for the random design can be derived as well (all under some technical assumptions needed for the deterministic convergences to become convergences in probability).

After the historical data are observed, another \(T_m \in \mathbb {N}\) observations are measured instantly for both—the response variable \( Y_i\) and the explanatory vector \(\textbf{X}_i \in \varUpsilon \subseteq \mathbb {R}^q\), both for \(i = m + 1, \dots , m + T_m\). The underlying model for these new observations—online data—is assumed to take an analogous form

$$\begin{aligned} Y_i=f(\textbf{X}_i, {\varvec{\beta }}_i)+\varepsilon _i, \qquad i=m+1, \dots , m+T_m, \end{aligned}$$
(2)

where the underlying regression functional form remains the same and \({\varvec{\beta }}_i \in \mathbb {R}^p\). For the parameter vectors \(\{\varvec{\beta }_i\}_{i = m + 1}^{m + T_m}\) in (2), it is either assumed that their true (unknown) values are all equal to \({\varvec{\beta }}^0\) (thus, there is no changepoint present in the overall combined model (1) and (2)) or, instead, there is some specific index \(k_m^0 \in \{1, \dots , T_m - 1\}\) such that \({\varvec{\beta }}_i = {\varvec{\beta }}^0\) for all \(i = m + 1, \dots , m + k_m^0\), while \({\varvec{\beta }}_i \ne {\varvec{\beta }}^0\) for \(i = m + k_m^0 + 1, \dots , m + T_m\). In such case, there is a changepoint (located at \(k_m^0\)) present in the model generating the online data \(\{(Y_i, \textbf{X}_i^\top )^\top ;~i = m + 1, \dots , m + T_m\}\).

The error terms \(\{\varepsilon _i\}_{1 \leqslant i \leqslant m+T_m}\) from the overall model (1) and (2) are assumed to be independent and, moreover, they all follow the same distribution. A generic random error term from the underlying distribution is denoted as \(\varepsilon \). The idea is to use the historical data to estimate the unknown parameter vector \({\varvec{\beta }}\in \mathbb {R}^p\). Later, the online data—starting from the observation index \(i = m+1\)—are measured in real-time while asking a question for each new observation \(i \ge m+1\) whether the underlying model remains unchanged (i.e., \({\varvec{\beta }}_i={\varvec{\beta }}^0\)) or there is some change detected in terms of the unknown parameter vectors \({\varvec{\beta }}_i \in \mathbb {R}^p\). If there is no changepoint detected for the given i then all available observations are used in the next step to ask the same question regarding the new—most recent observation. The whole changepoint detection process stops at the first observation \(i \in \{m + 1, \dots , m + T_m\}\) for which there is a statistical evidence that \({\varvec{\beta }}_i \ne {\varvec{\beta }}^0\).

From a formal theoretical point view, at the first step, the historical data \(\{(Y_i, \textbf{X}_i^\top )^\top :\,i = 1, \dots , m\}\) are used to obtain a conditional expectile estimator for the unknown parameter vector \({\varvec{\beta }}\in \mathbb {R}^p\). In particular, for a given expectile index \(\tau \in (0,1)\) the expectile function is defined as

$$\begin{aligned} \rho _\tau (x)= \bigg | \tau - \mathbb {I}_{\{x <0\}} \bigg | x^2, \qquad \text {for} \quad x \in \mathbb {R}, \end{aligned}$$
(3)

and the corresponding expectile estimator of the unknown (true) parameter vector \({\varvec{\beta }}^0 \in \mathbb {R}^p\) from the model in (1) is defined as

$$\begin{aligned} \widehat{\varvec{\beta }}_m\equiv {\mathrm {arg\,min}}_{{\varvec{\beta }}\in \mathbb {R}^p} \sum ^m_{i=1} \rho _\tau \big (Y_i-f(\textbf{X}_i, {\varvec{\beta }})\big ), \end{aligned}$$
(4)

where \(\widehat{\varvec{\beta }}_m = \big ( \widehat{\beta }_{m 1}, \dots ,\widehat{\beta }_{m p} \big )^\top \in \mathbb {R}^p\). It is straightforward to verify that for \(\tau =1/2\) the expectile estimate \(\widehat{\varvec{\beta }}_m\) defined by (4) reduces to a standard (nonlinear) least squares (LS) estimator of \({\varvec{\beta }}^0 \in \mathbb {R}^p\). In general, the \(\tau ^{th}\) expectile of the given distribution can be interpreted as a hypothetical mean of some other distribution that would be obtained if the values above the expectile in the original distribution would occur \(\frac{\tau }{1 - \tau }\) times more frequently. Thus, the choice of \(\tau \in (0,1)\) can be also seen in terms of some “exploratory” approach that somehow “balances” the distribution towards the (zero) mean and it provides a useful information about the skewness and possible outlying/extreme observations. Also note, that depending on the choice of the regression function f, the minimization problem in (4) may or may not be a convex problem. This restricts the choice of the algorithm used to obtain the final solution. For numerical issues and different techniques for fitting nonlinear models we refer to Chambers (1973). Computational aspects are further discussion in Sect. 4.

In the second step, the expectile estimator \(\widehat{\varvec{\beta }}_m\) obtained from the historical data \(\{(Y_i, \textbf{X}_i^\top )^\top ;~i = 1, \dots , m\}\) is used to perform a real-time changepoint detection in the online data \(\{(Y_i, \textbf{X}_i^\top )^\top ;~i = m + 1, \dots , m + T_m\}\) in terms of a formal statistical test of the null hypothesis

$$\begin{aligned} H_0: \, {\varvec{\beta }}_i= {\varvec{\beta }}^0, \qquad \text {for }\;\; i=m+1, \dots , m+T_m; \end{aligned}$$
(5)

against the alternative hypothesis

$$\begin{aligned} H_1:&\exists k^0_m \in \{1, \dots , T_m - 1\} \nonumber \\&\text {such that} \left\{ \begin{array}{ll} {\varvec{\beta }}_i ={\varvec{\beta }}^0, &{} i=m+1, \dots , m + k^0_m; \\ {\varvec{\beta }}_i={\varvec{\beta }}^1 &{} i=m + k^0_m+1, \dots , m+T_m, \end{array} \right. \end{aligned}$$
(6)

where \({\varvec{\beta }}^0 \ne {\varvec{\beta }}^1\). The proposed test statistic, sensitive to the null hypothesis, is defined as

$$\begin{aligned} \mathcal {T}(m) = \sup _{1\leqslant k \leqslant T_m} \frac{\Vert \mathbf{{S}}(m,k) \Vert _\infty }{z(m,k,\gamma )}, \end{aligned}$$
(7)

for a standard supremum norm \(\Vert \cdot \Vert _\infty \), a regularization function \(z(m,k,\gamma ) \equiv m^{1/2}(1+k/m)(k/(k+m))^\gamma \) for some \(\gamma \in [0, 1/2)\), and

$$\begin{aligned} \mathbf{{S}}(m,k) \equiv \textbf{J}^{-1/2}_m (\widehat{\varvec{\beta }}_m) \sum ^{m+k}_{i=m+1} \nabla \!f(\textbf{X}_i, \widehat{\varvec{\beta }}_m) g_\tau (\widehat{\varepsilon }_i), \end{aligned}$$
(8)

where \(g_\tau (x) \equiv \rho '_\tau (x) = 2 \tau x \mathbb {I}_{\{x \ge 0\}}+2(1-\tau )x \mathbb {I}_{\{x<0\}}\) stands the first derivative of the expectile function \(\rho _\tau (x)\) and \(\widehat{\varepsilon }_i = Y_i - f(\varvec{X}_i, \widehat{\varvec{\beta }}_m)\) are so-called expectile residuals for \(i = 1, \dots , m, m + 1, \dots , m + T_m\). Similarly, by \(h_\tau (x) \equiv \rho ''_\tau (x) =2 \tau \mathbb {I}_{\{x \ge 0\}}+2(1-\tau ) \mathbb {I}_{\{x<0\}}\) we denote the second derivative of \(\rho _\tau (x)\). In addition, \(\nabla \!f(\textbf{X}_i, \widehat{\varvec{\beta }}_m)\) stands for a p-dimensional vector of the first partial derivatives \(\frac{\partial }{\partial \varvec{\beta }}f(\varvec{X}_i, \widehat{\varvec{\beta }}_m)\) evaluated at the expectile estimate \(\widehat{\varvec{\beta }}_m\), and

$$\begin{aligned} \textbf{J}_m (\widehat{\varvec{\beta }}_m)\equiv \frac{\mathbb {V}\text{ ar }\,[g_\tau (\varepsilon )]}{m} \sum ^m_{i=1} \nabla \! f(\textbf{X}_i,\widehat{\varvec{\beta }}_m) \nabla ^\top \!\! f(\textbf{X}_i,\widehat{\varvec{\beta }}_m), \end{aligned}$$
(9)

where \(\textbf{J}^{-1/2}_m (\widehat{\varvec{\beta }}_m)\) in (8) denotes the inverse of the square root matrix (in a sense of the Cholesky factorization) of \(\textbf{J}_m(\widehat{\varvec{\beta }}_m)\). A formal decision with respect to the null hypothesis in (5) is done by comparing the test statistic in (7) with the corresponding quantile of the limit distribution, which is a functional of a Wiener process (see Theorem 2). Details regarding the behaviour of the test statistic under the null hypothesis and the alternative hypothesis are derived in the next section.

Remark 1

In practical applications, the theoretical quantity \(\mathbb {V}\text{ ar }\,[g_\tau (\varepsilon )]\) in (9) is typically unknown. However, the corresponding finite sample counterpart \(S_{g_{\hat{\tau }}}^2 = \frac{1}{(m - 1)} \sum _{i = 1}^m \big [g_{\hat{\tau }}(\widehat{\varepsilon }_i) \big ]^2\) can be used instead as a plug-in estimate, where \(\hat{\tau } \in (0,1)\) (implicitly) solves \(\frac{1}{m}\sum _{i = 1}^{m} g_{\tau }(\widehat{\varepsilon }_i) = 0\) (i.e., the empirical version of the theoretical assumption \(\mathbb {E}[g_\tau (\varepsilon )] = 0\)) and \(\{\widehat{\varepsilon }_i\}_{i = 1}^m\) are the model-based residuals. The empirical estimates for the theoretical quantities \(\mathbb {V}\text{ ar }\,[g_\tau (\varepsilon )]\), \(\mathbb {E}[g_\tau (\varepsilon )]\), and \(\tau \in (0,1)\) are all based on the historical data \(\{(Y_i, \textbf{X}_i^\top )^\top ;~i = 1, \dots , m\}\).

3 Theoretical results

Besides a p-dimensional vector \(\nabla \! f({\textbf{x}},{\varvec{\beta }}) = \partial f({\textbf{x}}, {\varvec{\beta }})/ \partial {\varvec{\beta }}\), for any \(\varvec{x} \in \varUpsilon \) and \({\varvec{\beta }}\in \Gamma \) let, analogously, \(\nabla ^2 \! f({\textbf{x}},{\varvec{\beta }}) \equiv \partial ^2 f({\textbf{x}}, {\varvec{\beta }})/ \partial {\varvec{\beta }}^2\) be a \((p \times p)\)-dimensional matrix of the second partial derivatives. In addition, let \(\nabla ^2_j f({\textbf{x}},{\varvec{\beta }}) \equiv \big (\partial ^2 f({\textbf{x}}, {\varvec{\beta }})/(\partial \beta _l \partial \beta _j)\big )_{1 \leqslant l \leqslant p} \), which is again a p-vector for each \(j \in \{1, \dots , p\}\). Finally, \(\textbf{V}_m({\varvec{\beta }})\) stands for a \((p \times p)\)-dimensional matrix being defined as \( \textbf{V}_m({\varvec{\beta }}) \equiv m^{-1} \sum ^m_{i=1} \nabla \! f(\textbf{X}_i,{\varvec{\beta }}) \nabla ^\top \!\! f(\textbf{X}_i,{\varvec{\beta }})\) and for any two constants \(a, b \in \mathbb {R}\) let \(a \vee b= \max (a,b)\) and \(a \wedge b= \min (a,b)\).

3.1 Model assumptions

Considering the overall changepoint model in (1) and (2), the theoretical results formulated in this section rely on the set of assumptions stated below. For a better organization of the whole paper, the assumptions are split into five groups, (A)–(E).

ASSUMPTION (A):

(A1):

The parameter space \(\Gamma \subseteq \mathbb {R}^p\) is a compact set and the design space \(\varUpsilon \subseteq \mathbb {R}^q\) is assumed to be bounded;

(A2):

For each \(i \in \{1, \dots ,, m,m+1, m+T_m\}\), the partial derivatives \(\nabla \! f(\textbf{X}_i,{\varvec{\beta }})\) and \(\nabla ^2 \! f(\textbf{X}_i,{\varvec{\beta }})\) all exist and, moreover, \(\nabla f(\textbf{X}_i,{\varvec{\beta }})\) is continuous on \(\varUpsilon \times \Gamma \);

(A3):

For \(q_m({\varvec{\beta }}) \equiv Card \{i \in \{1, \dots , m\}; f(\textbf{X}_i, {\varvec{\beta }}) \ne f(\textbf{X}_i, {\varvec{\beta }}^0)\}\) and every \({\varvec{\beta }}\in \Gamma \) such that \({\varvec{\beta }}\ne {\varvec{\beta }}^0\) it holds, that \(0 < \lim _{m \rightarrow \infty } q_m/m \le 1\).

ASSUMPTION (B): The density function of the random error terms \(\{\varepsilon _i\}_{i = 1}^{m + T_m}\) (the generic error term \(\varepsilon \) respectively) is continuous and strictly positive in zero.

ASSUMPTION (C): There exists a positive definite matrix \(\textbf{V}({\varvec{\beta }}^0)\) such that \(\textbf{V}_m({\varvec{\beta }}^0)=m^{-1} \sum ^m_{i=1} \nabla \! f(\textbf{X}_i,{\varvec{\beta }}^0) \nabla ^\top \!\! f(\textbf{X}_i,{\varvec{\beta }}^0) \longrightarrow \textbf{V}({\varvec{\beta }}^0)\) for \(m \rightarrow \infty \).

ASSUMPTION (D): The model errors \(\{\varepsilon _i\}_{i = 1}^{m + T_m}\) are independent and identically distributed (i.i.d.) with a continuous distribution, such that \(\mathbb {E}[\varepsilon _i^4]< \infty \) and \(\mathbb {E}[ g_\tau (\varepsilon _i)]=0\).

Assumptions (A), (B), and (C) are common conditions needed to show a strong consistency of the conditional expectile estimate \(\widehat{\varvec{\beta }}_m\) defined in (4). Analogous conditions are used, for instance, by Choi et al (2003). Similarly, Assumption (D) is quite standard for the expectile models (e.g., Gu and Zou (2016), Kim and Lee (2016), or Ciuperca (2022)).

3.2 Asymptotic behaviour of the expectile estimator

In order to study the asymptotic behaviour of the expectile estimator \(\widehat{\varvec{\beta }}_m\) defined in (4) let us consider the p-square matrix

$$\begin{aligned} {\varvec{\Omega }}\equiv \mathbb {E}[h_\tau (\varepsilon )] \textbf{V}({\varvec{\beta }}^0). \end{aligned}$$

In addition to Assumption (A2), it is also required to impose slightly stricter assumptions on the matrix of the second partial derivatives \(\nabla ^2 \! f({\textbf{x}},{\varvec{\beta }})\).

ASSUMPTION (E): The elements of \(\nabla ^2 \! f({\textbf{x}},{\varvec{\beta }})\) are all bounded for any \({\textbf{x}}\in \varUpsilon \) and for \({\varvec{\beta }}\) from a neighborhood of \({\varvec{\beta }}^0\) of radius of the order \(m^{-1/2}\).

The assumption above is a common property which is—under Assumption (A1)—satisfied by any function f which is continuous on \(\Upsilon \times \Gamma \). It is considered, for instance, for a sequential test in a nonlinear changepoint model in Ciuperca (2013) where an ordinary least squares (LS) estimation framework was used instead. For the expectile estimation framework proposed in this paper, the asymptotic behaviour of the estimator in (4) is formulated in the next proposition.

Proposition 1

Under Assumptions (A)–(E),

$$\begin{aligned} \widehat{\varvec{\beta }}_m={\varvec{\beta }}^0+{\varvec{\Omega }}^{-1} \frac{1}{m} \sum ^m_{i=1} \nabla \!f (\textbf{X}_i,{\varvec{\beta }}^0) g_\tau (\varepsilon _i)+o_\mathbb {P}(m^{-1/2}), \quad \text {as} m \rightarrow \infty . \end{aligned}$$

If the regression function f is linear in \({\varvec{\beta }}\in \Gamma \), then the asymptotic behaviour in the proposition reduces to a special case of Proposition 1 from Ciuperca (2022). Similarly, if the regression function f in (1) is nonlinear in \({\varvec{\beta }}\in \Gamma \), but the random error terms follow some normal distribution \(N(0, \sigma ^2)\) with \(\sigma ^2 < \infty \), the asymptotic behaviour in Proposition 1 gives the results of Theorem 2.1 in Seber and Wild (2003).

3.3 Test statistic under \(H_0\) and \(H_1\)

The asymptotic behaviour of the test statistic defined in (7) is investigated in this section under both—the null hypothesis in (5) and the alternative hypothesis in (6). Note that that the vectors of parameters \({\varvec{\beta }}^0, {\varvec{\beta }}^1\in \Gamma \), where \({\varvec{\beta }}^0 \ne {\varvec{\beta }}^1\), are both unknown. Let \(\textbf{J}_m ({\varvec{\beta }})\equiv \mathbb {V}\text{ ar }\,[g_\tau (\varepsilon )]\textbf{V}_m({\varvec{\beta }})\), for \({\varvec{\beta }}\in \Gamma \), be a \(p \times p\) matrix—a theoretical (deterministic) version of its empirical counterpart—the \(p \times p\) matrix \(\textbf{J}_m (\widehat{\varvec{\beta }}_m)\) defined in (9). Considering the size \(m \in \mathbb {N}\) for the historical data and the size \(T_m \in \mathbb {N}\) for the online data there are two specific possibilities which should be considered separately.

  • if \(\lim _{m \rightarrow \infty } T_m/m=\infty \) for either \(T_m=\infty \) or \(T_m < \infty \), then such a scenario is called an open-end procedure;

  • if \(\lim _{m \rightarrow \infty } T_m/m=T\) for \(T_m < \infty \) where \(T \in (0, \infty )\), then such a scenario is called a closed-end procedure.

By a common convention, it is usually assumed that for the open-end procedures it holds that \(T=\infty \).

Theorem 2

Let Assumptions (A)–(E) be satisfied. Then, under \(H_0\),

$$\begin{aligned} \mathcal {T}(m) \equiv \sup _{1\leqslant k \leqslant T_m} \frac{\Vert \mathbf{{S}}(m,k) \Vert _\infty }{z(m,k,\gamma )} \overset{\mathcal{L}}{\underset{m \rightarrow \infty }{\longrightarrow }} \sup _{0< t < L(T) } \frac{\Vert {\textbf {W}}_p(t)\Vert _\infty }{t^\gamma }, \end{aligned}$$

where \(\{{\textbf {W}}_p(t); \; t \in (0,\infty )\}\) is a p-dimensional Wiener process where \(L(T)=1\) for the open-end procedure and \(L(T)=T/(1+T)\) for the closed-end procedure.

The test statistic in Theorem 2 is based on the expectile estimator \(\widehat{\varvec{\beta }}_m\) of the true parameter vector \({\varvec{\beta }}^0 \in \Gamma \) calculated from the historical data. However, the limit process is the same as for the expectile estimator in the linear model considered in Ciuperca (2022), or the quantile estimator proposed in Zhou et al (2015). On the other hand, the test statistic is different from that proposed by Ciuperca (2013) or Horváth et al (2004) where the authors rather considered the CUSUM type statistic based on the least squares residuals of the linear model or the nonlinear model respectively.

In addition, the asymptotic behaviour of the test statistic under the null hypothesis in Theorem 2 does not depend on the underlying form of the nonlinear regression function f nor the true value \({\varvec{\beta }}^0\) as was the case for the test statistic applied for the parametric nonlinear model proposed in Ciuperca (2013). Therefore, the test statistic in Theorem 2 generally less restrictive, it is easier to use, and more straightforward to apply also for the least squares estimation (i.e., when \(\tau =1/2\)).

For the behaviour of the test statistic under the alternative hypothesis, more caution is needed. The model in (1) changes after the historical data and this change must be identifiable. Consequently, some reasonable assumptions are needed for the difference between the true parameter vectors \({\varvec{\beta }}^0\) and \({\varvec{\beta }}^1\) and, also, the underlying regression function f. Specific details are formulated in the next theorem.

Theorem 3

Let Assumptions (A)–(E) be satisfied and let \(m^{1/2}\Vert {\varvec{\beta }}^0 - {\varvec{\beta }}^1\Vert _2 \rightarrow \infty \) as \(m \rightarrow \infty \). If there exists \(C>0\) such that

$$\begin{aligned} \frac{1}{m^s} \left\| \sum ^{m+\widetilde{k}_m}_{i=m+k^0_m+1} c_i \nabla \!f(\textbf{X}_i, {\varvec{\beta }}^0) \big [ f(\textbf{X}_i, {\varvec{\beta }}^1) - f(\textbf{X}_i, {\varvec{\beta }}^0)\big ]\right\| _\infty>C>0 \end{aligned}$$

for some constants \(|c_i | \in [2(\tau \wedge (1-\tau )), 2(\tau \vee (1-\tau ))]\), then

$$\begin{aligned} \sup _{1\leqslant k \leqslant T_m} \frac{\Vert \mathbf{{S}}(m,k) \Vert _\infty }{z(m,k,\gamma )} \overset{\mathbb {P}}{\underset{m \rightarrow \infty }{\longrightarrow }} \infty . \end{aligned}$$

Considering the assertions of both theorems together, the statistical test based on the proposed test statistics in (7) is proved to be consistent. The decision rule can be defined directly by considering the corresponding quantiles of the limit process from Theorem 2.

Example 1

For a simple linear function \(f(x, {\varvec{\beta }})=\beta _0+\beta _1 x\), the unknown vector parameters \({\varvec{\beta }}^0=(\beta ^0_1, \beta ^0_2)^\top \in \Gamma \) and \({\varvec{\beta }}^1=(\beta ^1_1, \beta ^1_2)^\top \in \Gamma \), and \(x \in \Upsilon \subseteq \mathbb {R}\) one just need that \(\Gamma \subseteq \mathbb {R}^2\) is a compact set, \(\Upsilon \) is bounded, and \((\beta ^1_1-\beta ^0_1) (\beta ^1_2-\beta ^0_2) \ne 0\) for the assumptions in (A1)–(A3) to hold. Assumption (B) is typically valid for common (continuous) error distributions. Assumptions (C) and (E) are satisfied trivially by the linearity of f. Finally, Assumption (D) can not be usually verified in a straightforward way but a sample estimate for \(\tau \in (0,1)\) can be used such that the empirical counterpart of the equation \(\mathbb {E}[g_\tau (\varepsilon _i)] = 0\) is satisfied.

Example 2

For a nonlinear function \(f(x, {\varvec{\beta }}) = \exp \{ - \beta _1 e^{- \beta _2 x}\}\) (the Gompertz curve for \(x \in \Upsilon \)) with \({\varvec{\beta }}^0=(\beta ^0_1, \beta ^0_2)^\top \in \Gamma \) and \({\varvec{\beta }}^1=(\beta ^1_1, \beta ^1_2)^\top \in \Gamma \) for \(\Gamma = (0, \infty )\times (0, \infty )\) and some bounded \(\Upsilon \subset \mathbb {R}\) it is easy to see that \(\nabla \!f(x, {\varvec{\beta }}) = (- f(x, {\varvec{\beta }}) e^{-\beta _2 x}, f(x, {\varvec{\beta }}) \beta _1 x e^{-\beta _2 x})^\top \) is continuous on \(\Upsilon \times \Gamma \) and \(\nabla ^2 f(x, {\varvec{\beta }})\) exists. Thus, for (A1)–(A3) to hold, one just needs that \((\beta ^1_1-\beta ^0_1) (\beta ^1_2-\beta ^0_2) \ne 0\). Assumption (C) can be shown in a straightforward way and the remaining assumptions are analogous to Example 1.

On the basis of the results obtained above one can define a stopping time—i.e., the first observation for which the null hypothesis in (5) is rejected in favor of the alternative hypothesis—considering the significance level \(\alpha \in (0,1)\). The corresponding changepoint estimate is defined as

$$\begin{aligned} \widehat{k}_m \equiv \left\{ \begin{array}{l} \inf \Big \{ k \ge 1; \; \; \sup _{1\leqslant k \leqslant T_m} \frac{\Vert \mathbf{{S}}(m,k) \Vert _\infty }{z(m,k,\gamma )} > c_{\alpha }(\gamma ) \Big \}; \\ \\ \infty , \qquad \qquad \text {if } \sup _{1\leqslant k \leqslant T_m} \frac{\Vert \mathbf{{S}}(m,k) \Vert _\infty }{z(m,k,\gamma )} \le c_{\alpha }(\gamma ) \text { for all } k=1, \dots , T_m, \end{array} \right. \end{aligned}$$

where \(c_{\alpha }(\gamma )\) is \((1- \alpha )\)-quantile of the distribution of \( \sup _{0< t < L(T) } {\Vert {\textbf {W}}_p(t)\Vert _\infty }/{t^\gamma }\). Note, that \(\widehat{k}_m\) is the corresponding index referring to the online data only (i.e., \(\widehat{k}_m \in \{1, \dots , T_m\}\)). Thus, from the overall point of view, the underlying model changes after \(m + \widehat{k}_m\) observations. It holds that \(\lim _{m \rightarrow \infty } \mathbb {P}[\widehat{k}_m < \infty \mid H_0 \text { true} ]=\alpha \) and, similarly, \(\lim _{m \rightarrow \infty } \mathbb {P}[\widehat{k}_m < \infty \mid H_1 \text { true} ]=1\). Hence, the proposed test is consistent.

4 Empirical study

Finite sample properties of the proposed real-time changepoint detection method based on the expectile estimator defined in (4) are closely investigated in this section. Firstly, the empirical level of the test is assessed under various settings and the empirical power of the test is investigated for various changepoint scenarios. In the second part, the proposed methodology is also applied to analyze the Covid-19 prevalence data from Prague, Czech Republic, in order to link some authorities’ decisions to the real-time pandemic situation.

4.1 Simulation experiment

The main concept of the simulation study is analogous to that presented in Choi et al (2003). However, instead of a simple exponential function used for the underlying regression, a more complex Gompertz curve of the form

$$\begin{aligned} f(x, \varvec{\beta }) = \exp \{- \beta _1 e^{-\beta _2 x}\} \end{aligned}$$

is employed, where \(\varvec{\beta }^0 = (\beta _1, \beta _2)^\top \equiv (10, 5)^\top \) and \(x \in (0,1)\). The reason is that the function used in Choi et al (2003) becomes very insensitive to any parameter change for large \(x_t = t\) (even for \(t \ge 10\)). A simple iterative grid search algorithm is implemented to solve (4) and the changepoint test is performed in terms of Theorem 2. For the length of the historical period there are three different options considered (\(m \in \{20, 50, 200\}\)). Analogously as in Choi et al (2003), three error distributions are used: a symmetric standard normal distribution (with \(\tau = 0.5\)), asymmetric normal distribution with the mean and variance being equal to one (\(\hat{\tau } = 0.0719\)), and, finally, a heavy-tailed (symmetric) Laplace distribution with the zero mean and unit variance (again, \(\tau = 0.5\) due to the symmetric property). In order to mimic both situations—the closed-end scenario and the open-end scenario—there are again tree options considered for \(T_m \in \{10, m/2, m \log m \}\). The empirical results under the null hypothesis (of no change in the model) are summarized in Table 1 and in Fig. 1. Different values for the regularization parameter \(\gamma \in [0, 1/2)\) were considered as well but no substantial differences were found, therefore, all reported results are for \(\gamma = 0.1\) only.

Table 1 Simulation results under the null hypothesis (with the theoretical value of \(\tau = 0.5\) for the symmetric distributions and the empirical estimate \(\hat{\tau } = 0.0719\) in terms of Remark 1 for the asymmetric distribution)

The empirical level of the test seems to properly keep the nominal level of \(\alpha = 0.05\) for all considered scenarios. The results are slightly conservative for the symmetric distributions (the normal distribution N(0, 1) and the double exponential distribution L(0, 1)). On the other hand, a slightly underestimated nominal level is observed for the asymmetric error distribution (the normal distribution N(1, 1)) but the actual differences are rather negligible. The corresponding expectile estimates of the unknown (true) parameters \(\beta _1 = 10\) and \(\beta _2 = 5\) seem both to be consistent for all considered scenarios and no inconsistences are observed in Table 1.

Fig. 1
figure 1

The asymptotic behavior of the empirical Type I error probabilities for three different values of \(m \in \{20, 50, 200\}\), three error distributions (standard normal, asymmetric normal with unit mean and variance, and double exponential with zero mean and unit variance) and three different choices of \(T_m\) in order to mimic the closed-end and open-end scenarios

Table 2 Empirical powers of the proposed real-time changepoint test based on 5000 Monte Carlo simulations given for various simulation settings

On the other hand, the situation under the alternative hypothesis becomes slightly more comprehensive as there might be many different changepoint scenarios to possibly consider and take into account. For brevity purposes, there are only the results for one representative situation provided in this manuscript, but,any other situations were considered and compared with rather analogous results among all.

In particular, the following simulation scenarios under the alternative hypothesis were considered:

  • A change occurs either in \(\beta _1\), or in \(\beta _2\), or in both elements of \(\varvec{\beta } = (\beta _1, \beta _2)^\top \) simultaneously;

  • A change occurs immediately after the historical data or the changepoint occurs after the first half of the online data;

  • The magnitude of the change is relatively small compared to the true parameter values (\(20\%\) change with respect to the true value) or the change is relatively large (the parameter(s) after the changepoint is(are) doubled);

  • Finally, if the changepoint occurs in both elements of \(\varvec{\beta } = (\beta _1, \beta _2)^\top \), the corresponding effects of the changes may act against each other—thus, the resulting regression function after the change is very similar to the regression function before the change—or, alternatively, the effects of the changes aim at the same direction—thus, the regression function after the change is quite different from the underlying regression function before the change and there is also more power in the data to reveal such change.

All these situations have, of course, an important impact on the simulation results and, in particular, the performance of the proposed test in terms of its empirical power. For illustration purpose, one particular scheme (with the changepoint in \(\beta _2\) only and the change magnitude being equal to the true value of \(\beta _2\)) is reported in Table 2. It is obvious from the table, that the performance of the proposed test (in terms of the empirical power) mostly depends on the true changepoint location and the length of the online data but in all considered situations the proposed test seems to be consistent.

Fig. 2
figure 2

Covid-19 positive cases in Prague, Czech Republic. The overall daily increments in the upper panel and the cumulative counts in the lower panel are—just for better illustration—provided also separately for males (blue) and females (red). The vertical lines represent the date when the strict pandemic restrictions being in effect before Christmas 2020 were relaxed. The Gompertz population model in (10) is fitted on the historical data—the data before the restrictions release in December 1, 2020. The projection of the model is provided for the future in dashed red. The estimated saturation of \(\widehat{K} = 189~616\) is visualized in doted red

Note that for the situations where the changepoint occurs in the first half of the online data (the rows denoted as \(k_m^{(1)}\) in Table 2), there are some false rejections (roughly \(5\%\)) of the observed rejections in the first half of the online data before the actual change appears. Such false rejection are not considered in Table 2 and only the rejections after the first half of the online data are reported. This is also reflected by the fact that the average and median changepoint location indicators in the brackets are always greater than 0.50—which stands for the half of the online data sequence.

The average changepoint location indicator of, let us say, 0.25 indicates that the changepoint was estimated (when averaged over all simulations) after the first quarter of the online data. If the median location indicator (the second value in the brackets) is higher than the average, then the majority of the changepoint recoveries occurred after the first quarter, but there were also some relatively rear although very early recoveries (including also the very first online observation). On the other hand, for the median location indicator being smaller than the average indicator, the majority of the changepoint recoveries occurred before the first quarter, but there were also some very late recoveries (including the very last observations).

4.2 Covid-19 prevalence

Relatively recently, the world society was very much effected by the Covid-19 pandemic, therefore, we tried to apply the proposed estimation and changepoint detection method for a nonlinear parametric population risk model—a three parameter Gompertz curve—to model the cumulative counts of the Covid-19 positive cases in Prague, the capital of the Czech Republic, over the period from the first positive case appearance (March 1, 2020) until the end of May 2021. The data, provided for academic purposes by the Institute of Health Information and Statistics of the Czech Republic are assumed to follow a typical nonlinear (growth) model in (1), where

$$\begin{aligned} f(\varvec{X}_i, {\varvec{\beta }}) = K \exp \Big \{ -\beta _1 e^{- \beta _2 x_i} \Big \} \end{aligned}$$
(10)

for the unknown parameter vector \(\varvec{\beta } = (\beta _1, \beta _2, K)^\top \in \mathbb {R}_{+}^{3}\). The univariate explanatory variables \(\varvec{X}_{i} \equiv x_i\) stand for the current day and the dependent random variables \(Y_i\) in (1) reflect the cumulative Covid-19 positive cases at the given day. A similar population growth model—a five parameter logistic curve—was recently applied in Chen et al (2020) to predict the overall number of positive Covid-19 cases in the US. The resulting model, however, turned out to heavily underestimate the true number of positive cases, which could be also caused by the underlying distributional symmetry assumption.

Table 3 Parameter estimates for the underlying Gompertz model in (10) for three different data scenarios: firstly, historical data until restrictions release are considered; Second, the proposed online testing procedure is applied until the null hypothesis is rejected and the model is retrained. Finally, the all available data are used to estimate the overall model
Fig. 3
figure 3

The test statistic profile for the online data in panel (a) and the first five days only for a more detailed insight in panel (b); The limit distribution from Theorem 2 with the corresponding \(95\%\) sample quantile \(c_{0.95}(\gamma ) = 2.4260\), for \(\gamma = 0.1\), in panel (c); The model residuals from (10) with the corresponding density estimate, the empirical mean, and the empirical expectile for \(\widehat{\tau } = 0.11\), such that the empirical conterpart of \(E[g_{\tau }(\varepsilon )]\) equals to zero—all in panel (d); Finally, the residual autocorrelation and partial-autocorrelation plots in panels (e) and (f) respectively

In our approach, instead of trying to predict the overall positive cases, we pursue a slightly different goal: Firstly, the data are split into two parts—the historical data from the very first Covid-19 positive case in Prague until December 1, 2020 (when a rather populistic and quite much criticized government decision waved off some of the strict pandemic restrictions before Christmas) and the online data—arriving after December 1, 2020. Second, the proposed changepoint test is adopted to test whether the model before the government decision and the model after the government decision is the same, or not. Finally, the model can be also used to get some predictions of the overall Covid-19 positive cases over the overall follow-up period.

The data—daily positive cases—are visualized in Fig. 2a. The corresponding cumulative counts are given in the panel below—Fig. 2b. The Gompertz model from (10) is fitted on the historical data thus, the period from March 1, 2020 until December 1, 2020. The estimated parameters are provided in Table 3. The estimated number of the overall Covid-19 positive cases is \(\widehat{K} = 188~576\), while the true number of all positive cases reported until May 26, 2021, is 184 959.

The proposed changepoint detection test based on (7) is performed to verify the stability of the model trained on the historical data, for \(m = 275\), while new online data are arriving in a step-by-step manner (for \(T_m = 176\)). The values for the test statistic in (7) at each step of the online testing regime are plotted in Fig. 3a. The null hypothesis of no changepoint in the vector parameter \(\varvec{\beta } = (\beta _1, \beta _2, K)^\top \) is rejected relatively fast—just two days after the government reduced the restrictions—the corresponding test statistic is \(\mathcal {T}(m) = 4.1618\) for \(m = 275\) and the corresponding \(95\%\) quantile of the limit distribution from Theorem 2 is \(c_{0.95}(\gamma ) = 2.4260\) for \(\gamma = 0.1\). This may suggest that the actual change in the model occurred already before the online data—which can be also seen in Fig. 2—either from the first peak and the consecutive drop-off in panel (a) or some evident underestimation at the end of the historical data in panel (b). The estimated parameters for the retrained model after the changepoint detection are, for comparison, also reported in Table 3. Alternatively, one could also consider another set of the historical data (and maybe slightly more representative)—from the very first case until the first culmination (i.e., the beginning of November 2020, thus \(m = 245\)) and to test whether the model changes significantly after the peak as the daily Covid-19 cases start to decrease. The estimated parameters are very similar (\(\widehat{\beta _1} = 88.15\), \(\widehat{\beta }_2 = 0.0166\), and \(\widehat{K} = 197 264\)) but it takes 8 days for the proposed test statistic to detect a significant change in the model. Nevertheless, despite some obvious correlation among the model-based residuals (Fig. 3d and e) the estimated model seems to be relatively stable and the proposed changepoint detection test performs very well.

5 Conclusions

In this paper, we proposed the online procedure for testing stability of a nonlinear parametric regression model while taking into account the conditional expectile estimation framework. There are three main pivots behind the proposed methodology: Firstly, the nonlinear parametric form of the unknown regression function improves the overall flexibility of the model while the dependence on the unknown parameters still preserves a relatively simple and straightforward interpretation of the overall regression function estimate. Second, the expectile estimation method allows for some additional robustness especially with respect to asymmetric distributions. The estimation algorithm depends on the “asymmetry index” \(\tau \in (0, 1)\), which is usually unknown, but it can be either anticipated from the data generating mechanism or some plug-in estimate can be used instead. Third, the online regime for the changepoint detection makes the proposed method instantly applicable, which may turn out to be convenient in situations when real-time decisions and model adaptations are required. Finally, given the underlying regression function, the whole minimization problem formulated in (4) does not have to be convex—therefore, we proposed a widely applicable general iterative grid search algorithm which can be effectively used in practical applications.

The proposed methodological framework enriches the class of online procedures for changepoint detections. To our best knowledge, the specific model setup considered in this paper has not been studied in the literature yet. The empirical performance is illustrated through an extensive simulation study. A practical applicability of the whole methodological framework is illustrated on a real data example concerning some of the most recent challenges related to online decision making—especially essential decisions related to the Covid-19 pandemics made by local and global authorities.