1 Introduction

In recent years, there is a huge increase in the scale of data in many real-life application problems. The traditional centralized computing method faces many challenges and sometimes is entirely infeasible for large-scale problems. As a result, distributed algorithms over multi-agent systems have received much attention from researchers of diverse areas, including consensus problem [14], resource allocation (RA) [5, 6], multi-unmanned aerial vehicle (MUAV) control [7], and distributed target tracking [8] etc. Distributed algorithms are usually associated with a network of agents where each agent has limited computation and communication ability. The agents are required to cooperatively achieve a global objective by using their local observations and information transmitted from their neighbors. Compared with centralized approaches, distributed algorithms have the advantage of robustness on network link failure, privacy protection, and reduction on communication and computation cost.

One of the important branches of distributed algorithms is the distributed optimization problem. It seeks the minimizer of the global function which is written as a sum of the local functions of the agents. In particular, the distributed optimization for time-invariant cost function has become a mature discipline with many results, see [911] and references within. On the other hand, optimization problems with time-varying cost function have attracted much attention due to its appearance in various applications, for example, signal processing [12] and online optimization [13, 14]. The main challenge of time-varying optimization lies in the fact that the minimizer of the time-varying cost function is changing with time. Since the traditional optimization algorithms can only move the estimates towards the minimizer of the cost function of the current time, they cannot track the movement of the minimizer in the dynamic environment. To cope with this issue, two different strategies have been developed. The first one is running method [15, 16], where the algorithms sample the time-varying cost function at a fixed frequency, and perform the traditional optimization algorithms on the sampled function in between the sample time. The second one is prediction-correction method [17, 18], where the algorithms only optimize the cost function of the current time at each steps. But an additional information of the dynamic of the moving minimizer is required, such as the second derivative of the cost function [18], or an additional constrain on the dynamic of the minimizer [14]. While we consider a different problem in this paper, the method we utilize is much similar to the second method.

It can be noticed that the optimization problem can often be transformed into a root-seeking problem, since the optimization of a differentiable convex function is equivalent to seeking the root of its gradient, and for the case where the gradients are unavailable we can use the finite time difference to estimate the gradient [19]. So it’s natural to consider the distributed root-tracking problem. Distributed stochastic approximation for time-invariant regression functions has been studied by many researchers as a solution for distributed root-seeking problem [2022]. Inspired by the work of dynamic stochastic approximation [23, 24], in this paper, we propose a distributed stochastic approximation algorithm for tracking the changing root of a sum of time-varying regression functions over a network. Each agent is aimed at tracking the changing root of the global function, but it can only access a noise-corrupted local observation and the information transmitted from its neighbor. In addition, the noise-corrupted dynamics of the roots of the global regression function is assumed to be known to all agents.

In this paper, the distributed root-tracking problem for time-varying regression function is considered. First, motivated by the truncation technique given in [22], a distributed stochastic approximation algorithm with expanding truncations is introduced. The key difference is that the observation of the local function in this algorithm is a noise-corrupted one, while an exact gradient information is often required in the optimization algorithms mentioned above. Second, under the assumption that the noise-corrupted dynamics of the global roots is known to all agents, the convergence conditions of the algorithm are introduced. Third, it is proved that the estimates generated by the distributed algorithm are of both consensus and convergence with probability one. Finally, we apply this algorithm to a distributed target tracking problem. The numerical example is given demonstrating the performance of the algorithm.

The rest of the paper is organized as follows. The problem formulation and the distributed stochastic approximation algorithm are given in Section 2. The convergence conditions and results are presented in Section 3. To help the proof of the convergence result, two auxiliary sequences are defined and analysed in Section 4. The proof of the main result is given in Section 5. In Section 6, a distributed target tracking problem is solved by the algorithm and the numerical example is demonstrated. Some concluding remarks are addressed in Section 7.

2 Problem formulation and distributed root-tracking algorithm

2.1 Problem formulation

Consider a network system consisting of N agents. The interaction relationship among agents is described by a time-varying digraph \(\mathcal {G}(k)=\left \{\mathcal {V},\mathcal {E}(k)\right \}\), where k is the time index, \(\mathcal {V}=\{1,\dots,N\}\) is the agent set, and \(\mathcal {E}(k)\subset \mathcal {V}\times \mathcal {V}\) is the edge set. By \((i,j)\in \mathcal {E}(k)\) we mean that agent j can receive information from agent i at time k. Assume \((i,i)\in \mathcal {E}(k)\) for \(\forall k=1,2,\dots \) Denote the neighbor of agent i at time k by \(N_{i}(k)=\left \{j\in \mathcal {V}:(j,i)\in \mathcal {E}(k)\right \}\). The adjacency matrix associated with the graph is denoted by \(W(k)=\left [w_{ij}(k)\right ]_{i,j=1}^{N}\), where wij(k)>0 if and only if \((j,i)\in \mathcal {E}(k)\), otherwise wij(k)=0.

A time-independent digraph \(\mathcal {G}=\{\mathcal {V},\mathcal {E}\}\) is called strongly connected if for any \(i,j\in \mathcal {V}\), there exists a directed path from i to j. A directed path is a sequence of edges \(\left (i,i_{1}\right),\left (i_{1},i_{2}\right),\dots,\left (i_{p-1},j\right)\) in the digraph with distinct agents \(i_{k}\in \mathcal {V},~0\le k\le p-1\), where p is called the length of this path. A nonnegative matrix A is called doubly stochastic if A1=1 and 1TA=1T.

The time-varying global regression function is given by

$$\begin{array}{*{20}l} f_{k}(\cdot)=\frac{1}{N}\sum_{i=1}^{N}f_{i,k}(\cdot), \end{array} $$
(1)

where \(f_{i,k}(\cdot):\mathbb {R}^{l}\to \mathbb {R}^{l}\) is the local function associated with agent i. Denote by θk the root of the sum function fk(·) at time k, i.e., \(f_{k}(\theta _{k})=0,~ k=1,2,\dots \)

Further, assume that the dynamics of the root θk is governed by

$$\begin{array}{*{20}l} \theta_{k+1}=g_{k}(\theta_{k}){\color{black}{+\xi_{k+1}}},~k\geq0, \end{array} $$
(2)

where the function \(g_{k}(\cdot):\mathbb {R}^{l}\to \mathbb {R}^{l}\) is known for all agents, and {ξk} is the sequence of dynamic noises. As we can see in Section 6, this assumption is reasonable in some real-life application problems and have been studied before in [14, 25, 26].

For each agent i, the distributed root-tracking problem is to track the dynamic root of the time-varying global function by using its noise-corrupted observation of local function fi,k(·), the dynamic information of the root gk(·), and the information obtained from its adjacent neighbors.

2.2 Algorithm

We now introduce the distributed root-tracking algorithm as follows:

$$\begin{array}{*{20}l} x^{\prime}_{i,k+1}&=\left\{\sum_{j\in N_{i}(k)}w_{ij}(k)g_{k}\left(x_{j,k}\right)+a_{k}O_{i,k+1}\right\}\mathbb{I}_{\left[\sigma_{i,k}=\hat{\sigma}_{i,k}\right]}\\ &+h_{k}(x^{*})\mathbb{I}_{\left[\sigma_{i,k}<\hat{\sigma}_{i,k}\right]}, \end{array} $$
(3)
$$\begin{array}{*{20}l} x_{i,k+1}&=x^{\prime}_{i,k+1}\mathbb{I}_{\left[||x^{\prime}_{i,k+1}-h_{k}(x^{*})||\le M_{\hat{\sigma}_{i,k}}\right]} \\ &+h_{k}(x^{*})\mathbb{I}_{\left[||x^{\prime}_{i,k+1}-h_{k}(x^{*})||> M_{\hat{\sigma}_{i,k}}\right]}, \end{array} $$
(4)
$$\begin{array}{*{20}l} \sigma_{i,k+1}&=\hat{\sigma}_{i,k}+\mathbb{I}_{\left[||x^{\prime}_{i,k+1}-h_{k}(x^{*})||> M_{\hat{\sigma}_{i,k}}\right]}, \end{array} $$
(5)
$$\begin{array}{*{20}l} \sigma_{i,0}&=0,~\hat{\sigma}_{i,k}=\max_{j\in N_{i}(k)}\sigma_{j,k}, \end{array} $$
(6)
$$\begin{array}{*{20}l} O_{i,k+1}&=f_{i,k+1}(g_{k}\left(x_{i,k})\right)+\epsilon_{i,k+1}, \end{array} $$
(7)

where 1) \(x_{i,k}\in \mathbb {R}^{l}\) is the estimate of θk given by the agent i at time k, 2) Oi,k+1 defined by (7) is the local observation of agent i, 3) {ak}k≥0 is the sequence of the step-sizes used by all agents, 4) x is a fixed vector in \(\mathbb {R}^{l}\) known to all agents, 5) {Mk}k≥0 is a sequence of positive numbers increasingly diverging to infinity with M0≥||x||, 6) σi,k is the truncation number of agent i up-to-time k, and 7) hk(·) is a function defined as below

$$ {}h_{1}(x)=g_{1}(x),\quad h_{k}(x)=g_{k}\left(h_{k-1}(x)\right),\quad \text{for}~k=2,3,\dots. $$
(8)

Let us explain the algorithm. 1) For agent i, the estimate xi,k is the estimate of θk. Since the dynamics of {θk} is governed by (2), in order to make sure the estimate track the dynamic root, the update at time k+1 utilize gk(xi,k) instead of xi,k as it was shown in (3). 2) For agent i, the truncation happens when one of the following cases hold true: a) \(\sigma _{i,k}<\hat {\sigma }_{i,k}\), which means that there is at least one neighbor whose truncation number is larger than that of agent i. b) \(||x^{\prime }_{i,k+1}-h_{k}(x^{*})||> M_{\hat {\sigma }_{i,k}}\), which means that the distance between the intermediate value \(x^{\prime }_{i,k+1}\) and hk(x) is larger than the truncation bound. When truncation happens, the estimate xi,k is pulled to hk−1(x). 3) It can be seen that the truncation may not happen at the same time for different agents in the network. So for agent i, the update (5) makes sure that the truncation number of i is not smaller than the largest truncation number of its neighbors, i.e. \(\hat {\sigma }_{i,k}\). As to be shown in Lemma 4, this technique guarantees that the difference between truncation numbers of different agents is bounded, which helps the algorithm converge. 4) The truncation mechanism makes sure that the estimates xi,k won’t be too far away from hk−1(x). As to be shown in Lemma 1, we can prove that the distance between {hk−1(x)} and the dynamic root {θk} is bounded. So it is reasonable to choose this truncation condition.

Remark 1

In our previous work [27], we proposed the distributed root-tracking algorithm without the expanding truncation. To make sure the algorithm converge, we assumed the dynamic root {θk} and the estimate of all agents {xi,k} are bounded sequences in [27]. With the introduction of the expanding truncation mechanism, this assumption is removed in this paper.

3 Assumptions and convergence result

3.1 Assumptions

Let us list the assumptions to be used in the paper.

  • \(a_{k}>0,a_{k}\to 0,\sum _{k=1}^{\infty }a_{k}=\infty \).

  • There exists a continuously differentiable function \(v(\cdot):\mathbb {R}^{l}\to \mathbb {R}\) such that v(x)≠0 for ∀x≠0,v(0)=0 and for any 0<r1<r2<

    $$\begin{array}{*{20}l} \sup_{k}\sup_{r_{1}\le ||x-\theta_{k}||\le r_{2}}f_{k}^{T}(x)v_{x}(x-\theta_{k})<-a, \end{array} $$

    where a is a positive constant possibly depending on r1,r2. A constant r>η exists such that

    $$\begin{array}{*{20}l} \sup_{||y||\le\eta}v(y)<\sup_{||x||=r}v(x), \end{array} $$

    where η is an unknown constant specified later in Lemma 1.

  • The class of functions {fi,k(·)}k≥0 is equi-continuous for i=1,…,N, i.e., for any fixed i and any ε>0, there exists δ>0 such that

    $$ ||f_{i,k}(x)-f_{i,k}(y)||\le\epsilon \quad\forall k,\quad \text{whenever}\quad ||x-y||\le\delta, $$

    where δ only depends on ε. Furthermore, for ∀c>0, there exists a constant α(c) such that ||fi,k(θk+ν)||<α(c) for ∀ν with \(||\nu ||\le c, \forall i\in \mathcal {V},k=1,2,\dots \)

  • a) The adjacent matrices W(k) ∀k≥0 are doubly stochastic;

    b) There exists a constant 0<κ<1 such that

    $$\begin{array}{*{20}l} w_{ij}(k)\ge\kappa\quad\forall j\in N_{i}(k)\quad \forall i\in\mathcal{V}\quad\forall k\ge 0, \end{array} $$

    c) The digraph \(\mathcal {G}_{\infty }=\{\mathcal {V},\mathcal {E}_{\infty }\}\) is strongly connected, where

    $$\begin{array}{*{20}l} {}\mathcal{E}_{\infty}=\{(j,i):(j,i)\in\mathcal{E}(k)\text{ for infinitely many indices }k\}, \end{array} $$

    d) There exists a positive integer B such that

    $$\begin{array}{*{20}l} (j,i)\in\mathcal{E}(k)\cup\mathcal{E}(k+1)\cup\cdots\cup\mathcal{E}(k+B-1) \end{array} $$

    for all \((j,i)\in \mathcal {E}_{\infty }\) and any k≥0.

  • For any \(i\in \mathcal {V}\), the noise sequence {εi,k+1}k≥0 is such that

    $${\lim}_{T\to 0}\limsup_{k\to\infty}\frac{1}{T}||\sum_{m=n_{k}}^{m(n_{k},t_{k})}a_{m}\epsilon_{i,m}||=0,\quad\forall t_{k}\in[0,T]$$

    where \(m(k,T)\triangleq \max \left \{m:\sum _{i=k}^{m}a_{i}\le T\right \}\) and {nk} denotes the indices of any convergent subsequence \(\left \{x_{i,n_{k}}-\theta _{n_{k}}\right \}\).

  • \(g_{k}(\cdot):\mathbb {R}^{l}\to \mathbb {R}^{l}\) is equi-continuous with respect to k and is such that \(||d_{k}(x)||\le \gamma _{k}||x-\theta _{k}|| \forall x,k=1,2,\dots \), where

    $$d_{k}(x)\triangleq g_{k}(x)-g_{k}(\theta_{k})-(x-\theta_{k}),$$

    γk=o(ak), and \(\sum _{k=1}^{\infty }\gamma _{k}<\infty \).

  • ||ξk||=o(ak) and \(\sum _{k=1}^{\infty }||\xi _{k}||<\infty \).

A1 and A2 are the standard assumptions for stochastic approximation. A3 implies the local boundedness of the functions fi,k(·). Notice that the upper bound α(c) in A3 should be uniform with respect to k.

A4 describes the information exchanging among agents. We refer to [9] for the detailed explanation. Set \(\Phi (k,k+1)\triangleq \mathbf {I}_{N}\) and

$$\Phi(k,s)\triangleq W(k)\cdots W(s)\quad\forall k\ge s.$$

By Proposition 1 in [9] it follows that there exist constants c>0 and 0<ρ<1 such that

$$\begin{array}{*{20}l} ||\Phi(k,s)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T}||\le c\rho^{k-s+1}\quad\forall k\ge s. \end{array} $$
(9)

Notice that in A5 b), the noise condition is required only along the indices of any convergent subsequence \(\left \{x_{i,n_{k}}-\theta _{n_{k}}\right \}\). As to be seen in the next section, this makes the convergence analysis much easier compared with requiring the noise condition to hold along the whole sequence.

In A6, dk(x) measures the difference between the estimation error xi,kθk and the prediction error gk(xi,k)−gk(θk). This assumption implies that the dynamic of the root, i.e. gk(·), will tend to be a linear function as time k goes to infinity. For example, if the dynamics of the changing roots is gk(x)=x+c, then A6 holds with γk=0.

3.2 Main result

Set \(\text {col}\left \{x_{1},\dots,x_{m}\right \}\triangleq \left (x_{1}^{T},\dots,x_{m}^{T}\right)^{T}\), and define

$$\begin{array}{*{20}l} &X_{k}\triangleq \text{col}\left\{x_{1,k},\dots,x_{N,k}\right\},\\ &\Theta_{k}\triangleq \mathbf{1}\otimes \theta_{k} \in\mathbb{R}^{Nl},\\ &\epsilon_{k}\triangleq \text{col}\left\{\epsilon_{1,k},\dots,\epsilon_{N,k}\right\},\\ &{{\Xi_{k}\triangleq \mathbf{1}\otimes \xi_{k} \in\mathbb{R}^{Nl}}},\\ &G_{k}(X_{k})\triangleq \text{col}\left\{g_{k}\left(x_{1,k}\right),\dots,g_{k}\left(x_{N,k}\right)\right\},\\ &F_{k+1}(X_{k})\triangleq \text{col}\left\{f_{1,k+1}\left(x_{1,k}\right),\dots,f_{N,k+1}\left(x_{N,k}\right)\right\},\\ &D_{k}(X_{k})\triangleq \text{col}\left\{d_{k}\left(x_{1,k}\right),\dots,d_{k}\left(x_{N,k}\right)\right\}. \end{array} $$

Further, we denote the disagreement vector of Xk by \(X_{\bot,k}\triangleq D_{\bot }X_{k}\) with \(D_{\bot }\triangleq \left (\mathbf {I}_{N}-\frac {\mathbf {1}\mathbf {1}^{T}}{N}\right)\otimes \mathbf {I}_{l}\). Define \(x_{k}\triangleq \frac {1}{N}\sum _{i=1}^{N}x_{i,k}\), the average of all agents’ estimates at time k. Define \(\Delta _{i,k}\triangleq x_{i,k}-\theta _{k}, \Lambda _{k}\triangleq X_{k}-\Theta _{k}\), and \(\Delta _{k}\triangleq x_{k}-\theta _{k}\).

Theorem 1

Let {xi,k} be the estimates produced by (3)–(7) with an arbitrary initial value xi,0. Assume A1-A4 and A6 hold. If for a fixed sample ω, A5 holds for all agents, and A7 holds, then for this ω, the following assertion takes place:

i) There exists a positive integer k0 depending on ω such that

$$ x_{i,k+1}=\sum_{j\in N_{i}(k)}w_{ij}(k)g_{j}\left(x_{j,k}\right)+a_{k}O_{i,k+1}. $$
(10)

or in the compact form

$$ X_{k+1}=(W(k)\otimes\mathbf{I}_{l})G_{k}(X_{k})+a_{k}\left(F_{k+1}\left(G(X_{k})\right)+\epsilon_{k+1}\right) $$
(11)

for any kk0;

ii)

$$\begin{array}{*{20}l} {\lim}_{k\to\infty}X_{\bot,k}=\mathbf{0},\quad {\lim}_{k\to\infty} \Lambda_{k}=\mathbf{0}. \end{array} $$

Theorem 1 i) shows that the truncation ceases after a finite number of steps. This implies that the difference between the estimate xi,k and hk−1(x) is bounded, which is desirable as to be shown in Lemma 1. Before we move on to the proof of Theorem 1, we first show that the truncation mechanism is reasonable.

Lemma 1

If A6 and A7 hold, the sequence {hk(x)−θk+1} is bounded for any x.

Proof

By A6 and A7 from (2) it follows that

$$\begin{array}{*{20}l} &||h_{k}(x^{*})-\theta_{k+1}||=||g_{k}\left(h_{k-1}(x^{*})\right)-g_{k}(\theta_{k})-\xi_{k+1}||\\ &=||d_{k}\left(h_{k-1}(x^{*})\right)+h_{k-1}(x^{*})-\theta_{k}-\xi_{k+1}||\\ &\le(1+\gamma_{k})||h_{k-1}(x^{*})-\theta_{k}||+||\xi_{k+1}||\\ &\le\prod_{i=1}^{k}\left(1+\gamma_{i}\right)||g_{1}(x^{*})\,-\,\theta_{1}||\,+\,\sum_{i=1}^{k}\prod_{j=i+1}^{k}\left(1+\gamma_{j}\right)||\xi_{j+1}||\\ &\le\prod_{i=1}^{\infty}(1+\gamma_{i})||g_{1}(x^{*})-\theta_{1}||\,+\,\sum_{i=1}^{\infty}\prod_{j=i+1}^{\infty}\left(1+\gamma_{j}\right)||\xi_{j+1}||\\ &\triangleq\eta, \end{array} $$

where \(\prod _{i=1}^{\infty }(1+\gamma _{i})<\infty \) is implied by \(\sum _{j=1}^{\infty }\gamma _{j}<\infty \). □

As we mentioned in Section 2, Lemma 1 shows that the distance between {hk−1(x)} and {θk} is bounded. Since we hope the estimate {xi,k} generated by the algorithm (3)–(7) track the root {θk}, the truncation mechanism \(\mathbb {I}_{\left [||x^{\prime }_{i,k+1}-h_{k}(x^{*})||\le M_{\hat {\sigma }_{i,k}}\right ]}\) is intuitively reasonable.

4 Auxiliary sequences

The next two sections of this paper focus on the proof of Theorem 1. But prior to analyzing {xi,k}, we need to introduce two auxiliary sequences \(\left \{\tilde {x}_{i,k}\right \}\) and \(\left \{\tilde {\epsilon }_{i,k}\right \}\) for each agent \(i\in \mathcal {V}\). The motivation of constructing these two sequences comes from the character of distributed algorithm with expanding truncation. Recall the convergence analysis of stochastic approximation algorithm with expanding truncation (SAAWET) [28]. The key step is to show that the truncations cease after a finite number of steps, therefore, the boundedness of the estimates is established. If the number of truncations increases unboundedly, then the estimate is pulled back to x infinitely many times. This produces a convergent subsequence from the estimate sequence. Then a contradiction can be shown analysing the property along this subsequence, which proves the boundedness of the estimates.

Although the problem in this paper is different from the one in [28] since the regression function is time-varying in this paper, we use the same approach to prove the boundedness of the estimates. Notice the distributed algorithm with expanding truncation (3)–(7). When \({\lim }_{k\to \infty }\sigma _{i,k}=\infty ~\forall i\in \mathcal {V}\), the estimate xi,k is pulled back to hk−1(x) infinitely times. By lemma 1, we know that ||hk−1(x)−θk|| is bounded. So {Δi,k} contains a convergent subsequence. However, {Δk} may still not contain any convergent subsequences. This is because truncation may occur at different times for different \(i\in \mathcal {V}\). Therefore, the analysis approach used for SAAWET cannot directly be applied to the algorithm (3)–(7).

To overcome this difficulty, we introduce the auxiliary sequences \(\left \{\tilde {x}_{i,k}\right \}\) and \(\left \{\tilde {\epsilon }_{i,k}\right \}\). As to be shown, the auxiliary sequences \(\left \{\tilde {x}_{i,k}\right \}\) satisfies the recursions (19)–(21), for which the number of truncation at time k for all agents is the same and the estimates \(\tilde {x}_{i,k}\) for all the agents are pulled back to hk−1(x) when σk>σk−1. The auxiliary noise \(\left \{\tilde {\epsilon }_{i,k}\right \}\) satisfies a condition similar to A5 b). These make the analysis for (19)–(21) feasible.

It is shown below that the important feature of the auxiliary sequences consists in that \(\left \{\tilde {x}_{i,k}\right \}\) and {xi,k} coincide in a finite number of steps, which means that the convergence of these two sequences is equivalent.

Denote by \(\tau _{i,m}\triangleq \inf \left \{k:\sigma _{i,k}=m\right \}\) the smallest time when the truncation number of agent i has reached m, by \(\tau _{m}\triangleq \min _{i\in \mathcal {V}}\tau _{i,m}\) the smallest time when at least one of agents has its truncation number reached m, and by

$$\begin{array}{*{20}l} \sigma_{k}\triangleq\max_{i\in\mathcal{V}}\sigma_{i,k} \end{array} $$
(12)

the largest truncation number among all agents at time k. Set \(\tilde {\tau }_{i,m}\triangleq \tau _{i,m}\land \tau _{m+1}\), where ab= min{a,b}.

For any \(i\in \mathcal {V}\), define the auxiliary sequences \(\left \{\tilde {x}_{i,k}\right \}_{k\ge 0}\) and \(\left \{\tilde {\epsilon }_{i,k}\right \}_{k\ge 0}\) as follows:

$$\begin{array}{*{20}l} \tilde{x}_{i,k}\triangleq h_{k-1}(x^{*}), \tilde{\epsilon}_{i,k+1} \\ \triangleq -f_{i,k+1}\left(h_{k}(x^{*})\right),~\forall k:\tau_{m}\le k<\tilde{\tau}_{i,m}, \end{array} $$
(13)
$$\begin{array}{*{20}l} \tilde{x}_{i,k}\triangleq x_{i,k}, \tilde{\epsilon}_{i,k+1} \\ \triangleq \epsilon_{i,k+1},~\forall k:\tilde{\tau}_{i,m}\le k<\tau_{m+1}, \end{array} $$
(14)

where m is an integer.

Note that for the considered sample ω there exists a unique integer m≥0 corresponding to an integer k≥0 such that τmk<τm+1. By definition \(\tilde {\tau }_{i,m}\le \tau _{m+1}~\forall i\in \mathcal {V}\). So, \(\left \{\tilde {x}_{i,k}\right \}_{k\ge 0}\) and \(\left \{\tilde {\epsilon }_{i,k}\right \}_{k\ge 0}\) are uniquely determined by the sequences {xi,k}k≥0 and {εi,k}k≥0.

Lemma 2

For k∈[τm,τm+1), the following assertions hold:

$$\begin{array}{*{20}l} \text{i)}~\tilde{x}_{i,k}&=h_{k-1}(x^{*}),~\tilde{\epsilon}_{i,k+1} \\ &=-f_{i,k+1}\left(h_{k}(x^{*})\right),~\text{if}~\sigma_{i,k}<m; \end{array} $$
(15)
$$\begin{array}{*{20}l} &\text{ii)}~\tilde{x}_{i,k}=x_{i,k},~\tilde{\epsilon}_{i,k+1}=\epsilon_{i,k+1},~\text{if}~\sigma_{i,k}=m; \end{array} $$
(16)
$$\begin{array}{*{20}l} &\text{iii)}~\tilde{x}_{j,k}=h_{k-1}(x^{*}),~\text{if}~\sigma_{j,k-1}<m; \end{array} $$
(17)
$$\begin{array}{*{20}l} &\text{iv)}~\tilde{x}_{j,k+1}=h_{k}(x^{*}),~\forall j\in\mathcal{V},~\text{if}~\sigma_{k+1}=m+1. \end{array} $$
(18)

Proof

i) Since σi,k<m, by the definition of τi,m and the fact that k∈[τm,τm+1), we know τi,m>k. Thus, \(\tilde {\tau }_{i,m}=\tau _{i,m}\land \tau _{m+1}>k\). Hence, we conclude (15) from (13).

ii) Since σi,k=m, by definition we have τi,mk. Hence \(\tilde {\tau }_{i,m}=\tau _{i,m}\land \tau _{m+1}=\tau _{i,m}\le k\). So by (14) we conclude (16).

iii) By τmk<τm+1 we know σj,km. We consider two cases: σj,k=m, and σj,k<m. 1) For σj,k=m, since σj,k−1<m, we know that truncation happens at time k for agent j. Truncation happens only when one of the following cases hold true: a) \(\sigma _{j,k-1}<\hat {\sigma }_{j,k-1}\). For this case, by (3) we have \(x^{\prime }_{j,k}=h_{k-1}(x^{*})\), hence by (4) we have xj,k=hk−1(x); b) \(\|x^{\prime }_{j,k}-h_{k-1}(x^{*})\|> M_{\hat {\sigma }_{j,k-1}}\). For this case, by (4) we have xj,k=hk−1(x). In conclusion, when σj,k=m holds true, we have xj,k=hk−1(x). Furthermore, from (16) it follows that \(\tilde {x}_{j,k}=x_{j,k}\) if σj,k=m. So, we have \(\tilde {x}_{j,k}=h_{k-1}(x^{*})\). 2) For σj,k<m, from (15) we have \(\tilde {x}_{j,k}=h_{k-1}(x^{*})\).

iv) From k∈[τm,τm+1] we know σk=m. Hence from σk+1=m+1 by definition we have τm+1=k+1, and k+1∈[τm+1,τm+2). By σk=m we see \(\sigma _{j,k}<m+1~\forall j\in \mathcal {V}\). Then we derive (18) by (17). □

Lemma 3

The auxiliary sequences \(\{\tilde {x}_{i,k}\},~\{\tilde {\epsilon }_{i,k}\}\) defined by (13)(14) satisfy the following recursions:

$$\begin{array}{*{20}l} \hat{x}_{i,k+1}&=\sum_{j\in N_{i}(k)}w_{ij}(k)g_{k}(\tilde{x}_{j,k}) \\ & +a_{k}\left(f_{i,k+1}\left(g_{k}\left(\tilde{x}_{i,k}\right)\right)+\tilde{\epsilon}_{i,k+1}\right) \end{array} $$
(19)
$$\begin{array}{*{20}l} \tilde{x}_{i,k+1}&=\hat{x}_{i,k+1}\mathbb{I}_{\left[||\hat{x}_{j,k+1}-h_{k}(x^{*})||\le M_{\sigma_{k}},\forall j\in\mathcal{V}\right]}\\ &+h_{k}(x^{*})\mathbb{I}_{\left[\exists j\in\mathcal{V}~||\hat{x}_{j,k+1}-h_{k}(x^{*})>M_{\sigma_{k}}||\right]} \end{array} $$
(20)
$$\begin{array}{*{20}l} \sigma_{k+1}&=\sigma_{k}+\mathbb{I}_{\left[\exists j\in\mathcal{V}~||\hat{x}_{j,k+1}-h_{k}(x^{*})>M_{\sigma_{k}}||\right]},~\sigma_{0}=0. \end{array} $$
(21)

Proof

We prove this by induction.

First we prove (19)–(21) for k=0. Since 0∈[τ0,τ1) and \(\sigma _{0}=0~\forall i\in \mathcal {V}\), by (16) we have \(\tilde {x}_{i,0}=x_{i,0},~\tilde {\epsilon }_{i,1}=\epsilon _{i,1}~\forall i\in \mathcal {V}\). Then by \(\hat {\sigma }_{i,0}=\sigma _{i,0}=0~\forall i\in \mathcal {V}\), from (3)(19) we see

$$\begin{array}{*{20}l} \hat{x}_{i,1}=x^{\prime}_{i,1},~\forall i\in\mathcal{V}. \end{array} $$
(22)

Now we prove that \(\tilde {x}_{i,1}\) and σ1 generated by (19)–(21) are consistent with the definition (12)(13)(14). We consider two cases:

i) There is no truncation at time k=1, i.e., \(\sigma _{i,1}=0~\forall i\in \mathcal {V}\). Since σi,0=0, by (5) we know that \(\phantom {\dot {i}\!}||x_{i,1}^{\prime }-h_{0}(x^{*})||\le M_{0}\). Then we have \(\phantom {\dot {i}\!}x_{i,1}=x^{\prime }_{i,1}\) by (4), and \(\tilde {x}_{i,1}=\hat {x}_{i,1}, \sigma _{1}=0\) by (20)(21). Combining these with (22) it is shown that \(\tilde {x}_{i,1}=x_{i,1}~\forall i\in \mathcal {V}\), which is consistent with (14) since \(\tilde {\tau }_{i,0}\le 1<\tau _{1}\). By (12) we see \(\sigma _{1}=\max _{i\in \mathcal {V}}\sigma _{i,1}=0\), which is consistent with the one derived from (21).

ii) There is a truncation at k=1 for agent i0, i.e., \(\sigma _{i_{0},1}=1\). Then by (4)(5) we have \(x_{i_{0},1}=h_{0}(x^{*}), ||x_{i_{0},1}^{\prime }-h_{0}(x^{*})||>M_{0}\). Hence \(||\hat {x}_{i_{0},1}-h_{0}(x^{*})||>M_{0}\) by (22). From (20)(21) we have \(\hat {x}_{i,1}=h_{0}(x^{*})~\forall i\in \mathcal {V}\) and σ1=1. By (12) from \(\sigma _{i_{0},1}=1\) we derive σ1=1. Since 0∈[τ0,τ1) and σ1=1, by (18) we have \(\tilde {x}_{i,1}=h_{0}(x^{*})~\forall i\in \mathcal {V}\). Thus, \(\tilde {x}_{i,1}\) and σ1 defined by (13)(14)(12) are consistent with those generated by (19)–(21).

In summery, we prove the lemma for k=0.

By induction, we assume (19)–(21) hold for \(k=0,1,\dots,p\). At a fixed sample ω for a given integer p there exists a unique integer m such that τmp<τm+1. Now we aim to show that (19)–(21) hold for k=p+1. Before this, we first express \(\hat {x}_{i,p+1}~\forall i\in \mathcal {V}\) produced by (19) for the following two cases:

Case 1: σi,p<m. Since p∈[τm,τm+1), by (15) we see

$$\begin{array}{*{20}l} \tilde{x}_{i,p}=h_{p-1}(x^{*}),\quad\tilde{\epsilon}_{i,p+1}=-f_{i,p+1}\left(h_{p}(x^{*})\right). \end{array} $$
(23)

From σi,p<m it follows that σj,p−1<mjNi(p) by (5). Then by (17) we have \(\tilde {x}_{i,p}=h_{p-1}(x^{*})~\forall j\in N_{i}(p)\), which combining with (19)(23) shows

$$\begin{array}{*{20}l} \hat{x}_{i,p+1}=h_{p}(x^{*})\quad \forall i:\sigma_{i,p}<m \end{array} $$
(24)

Case 2: σi,p=m. By τmp<τm+1 we have \(\sigma _{j,p}\le m~\forall j\in \mathcal {V}\) and hence by (6) we derive

$$\begin{array}{*{20}l} \hat{\sigma}_{i,p}=m,\quad\forall i:\sigma_{i,p}=m \end{array} $$
(25)

Then by (3)

$$\begin{array}{*{20}l} x^{\prime}_{i,p+1}&=\sum_{j\in N_{i}(p)}w_{ij}(p)g_{p}(x_{j,p})\\ &+a_{p}\Big(f_{i,p+1}\big(g_{p}(x_{i,p})\big)+\epsilon_{i,p+1}\Big). \end{array} $$
(26)

From σi,p=m and p∈[τm,τm+1), by (16) it can be shown that

$$\begin{array}{*{20}l} \tilde{x}_{i,p}=x_{i,p},\quad\tilde{\epsilon}_{i,p+1}=\epsilon_{i,p+1}. \end{array} $$
(27)

Substituting (27) into (19), by (26) we know that

$$\begin{array}{*{20}l} \hat{x}_{i,p+1}=x^{\prime}_{i,p+1}\quad\forall i:\sigma_{i,p}=m. \end{array} $$
(28)

So we have expressed \(\hat {x}_{i,p+1}~\forall i\in \mathcal {V}\) produced by (19) for the two cases above.

Since τmp<τm+1, we have σp<m+1 and hence σp+1m+1. From τmp it follows that σp=m and σp+1m, hence mσp+1m+1.

Now we show that \(\tilde {x}_{i,p+1}\) and σp+1 generated by (19)–(21) are consistent with their definitions (12)(13)(14). We prove this for two cases σp+1=m+1 and σp+1=m.

Case 1: σp+1=m+1. We first show

$$\begin{array}{*{20}l} \sigma_{i,p+1}\le m,~\text{if}~\sigma_{i,p}<m \end{array} $$
(29)

for the following two cases: 1) σi,p<m and σj,p<mjNi(p). For this case by (6) we have \(\hat {\sigma }_{i,p}<m\), and hence \(\sigma _{i,p+1}\le \hat {\sigma }_{i,p}+1\le m\) by (5). 2) σi,p<m and σj,p=m for some jNi(p). For this case we derive \(\hat {\sigma }_{i,p}=m, x^{\prime }_{i,p+1}=h_{p}(x^{*})\) by (3)(6). So, by (5) we derive \(\sigma _{i,p+1}=\hat {\sigma }_{i,p}=m\). Thus, σi,p+1m when σi,p<m. Hence (29) holds. Furthermore, this means that

$$\begin{array}{*{20}l} \sigma_{i,p+1}=m+1~\text{only if}~\sigma_{i,p}=m. \end{array} $$
(30)

Since we are considering the case where σp+1=m+1, by definition we know that there exists some agent \(i_{0}\in \mathcal {V}\) such that \(\sigma _{i_{0},p+1}=m+1\). Then \(\sigma _{i_{0},p}=m\) by (30), and hence \(\hat {\sigma }_{i_{0},p}=m\) from (25). Then from \(\sigma _{i_{0},p+1}=m+1\) by (5) we know that \(||x^{\prime }_{i_{0},p+1}-h_{p}(x^{*})||>M_{m}\). So from (20)(21) we derive \(\tilde {x}_{i,p+1}=h_{p}(x^{*})~\forall i\in \mathcal {V}\) and σp+1=m+1, which is consistent with the σp+1 defined by (12). Since σp+1=m+1 and p∈[τm,τm+1), by (18) we see that \(\tilde {x}_{i,p+1}=h_{p}(x^{*})~\forall i\in \mathcal {V}\), which is consistent with that generated by (19)–(21).

Case 2: σp+1=m. In this case \(\sigma _{i,p+1}\le m~\forall i\in \mathcal {V}\). By (25), from (4)(5) we see that

$$\begin{array}{*{20}l} {}||x^{\prime}_{i,p+1}-h_{p}(x^{*})||\le M_{m},~x_{i,p+1}=x^{\prime}_{i,p+1}~\forall~i:\sigma_{i,p}=m. \end{array} $$
(31)

So, by (28) we derive

$$\begin{array}{*{20}l} ||\hat{x}_{i,p+1}-h_{p}(x^{*})||\le M_{m},~\forall i:\sigma_{i,p}=m. \end{array} $$
(32)

From (24) we have \(||\hat {x}_{i,p+1}-h_{p}(x^{*})||=0\le M_{m}~\forall i:\sigma _{i,p}<m\), which incorporating with (32) yields \(||\hat {x}_{i,p+1}-h_{p}(x^{*})||\le M_{m}\forall i\in \mathcal {V}\). Then from (20) it follows

$$\begin{array}{*{20}l} \tilde{x}_{i,p+1}=\hat{x}_{i,p+1}~\forall i\in\mathcal{V},\quad \sigma_{p+1}=m. \end{array} $$
(33)

which means that σp+1 is consistent with that defined by (12).

It remains to show that \(\tilde {x}_{i,p+1}\) generated by (19)–(21) is consistent with that defined by (13)(14). We consider two cases: 1) σi,p=m. For this case, by (25)(31)(32) we see \(\tilde {x}_{i,p+1}=x_{i,p+1}~\forall i:\sigma _{i,p}=m\). By σp+1=m we see p∈[τm,τm+1), and hence \(\tilde {x}_{i,p+1}=x_{i,p+1}\) by (16). So the consistency assertion holds for any i with σi,p=m. 2) σi,p<m. For this case, from σp+1=m we see p+1∈[τm,τm+1), and hence by σi,p<m and (17) we know \(\tilde {x}_{i,p+1}\) defined by (13)(14) is equal to hp(x). By (24)(33) we derive \(\tilde {x}_{i,p+1}=h_{p}(x^{*})\). So the consistency assertion holds for i with σi,p<m too.

In summery, \(\tilde {x}_{i,p+1}\) and σp+1 generated by (19)–(21) are consistent with their definitions (12)(13)(14). So the induction is complete. □

Lemma 4

Assume A4 holds. Then

i)

$$\begin{array}{*{20}l} \sigma_{j,k}+Bd_{i,j}\ge\sigma_{i,k}\quad\forall j\in\mathcal{V}~\forall k>0, \end{array} $$
(34)

where di,j is the length of the shortest directed path from i to j in \(\mathcal {G}_{\infty }\), and B is the positive integer given in A4 d).

ii)

$$\begin{array}{*{20}l} \tilde{\tau}_{j,m}\le\tau_{m}+BD\quad\forall j\in\mathcal{V}~\text{for}~m>1, \end{array} $$
(35)

where \(D\triangleq \max _{i,j\in \mathcal {V}}d_{i,j}\).

Proof

i) Since \(\mathcal {G}_{\infty }\) is strongly connected by A4 c), for any \(j\in \mathcal {V}\) there exists a sequence of nodes \(i_{1},i_{2},\dots,i_{d_{i,j}-1}\) such that \((i,i_{1})\in \mathcal {E}_{\infty },(i_{1},i_{2})\in \mathcal {E}_{\infty },\dots,\left (i_{d_{i,j}-1,j}\right)\in \mathcal {E}_{\infty }\).

Noticing that \((i,i_{1})\in \mathcal {E}_{\infty }\), by A4 d) we have

$$\begin{array}{*{20}l} (i,i_{1})\in\mathcal{E}(k)\cup\mathcal{E}(k+1)\cup\dots\cup\mathcal{E}(k+B-1). \end{array} $$

Therefore, there exists a positive integer k∈[k,k+B−1] such that \((i,i_{1})\in \mathcal {E}(k^{\prime })\). So, \(i\in N_{i_{1}}(k^{\prime })\), and hence by (6) and (5) we have

$$\begin{array}{*{20}l} \sigma_{i_{1},k+B}\ge\sigma_{i_{1},k^{\prime}+1}\ge\hat{\sigma}_{i_{1},k^{\prime}}\ge\sigma_{i,k^{\prime}}\ge\sigma_{i,k}. \end{array} $$

Repeat this procedure, we can obtain \(\sigma _{i_{2},k+2B}\ge \sigma _{i_{1},k+B}\ge \sigma _{i,k}\). Finally we can reach (34).

ii) For some m≥1, let τm=k1. Then there exists an i such that τi,m=k1. By (34) we have \(\sigma _{j,k_{1}+Bd_{i,j}}\ge \sigma _{i,k_{1}}=m~\forall j\in \mathcal {V}\).

For the case where \(\sigma _{j,k_{1}+Bd_{i,j}}=m~\forall j\in \mathcal {V}\), we have \(\tau _{j,m}\le k_{1}+Bd_{i,j}~\forall j\in \mathcal {V}\). By noticing τm=k1, by the definition of \(\tilde {\tau }_{j,m}\) we have (35):

$$\begin{array}{*{20}l} \tilde{\tau}_{j,m}\le\tau_{j,m}\le\tau_{m}+Bd_{i,j}\le\tau_{m}+BD\quad j\in\mathcal{V}. \end{array} $$

For the case where \(\sigma _{j,k_{1}+Bd_{i,j}}>m\) for some \(j\in \mathcal {V}\), we have τm+1k1+Bdi,j for some \(j\in \mathcal {V}\), and hence τm+1τm+BD. Again, we obtain (35):

$$\begin{array}{*{20}l} \tilde{\tau}_{j,m}\le\tau_{m+1}\le\tau_{m}+BD\quad j\in\mathcal{V}. \end{array} $$

Corollary 1

If \(\sigma _{k}\xrightarrow [k\to \infty ]\infty \), then \({\lim }_{k\to \infty }\sigma _{i,k}=\infty ~\forall i\in \mathcal {V}\).

This corollary can be easily obtained from (34).

Lemma 5

Assume A5 holds at the sample path ω under consideration, A1, A3,A6, and A7 hold. Then for this ω

$$\begin{array}{*{20}l} {\lim}_{T\to\infty}&\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m(n_{k},t_{k})\land(\tau_{\sigma_{n_{k}}+1}-1)}a_{s}\tilde{\epsilon}_{s+1}||=0\\ &\forall t_{k}\in[0,T] \end{array} $$
(36)

along indices {nk} whenever \(\left \{\widetilde {\Lambda }_{n_{k}}\right \}\) converges at ω, where \(\tilde {\epsilon }_{k}\triangleq col\left \{\tilde {\epsilon }_{1,k},\dots,\tilde {\epsilon }_{N,k}\right \}, \tilde {X}_{k}\triangleq col\left \{\tilde {x}_{1,k},\dots,\tilde {x}_{N,k}\right \}\), and \(\widetilde {\Lambda }_{k}\triangleq \tilde {X}_{k}-\Theta _{k}\).

Proof

It suffices to show

$$\begin{array}{*{20}l} {\lim}_{T\to\infty}&\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m\left(n_{k},t_{k}\right)\land\left(\tau_{\sigma_{n_{k}}+1}-1\right)}a_{s}\tilde{\epsilon}_{i,s+1}||=0\\ &\forall t_{k}\in[0,T]\text{ for sufficiently large }K>0 \end{array} $$
(37)

along indices {nk} whenever \(\left \{\tilde {x}_{i,n_{k}}-\theta _{n_{k}}\right \}\) converges at sample ω where A5 b) holds for agent i.

We consider two cases:

Case 1: \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \). From definition we can obtain

$$\begin{array}{*{20}l} \tau_{\sigma+1}=\infty~\text{when }{\lim}_{k\to\infty}\sigma_{k}=\sigma. \end{array} $$
(38)

By (35) we have \(\tilde {\tau }_{i,\sigma }\le \tau _{\sigma }+BD\), hence by (14)(38)

$$\begin{array}{*{20}l} \tilde{x}_{i,k}=x_{i,k},~\tilde{\epsilon}_{i,k+1}=\epsilon_{i,k+1}\quad\forall k\ge\tau_{\sigma}+BD. \end{array} $$
(39)

So,

$$\begin{array}{*{20}l} ||\sum_{s=k}^{m(k,t)\land(\tau_{\sigma_{k}+1}-1)}a_{s}\tilde{\epsilon}_{i,s+1}||=||\sum_{s=k}^{m(k,t)}a_{s}\epsilon_{i,s+1}|| \end{array} $$

for any t>0 and any sufficiently large k. Then by A5 b) we prove (37).

Case 2: \({\lim }_{k\to \infty }\sigma _{k}=\infty \). In this case we prove (37) for three separate cases:

i): \(\tilde {\tau }_{i,\sigma _{n_{p}}}\le n_{p}\). For this case, \(\left [n_{p},\tau _{\sigma _{n_{p}}+1}\right)\subset \left [\tau _{i,\sigma _{n_{p}}},\tau _{\sigma _{n_{p}}+1}\right)\). So by (14) we have

$$\begin{array}{*{20}l} \tilde{x}_{i,s}=x_{i,s},~\tilde{\epsilon}_{i,s+1}=\epsilon_{i,s+1}\quad\forall n_{p}\le s\le\tau_{\sigma_{n_{p}}+1}. \end{array} $$
(40)

Thus, for any tp∈[0,T]

$$\begin{array}{*{20}l} &||\sum_{s=n_{p}}^{m\left(n_{p},t_{p}\right)\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}\tilde{\epsilon}_{i,s+1}||\\ &=||\sum_{s=n_{p}}^{m\left(n_{p},t_{p}\right)\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}\epsilon_{i,s+1}||. \end{array} $$
(41)

Notice that by (40), \(\tilde {x}_{i,n_{p}}=x_{i,n_{p}}\), and the indices {np} is taken when \(\{\tilde {x}_{i,n_{p}}-\theta _{n_{p}}\}\) is a convergent subsequence. So \(\{x_{i,n_{p}}-\theta _{n_{p}}\}\) is a convergent subsequence as well. It can be seen \(\sum _{s=n_{p}}^{m(n_{p},t_{p})\land (\tau _{\sigma _{n_{p}}+1}-1)}a_{s}\le \sum _{s=n_{p}}^{m(n_{p},t_{p})}a_{s}\le t_{p}\le T\). Hence, from (41) and A5 b) we conclude (37).

ii): \(\tilde {\tau }_{i,\sigma _{n_{p}}}>n_{p}\) and \(\tilde {\tau }_{i,\sigma _{n_{k}}}=\tau _{\sigma _{n_{k}+1}}\). By definition of τk and σk we have \(\tau _{\sigma _{k}}\le k\), and hence \(\tau _{\sigma _{n_{p}}}\le n_{p}\). Then \([n_{p},\tau _{\sigma _{n_{p}+1}})\subset [\tau _{\sigma _{n_{p}}},\tilde {\tau }_{i,\sigma _{n_{p}}})\), and hence by (13) we have

$$\begin{array}{*{20}l} \tilde{x}_{i,s}=h_{s-1}(x^{*}),~\tilde{\epsilon}_{i,s+1}=-f_{i,s+1}(h_{s}(x^{*}))\\ \forall s:n_{p}\le s<\tau_{\sigma_{n_{p}}+1}. \end{array} $$
(42)

From \(\tilde {\tau }_{i,\sigma _{n_{p}}}=\tau _{\sigma _{n_{p}}+1}\) by (35) we know that \(\tau _{\sigma _{n_{p}}+1}\le \tau _{\sigma _{n_{p}}}+BD\le n_{p}+BD\). Then for any tp∈[0,T], utilizing A1, A3 and Lemma 1 we have

$$\begin{array}{*{20}l} &||\sum_{s=n_{p}}^{m\left(n_{p},t_{p}\right)\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}\tilde{\epsilon}_{i,s+1}|| \\ & \le\sum_{s=n_{p}}^{n_{p}+BD}a_{s}||f_{i,s+1}\left(h_{s}(x^{*})\right)|| \\ &=\sum_{s=n_{p}}^{n_{p}+BD}a_{s}||f_{i,s+1}\left(\theta_{s+1}+h_{s}(x^{*})-\theta_{s+1}\right)||\\ &\le\sum_{s=n_{p}}^{n_{p}+BD}a_{s}\alpha(\eta) \\ &\le BD\cdot a_{n_{p}}\cdot\alpha(\eta)\xrightarrow[p\to\infty]{}0, \end{array} $$
(43)

and hence (37) holds for this case.

iii): \(\tilde {\tau }_{i,\sigma _{n_{p}}}>n_{p}\) and \(\tilde {\tau }_{i,\sigma _{n_{p}}}<\tau _{\sigma _{n_{p}}+1}\). For this case from definition we know that \(\tilde {\tau }_{i,\sigma _{n_{p}}}=\tau _{i,\sigma _{n_{p}}}\). So by (35) we have \(\tau _{i,\sigma _{n_{p}}}\le \tau _{\sigma _{n_{p}}}+BD\). Noticing \(\tau _{\sigma _{n_{p}}}\le n_{p}\), we conclude that

$$\begin{array}{*{20}l} \tau_{\sigma_{n_{p}}}\le n_{p}<\tilde{\tau}_{i,\sigma_{n_{p}}}=\tau_{i,\sigma_{n_{p}}}\le n_{p}+BD \end{array} $$
(44)

So, \([n_{p},\tau _{i,\sigma _{n_{p}}})\subset [\tau _{\sigma _{n_{p}}},\tilde {\tau }_{i,\sigma _{n_{p}}})\). From this and \(\tilde {\tau }_{i,\sigma _{n_{p}}}=\tau _{i,\sigma _{n_{p}}}\), by (13)(14) we derive

$$\begin{array}{*{20}l} {}\tilde{x}_{i,s}=h_{s-1}(x^{*}),~\tilde{\epsilon}_{i,s+1}=-f_{s+1}\left(h_{s}(x^{*})\right),~\forall n_{p}\le s<\tau_{i,\sigma_{n_{p}}},\\ \tilde{x}_{i,s}=x_{i,s},~\tilde{\epsilon}_{i,s}=\epsilon_{i,s},~\forall \tau_{i,\sigma_{n_{p}}}\le s<\tau_{\sigma_{n_{p}+1}}. \end{array} $$

Consequently, for any tp∈[0,T]

$$\begin{array}{*{20}l} &||\sum_{s=n_{p}}^{m(n_{p},t_{p})\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}\tilde{\epsilon}_{i,s+1}||\\ &\le||\sum_{s=n_{p}}^{m(n_{p},t_{p})\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}f_{s+1}\left(h_{s}(x^{*})\right)\mathbb{I}_{[n_{p}\le s<\tau_{i,\sigma_{n_{p}}}]}||\\ &+||\sum_{s=\tau_{i,\sigma_{n_{p}}}}^{m\left(n_{p},t_{p}\right)\land\left(\tau_{\sigma_{n_{p}}+1}-1\right)}a_{s}\epsilon_{i,s+1}|| \end{array} $$
(45)

Analyze the first term at the right hand of (45):

$$\begin{array}{*{20}l} &||\sum_{s=n_{p}}^{m(n_{p},t_{p})\land(\tau_{\sigma_{n_{p}}+1}-1)}a_{s}f_{s+1}\big(h_{s}(x^{*})\big)\mathbb{I}_{[n_{p}\le s<\tau_{i,\sigma_{n_{p}}}]}||\\ &\le\sum_{s=n_{p}}^{\tau_{i,\sigma_{n_{p}}}}a_{s}||f_{s+1}\big(h_{s}(x^{*})\big)||\\ &\le\sum_{s=n_{p}}^{n_{p}+BD}a_{s}\alpha(\eta)\xrightarrow[p\to\infty]{}0 \end{array} $$

From the definition of τi,k, the truncation number of agent i at time \(\tau _{i,\sigma _{n_{p}}}\) is \(\sigma _{n_{p}}\) while it’s smaller than \(\sigma _{n_{p}}\) at time \(\tau _{i,\sigma _{n_{p}}}-1\). So by the algorithm (3)-(6) we know \(x_{i,\tau _{i,\sigma _{n_{p}}}}=h_{\tau _{i,\sigma _{n_{p}}}-1}(x^{*})\). Notice the second term at the right hand of (45). If we can show \(\{h_{\tau _{i,\sigma _{n_{p}}}-1}(x^{*})-\theta _{\tau _{i,\sigma _{n_{p}}}}\}\) is convergent, combining it with the fact \(\sum _{s=\tau _{i,\sigma _{n_{p}}}}^{m(n_{p},t_{p})\land \tau _{\sigma _{n_{p}}+1}-1}a_{s}\le \sum _{s=n_{p}}^{m(n_{p},t_{p})}a_{s}\le t_{p}\), from A5 we can conclude that the second term at the right hand of (45) tends to zero as p.

We show that {hk(x)−θk+1} is a convergent sequence by proving that the sequence is a Cauchy sequence. For two different integer j>i>0, we see

$$\begin{array}{*{20}l} &||h_{j}(x^{*})-\theta_{j+1}-h_{i}(x^{*})+\theta_{i+1}||\\ &\le||h_{j}(x^{*})-\theta_{j+1}-h_{j-1}(x^{*})+\theta_{j}\\ &+h_{j-1}(x^{*})-\theta_{j}-h_{j-2}(x^{*})+\theta_{j-1}+\cdots+h_{i+1}(x^{*})\\&-\theta_{i+2}\\ &-h_{i}(x^{*})+\theta_{i+1}||\\ &=||g_{j}\big(h_{j-1}(x^{*})\big)\,-\,g_{j}(\theta_{j})-h_{j-1}(x^{*})+\theta_{j}-\xi_{j+1}+\cdots||\\ &\le\sum_{l=i+1}^{j}||d_{l}(h_{l-1}(x^{*}))||+\sum_{l=i+2}^{j+1}||\xi_{l}||\\ &\le\sum_{l=i+1}^{j}\gamma_{l}\cdot\eta+\sum_{i+2}^{j+1}||\xi_{l}||, \end{array} $$

where the last inequality comes from A6 and Lemma 1. By A6 and A7 we know that \(\sum _{k=1}^{\infty }\gamma _{k}<\infty \) and \(\sum _{k=1}^{\infty }\xi _{k}<\infty \). Thus, for ∀ε>0, we can find a sufficiently large N>0 such that ||hj(x)−θj+1hi(x)+θi+1||<ε for ∀i,j>N, which means that {hk(x)−θk+1} is a Cauchy sequence. Furthermore, we can prove that (37) holds for case iii).

Since one of case i), ii), iii) must take place for the case \({\lim }_{k\to \infty }\sigma _{k}=\infty \), we can conclude that (37) holds in Case 2.

Combining Case 1 and Case 2, we conclude (36). □

Corollary 2

In (38) we show that τσ+1= when \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \). So, if \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \), by (20) we know that \(\{\tilde {x}_{i,k}\}\) and \(\{x_{i,k}\}, \{\tilde {\epsilon }_{i,k}\}\) and {εi,k} coincide in a finite number of steps.

5 Proof of the main result

Define:

$$\begin{array}{*{20}l} \Psi(k,s)\triangleq&\big[D_{\bot}(W(k)\otimes\mathbf{I}_{l})\big]\big[D_{\bot}(W(k-1)\otimes\mathbf{I}_{l})\big]\cdots\\ &\big[D_{\bot}(W(s)\otimes\mathbf{I}_{l})\big]\quad\forall k\ge s, \end{array} $$

and

$$\quad \Psi(k-1,k)\triangleq\mathbf{I}_{Nl}.$$

Since W(k) are doubly stochastic, by the property of Kronecker product (AB)(CD)=(AC)⊗(BD) we know that for ∀ks−1

$$\begin{array}{*{20}l} &\Psi(k,s)=\big(\Phi(k,s)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T}\big)\otimes\mathbf{I}_{l}, \end{array} $$
(46)
$$\begin{array}{*{20}l} &\Psi(k,s)D_{\bot}=\big(\Phi(k,s)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T}\big)\otimes\mathbf{I}_{l}. \end{array} $$
(47)

The following lemma characterizes the closeness of the auxiliary sequence \(\{\widetilde {\Lambda }_{k}\}_{k\geq 1}\) along its convergent subsequence \(\{\widetilde {\Lambda }_{n_{k}}\}\).

Lemma 6

Assume A1, A3, A4, and A6 hold. Further, for a fixed sample ω, assume A5 and A7 hold for all the agents. Let \(\{\widetilde {\Lambda }_{n_{k}}\}\) be a convergent subsequence of \(\{\widetilde {\Lambda }_{k}\}\) for ω. \(\widetilde {\Lambda }_{n_{k}}\xrightarrow [k\to \infty ]{}\widetilde {\Lambda }\). Then for this ω there is a T>0 such that for all sufficiently large k and any Tk∈[0,T]

$$\begin{array}{*{20}l} {}\tilde{X}_{m+1}\,=\,(W(m)\!\otimes\!\mathbf{I}_{l})G_{m}(\tilde{X}_{m})\,+\,a_{m}\Big(F_{m+1}\big(G(\tilde{X}_{m})\big)+\tilde{\epsilon}_{m+1}\Big) \end{array} $$
(48)

for any \(m=n_{k},\dots,m(n_{k},T_{k})\), and

$$\begin{array}{*{20}l} ||\widetilde{\Lambda}_{m+1}-\widetilde{\Lambda}_{n_{k}}||\le c_{1}T_{k}+M_{0}^{\prime} \end{array} $$
(49)
$$\begin{array}{*{20}l} ||\widetilde{\Delta}_{m+1}-\widetilde{\Delta}_{n_{k}}||\le c_{2}T_{k},\quad \forall n_{k}\le m\le m(n_{k},T_{k}), \end{array} $$
(50)

where \(\tilde {x}_{k}\triangleq \frac {1}{N}\sum _{k=1}^{N}\tilde {x}_{i,k}, \widetilde {\Delta }_{k}\triangleq \tilde {x}_{k}-\theta _{k}\), and \(\widetilde {\Delta }_{i,k}\triangleq \tilde {x}_{i,k}-\theta _{k}\).

Proof

Consider a fixed sample path ω where A5 and A7 hold.

Let \(C>||\widetilde {\Lambda }||\). There exists an integer kC>0 such that

$$\begin{array}{*{20}l} ||\widetilde{\Lambda}_{n_{k}}||\le C,~\gamma_{k}<a_{k},\\ ||\xi_{k+1}||<a_{k},~a_{k}<1,\quad\forall k\ge k_{C} \end{array} $$
(51)

From Lemma 5 we know that there exist constants T1>0 and k0>kC such that

$$\begin{array}{*{20}l} ||\sum_{s=n_{k}}^{m(n_{k},t_{k})\land(\tau_{\sigma_{n_{k}+1}-1})}a_{s}\tilde{\epsilon}_{s+1}||\le T_{0}\\ \forall t_{k}\in[0,T_{k}],~\forall T_{0}\in[0,T_{1}],~\forall k\ge k_{0}. \end{array} $$
(52)

Define

$$\begin{array}{*{20}l} M_{0}^{\prime}\triangleq C(c\rho+2)+1, \end{array} $$
(53)
$$\begin{array}{*{20}l} c_{1}\triangleq\sqrt{N}\cdot c_{2}+2+\frac{c(1+\rho)}{1-\rho}, \end{array} $$
(54)
$$\begin{array}{*{20}l} c_{2}\triangleq M_{0}^{\prime}+C+2+\alpha(2M_{0}^{\prime}+2C+3)+\frac{1}{\sqrt{N}}, \end{array} $$
(55)

where c and ρ are given by (9). Select T such that

$$\begin{array}{*{20}l} 0<T\le T_{1},~c_{1}T<1. \end{array} $$
(56)

For any kk0 and any Tk∈[0,T] define

$$\begin{array}{*{20}l} &s_{k}\triangleq\sup\{s\ge n_{k}:||\widetilde{\Lambda}_{j}-\widetilde{\Lambda}_{n_{k}}||\\ &\le c_{1}T_{k}+M_{0}^{\prime}\quad\forall n_{k}\le j\le s\} \end{array} $$
(57)

So from (51) and (56) it follows that

$$\begin{array}{*{20}l} ||\widetilde{\Lambda}_{j}||\le M^{\prime}_{0}+C+1,\quad\forall n_{k}\le j\le s_{k}. \end{array} $$
(58)

We intend to prove sk>m(nk,Tk). Assume the converse that for sufficiently large kk0 and any Tk∈[0,T]

$$\begin{array}{*{20}l} s_{k}\le m(n_{k},T_{k}). \end{array} $$
(59)

We first show that there exists a positive integer k1>k0 such that for any kk1

$$\begin{array}{*{20}l} s_{k}<\tau_{\sigma_{n_{k}}+1},~\forall k\ge k_{1},~\forall T_{k}\in[0,T]. \end{array} $$
(60)

We prove (60) for two cases: \({\lim }_{k\to \infty }\sigma _{k}=\infty \) and \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \).

i) \({\lim }_{k\to \infty }\sigma _{k}=\infty \): From (58) we know that \(||\tilde {x}_{i,n_{k}}-\theta _{n_{k}}||\le M^{\prime }_{0}+C+1~\forall i\in \mathcal {V}\). First, we prove that for sufficiently large k, truncation does not happen at time nk+1. For any \(i\in \mathcal {V}\), we consider the following two cases:

a) \(\tilde {x}_{i,n_{k}}\) and \(\tilde {\epsilon }_{i,n_{k}+1}\) take value as (13): From (19) we have

$$\begin{array}{*{20}l} {}\hat{x}_{i,n_{k}+1}=&\sum_{j\in N_{i}(n_{k})}w_{ij}(n_{k})g_{n_{k}}(\tilde{x}_{j,n_{k}})\\ =&\sum_{j\in N_{i}(n_{k})}w_{ij}(n_{k})\Big(g_{n_{k}}(\tilde{x}_{j,n_{k}})\,-\,g_{n_{k}}(\theta_{n_{k}})\,-\,(\tilde{x}_{j,n_{k}}\,-\,\theta_{n_{k}})\Big)\\ &+\sum_{j\in N_{i}(n_{k})}w_{ij}(n_{k})\Big(\theta_{n_{k}+1}-\xi_{n_{k}+1}+(\tilde{x}_{j,n_{k}}-\theta_{n_{k}})\Big). \end{array} $$

Since A4 indicates that W(nk) is doubly stochastic, by A6, (51) and direct calculation we have the following inequalities

$$\begin{array}{*{20}l} \|\hat{x}_{i,n_{k}+1}-\theta_{n_{k}+1}\|\le&(\gamma_{n_{k}}+1)(M^{\prime}_{0}+C+1)+\|\xi_{n_{k}+1}\|\\ \le&2M^{\prime}_{0}+2C+3, \end{array} $$

and hence by Lemma 1 we know \(||\hat {x}_{i,n_{k}+1}-h_{n_{k}}(x^{*})||\le \eta +2M^{\prime }_{0}+2C+3\).

b) \(\tilde {x}_{i,n_{k}}\) and \(\tilde {\epsilon }_{i,n_{k}+1}\) take value as (14): From (19) we have

$$\begin{array}{*{20}l} {}\hat{x}_{i,n_{k}+1}\!=&\sum_{j\in N_{i}(n_{k})}w_{ij}(n_{k})\Big(g_{n_{k}}(\tilde{x}_{j,n_{k}})\,-\,g_{n_{k}}(\theta_{n_{k}})\,-\,(\tilde{x}_{j,n_{k}}\,-\,\theta_{n_{k}})\Big)\\ &+\sum_{j\in N_{i}(n_{k})}w_{ij}(n_{k})\Big(\theta_{n_{k}+1}-\xi_{n_{k}+1}+(\tilde{x}_{j,n_{k}}-\theta_{n_{k}})\Big)\\ &+a_{n_{k}}f_{i,n_{k}+1}\left(\theta_{n_{k}+1}+g_{n_{k}}(\tilde{x}_{i,n_{k}})-g_{n_{k}}(\theta_{n_{k}})\right.\\ &\quad-\left.(\tilde{x}_{i,n_{k}}-\theta_{n_{k}})+(\tilde{x}_{i,n_{k}}-\theta_{n_{k}})-\xi_{n_{k}+1}\right)\\ &+a_{n_{k}}\epsilon_{i,n_{k}+1}. \end{array} $$

By A5 a) we know that \(a_{n_{k}}\epsilon _{i,n_{k}+1}<1\) for sufficiently large k. Then, by A3, A4, A6 and (51), we have the following inequalities

$$\begin{array}{*{20}l} {}\|\hat{x}_{i,n_{k}+1}-\theta_{n_{k}+1}\|\le&2M^{\prime}_{0}+2C+3+a_{n_{k}}\alpha(2M^{\prime}_{0}+2C+3)\\&+a_{n_{k}}\epsilon_{i,n_{k}+1}\\ \le&2M^{\prime}_{0}+2C+4+\alpha(2M^{\prime}_{0}+2C+3)\\&\triangleq M_{1}, \end{array} $$

and hence by Lemma 1 we know \(||\hat {x}_{i,n_{k}+1}-h_{n_{k}}(x^{*})||\le \eta +M_{1}\).

So we show that when \(||\widetilde {\Lambda }_{n_{k}}||\le M^{\prime }_{0}+C+1\), we have \(||\hat {x}_{i,n_{k}+1}-h_{n_{k}}(x^{*})||\le \eta +M_{1}\). Since {Mk} is a sequence of positive number increasingly diverging to infinity, there exits a positive integer k1>k0 such that \(M_{\sigma _{n_{k}}}>\eta +M_{1}\) for all kk1. Thus, we prove that truncation does not happen at time nk+1.

Notice (58) holds for j:nkjsk. So, similar to the proof above, we can prove that truncation does not happen for time \(n_{k}+1,\dots,s_{k}+1\). Then we conclude \(\phantom {\dot {i}\!}s_{k}<\tau _{\sigma _{n_{k}}+1}\).

ii) \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \): For this case there exists a positive integer k1>k0 such that \(\phantom {\dot {i}\!}\sigma _{n_{k}}=\sigma \) for all kk1, and hence \(\phantom {\dot {i}\!}\tau _{\sigma _{n_{k}}+1}=\infty \) by definition. Then \(\phantom {\dot {i}\!}m(n_{k},T_{k})<\tau _{\sigma _{n_{k}}+1}\), hence by (59) we know \(\phantom {\dot {i}\!}s_{k}<\tau _{\sigma _{n_{k}}+1}\). So (60) is proven.

By (56) we see Tk∈[0,T]⊂[0,T1], then from (52) we know that for sufficiently large k>k1 and any Tk∈[0,T]

$$\begin{array}{*{20}l} ||\sum_{s=n_{k}}^{m(n_{k},t_{k})\land(\tau_{\sigma_{n_{k}+1}-1})}a_{s}\tilde{\epsilon}_{s+1}||\le T_{k}\quad\forall t_{k}\in[0,T_{k}]. \end{array} $$
(61)

By setting \(\phantom {\dot {i}\!}t_{k}=\sum _{m=n_{k}}^{s}a_{m}\) for some s∈[nk,sk], from (59) we see \(\phantom {\dot {i}\!}\sum _{m=n_{k}}^{s}a_{m}\le \sum _{m=n_{k}}^{s_{k}}a_{m}\le T_{k}\). Noticing m(nk,tk)=s, from (60) we derive \(\phantom {\dot {i}\!}m(n_{k},t_{k})\land (\tau _{\sigma _{n_{k}}+1}-1)=s\). So by (61) we know that

$$\begin{array}{*{20}l} ||\sum_{m=n_{k}}^{s}a_{m}\tilde{\epsilon}_{m+1}||\le T_{k}\quad\forall s:n_{k}\le s\le s_{k} \end{array} $$
(62)

for sufficiently large kk1 and any Tk∈[0,T].

Now we consider the following recursive algorithm starting from nk:

$$ \begin{aligned} Z_{m+1}=(W(m)\otimes\mathbf{I}_{l})G_{m}(Z_{m})+a_{m}\Big(F_{m+1}\big(G_{m}(Z_{m})\big)+\tilde{\epsilon}_{m+1}\Big),\\ Z_{n_{k}}=\tilde{X}_{n_{k}}. \end{aligned} $$
(63)

where \(Z_{k}\triangleq \text {col}\{z_{i,k},\dots,z_{N,k}\}\). By (60) we know that (48) holds for \(m=n_{k},\dots,s_{k}-1\) for ∀kk1Tk∈[0,T]. Then we derive

$$\begin{array}{*{20}l} Z_{m}=\tilde{X}_{m}\quad\forall m:n_{k}\le m\le s_{k} \end{array} $$
(64)

Set \(z_{k}=\frac {\mathbf {1}^{T}\otimes \mathbf {I}_{l}}{N}Z_{k}, \widehat {\Delta }_{i,k}\triangleq z_{i,k}-\theta _{k}, \widehat {\Delta }_{k}\triangleq z_{k}-\theta _{k}\), and \(\widehat {\Lambda }_{k}\triangleq Z_{k}-\Theta _{k}\). By multiplying both sides of (63) with \(\frac {1}{N}(\mathbf {1}^{T}\otimes \mathbf {I}_{l})\), from 1TW(m)=1T and (AB)(CD)=ABCD we derive

$$\begin{array}{*{20}l} {}z_{s+1}=\frac{1}{N}\sum_{i=1}^{N}g_{s}(z_{i,s})+\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}a_{s}\Big(F_{s+1}\big(G_{s}(Z_{s})\big)+\tilde{\epsilon}_{s+1}\Big) \end{array} $$

and hence

$$\begin{array}{*{20}l} &z_{s+1}=\frac{1}{N}\sum_{i=1}^{N}g_{s}(z_{i,s})-\frac{1}{N}\sum_{i=1}^{N}g_{s}(\theta_{s})-\xi_{s+1}+\theta_{s+1}\\ &+\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}a_{s}F_{s+1}\big(G_{s}(Z_{s})-G_{s}(\Theta_{s})-\Xi_{s+1}+\Theta_{s+1}\big)\\ &+\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}a_{s}\tilde{\epsilon}_{s+1}\\ &=\frac{1}{N}\sum_{i=1}^{N}d_{s}(z_{i,s})+\widehat{\Delta}_{s}-\xi_{s+1}+\theta_{s+1}\\ &+\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}a_{s}F_{s+1}\big(\Theta_{s+1}+D_{s}(Z_{s})+\widehat{\Lambda}_{s}-\Xi_{s+1}\big)\\ &+\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}a_{s}\tilde{\epsilon}_{s+1}. \end{array} $$

So

$$\begin{array}{*{20}l} &||\widehat{\Delta}_{s+1}-\widehat{\Delta}_{n_{k}}||\le||\sum_{j=n_{k}}^{s}\frac{1}{N}\sum_{i=1}^{N}d_{j}(z_{i,j})||+||\sum_{j=n_{k}}^{s}\xi_{j+1}||\\ &+\big|\big|\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}\sum_{j=n_{k}}^{s}a_{j}F_{j+1}\big(\Theta_{j+1}+D_{j}(Z_{j})+\widehat{\Lambda}_{j}-\Xi_{j+1}\big)\big|\big|\\ &+||\frac{\mathbf{1}^{T}\otimes\mathbf{I}_{l}}{N}\sum_{j=n_{k}}^{s}a_{j}\tilde{\epsilon}_{j+1}||\\ &\le\frac{1}{N}\sum_{j=n_{k}}^{s}\sum_{i=1}^{N}\gamma_{j}||z_{i,j}-\theta_{j}||+||\sum_{j=n_{k}}^{s}\xi_{j+1}||\\ &+\frac{1}{N}\sum_{j=n_{k}}^{s}a_{j}\sum_{i=1}^{N}\big|\big|f_{i,j+1}\big(\theta_{j+1}+d_{j}(z_{i,j})+\widehat{\Delta}_{i,j}-\xi_{j+1}\big)\big|\big|\\ &+\frac{1}{\sqrt{N}}||\sum_{j=n_{k}}^{s}a_{j}\tilde{\epsilon}_{j+1}||\\ &\le\sum_{j=n_{k}}^{s}\gamma_{j}\cdot(M_{0}^{\prime}+C+1)+||\sum_{j=n_{k}}^{s}\xi_{j+1}||\\ &+\sum_{j=n_{k}}^{s}a_{j}\alpha\Big((1+\gamma_{j})(M^{\prime}_{0}+C+1)+||\xi_{j+1}||\Big)\\ &+\frac{1}{\sqrt{N}}T_{k}\\ &\le\Big(M^{\prime}_{0}+C+1+1+\alpha(2M^{\prime}_{0}+2C+3)+\frac{1}{\sqrt{N}}\Big)T_{k}\\ &= c_{2}T_{k},\qquad \forall s:n_{k}\le s\le s_{k}, \end{array} $$
(65)

where the second inequality comes from A6, the third inequality comes from (58) A3 and A5, the fourth inequality comes from (51) (59), and the last inequality comes from (59).

Denote by Z⊥,s=DZs the disagreement vector of Zs. By multiplying both sides of (63) with D we have

$$\begin{array}{*{20}l} &Z_{\perp,m+1}=D_{\perp}(W(m)\otimes\mathbf{I}_{l})G_{m}(Z_{m})+\\ &a_{m}D_{\perp}\Big(F_{m+1}\big(G_{m}(Z_{m})\big)+\tilde{\epsilon}_{m+1}\Big) \end{array} $$

Notice that D(W(m)⊗Il)=D(W(m)⊗Il)D, so we have

$$\begin{array}{*{20}l} {}Z_{\perp,m+1}=&D_{\perp}(W(m)\otimes\mathbf{I}_{l})D_{\perp}Z_{m}\\&+D_{\perp}(W(m)\otimes\mathbf{I}_{l})\big(G_{m}(Z_{m})-Z_{m}\big)\\ &+a_{m}D_{\perp}\Big(F_{m+1}\big(G_{m}\big(Z_{m})\big)+\tilde{\epsilon}_{m+1}\Big) \end{array} $$

By definition we know that DGm(Θm)=DΘm=0, hence we have

$$\begin{array}{*{20}l} &Z_{\perp,m+1}=D_{\perp}(W(m)\otimes\mathbf{I}_{l})Z_{\perp,m}\\ &+D_{\perp}(W(m)\otimes\mathbf{I}_{l})\big(G_{m}(Z_{m})-G_{m}(\Theta_{m})-Z_{m}+\Theta_{m}\big)\\ &+a_{m}D_{\perp}\Big(F_{m+1}\big(G_{m}(Z_{m})\big)+\tilde{\epsilon}_{m+1}\Big)\\ &=D_{\perp}(W(m)\otimes\mathbf{I}_{l})Z_{\perp,m}+D_{\perp}(W(m)\otimes\mathbf{I}_{l})D_{m}(Z_{m})\\ &+a_{m}D_{\perp}\Big(F_{m+1}\big(G_{m}(Z_{m})\big)+\tilde{\epsilon}_{m+1}\Big). \end{array} $$

So inductively

$$\begin{array}{*{20}l} &Z_{\perp,s+1}=\Psi(s,n_{k})Z_{n_{k}}+\sum_{m=n_{k}}^{s}\Psi(s,n_{k})D_{m}(Z_{m})\\ &+\sum_{m=n_{k}}^{s}\Psi(s,m+1)D_{\perp}a_{m}F_{m+1}\big(G_{m}(Z_{m})\big)\\ &+\sum_{m=n_{k}}^{s}\Psi(s,m+1)D_{\perp}a_{m}\tilde{\epsilon}_{m+1}, \end{array} $$

by (46)(47) we have

$$\begin{array}{*{20}l} {}&Z_{\perp,s+1}=\Big[(\Phi(s,n_{k})-\frac{1}{N}\mathbf{1}\mathbf{1}^{T})\otimes\mathbf{I}_{l}\Big](Z_{n_{k}}-\Theta_{n_{k}})\\ {}&+\sum_{m=n_{k}}^{s}\Big[(\Phi(s,m)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T})\otimes\mathbf{I}_{l}\Big]D_{m}(Z_{m})\\ {}&+\sum_{m=n_{k}}^{s}a_{m}\Big[(\Phi(s,m+1)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T})\otimes\mathbf{I}_{l}\Big]F_{m+1}\big(G_{m}(Z_{m})\big)\\ {}&+\sum_{m=n_{k}}^{s}a_{m}\Big[(\Phi(s,m+1)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T})\otimes\mathbf{I}_{l}\Big]\tilde{\epsilon}_{m+1}. \end{array} $$
(66)

From (9), (51), (58), (64), A3, and A6 we can derive

$$\begin{array}{*{20}l} {}&||Z_{\perp,s+1}||\le Cc\rho^{s+1-n_{k}}+\sum_{m=n_{k}}^{s}a_{m}||Z_{m}-\Theta_{m}||c\rho^{s+1-m}\\ {}&+\sum_{m=n_{k}}^{s}a_{m}\alpha(2M^{\prime}_{0}+2C+3)c\rho^{s-m+2}\\ {}&+\Big|\Big|\sum_{m=n_{k}}^{s}a_{m}\Big[\big(\Phi(s,m+1)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T}\big)\otimes\mathbf{I}_{l}\Big]\tilde{\epsilon}_{m+1}\Big|\Big|. \end{array} $$
(67)

Set \(\Gamma _{n}\triangleq \sum _{m=1}^{n}a_{m}\tilde {\epsilon }_{m+1}\), by (62) we know that \(\phantom {\dot {i}\!}||\Gamma _{s}-\Gamma _{n_{k}-1}||\le T_{k}\quad \forall s:n_{k}\le s\le s_{k}\). Notice

$$\begin{array}{*{20}l} &\sum_{m=n_{k}}^{s}a_{m}\big(\Phi(s,m+1)\otimes\mathbf{I}_{l}\big)\tilde{\epsilon}_{m+1}\\ &=\sum_{m=n_{k}}^{s}\big(\Phi(s,m+1)\otimes\mathbf{I}_{l}\big)(\Gamma_{m}-\Gamma_{m-1})\\ &=\sum_{m=n_{k}}^{s}\big(\Phi(s,m+1)\otimes\mathbf{I}_{l}\big)(\Gamma_{m}-\Gamma_{n_{k}-1})\\ &+\sum_{m=n_{k}}^{s}\big(\Phi(s,m+1)\otimes\mathbf{I}_{l}\big)(\Gamma_{m-1}-\Gamma_{n_{k}-1}). \end{array} $$

So, summing by parts with (9) we have

$$\begin{array}{*{20}l} &||\sum_{m=n_{k}}^{s}a_{m}(\Phi(s,m+1)\otimes\mathbf{I}_{l})\tilde{\epsilon}_{m+1}||\\ &\le||\Gamma_{s}-\Gamma_{n_{k}}||\\ &+\sum_{m=n_{k}}^{s}||\Phi(s,m+1)-\Phi(s,m+2)||\cdot||\Gamma_{m}-\Gamma_{n_{k}-1}||\\ &\le T_{k}+\sum_{m=n_{k}}^{s}(c\rho^{s-m}+c\rho^{s-m-1})\cdot T_{k}\\ &\le T_{k}+\frac{c(\rho+1)}{1-\rho}T_{k}, \end{array} $$

which incorporating with (62) produces

$$\begin{array}{*{20}l} &\Big|\Big|\sum_{m=n_{k}}^{s}a_{m}\Big[\big(\Phi(s,m+1)-\frac{1}{N}\mathbf{1}\mathbf{1}^{T}\big)\otimes\mathbf{I}_{l}\Big]\tilde{\epsilon}_{m+1}\Big|\Big|\\ &\le \Big(2+\frac{c(\rho+1)}{1-\rho}\Big)T_{k}\quad\forall s:n_{k}\le s\le s_{k} \end{array} $$
(68)

for sufficiently large kk1 and any Tk∈[0,T].

Notice \(\sum _{m=n_{k}}^{s}a_{m}\rho ^{s-m}\le \frac {1}{1-\rho }\sup _{m\ge n_{k}}a_{m}\) from \(a_{m}\xrightarrow [m\to \infty ]{}0\). Combine this with (67)(68) we have

$$\begin{array}{*{20}l} {}&||Z_{\perp,s+1}||\le Cc\rho+(M_{0}^{\prime}+C+1)C\frac{1}{1-\rho}\sup_{m\ge n_{k}}a_{m}\\ {}&+\alpha(2M_{0}^{\prime}+2C+3)C\frac{1}{1-\rho}\sup_{m\ge n_{k}}a_{m}\,+\,\Big(2+\frac{c(\rho+1)}{1-\rho}\Big)T_{k}\\ {}&\le Cc\rho+1+\Big(2+\frac{c(\rho+1)}{1-\rho}\Big)T_{k}\quad\forall s:n_{k}\le s\le s_{k} \end{array} $$
(69)

for sufficiently large kk1 and any Tk∈[0,T]. Notice that \(\widehat {\Lambda }_{s}=Z_{\perp,s}+(\mathbf {1}\otimes \mathbf {I}_{l})\widehat {\Delta }_{s}\). We derive

$$\begin{array}{*{20}l} &||\widehat{\Lambda}_{s+1}-\widehat{\Lambda}_{n_{k}}||=||(\mathbf{1}\otimes\mathbf{I}_{l})\widehat{\Delta}_{s+1}+Z_{\perp,s+1}-\\ &Z_{\perp,n_{k}}-(\mathbf{1}\otimes\mathbf{I}_{l})\widehat{\Delta}_{n_{k}}||\\ &\le||Z_{\perp,s+1}||+||Z_{\perp,n_{k}}||+\sqrt{N}||\widehat{\Lambda}_{s+1}-\widehat{\Lambda}_{n_{k}}||. \end{array} $$

Since \(||Z_{\perp,n_{k}}||\le 2||\widehat {\Lambda }_{n_{k}}||\le 2C\), from (65)(67) it follows that for sufficiently large kk1 and any Tk∈[0,T]

$$\begin{array}{*{20}l} &||\widehat{\Lambda}_{s+1}-\widehat{\Lambda}_{n_{k}}||\\ &\le Cc\rho+1+\Big(2+\frac{c(\rho+1)}{1-\rho}\Big)T_{k}+2C+\sqrt{N}c_{2}T_{k}\\ &\le C(c\rho+2)+1+(2+\frac{c(\rho+1)}{1-\rho}+c_{2}\sqrt{N})T_{k}\\ &=m^{\prime}_{0}+c_{1}T_{k}. \end{array} $$
(70)

Therefore, from (56)(51) we know that for sufficiently large kk1 and any Tk∈[0,T]

$$\begin{array}{*{20}l} ||\widehat{\Lambda}_{s_{k}+1}||\le||\widehat{\Lambda}_{n_{k}}||+m_{0}^{\prime}+c_{1}T_{k}\le M_{0}^{\prime}+1+C. \end{array} $$
(71)

Now we look back at the recursive algorithm (19). We rewrite it in the compact form as follows

$$\begin{array}{*{20}l} \widehat{X}_{s_{k}+1}&=[W(s_{k})\otimes\mathbf{I}_{l}]G_{s_{k}}(\widetilde{X}_{s_{k}})\\&\quad+a_{s_{k}}\Big(F_{s_{k}+1}\big(G_{s_{k}}(\widetilde{X}_{s_{k}})\big)+\tilde{\epsilon}_{s_{k}+1}\Big), \end{array} $$

where \(\widehat {X}_{k}\triangleq \text {col}\{\hat {x}_{1,k},\dots,\hat {x}_{N,k}\}\). Then by (63)(64), \(\widehat {X}_{s_{k}+1}=Z_{s_{k}+1}\). So by (71) it follows that

$$\begin{array}{*{20}l} ||\widehat{X}_{s_{k}+1}-\Theta_{s_{k}+1}||\le M^{\prime}_{0}+1+C. \end{array} $$
(72)

We now show

$$\begin{array}{*{20}l} \tilde{X}_{s_{k}+1}=\widehat{X}_{s_{k}+1},~s_{k}+1<\tau_{\sigma_{k}+1} \end{array} $$
(73)

for sufficiently large kk1 and any Tk∈[0,T]. We consider the following two cases: \({\lim }_{k\to \infty }\sigma _{k}=\infty \) and \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \).

i) \({\lim }_{k\to \infty }\sigma _{k}=\infty \): Notice \(M_{\sigma _{k}}>\eta +M_{0}^{\prime }+1+C\) when kk1. By (20)(21) we know that \(\phantom {\dot {i}\!}\tilde {X}_{s_{k}+1}=\widehat {X}_{s_{k}+1}, \sigma _{s_{k}+1}=\sigma _{s_{k}}\). So \(\phantom {\dot {i}\!}s_{k}+1<\tau _{\sigma _{n_{k}}+1}\) by (60).

ii) \({\lim }_{k\to \infty }\sigma _{k}=\sigma <\infty \): For this case \(\phantom {\dot {i}\!}\tau _{\sigma _{n_{k}+1}}=\infty \) for all kk1. By (60) we see \(\phantom {\dot {i}\!}s_{k}+1<\tau _{\sigma _{n_{k}+1}}\). Then by \(\phantom {\dot {i}\!}\sigma _{n_{k}}=\sigma \) we conclude \(\phantom {\dot {i}\!}\sigma _{s_{k}+1}=\sigma _{x_{k}}=\sigma \), and hence by (20) we derive \(\phantom {\dot {i}\!}\tilde {X}_{s_{k}+1}=\widehat {X}_{s_{k}+1}\). Thus (73) holds.

From (73) we know that (48) holds for m=sk for sufficiently large kk1 and any Tk∈[0,T]. From \(\phantom {\dot {i}\!}\widehat {X}_{s_{k}+1}=Z_{s_{k}+1}\) by (73) we see \(\phantom {\dot {i}\!}\tilde {X}_{s_{k}+1}=Z_{s_{k}+1}\). It follows that for sufficiently large kk1 and any Tk∈[0,T]

$$\begin{array}{*{20}l} ||\widetilde{\Lambda}_{s_{k}+1}-\widetilde{\Lambda}_{n_{k}}||\le M_{0}^{\prime}+c_{1}T_{k}, \end{array} $$

which contradicts with the definition of sk. Thus (59) does not hold. So sk>m(nk,Tk) and hence (49) holds.

Since sk>m(nk,Tk), we know \(\{\widetilde {\Lambda }_{s}:n_{k}\le s\le m(n_{k},T_{k})+1\}\) is bounded. Similar to proving (60) we can be shown that \(\phantom {\dot {i}\!}m(n_{k},T_{k})<\tau _{\sigma _{k}+1}\). So (48) holds for \(\phantom {\dot {i}\!}m=n_{k},\dots,m(n_{k},T_{k})\). Similar to (65) we can prove

$$\begin{array}{*{20}l} ||\widetilde{\Delta}_{s+1}-\widetilde{\Delta}_{n_{k}}||\le c_{2}T \end{array} $$

for sufficiently large k and any Tk∈[0,T]. Hence, (50) holds.

In conclusion, the proof of Lemma 6 is complete. □

By multiplying both sides of (48) with \(\frac {1}{N}(\mathbf {1}^{T}\otimes \mathbf {I}_{l})\), we have

$$\begin{array}{*{20}l} &\tilde{x}_{m+1}=\frac{1}{N}\sum_{i=1}^{N}g_{m}(\tilde{x}_{i,m})+a_{m}\frac{1}{N}\sum_{i=1}^{N}f_{i,m+1}\big(g_{m}(\tilde{x}_{i,m})\big)\\ &+a_{m}\frac{1}{N}\sum_{i=1}^{N}\tilde{\epsilon}_{i,m+1}\\ &=g_{m}(\tilde{x}_{m})+a_{m}f_{m+1}\big(g_{m}(\tilde{x}_{m})\big)\\ &+a_{m}\frac{1}{N}\frac{1}{a_{m}}\sum_{i=1}^{N}\Big(g_{m}(\tilde{x}_{i,m})-g_{m}(\tilde{x}_{m})\Big)\\ &+a_{m}\frac{1}{N}\sum_{i=1}^{N}\Big(f_{i,m+1}\big(g_{m}(\tilde{x}_{i,m})\big)-f_{i,m+1}\big(g_{m}(\tilde{x}_{m})\big)\Big)\\ &+a_{m}\frac{1}{N}\sum_{i=1}^{N}\tilde{\epsilon}_{i,m+1}. \end{array} $$
(74)

Setting

$$\begin{array}{*{20}l} \zeta^{(1)}_{k+1}=\frac{1}{N}\frac{1}{a_{k}}\sum_{i=1}^{N}\Big(g_{k}(\tilde{x}_{i,k})-g_{k}(\tilde{x}_{k})\Big),\\ \zeta^{(2)}_{k+1}=\frac{1}{N}\sum_{i=1}^{N}\Big(f_{i,k+1}\big(g_{k}(\tilde{x}_{i,k})\big)-f_{i,k+1}\big(g_{k}(\tilde{x}_{k})\big)\Big),\\ \zeta^{(3)}_{k+1}=\frac{1}{N}\sum_{i=1}^{N}\tilde{\epsilon}_{i,k+1},\\ \zeta_{k+1}=\zeta^{(1)}_{k+1}+\zeta^{(2)}_{k+1}+\zeta^{(3)}_{k+1} \end{array} $$

We can rewrite (74) as

$$\begin{array}{*{20}l} \tilde{x}_{m+1}=g_{m}(\tilde{x}_{m})+a_{m}f_{m+1}\big(g_{m}(\tilde{x}_{m})\big)+a_{m}\zeta_{m+1}. \end{array} $$
(75)

The following lemma gives the noise property of the sequence {ζk+1}.

Lemma 7

Assume all the conditions in Lemma 6 hold. \(\{\widetilde {\Lambda }_{n_{k}}\}\) is a convergent subsequence with limit \(\widetilde {\Lambda }\) at the considered sample ω. Then for this ω

$$\begin{array}{*{20}l} {\lim}_{T\to0}\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta_{s+1}||=0\quad T_{k}\in[0,T]. \end{array} $$
(76)

Proof

In the proof of Lemma 6 it has been pointed out that there exists a T∈(0,1) such that \(\phantom {\dot {i}\!}m(n_{k},T)<\tau _{\sigma _{n_{k}}+1}\) for sufficiently large k. So \(\phantom {\dot {i}\!}||\sum _{s=n_{k}}^{m(n_{k},T_{k})\land (\tau _{\sigma _{n_{k}}+1}-1)}a_{s}\tilde {\epsilon }_{s+1}||=||\sum _{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\tilde {\epsilon }_{s+1}||\). Thus, by Lemma 5 we can immediately derive that

$$\begin{array}{*{20}l} {\lim}_{T\to0}\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(3)}_{s+1}||=0\quad T_{k}\in[0,T]. \end{array} $$

Now we need to show that \(\zeta ^{(i)}_{k+1}\) also satisfies the property above for i=1,2. First we consider \(\zeta ^{(1)}_{k+1}\). We see that

$$\begin{array}{*{20}l} &||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(1)}_{s+1}||\\ &=||\sum_{s=n_{k}}^{m(n_{k},T_{k})}\frac{1}{N}\sum_{i=1}^{N}\big(g_{s}(\tilde{x}_{i,s})-g_{s}(\tilde{x}_{s})\big)||\\ &=||\frac{1}{N}\sum_{s=n_{k}}^{m(n_{k},T_{k})}\sum_{i=1}^{N}\big(g_{s}(\tilde{x}_{i,s})-g_{s}(\theta_{s})-g_{s}(\tilde{x}_{s})+g_{s}(\theta_{s})\big)||\\ &=\frac{1}{N}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}\sum_{i=1}^{N}\big(d_{s}(\tilde{x}_{i,s})+\widetilde{\Delta}_{i,s}-d_{s}(\tilde{x}_{s})-\widetilde{\Delta}_{s}\big)||\\ &=\frac{1}{N}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}\sum_{i=1}^{N}\big(d_{s}(\tilde{x}_{i,s})-d_{s}(\tilde{x}_{s})\big)||\\ &\le\frac{1}{N}\sum_{s=n_{k}}^{m(n_{k},T_{k})}\sum_{i=1}^{N}||d_{s}(\tilde{x}_{i,s})||+\sum_{s=n_{k}}^{m(n_{k},T_{k})}||d_{s}(\tilde{x}_{s})||\\ &\le\frac{1}{N}\sum_{s=n_{k}}^{m(n_{k},T_{k})}\sum_{i=1}^{N}\gamma_{s}||\widetilde{\Delta}_{i,s}||+\sum_{s=n_{k}}^{m(n_{k},T_{k})}\gamma_{s}||\widetilde{\Delta}_{s}||, \end{array} $$
(77)

where the last inequality comes from A6.

Since \({\lim }_{k\to \infty }\widetilde {\Lambda }_{n_{k}}=\widetilde {\Lambda }\), by setting \(\widetilde {\Delta }\triangleq \frac {\mathbf {1}^{T}\otimes \mathbf {I}_{l}}{N}\widetilde {\Lambda }\) we see that \({\lim }_{k\to \infty }\widetilde {\Delta }_{n_{k}}=\widetilde {\Delta }\). So by (49)(50) we conclude that \(\{||\widetilde {\Delta }_{i,s}||,s:n_{k}\le s\le m(n_{k},T_{k})\}\) and \(\{||\widetilde {\Delta }_{s}||,s:n_{k}\le s\le m(n_{k},T_{k})\}\) are bounded. Without the loss of generality we denote the bound of these two sequence by A. Then from (77) with A6 it follows

$$\begin{array}{*{20}l} ||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(1)}_{s+1}||\le 2A\sum_{s=n_{k}}^{m(n_{k},T_{k})}\gamma_{s}\xrightarrow[k\to\infty]{}0. \end{array} $$

So we conclude

$$\begin{array}{*{20}l} {\lim}_{T\to0}\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(1)}_{s+1}||=0\quad T_{k}\in[0,T]. \end{array} $$

Finally, we consider the case i=2. Notice

$$\begin{array}{*{20}l} &||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(2)}_{s+1}||\\ &=\Big|\Big|\sum_{s=n_{k}}^{m(n_{k},T_{k})}\frac{1}{N}\sum_{i-1}^{N}a_{s}\Big(f_{i,s+1}\big(g_{s}(\tilde{x}_{i,s})\big)-f_{i,s+1}\big(g_{s}(\tilde{x}_{s})\big)\Big)\Big|\Big| \end{array} $$

Similar to the proof in Lemma 6 we know that there exist constants c3,c4,c5>0 such that for sufficiently large k

$$\begin{array}{*{20}l} ||\tilde{X}_{\perp,s+1}||\le c_{3}\rho^{s+1-n_{k}}+c_{4}\sup_{m\ge n_{k}}a_{m}+c_{5}T \end{array} $$
(78)

hold for ∀s:nksm(nk,T). From A1,A6, and A7 we can assume that for sufficiently large k

$$\begin{array}{*{20}l} a_{k}<1,~\gamma_{k}\le a_{k},~ ||\xi_{k+1}||\le a_{k}. \end{array} $$
(79)

Since 0<ρ<1, there exists a positive integer m such that \(\phantom {\dot {i}\!}\rho ^{m^{\prime }}<T\). Then \(\phantom {\dot {i}\!}\sum _{m=n_{k}}^{n_{k}+m^{\prime }}a_{m}\xrightarrow [k\to \infty ]{}0\). Thus, we have nk+m<m(nk,T) for sufficiently large k. So from (78) it follows

$$\begin{array}{*{20}l} &||\tilde{X}_{\perp,s+1}||\le \text{o}(1)+(c_{3}+c_{5})T\\ &\forall s:n_{k}+m^{\prime}\le s\le m(n_{k},T). \end{array} $$
(80)

where o(1)→0 as k. Hence for nk+m<m(nk,T) and sufficiently large k

$$\begin{array}{*{20}l} ||\tilde{x}_{i,s}-\tilde{x}_{s}||=\text{o}(1)+\delta(T) \end{array} $$

where δ(T)→0 as T→0.

So

$$\begin{array}{*{20}l} &||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(2)}_{s+1}||\\ &\le\Big|\Big|\sum_{s=n_{k}}^{n_{k}+m^{\prime}}\frac{1}{N}\sum_{i-1}^{N}a_{s}\Big(f_{i,s+1}\big(g_{s}(\tilde{x}_{i,s})\big)-f_{i,s+1}\big(g_{s}(\tilde{x}_{s})\big)\Big)\Big|\Big|\\ &+\Big|\Big|\sum_{s=n_{k}+m^{\prime}}^{m(n_{k},T)}\frac{1}{N}\sum_{i-1}^{N}a_{s}(f_{i,s+1}\big(g_{s}(\tilde{x}_{i,s})\big)-f_{i,s+1}\big(g_{s}(\tilde{x}_{s})\big)\Big|\Big|\\ &\le\sum_{s=n_{k}}^{n_{k}+m^{\prime}}\frac{1}{N}\sum_{i-1}^{N}a_{s}\Big|\Big|f_{i,s+1}\big(g_{s}(\tilde{x}_{i,s})\big)\Big|\Big|+\Big|\Big|f_{i,s+1}(g_{s}\big(\tilde{x}_{s})\big)\Big|\Big|\\ &+\sum_{s=n_{k}+m^{\prime}}^{m(n_{k},T)}a_{s}\big(\text{o}(1)+\delta(T)\big)\\ &\le\sum_{s=n_{k}}^{n_{k}+m^{\prime}}a_{s}\alpha(2A+1)+T\big(\text{o}(1)+\delta(T)\big)\\ &\le\alpha(2A+1)\cdot m^{\prime}\sup_{m\ge n_{k}}a_{m}+T\big(\text{o}(1)+\delta(T)\big). \end{array} $$

And hence \(\limsup _{k\to \infty }\frac {1}{T}||\sum _{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta ^{(2)}_{s+1}||=\delta (T)\), which implies that

$$\begin{array}{*{20}l} {\lim}_{T\to0}\limsup_{k\to\infty}\frac{1}{T}||\sum_{s=n_{k}}^{m(n_{k},T_{k})}a_{s}\zeta^{(2)}_{s+1}||=0\quad T_{k}\in[0,T]. \end{array} $$

So we complete the proof. □

Lemma 8

Assume A1-A4, A6 hold, and that A5, A7 hold at the sample path ω under consideration. Then any nonempty interval [δ1,δ2] with 0∉[δ1,δ2] cannot be crossed by \(\{v(\widetilde {\Delta }_{m_{k}}),\dots,v(\widetilde {\Delta }_{l_{k}})\}\) infinitely many times with \(\{\widetilde {\Lambda }_{m_{k}}\}\) bounded. By " [δ1,δ2] being crossed by \(\{v(\widetilde {\Delta }_{m_{k}}),\dots,v(\widetilde {\Delta }_{l_{k}})\}\)" it means that \(v(\widetilde {\Delta }_{m_{k}})\le \delta _{1}, v(\widetilde {\Delta }_{l_{k}})\ge \delta _{2}\) and \(\delta _{1}<v(\widetilde {\Delta }_{s})<\delta _{2}\quad \forall s:m_{k}<s<l_{k}\).

Proof

Assume the converse: for some interval [δ1,δ2] with 0∉[δ1,δ2], there are infinitely many crossing \(\{v(\widetilde {\Delta }_{m_{k}}),\dots,v(\widetilde {\Delta }_{l_{k}})\}\) with \(\{\widetilde {\Lambda }_{m_{k}}\}\) bounded.

By the boundedness of \(\{\widetilde {\Lambda }_{m_{k}}\}\) we can extract a convergent subsequence still denoted by \(\{\widetilde {\Lambda }_{m_{k}}\}\) with limit \({\lim }_{k\to \infty }\widetilde {\Lambda }_{m_{k}}=\widetilde {\Lambda }\). So, \({\lim }_{k\to \infty }\widetilde {\Delta }_{m_{k}}=\widetilde {\Delta }\) with \(\widetilde {\Delta }\triangleq \frac {\mathbf {1}^{T}\otimes \mathbf {I}_{l}}{N}\widetilde {\Lambda }\). By (50), letting T→0 we have \(||\widetilde {\Delta }_{m_{k}+1}-\widetilde {\Delta }_{m_{k}}||\le c_{2}T\to 0\). By the definition of crossing, \(v(\widetilde {\Delta }_{m_{k}})\le \delta _{1}<v(\widetilde {\Delta }_{m_{k}+1})\), we can obtain

$$\begin{array}{*{20}l} v(\widetilde{\Delta}_{m_{k}})\xrightarrow[k\to\infty,T\to0]{}\delta_{1}=v(\widetilde{\Delta})>0. \end{array} $$
(81)

So by the assumption v(x)=0⇔x=0 we know there exists a constant β such that \(||\widetilde {\Delta }||>\beta \). And hence by (50) we conclude

$$\begin{array}{*{20}l} ||\widetilde{\Delta}_{j}||>\frac{\beta}{2},\quad j:m_{k}\le j\le m(m_{k},T)+1 \end{array} $$
(82)

for sufficiently small T>0 and large k.

Setting \(\widetilde {\Delta }_{k}\) to be a vector in-between \(\widetilde {\Delta }_{m_{k}}\) and \(\widetilde {\Delta }_{m(m_{k},T)}\). From (50) it follows that \(||\widetilde {\Delta }_{k}||\le c_{2}T+||\widetilde {\Delta }||+1\) for sufficiently large k. We consider the following Taylor’s expansion

$$\begin{array}{*{20}l} {}&v(\widetilde{\Delta}_{m(m_{k},T)})-v(\widetilde{\Delta}_{m_{k}})\\ {}&=v_{x}^{T}(\widetilde{\Delta}_{k})\sum_{j=m_{k}}^{m(m_{k},T)-1}\Big(d_{j}(\tilde{x}_{j})\,-\,\xi_{j+1}\,+\,a_{j}\big(f_{j+1}(g_{j}(\tilde{x}_{j}))\,+\,\zeta_{j+1}\big)\Big)\\ {}&=v_{x}^{T}(\widetilde{\Delta}_{k})\Big\{\sum_{j=m_{k}}^{m(m_{k},T)-1}\big(d_{j}(\tilde{x}_{j})+\xi_{j+1}\big)+\sum_{j=m_{k}}^{m(m_{k},T)-1}a_{j}\zeta_{j+1}\Big\}\\ {}&+\sum_{j=m_{k}}^{m(m_{k},T)-1}a_{j}\big(v_{x}^{T}(\widetilde{\Delta}_{k})-v_{x}^{T}(g_{j}(\tilde{x}_{j})-\theta_{j+1})\big)f_{j+1}\big(g_{j}(\tilde{x}_{j})\big)\\ {}&+\sum_{j=m_{k}}^{m(m_{k},T)-1}a_{j}v_{x}^{T}(g_{j}(\tilde{x}_{j})-\theta_{j+1})f_{j+1}\big(g_{j}(\tilde{x}_{j})\big) \end{array} $$
(83)

Similar to (51), we take sufficiently large k, then by A3, A6 and Lemma 7, there exists a constant c such that

$$\begin{array}{*{20}l} &\Big|\Big|a_{j}(f_{j+1}\big(g_{j}(\tilde{x}_{j}))+\zeta_{j+1}\big)\Big|\Big|\\ &\le\Big|\Big|a_{j}f_{j+1}\big(\theta_{j+1}+d_{j}(\tilde{x}_{j})-\xi_{j+1}+\widetilde{\Delta}_{j}\big)\Big|\Big|\\ &+\Big|\Big|\sum_{l=m_{k}}^{j}a_{l}\zeta_{l+1}-\sum_{l=m_{k}}^{j-1}a_{l}\zeta_{l+1}\Big|\Big|\\ &\le a_{j}\alpha\Big(2||\widetilde{\Delta}||+1\Big)+2c^{\prime}T<\frac{\beta}{4} \end{array} $$
(84)

for sufficiently large k and sufficiently small T, where j:mkjm(mk,T)−1. It follows that

$$\begin{array}{*{20}l} {}||g_{j}(\tilde{x}_{j})-\theta_{j+1}||&=||\widetilde{\Delta}_{j+1}+g_{j}(\tilde{x}_{j})-\tilde{x}_{j+1}||\\ &\ge\frac{\beta}{2}-\frac{\beta}{4}=\frac{\beta}{4}, \end{array} $$
(85)
$$\begin{array}{*{20}l} {}||g_{j}(\tilde{x}_{j})-\theta_{j+1}||&=||\tilde{x}_{j+1}-\theta_{j+1}-a_{j}\{f_{j+1}(g_{j}(\tilde{x}_{j}))+\zeta_{j+1}\}||\\ &<\frac{\beta}{4}+||\widetilde{\Delta}_{j+1}||\le\frac{\beta}{4}+c_{2}T+||\widetilde{\Delta}||+1. \end{array} $$
(86)

Identifying r1 and r2 in A2 to \(\frac {\beta }{4}\) and \(\frac {\beta }{4}+||\widetilde {\Delta }_{j+1}||\le \frac {\beta }{4}+c_{2}T+||\widetilde {\Delta }||\), respectively, we can find a>0 such that for ∀j:mkjm(mk,T)

$$\begin{array}{*{20}l} v_{x}^{T}(g_{j}(\tilde{x}_{j})-\theta_{j+1})f_{j+1}(g_{j}(\tilde{x}_{j}))<-a. \end{array} $$
(87)

Noticing that for \(\forall j:m_{k}\le j\le m(m_{k},T), ||d_{j}(\tilde {x}_{j})||\le \gamma _{j}||\widetilde {\Delta }_{j}||\le \gamma _{j}(c_{2}T+||\widetilde {\Delta }||+1)\), by A6 and A7 we have

$$\begin{array}{*{20}l} {\lim}_{k\to\infty}\sum_{j=m_{k}}^{m(m_{k},T)-1}(d_{j}(\tilde{x}_{j})-\xi_{j+1})=0. \end{array} $$
(88)

By Lemma 7 it follows

$$\begin{array}{*{20}l} \limsup_{k\to\infty}||\sum_{j=m_{k}}^{m(m_{k},T)-1}a_{j}\zeta_{j+1}||=\delta(T). \end{array} $$
(89)

Notice that for ∀j:mkjm(mk,T)

$$\begin{array}{*{20}l} {}&||\widetilde{\Delta}_{k}-(g_{j}(\tilde{x}_{j})-\theta_{j+1})||\\ {}&\le||\widetilde{\Delta}_{k}-\widetilde{\Delta}_{m_{k}}||+||\widetilde{\Delta}_{j}-\widetilde{\Delta}_{m_{k}}||+||g_{j}(\tilde{x}_{j})-\theta_{j+1}-\widetilde{\Delta}_{j}||\\ {}&\le 2c_{1}T+||d_{j}(\tilde{x}_{j})-\xi_{j+1}||\\ {}&\le 2c_{1}T+\gamma_{j}(c_{2}T+||\widetilde{\Delta}||+1)+||\xi_{j+1}||\xrightarrow[k\to\infty,T\to 0]{}0. \end{array} $$
(90)

So by the continuity of v(·) we know

$$\begin{array}{*{20}l} v_{x}^{T}\Big(\widetilde{\Delta}_{k}\Big)-v_{x}^{T}\Big(g_{j}(\tilde{x}_{j})-\theta_{j+1}\Big)\xrightarrow[k\to\infty,T\to 0]{}0. \end{array} $$
(91)

From A3, A6 and (51), we’ve already utilized this inequality before

$$\begin{array}{*{20}l} \Big|\Big|f_{j+1}\Big(g_{j}(\tilde{x}_{j})\Big)\Big|\Big|\le\alpha\Big(2||\widetilde{\Delta}||+1\Big). \end{array} $$
(92)

So, from (88)(89)(92) we can conclude that the first and second term of (83) is o(T) as k, T→0. Combining this with (87) it follows that for sufficiently large k and sufficiently small T by (83) we have

$$\begin{array}{*{20}l} v(\widetilde{\Delta}_{m(m_{k},T)})-v(\widetilde{\Delta}_{m_{k}})\le-\frac{a}{2}T. \end{array} $$
(93)

Let k we have

$$\begin{array}{*{20}l} \limsup_{k\to\infty}v(\widetilde{\Delta}_{m(m_{k},T)})\le\delta_{1}-\frac{a}{2}T. \end{array} $$
(94)

Notice that by Lemma 6 we have

$$\begin{array}{*{20}l} {\lim}_{T\to0}\max_{m_{k}\le m\le m(m_{k},T)}\Big|\Big|v(\widetilde{\Delta}_{m})-v(\widetilde{\Delta}_{m_{k}})\Big|\Big|=0 \end{array} $$

which implies that m(mk,T)<lk for sufficiently small T. Therefore, \(v(\widetilde {\Delta }_{m(m_{k},T)})\in [\delta _{1},\delta _{2}]\) which contradicts with (94). So, the converse assumption is not true. The proof is completed. □

Lemma 9

Assume all the assumptions required by Lemma 8 hold. Then there exists a positive integer σ such that

$$\begin{array}{*{20}l} {\lim}_{k\to\infty}\sigma_{k}=\sigma<\infty. \end{array} $$
(95)

Proof

Assume the converse:

$$\begin{array}{*{20}l} {\lim}_{k\to\infty}\sigma_{k}=\infty. \end{array} $$
(96)

Then there exists a sequence of integer {nk}k≥0 such that \(\sigma _{n_{k}}=k\) and \(\sigma _{n_{k}-1}=k-1\). By the algorithm (3)–(7) we know that \(\tilde {x}_{i,n_{k}}=h_{n_{k}-1}(x^{*})~\forall i\in \mathcal {V}\). Therefore, from Lemma 1 we know that \(\{\widetilde {\Lambda }_{n_{k}}\}\) is a bounded sequence, and hence, it contains a convergent subsequence. For the sake of convenience, We denote the convergent subsequence still by \(\{\widetilde {\Lambda }_{n_{k}}\}\) with limit \(\widetilde {\Lambda }\).

Since {Mk}k≥0 is a sequence of positive numbers increasingly diverging to infinity, there exists a positive integer k0 such that

$$\begin{array}{*{20}l} M_{k}\ge 2\sqrt{N}r+2+M_{1}^{\prime}~~\forall k\ge k_{0} \end{array} $$
(97)

where r is given in A2 and

$$\begin{array}{*{20}l} M_{1}^{\prime}=2+(2\sqrt{N}r+2)(c\rho+2). \end{array} $$
(98)

Now we show that under the converse assumption, \(\{\tilde {\Delta }_{n_{k}}\}\) starting from nk will exit the ball r infinitely many times. Define

$$\begin{array}{*{20}l} &m_{k}\triangleq\inf\{s>n_{k}:\|\widetilde{\Lambda}_{s}\|\ge 2\sqrt{N}r+2+m_{1}^{\prime}\}, \end{array} $$
(99)
$$\begin{array}{*{20}l} &l_{k}\triangleq\sup\{s<m_{k}:\|\widetilde{\Lambda}_{s}\|\le 2\sqrt{N}r+2\}. \end{array} $$
(100)

Notice that \(\|\widetilde {\Lambda }_{n_{k}}\|=\sqrt {N}\eta \) by Lemma 1 and r>η from A2, we derive \(\|\widetilde {\Lambda }_{n_{k}}\|<\sqrt {N}r\). Hence from (99) (100) we have nk<lk<mk. By the definition of lk we know that \(\{\widetilde {\Lambda }_{l_{k}}\}\) is bounded, then there exists a convergent subsequence denoted still by \(\{\widetilde {\Lambda }_{l_{k}}\}\).

By Lemma 6 there exist constants \(M_{0}^{\prime }>0\) defined by (53) with \(C=2\sqrt {N}r+2, c_{1}>0\) defined by (54), c2>0 defined by (55), and 0<T<1 with c1T<1 such that

$$\begin{array}{*{20}l} \|\widetilde{\Lambda}_{m+1}-\widetilde{\Lambda}_{l_{k}}\|\le c_{1}T+M_{0}^{\prime}~~\forall m:l_{k}\le m\le m(l_{k},T) \end{array} $$

for sufficiently large kk0. Then for sufficiently large kk0 we have

$$\begin{array}{*{20}l} \|\widetilde{\Lambda}_{m+1}\|\le&\|\widetilde{\Lambda}_{l_{k}}\|+c_{1}T+M_{0}^{\prime}\\ \le&2\sqrt{N}r+2+1+1+(2\sqrt{N}r+2)(c\rho+2)\\ =&2\sqrt{N}r+2+M_{1}^{\prime}~~\forall m:l_{k}\le m\le m(l_{k},T). \end{array} $$
(101)

Then m(lk,T)<nk+1 for sufficiently large kk0 by (97) and the definition of nk.

From (101) by the definition of mk (99), we conclude m(lk,T)+1<mk for sufficiently large kk0. Then by (99) (100) we know that for sufficiently large kk0

$$\begin{array}{*{20}l} 2\sqrt{N}r+2<\|\widetilde{\Lambda}_{m+1}\|\le2\sqrt{N}r+2+M_{1}^{\prime} \end{array} $$
(102)

holds for m:lkmm(lk,T).

Since 0<ρ<1, there exists a positive integer m0 such that \(\phantom {\dot {i}\!}4c\rho ^{m_{0}}<1\). Then \(\sum _{m=l_{k}}^{l_{k}+m_{0}}a_{m}\xrightarrow [k\to \infty ]{}0\) by A1, and hence lk+m0<m(lk,T)<nk+1 for sufficiently large kk0. So, from (102) it can be seen that for sufficiently large kk0 we have

$$\begin{array}{*{20}l} \|\widetilde{\Lambda}_{l_{k}+m_{0}}\|>2\sqrt{N}r+2 \end{array} $$
(103)

Notice that \(\{\widetilde {\Lambda }_{m+1}:l_{k}\le m\le m(l_{k},T)\}\) is bounded, similarly to (67) (69) we know that for sufficiently large kk0

$$\begin{array}{*{20}l} &\|\tilde{X}_{\bot,m+1}\|\le(2\sqrt{N}r+2)c\rho^{m+1-l_{k}}\\&\quad+(M_{0}^{\prime}+C+1)C\frac{1}{1-\rho}\sup_{m\ge n_{k}}a_{m}\\ &+\alpha(2M_{0}^{\prime}+2C+3)C\frac{1}{1-\rho}\sup_{m\ge n_{k}}a_{m}\\&\quad+\Big(2+\frac{c(\rho+1)}{1-\rho}\Big)T_{k}. \end{array} $$

Since \(\phantom {\dot {i}\!}4c\rho ^{m_{0}}<1, c_{1}T<1\) and \(a_{k}\xrightarrow [k\to \infty ]{}0\), it follows that

$$\begin{array}{*{20}l} &\|\tilde{X}_{\bot,l_{k}+m_{0}}\|\le\frac{1}{2}(\sqrt{N}r+1)+\frac{1}{2}+1=\frac{\sqrt{N}r}{2}+2 \end{array} $$
(104)

for sufficiently large kk0. By noticing \((\mathbf {1}\otimes \mathbf {I}_{l})\tilde {\Delta }_{l_{k}+m_{0}}=\widetilde {\Lambda }_{l_{k}+m_{o}}-\tilde {X}_{\bot,l_{k}+m_{0}}\), from (103) (104) we conclude that

$$\begin{array}{*{20}l} &\sqrt{N}\|\tilde{\Delta}_{l_{k}+m_{0}}\|=\|\widetilde{\Lambda}_{l_{k}+m_{o}}-\tilde{X}_{\bot,l_{k}+m_{0}}\|\\ &\ge\|\widetilde{\Lambda}_{l_{k}+m_{o}}\|-\|\tilde{X}_{\bot,l_{k}+m_{0}}\|>\frac{3}{2}\sqrt{N}r. \end{array} $$
(105)

Therefore, \(\|\tilde {\Delta }_{l_{k}+m_{0}}\|>r\). So we prove that \(\{\tilde {\Delta }_{n_{k}}\}\) starting from nk will exit the ball r infinitely many times.

Since \(\{\widetilde {\Lambda }_{n_{k}}\}\) is convergent, we know that \(\{\widetilde {\Delta }_{n_{k}}\}\) is convergent and from Lemma 1 it follows that \(||\widetilde {\Delta }_{n_{k}}||\le \eta \). And we know that \(\{\tilde {\Delta }_{n_{k}}\}\) starting from nk will exit the ball r infinitely many times. Therefore, there exists an interval [δ1,δ2]∈(sup||y||≤ηv(y), inf||x||=rv(x)) with 0∉[δ1,δ2], where for any k, there is a sequence \(\tilde {\Delta }_{s_{k}},\dots,\tilde {\Delta }_{t_{k}}\) such that \(n_{k}\le s_{k}, v(\tilde {\Delta }_{s_{k}})\le \delta _{1}, \delta _{1}<v(\tilde {\Delta }_{j})<\delta _{2}\) for ∀j:sk<j<tk and \(v(\tilde {\Delta }_{t_{k}})>\delta _{2}\). In other words, the values of v(·) at sequence \(\{\tilde {\Delta }_{s_{k}},\dots,\tilde {\Delta }_{t_{k}}\}\) cross the interval [δ1,δ2] infinitely many times with \(\|\widetilde {\Lambda }_{m_{k}}\|<r\). This contradicts with Lemma 8. So the proof is done. \({\lim }_{k\to \infty }\sigma _{k}<\infty \) is indeed true. □

Lemma 10

Assume all the assumptions required by Lemma 9 hold. Then

$$\begin{array}{*{20}l} {\lim}_{k\to\infty}\sigma_{i,k}=\sigma<\infty\quad\forall i\in\mathcal{V}. \end{array} $$
(106)

Proof

From Lemma 9 it follows that

$$\begin{array}{*{20}l} \sigma_{i,k}\le\sigma\quad\forall i\in\mathcal{V}. \end{array} $$

By Lemma 4 we know \(\tilde {\tau }_{i,\sigma }=\tau _{i,\sigma }\le BD+\tau _{\sigma }\). So, by definition we know that σi,kσkBD+τσ.

In conclusion, we have \(\sigma _{i,k}=\sigma \quad \forall k\ge BD+\tau _{\sigma }\quad \forall i\in \mathcal {V}\). The proof is completed. □

By the definition of the auxiliary sequence \(\{\tilde {x}_{i,k}\}\), we can see that Lemma 10 indicates the fact that \(\{\tilde {x}_{i,k}\}\) and {xi,k} coincide in a finite number of steps.

Proof of Theorem 1: By (95) and (106), there exists a positive integer σ depending on ω such that

$$\begin{array}{*{20}l} \hat{\sigma}_{i,k}=\sigma_{i,k}=\sigma\quad\forall k\ge k_{0}\triangleq BD+\tau_{\sigma}\quad\forall i\in\mathcal{V}, \end{array} $$
(107)

and hence by (3)

$$\begin{array}{*{20}l} {}x^{\prime}_{i,k+1}\,=\,\sum_{j\in N_{i}(k)}w_{ij}(k)g_{j}(x_{j,k})+a_{k}O_{i,k+1}\quad\forall k\ge k_{0}\quad\forall i\in\mathcal{V}, \end{array} $$
(108)

by (5) \(||x^{\prime }_{i,k+1}-h_{k}(x^{*})||\le M_{\sigma }\) and by (4) \(x_{i,k+1}=x^{\prime }_{i,k+1}\) for any kk0 and any \(i\in \mathcal {V}\). So, we have proved the assertion i).

Multiply (11) by D from left, we derive

$$\begin{array}{*{20}l} {}X_{\bot,k+1}&=D_{\bot}(W(k)\otimes\mathbf{I}_{l})G_{k}(X_{k})\\ &\quad+a_{k}D_{\bot}\Big(F_{k+1}\big(G_{k}(X_{k})\big)+\epsilon_{k+1}\Big)\\ &=D_{\bot}(W(k)\otimes\mathbf{I}_{l})D_{\bot}X_{k}\\ &\quad+D_{\bot}\big(W(k)\otimes\mathbf{I}_{l}\big)\big(G_{k}(X_{k})-X_{k}\big)\\ &\quad+a_{k}D_{\bot}\Big(F_{k+1}\big(G_{k}(X_{k})\big)+\epsilon_{k+1}\Big)\\ &=D_{\bot}(W(k)\otimes\mathbf{I}_{l})X_{\bot,k}\\ &\quad+D_{\bot}\big(W(k)\otimes\mathbf{I}_{l}\big)\big(G_{k}(X_{k})-G_{k}(\Theta_{k})-X_{k}\,+\,\Theta_{k}\big)\\ &\quad+a_{k}D_{\bot}\big(F_{k+1}(G_{k}(X_{k}))+\epsilon_{k+1}\big)\\ &=D_{\bot}(W(k)\otimes\mathbf{I}_{l})X_{\bot,k}\\ &\quad+D_{\bot}(W(k)\otimes\mathbf{I}_{l})D_{k}(X_{k})\\ &\quad+a_{k}D_{\bot}\big(F_{k+1}(G_{k}(X_{k}))+\epsilon_{k+1}\big), \end{array} $$

where the third equality comes from the fact that (W(k)⊗Il)Gk(Xk)=(W(k)⊗Il)D and DGk(Xk)=DXk=0. So, for any kk0 by induction we have

$$\begin{array}{*{20}l} {}X_{\bot,k+1}&=\Psi(k,k_{0})(X_{k_{0}}-\Theta_{k_{0}})+\sum_{m=k_{0}}^{k}\Psi(k,m)D_{m}(X_{m})\\ {}&+\sum_{m=k_{0}}^{k}a_{m}\Psi(k,m\,+\,1)D_{\bot}\big(F_{m+1}(G_{m}(X_{m}))\,+\,\epsilon_{m+1}\big). \end{array} $$

From Lemma 10 we know the number of truncations is finite. So, {xi,khk−1(x)}k≥1 is bounded. Furthermore, Lemma 1 shows that {hk−1(x)−θk}k≥1 is bounded as well. Therefore, {xi,kθk}k≥1 is bounded. By assumption A6, A7 we can take sufficiently large k1>k0 such that ||ξk||≤1,γk≤1 ∀kk1. So, for sufficiently large k, there exists constants \(c_{6},c_{7},c_{8},c^{\prime }_{8},c_{9}>0\) such that

$$\begin{array}{*{20}l} ||X_{\bot,k+1}||&\le c_{6}\rho^{k+1-k_{0}}+c_{7}\sum_{m=k_{0}}^{k}\gamma_{m}\rho^{k+1-m}\\ &+c_{8}\sum_{m=k_{0}}^{k}\alpha(2||\Lambda_{m}||+1)a_{m}\rho^{k-m+2}\\ &+c_{9}\sum_{m=k_{0}}^{k}a_{m}\rho^{k-m+2}||\epsilon_{m+1}||\\ &\le c_{6}\rho^{k+1-k_{0}}+c_{7}\sum_{m=k_{0}}^{k}\gamma_{m}\rho^{k+1-m}\\ &+c^{\prime}_{8}\sum_{m=k_{0}}^{k}a_{m}\rho^{k-m+2}\\&+c_{9}\sum_{m=k_{0}}^{k}a_{m}\rho^{k-m+2}||\epsilon_{m+1}||. \end{array} $$
(109)

Notice that for any ε>0, there exists integer k2>k1 such that γk<ε. We can derive

$$\begin{array}{*{20}l} {}\sum_{m=0}^{k}\gamma_{m}\rho^{k-m+1}&=\sum_{m=0}^{k_{2}}\gamma_{m}\rho^{k-m+1}+\sum_{m=k_{2}+1}^{k}\gamma_{m}\rho^{k-m+1}\\ &\le\rho^{k-k_{2}+1}\sum_{m=0}^{k_{2}}\gamma_{m}\,+\,\epsilon\frac{1}{1-\rho}\xrightarrow[k\to\infty,\epsilon\to0]{}0. \end{array} $$

Therefore, the second and third term at the right-hand side of (109) tends to zero as k. Similarly, the last term of (109) also tends to zero since \({\lim }_{k\to \infty }a_{k}\epsilon _{k+1}=0\). The first term of (109) tends to zero as k as well since 0<ρ<1. So, we conclude that

$$\begin{array}{*{20}l} X_{\bot,k}\xrightarrow[k\to\infty]{}0. \end{array} $$

We now show the convergence of \(\{v(\widetilde {\Delta }_{k})\}\). Since

$$\begin{array}{*{20}l} v_{1}\triangleq\liminf_{k\to\infty}v(\widetilde{\Delta}_{k})\le\limsup_{k\to\infty}v(\widetilde{\Delta}_{k})\triangleq v_{2}, \end{array} $$

we aim to prove v1=v2. Assume the converse: v1<v2. Then there exists an interval [δ1,δ2]∈(v1,v2) such that 0∉[δ1,δ2]. \(v(\widetilde {\Delta }_{k})\) crosses the interval [δ1,δ2] infinite many times. By Lemma 9 and (13)–(14) we know \(\widetilde {\Delta }_{k}\) is bounded, so, \(\widetilde {\Lambda }_{k}\) is bounded as well. This contradicts with Lemma 8. Therefore, \(\{v(\widetilde {\Delta }_{k})\}\) is convergent.

Finally, we show \(\widetilde {\Delta }_{k}\xrightarrow [k\to \infty ]{}0\). Assume the converse. Then there exists a convergent subsequence \(\{\widetilde {\Delta }_{n_{k}}\}\) with limit \(\widetilde {\Delta }\neq 0\). Take β>0 such that \(||\widetilde {\Delta }||>\beta \). From Lemma 6 we know for sufficiently large k and sufficiently small T we have

$$\begin{array}{*{20}l} ||\widetilde{\Delta}_{j}||>\frac{\beta}{2},\quad n_{k}\le j\le m(n_{k},T). \end{array} $$

Similar to the proof of Lemma 8, by Taylor’s expansion there exists a>0 such that

$$\begin{array}{*{20}l} v(\widetilde{\Delta}_{m(n_{k},T)})-v(\widetilde{\Delta}_{n_{k}})<-\frac{a}{2}T. \end{array} $$
(110)

Since \(\{v(\widetilde {\Delta }_{k})\}\) is convergent and by definition \(m(n_{k},T)\xrightarrow [k\to \infty ]{}\infty \), Let k for both sides of (110), we derive \(0<-\frac {a}{2}T\) which is impossible. So \(\widetilde {\Delta }_{k}\xrightarrow [k\to \infty ]{}0\). By Lemma 10 we know that after a finite number of steps we have \(\widetilde {\Delta }_{k}=\Delta _{k}\), and hence, \(\Delta _{k}\xrightarrow [k\to \infty ]{}0\). Combining \(\Delta _{k}\xrightarrow [k\to \infty ]{}0\) with \(X_{\bot,k}\xrightarrow [k\to \infty ]{}0\), we can conclude \(\Lambda _{k}\xrightarrow [k\to \infty ]{}0\). □

6 Numerical simulation

In this section, we apply the distributed algorithm to a distributed tracking problem and demonstrate the performance of the algorithm. Consider a maneuvering target in the 2-D plane. The state of the target θk at each time consists of four components \(\theta _{k}=[\theta _{k}^{1},\theta _{k}^{2},\theta _{k}^{3},\theta _{k}^{4}]^{T}\). They are horizontal position, horizontal velocity, vertical position, and vertical velocity, respectively. The dynamic model of the target is chosen to be a nearly constant velocity model [29], which means that the dynamic of the target is governed by:

$$\begin{array}{*{20}l} \theta_{k+1}=A\theta_{k}+\xi_{k+1}, \end{array} $$
(111)

where \(\xi _{k+1}\in \mathbb {R}^{4}\) is noise, and A is defined as

$$\begin{array}{*{20}l} A=\mathbf{I}_{2}\otimes\left(\begin{array}{cc} 1 & T\\ 0 & 1\\ \end{array}\right) \end{array} $$
(112)

with T being the sampling interval. It can be seen that when ξk+1=0 the target follows a constant velocity movement. The goal of the network is to track this target by estimating the state θk.

Consider a sensor network \(\mathcal {G}=(\mathcal {V},\mathcal {E})\) with \(\mathcal {V}=\{1,\cdots,N\},~N=20,\) and \(\mathcal {E}=G(N,p_{N})\) being the Poisson random graphFootnote 1 with designing parameter 0≤pN≤1. We choose pN=0.25. Denote by Ni the neighbor set of agent i and by ni the cardinality of Ni. Set \(W(k)=[w_{ij}]_{i,j=1}^{N}~\forall k\ge 1\) with \(w_{ij}=\frac {1}{n_{i}}\) if agent j is in the set Ni. All agents aim to track the target state θk cooperatively. We assume for each agent, only one component of the target state can be observed with noise. To explain it in a mathematical model, the local function for agent i is defined as:

$$\begin{array}{*{20}l} f_{i,k}(x)\triangleq e_{k_{i}}\theta_{k}-e_{k_{i}}x, \end{array} $$
(113)

where ek is a 4-dimensional square diagonal matrix with only the kth diagonal element being 1 and every other elements being 0, i.e.,

$$\begin{array}{*{20}l} e_{1}\triangleq \text{diag}(1,0,0,0)\\ e_{2}\triangleq \text{diag}(0,1,0,0)\\ e_{3}\triangleq \text{diag}(0,0,1,0)\\ e_{4}\triangleq \text{diag}(0,0,0,1). \end{array} $$

The selection of ki will be explained later. Since the state θk is unknown to the agents, each agent can only get a noise-corrupted observation of this local function instead of the exact value. The global function can be written as:

$$\begin{array}{*{20}l} f_{k}(x)\triangleq\frac{1}{N} \sum_{i=1}^{N}e_{k_{i}}\theta_{k}-e_{k_{i}}x. \end{array} $$
(114)

It can be seen that while each agent can only estimate one component of θk with its own local function, θk is the unique root of the global function fk(x).

For our experiment, we take \(\xi _{k}\triangleq \frac {1}{k^{2}}v_{k}\), where {vk} is a sequence of i.i.d. random variables uniformly distributed over [−1,1], and the step-sizes \(a_{k}\triangleq \frac {20}{k}\). We let the sampling interval be \(T\triangleq 0.1\) s, the truncation bound \(M_{k}\triangleq k+80\), and \(x^{*}\triangleq [1,1,1,1]^{T}\). The initial value xi,0 for all agents is chosen from the uniform distribution over [−2,2]. Let the observation noise εi,k be the white Gaussian noise. As for the selection of ki, for agent i, if i mod 4≠0, then \(k_{i}\triangleq i\mod l\), if i mod 4=0, then \(k_{i}\triangleq 4\).

Denote by \(\{x_{i,k}\}_{k\ge 1},i\in \mathcal {V}\) the estimates given by (3)–(7) and by \(x_{k}=\frac {1}{N}\sum _{i=1}^{N}x_{i,k}\) the average of \(x_{i,k},~i\in \mathcal {V}\). In Fig. 1, the dashed lines denote the state of the moving target and the solid lines the average estimates for entries \(\{\theta _{k}^{j},~j=1,\cdots,4\}_{k\geq 1}\) of {θk}k≥1. From the figure we can see that the estimate can track the moving target successfully.

Fig. 1
figure 1

Average estimation sequences of \(\theta _{k}^{j},j=1,\cdots,l\)

7 Conclusion

The distributed root-tracking problem for a sum of time-varying regression functions over a network is considered in this paper. It is assumed that a noise-corrupted dynamic information of the roots is known to all agents in the network. Each agent updates its estimate by using the local observation, the dynamic information of the global root, and information received from its neighbors. A distributed stochastic approximation algorithm is proposed and the consensus and convergence of the estimates are established.

For future research, it is of interest to relax the conditions on the dynamic information of the global roots, and to consider the convergence results of the algorithm over an unbalanced network.

8 Notations

Table 1