Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception

Ouyang, Ruofei; Low, Bryan Kian Hsiang

doi:10.1007/s10514-018-09826-z

Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception

Published: 28 January 2019

Volume 44, pages 359–376, (2020)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Ruofei Ouyang¹ &
Bryan Kian Hsiang Low¹

848 Accesses
7 Citations
Explore all metrics

Abstract

This paper presents novel Gaussian process decentralized data fusion algorithms exploiting the notion of agent-centric support sets for distributed cooperative perception of large-scale environmental phenomena. To overcome the limitations of scale in existing works, our proposed algorithms allow every mobile sensing agent to utilize a different support set and dynamically switch to another during execution for encapsulating its own data into a local summary that, perhaps surprisingly, can still be assimilated with the other agents’ local summaries (i.e., based on their current support sets) into a globally consistent summary to be used for predicting the phenomenon. To achieve this, we propose a novel transfer learning mechanism for a team of agents capable of sharing and transferring information encapsulated in a summary based on a support set to that utilizing a different support set with some loss that can be theoretically bounded and analyzed. To alleviate the issue of information loss accumulating over multiple instances of transfer learning, we propose a new information sharing mechanism to be incorporated into our algorithms in order to achieve memory-efficient lazy transfer learning. Empirical evaluation on three real-world datasets for up to 128 agents show that our algorithms outperform the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Decision-Theoretic Active Perception for Multi-robot Active Information Gathering

Neighborhood-Oriented Decentralized Learning Communication in Multi-Agent System

MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling

Notes

PITC generalizes the Bayesian Committee Machine (BCM) (Schwaighofer and Tresp 2002), the latter of which assumes the support set to be the set of unobserved input locations whose measurements are to be predicted (Quiñonero-Candela and Rasmussen 2005). As a result, BCM does not scale well with a large set of such unobserved input locations.
An exception is the work of Park et al. (2011) that overcomes this boundary effect by imposing continuity constraints along the boundaries in a centralized manner.
The conditional independence of $Y_{\mathcal {D}_1},\ldots ,Y_{\mathcal {D}_N}$ given $Y_{\mathcal {S}}$ assumed by PITC and PIC (hence, GP-DDF and GP-$\hbox {DDF}^+$) improves their scalability over the GP model (Sect. 2) at the cost of poorer predictive performance. To potentially reduce the degree of violation of this assumption, an informative support set can be $\hbox {selected}^6$. Furthermore, the experimental results in Chen et al. (2015) show that GP-DDF and GP-$\hbox {DDF}^+$ can achieve predictive performance comparable to that of the GP model while enjoying lower computational cost over it. The predictive performance of GP-DDF and GP-$\hbox {DDF}^+$ can be improved by increasing the size of $\mathcal {S}$ at the expense of greater time and communication overhead.
Naively, an agent can delay transfer learning by simply storing a separate local summary based on the support set for every previously visited local area, which is not memory-efficient.
Multiple backups of the local summary and support set for the same local area may exist if agents leave this area at the same time, which rarely happens. In this case, agent i should retrieve (and remove) all these backups from the agents storing them.
Alternatively, active learning can be used to select an informative support set a priori for each local area (Chen et al. 2015). Empirically, this yields little performance improvement due to a sufficiently dense (yet small) support set uniformly distributed over the local area and slightly beyond its boundary by $10\%$ of its width.
Local GPs result from a sparse block-diagonal $\varSigma _{\mathcal {D}\mathcal {D}}$ (2).
The predictive performance of centralized PITC corresponds to that of GP-DDF, as discussed in Sect. 2.2. Hence, the RMSE of centralized PITC coincides exactly with that of GP-DDF in Fig. 8.
The incurred time of centralized PITC is slightly less than that of GP-DDF (Fig. 8) increased by a factor of the total number of agents. This agrees with the analysis of the time complexity of PITC versus GP-DDF in Sect. 2.2. This can also be observed in Fig. 9 where the incurred time of GP-DDF increases by nearly two fold when the number of agents is halved.
If the subset sizes differ, then “virtual” locations are added to each subset to make all subsets to be of the same size as $T\triangleq \arg \max _{s\in \mathcal {S}} |\mathcal {D}_{is}|$ ($T'\triangleq \arg \max _{s\in \mathcal {S}} |\mathcal {S}'_{s}|$). The virtual locations added to $\mathcal {D}_{is}$ ($\mathcal {S}'_{s}$) are chosen as $s\in \mathcal {S}$ so that they do not induce additional errors but will loosen the bound.

References

Cao, N., Low, K. H., & Dolan, J. M. (2013). Multi-robot informative path planning for active sensing of environmental phenomena: A tale of two algorithms. In Proceedings of AAMAS.
Chen, J., Cao, N., Low, K. H., Ouyang, R., Tan, C. K. Y., & Jaillet, P. (2013a). Parallel Gaussian process regression with low-rank covariance matrix approximations. In Proceedings of UAI (pp. 152–161).
Chen, J., Low, K. H., Jaillet, P., & Yao, Y. (2015). Gaussian process decentralized data fusion and active sensing for spatiotemporal traffic modeling and prediction in mobility-on-demand systems. IEEE Transactions on Automation Science and Engineering, 12, 901–921.
Article Google Scholar
Chen, J., Low, K. H., Tan, C. K. Y., Oran, A., Jaillet, P., Dolan, J. M., & Sukhatme, G. S. (2012). Decentralized data fusion and active sensing with mobile sensors for modeling and predicting spatiotemporal traffic phenomena. In Proceedings of UAI (pp. 163–173).
Chen, J., Low, K. H., & Tan, C. K. Y. (2013b). Gaussian process-based decentralized data fusion and active sensing for mobility-on-demand system. In Proceedings of robotics: science and systems conference.
Choudhury, A., Nair, P. B., & Keane, A. J. (2002). A data parallel approach for large-scale Gaussian process modeling. In Proceedings of SDM (pp. 95–111).
Chung, T. H., Gupta, V., Burdick, J. W., & Murray, R. M. (2004). On a decentralized active sensing strategy using mobile sensor platforms in a network. In Proceedings of CDC (pp. 1914–1919).
Coates, M. (2004). Distributed particle filters for sensor networks. In Proceedings of IPSN (pp. 99–107).
Cortes, J. (2009). Distributed kriged Kalman filter for spatial estimation. IEEE Transactions on Automatic Control, 54(12), 2816–2827.
Article MathSciNet Google Scholar
Das, J., Harvey, J. B. J., Py, F., Vathsangam, H., Graham, R., Rajan, K., & Sukhatme, G. S. (2013). Hierarchical probabilistic regression for AUV-based adaptive sampling of marine phenomena. In Proceedings of IEEE ICRA (pp. 5571–5578).
Das, K., & Srivastava, A. N. (2010). Block-GP: Scalable Gaussian process regression for multimodal data. In Proceedings of ICDM (pp. 791–796).
Daxberger, E., & Low, K. H. (2017). Distributed batch Gaussian process optimization. In Proceedings of ICML (pp. 951–960).
Dolan, J. M., Podnar, G., Stancliff, S., Low, K. H., Elfes, A., Higinbotham, J., Hosler, J. C., Moisan, T. A., & Moisan, J. (2009). Cooperative aquatic sensing using the telesupervised adaptive ocean sensor fleet. In Proceedings of SPIE conference on remote sensing of the ocean, sea ice, and large water regions Vol. 7473.
Guestrin, C., Bodik, P., Thibaus, R., Paskin, M., & Madden, S. (2004). Distributed regression: An efficient framework for modeling sensor network data. In Proceedings of IPSN (pp. 1–10).
Hoang, T. N., Hoang, Q. M., & Low, K. H. (2016). A distributed variational inference framework for unifying parallel sparse Gaussian process regression models. In Proceedings of ICML (pp. 382–391).
Hoang, Q. M., Hoang, T. N., & Low, K. H. (2017). A generalized stochastic variational Bayesian hyperparameter learning framework for sparse spectrum Gaussian process regression. In Proceedings of AAAI (pp. 2007–2014).
Hoang, T. N., Hoang, Q. M., & Low, K. H. (2018). Decentralized high-dimensional Bayesian optimization with factor graphs. In Proceedings of AAAI (pp. 3231–3238).
Hoang, T. N., Hoang, Q. M., Low, K. H., & How, J. P. (2019). Collective online learning of Gaussian processes in massive multi-agent systems. In Proceedings of AAAI.
Hoang, T. N., Low, K. H., Jaillet, P., & Kankanhalli, M. (2014). Nonmyopic $\epsilon $-Bayes-optimal active learning of Gaussian processes. In Proceedings of ICML (pp. 739–747).
Kim, Y., & Shell, D. (2014). Distributed robotic sampling of non-homogeneous spatiotemporal fields via recursive geometric sub-division. In Proceedings of IEEE ICRA (pp. 557–562).
Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9, 235–284.
MATH Google Scholar
Leonard, N. E., Palley, D. A., Lekien, F., Sepulchre, R., Fratantoni, D. M., & Davis, R. E. (2007). Collective motion, sensor networks, and ocean sampling. Proceedings of the IEEE, 95(1), 48–74.
Article Google Scholar
Ling, C. K., Low, K. H., & Jaillet, P. (2016). Gaussian process planning with Lipschitz continuous reward functions: Towards unifying Bayesian optimization, active learning, and beyond. In Proceedings of AAAI (pp. 1860–1866).
Low, K. H., Chen, J., Dolan, J. M., Chien, S., & Thompson, D. R. (2012). Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing. In Proceedings of AAMAS (pp. 105–112).
Low, K. H., Chen, J., Hoang, T. N., Xu, N., & Jaillet, P. (2015a). Recent advances in scaling up Gaussian process predictive models for large spatiotemporal data. In S. Ravela, A. Sandu (Eds.), Proceedings of dynamic data-driven environmental systems science conference (DyDESS’14), LNCS 8964, Springer.
Low, K. H., Dolan, J. M., & Khosla, P. (2008). Adaptive multi-robot wide-area exploration and mapping. In Proceedings of AAMAS (pp. 23–30).
Low, K. H., Dolan, J. M., & Khosla, P. (2009). Information-theoretic approach to efficient adaptive path planning for mobile robotic environmental sensing. In Proceedings of ICAPS.
Low, K. H., Dolan, J. M., & Khosla, P. (2011). Active Markov information-theoretic path planning for robotic environmental sensing. In Proceedings of AAMAS (pp. 753–760).
Low, K. H., Gordon, G. J., Dolan, J. M., & Khosla, P. (2007). Adaptive sampling for multi-robot wide-area exploration. In Proceedings of IEEE ICRA (pp. 755–760).
Low, K. H., Yu, J., Chen, J., & Jaillet, P. (2015b). Parallel Gaussian process regression for big data: Low-rank representation meets Markov approximation. In Proceedings of AAAI (pp. 2821–2827).
Min, W., & Wynter, L. (2011). Real-time road traffic prediction with spatio-temporal correlations. Transportation Research Part C: Emerging, 19(4), 606–616.
Article Google Scholar
Ouyang, R., & Low, K. H. (2018). Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception. In Proceedings of AAAI (pp. 3876–3883).
Ouyang, R., Low, K. H., Chen, J., & Jaillet, P. (2014). Multi-robot active sensing of non-stationary Gaussian process-based environmental phenomena. In Proceedings of AAMAS (pp. 573–580).
Park, C., Huang, J. Z., & Ding, Y. (2011). Domain decomposition approach for fast Gaussian process regression of large spatial data sets. JMLR, 12, 1697–1728.
MathSciNet MATH Google Scholar
Paskin, M. A., Guestrin, C. (2004). Robust probabilistic inference in distributed systems. In Proceedings of UAI (pp. 436–445).
Podnar, G., Dolan, J. M., Low, K. H., & Elfes, A. (2010). Telesupervised remote surface water quality sensing. In Proceedings of IEEE aerospace conference.
Quiñonero-Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate Gaussian process regression. JMLR, 6, 1939–1959.
MathSciNet MATH Google Scholar
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.
MATH Google Scholar
Rosencrantz, M., Gordon, G., & Thrun, S. (2003). Decentralized sensor fusion with distributed particle filters. In Proceedings of UAI (pp. 493–500).
Schwaighofer, A., & Tresp, V. (2002). Transductive and inductive methods for approximate Gaussian process regression. In Proceedings of NIPS (pp. 953–960).
Singh, A., Krause, A., Guestrin, C., & Kaiser, W. J. (2009). Efficient informative sensing using multiple robots. Journal of Artificial Intelligence Research, 34, 707–755.
Article MathSciNet Google Scholar
Snelson, E. L., & Ghahramani, Z. (2007). Local and global sparse Gaussian process approximation. In Proceedings of AISTATS.
Sukkarieh, S., Nettleton, E., Kim, J., Ridley, M., Goktogan, A., & Durrant-Whyte, H. (2003). The ANSER project: Data fusion across multiple uninhabited air vehicles. IJRR, 22(7–8), 505–539.
Google Scholar
Sun, S., Zhao, J., & Zhu, J. (2015). A review of Nyström methods for large-scale machine learning. Information Fusion, 26, 36–48.
Article Google Scholar
Thompson, D. R., Cabrol, N., Furlong, M., Hardgrove, C., Low, K. H., Moersch, J., & Wettergreen, D. (2013). Adaptive sampling of time series with application to remote exploration. In Proceedings of IEEE ICRA (pp. 3463–3468).
Wang, Y., & Papageorgiou, M. (2005). Real-time freeway traffic state estimation based on extended Kalman filter: A general approach. Transportation Research Part B: Methodological, 39(2), 141–167.
Article Google Scholar
Work, D. B., Blandin, S., Tossavainen, O., & Piccoli, B. (2010). Bayen A (2010) A traffic model for velocity data assimilation. AMRX, 1, 1–35.
MATH Google Scholar
Xu, N., Low, K. H., Chen, J., Lim, K. K., & Özgül, E.B. (2014). GP-Localize: Persistent mobile robot localization using online sparse Gaussian process observation model. In Proceedings of AAAI (pp. 2585–2592).
Zhang, K., Tsang, I. W., & Kwok, J. T. (2008). Improved Nyström low-rank approximation and error analysis. In Proceedings of ICML (pp. 1232–1239).
Zhang, Y., Hoang, T. N., Low, K. H., & Kankanhalli, M. (2016). Near-optimal active learning of multi-output Gaussian processes. In Proceedings of AAAI.

Download references

Author information

Authors and Affiliations

Department of Computer Science, National University of Singapore, Singapore, Republic of Singapore
Ruofei Ouyang & Bryan Kian Hsiang Low

Authors

Ruofei Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Kian Hsiang Low
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bryan Kian Hsiang Low.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is one of the several papers published in Autonomous Robots comprising the Special Issue on Multi-Robot and Multi-Agent Systems.

This research is supported by Singapore Ministry of Education Academic Research Fund Tier 2, MOE2016-T2-2-156.

Appendices

Appendix A: Gaussian predictive distribution computed by the GP-$\hbox {DDF}^+$ algorithm

Definition 5

(GP-$\hbox {DDF}^+$) Given a common support set $\mathcal {S}\subset \mathcal {X}$ known to all N agents, global summary $(\dot{\nu }_{\mathcal {S}},\dot{\varPsi }_{\mathcal {S}\mathcal {S}})$, local summary $(\nu _{\mathcal {S}|\mathcal {D}_i},\varPsi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})$, and a column vector $y_{\mathcal {D}_i}$ of realized measurements for observed locations $\mathcal {D}_i$, the GP-$\hbox {DDF}^+$ algorithm run by each agent i computes a Gaussian predictive distribution $\mathcal {N}(\overline{\mu }_{x}, \overline{\sigma }^2_{x})$ of the measurement for any unobserved location $x \in \mathcal {X}{\setminus }\mathcal {D}$ where

$$\begin{aligned} \begin{aligned} \overline{\mu }_{x}&\triangleq \displaystyle \mu _{x}+\left( \gamma _{x\mathcal {S}}^i\dot{\varPsi }^{-1}_{\mathcal {S}\mathcal {S}}\dot{\nu }_{\mathcal {S}} -\varSigma _{x\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\nu _{\mathcal {S}|\mathcal {D}_i}\right) +{\nu }_{x|\mathcal {D}_i},\\ \overline{\sigma }^2_{x}&\triangleq \displaystyle \sigma _{xx} - \Big (\gamma _{x\mathcal {S}}^i\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}x}-\varSigma _{x\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varPsi _{\mathcal {S}x|\mathcal {D}_i}\\&\quad \displaystyle -\gamma _{x\mathcal {S}}^i\dot{\varPsi }_{\mathcal {S}\mathcal {S}}^{-1}\gamma _{\mathcal {S}x}^i \Big )-\varPsi _{xx|\mathcal {D}_i}, \end{aligned} \end{aligned}$$

(10)

$\gamma _{x\mathcal {S}}^i \triangleq \displaystyle \varSigma _{x\mathcal {S}}+\varSigma _{x\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varPsi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}-\varPsi _{x\mathcal {S}|\mathcal {D}_i}\ ,$ and $\gamma _{\mathcal {S}x}^i \triangleq \gamma _{x\mathcal {S}}^{i\top }.$

The Gaussian predictive distribution (10) computed by the GP-$\hbox {DDF}^+$ algorithm is observed to exploit the local and global summaries (i.e., terms within brackets) as well as the data local to agent i (i.e., ${\nu }_{x|\mathcal {D}_i}$ and $\varPsi _{xx|\mathcal {D}_i}$ terms).

Appendix B: Proof of Proposition 1

$$\begin{aligned} \begin{aligned}&\omega _{\mathcal {S}'|\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}'\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}(y_{\mathcal {D}_i}-\mu _{\mathcal {D}_i})\\&\quad = \displaystyle \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}(y_{\mathcal {D}_i}-\mu _{\mathcal {D}_i})\\&\quad =\displaystyle \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i} \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned}&\varPhi _{\mathcal {S}'\mathcal {S}'|\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}'\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}'}\\&\quad = \displaystyle \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {S}'}\\&\quad =\displaystyle \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {S}'} \end{aligned} \end{aligned}$$

where the second equalities above follow from the assumption that $\mathcal {S}'$ and $\mathcal {D}_i$ are conditionally independent given $\mathcal {S}$ (i.e., $\varSigma _{\mathcal {S}'\mathcal {D}_i|\mathcal {S}}= \varSigma _{\mathcal {S}'\mathcal {D}_i} - \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i} =\underline{0}$).

Appendix C: Proof of Proposition 2

$$\begin{aligned} \begin{aligned}&\varPsi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i|\mathcal {S}}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}(\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}+\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1})\varSigma _{\mathcal {D}_i\mathcal {S}}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\\&\qquad \displaystyle +\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}+\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}})^{-1}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}+\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(I+(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}+\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\varSigma _{\mathcal {S}\mathcal {S}} \end{aligned} \end{aligned}$$

where the second equality follows from the matrix inverse lemma on $\varSigma _{\mathcal {D}_i\mathcal {D}_i|\mathcal {S}}^{-1} = (\varSigma _{\mathcal {D}_i\mathcal {D}_i}-\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i})^{-1} =\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}+\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1} \varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}$. As a result,

$$\begin{aligned}&\varPsi _{SS|\mathcal {D}_i}^{-1}\\&\quad =\varSigma _{SS}^{-1}(\varSigma _{SS} -\varPhi _{SS|\mathcal {D}_i})\varPhi _{SS|\mathcal {D}_i}^{-1} =\varPhi _{SS|\mathcal {D}_i}^{-1} - \varSigma _{SS}^{-1}.\\&\nu _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i|\mathcal {S}}^{-1}y_{\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}(\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}+\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1})y_{\mathcal {D}_i}\\&\quad =\displaystyle \varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}y_{\mathcal {D}_i}+\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}y_{\mathcal {D}_i}\\&\quad =\displaystyle \omega _{\mathcal {S}|\mathcal {D}_i}+\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}})^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \omega _{\mathcal {S}|\mathcal {D}_i}+\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}+(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1})\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\\&\qquad \displaystyle ((\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}+I)\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}(\varSigma _{\mathcal {S}\mathcal {S}}-\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i})^{-1}\varSigma _{\mathcal {S}\mathcal {S}}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle (\varSigma _{\mathcal {S}\mathcal {S}}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}-I)^{-1}\varSigma _{\mathcal {S}\mathcal {S}}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle (\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}-\varSigma _{\mathcal {S}\mathcal {S}}^{-1})^{-1}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}\\&\quad =\displaystyle \varPsi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}\varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i} \end{aligned}$$

where the second equality follows from the matrix inverse lemma on $\varSigma _{\mathcal {D}_i\mathcal {D}_i|\mathcal {S}}^{-1} =\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}+\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}\varSigma _{\mathcal {D}_i\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}\varSigma _{\mathcal {D}_i\mathcal {D}_i}^{-1}$. As a result, $\varPsi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\nu _{\mathcal {S}|\mathcal {D}_i} = \varPhi _{\mathcal {S}\mathcal {S}|\mathcal {D}_i}^{-1}\omega _{\mathcal {S}|\mathcal {D}_i}$. So, (8) follows.

Appendix D: Proof of Theorem 1

The following lemma is necessary for deriving our main result here:

Lemma 1

Define $\sigma _{xx'}$ using a squared exponential covariance function. Then, every covariance component $\sigma _{xx'}$ in $\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}}$, $\varSigma _{\mathcal {S}\mathcal {S}}$, $\varSigma _{\mathcal {S}'_{t}\mathcal {S}}$, and $\varSigma _{\mathcal {D}_{it'}\mathcal {S}}$ satisfies $(\sigma _{xx'}-\sigma _{ss'})^2\le {3e^{-1}\sigma _s^4}(\Vert \varLambda ^{-1}(x-s)\Vert ^2 + \Vert \varLambda ^{-1}(x'-s')\Vert ^2)$ for all $x,x',s,s'\in \mathcal {X}$.

Proof

Since every covariance component $\sigma _{xx'}$ in $\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}}$, $\varSigma _{\mathcal {S}\mathcal {S}}$, $\varSigma _{\mathcal {S}'_{t}\mathcal {S}}$, and $\varSigma _{\mathcal {D}_{it'}\mathcal {S}}$ does not involve the noise variance $\sigma ^2_n$, it follows from (1) that

$$\begin{aligned} \begin{aligned} \sigma _{xx'}&=\displaystyle \sigma _s^2\exp \left( -\left\| \frac{{\varLambda }^{-1}({x} - {x}')}{\sqrt{2}}\right\| ^2\right) \\&=\displaystyle \sigma _s^2 k\left( \left\| \frac{{\varLambda }^{-1}({x} - {x}')}{\sqrt{2}}\right\| \right) \end{aligned} \end{aligned}$$

where $k(a)\triangleq \exp (-a^2)$. Then,

$$\begin{aligned} \begin{aligned}&\displaystyle (\sigma _{xx'}-\sigma _{ss'})^2\\&\quad \displaystyle = \sigma _s^4 \left\{ k\left( \left\| \frac{{\varLambda }^{-1}({x} - {x}')}{\sqrt{2}}\right\| \right) - k\left( \left\| \frac{{\varLambda }^{-1}({s} - {s}')}{\sqrt{2}}\right\| \right) \right\} ^2\\&\quad \displaystyle = 0.5\sigma _s^4 k'(\xi )^2(\Vert \varLambda ^{-1}({x} - {x}')\Vert - \Vert \varLambda ^{-1}({s} - {s}')\Vert )^2\\&\quad \displaystyle \le e^{-1}\sigma _s^4 (\Vert \varLambda ^{-1}(x-s)\Vert + \Vert \varLambda ^{-1}(x'-s')\Vert )^2\\&\quad \displaystyle \le e^{-1}\sigma _s^4(\Vert \varLambda ^{-1}(x-s)\Vert + \Vert \varLambda ^{-1}(x'-s')\Vert )^2\\&\quad \displaystyle \le {3e^{-1}\sigma _s^4}(\Vert \varLambda ^{-1}(x-s)\Vert ^2 + \Vert \varLambda ^{-1}(x'-s')\Vert ^2) \end{aligned} \end{aligned}$$

where the second equality is due to mean value theorem such that $k'(\xi )$ is the first-order derivative of k evaluated at some $\xi \in (\Vert \varLambda ^{-1}(s-s')\Vert /\sqrt{2},\Vert \varLambda ^{-1}(x-x')\Vert /\sqrt{2})$ without loss of generality, the first inequality follows from the fact that $k'(a)$ is maximized at $a=-1/\sqrt{2}$ and hence $k'(\xi )\le k'(-1/\sqrt{2})=\sqrt{2/e}$, and the second inequality is due to triangle inequality (i.e., $\Vert \varLambda ^{-1}(x-x')\Vert \le \Vert \varLambda ^{-1}(x-s)\Vert +\Vert \varLambda ^{-1}(s-s')\Vert +\Vert \varLambda ^{-1}(s'-x')\Vert $). $\square $

Supposing each subset $\mathcal {D}_{is}$ ($\mathcal {S}'_s$) contains T ($T'$) locations,^{Footnote 10} select one location from each subset to form a new subset $\mathcal {D}_{it'}\triangleq \{x_{it's} \}_{s\in \mathcal {S}}$ ($\mathcal {S}'_t\triangleq \{x'_{ts} \}_{s\in \mathcal {S}}$) of $|\mathcal {S}|$ locations for $t'=1$ ($t=1$) and repeat this for $t'=2,\ldots ,T$ ($t=2,\ldots ,T'$). Then, $\mathcal {D}_{i}=\bigcup ^T_{t'=1}\mathcal {D}_{it'}$ and $\mathcal {S}'=\bigcup ^{T'}_{t=1}\mathcal {S}'_{t}$. It follows that $\varSigma _{\mathcal {S}'\mathcal {S}} = [\varSigma _{\mathcal {S}'_{t}\mathcal {S}}]_{t=1,\ldots ,T'}$, $\varSigma _{\mathcal {S}\mathcal {D}_{i}} = [\varSigma _{\mathcal {S}\mathcal {D}_{it'}}]_{t'=1,\ldots ,T}$, and $\varSigma _{\mathcal {S}'\mathcal {D}_{i}} = [\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}}]_{t=1,\ldots ,T',t'=1,\ldots ,T}$.

Using the definition of Frobenius norm followed by the subadditivity of a square root function,

$$\begin{aligned} \begin{aligned}&\displaystyle ||\varSigma _{\mathcal {S}'\mathcal {D}_i} - \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}||_F\\&\quad = \displaystyle ||\varSigma _{\mathcal {S}'\mathcal {D}_i|\mathcal {S}}||_F\\&\quad = \displaystyle \sqrt{\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}|\mathcal {S}}||^2_F}\\&\quad \le \displaystyle \sum ^{T'}_{t=1} \sum ^{T}_{t'=1} ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}|\mathcal {S}}||_F. \end{aligned} \end{aligned}$$

(11)

Let $A_{\mathcal {S}'_{t}\mathcal {D}_{it'}}\triangleq \varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}}-\varSigma _{\mathcal {S}\mathcal {S}}$, $B_{\mathcal {S}'_t\mathcal {S}}\triangleq \varSigma _{\mathcal {S}'_t\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {S}}$, and $C_{\mathcal {D}_{it'}\mathcal {S}}\triangleq \varSigma _{\mathcal {D}_{it'}\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {S}}$. Then,

$$\begin{aligned} \begin{aligned}&\displaystyle ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}|\mathcal {S}}||_F\\&\quad = \displaystyle ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}} - \varSigma _{\mathcal {S}'_t\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_{it'}}||_F\\&\quad = \displaystyle || \varSigma _{\mathcal {S}\mathcal {S}}+A_{\mathcal {S}'_{t}\mathcal {D}_{it'}} \\&\qquad -\displaystyle (\varSigma _{\mathcal {S}\mathcal {S}}+B_{\mathcal {S}'_t\mathcal {S}})\varSigma _{\mathcal {S}\mathcal {S}}^{-1}(\varSigma _{\mathcal {S}\mathcal {S}}+C_{\mathcal {D}_{it'}\mathcal {S}})^{\top }||_F\\&\quad = \displaystyle || \varSigma _{\mathcal {S}\mathcal {S}}+A_{\mathcal {S}'_{t}\mathcal {D}_{it'}} -\varSigma _{\mathcal {S}\mathcal {S}}^{\top }-C_{\mathcal {D}_{it'}\mathcal {S}}^{\top } -B_{\mathcal {S}'_t\mathcal {S}}\\&\qquad -\displaystyle B_{\mathcal {S}'_t\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}C_{\mathcal {D}_{it'}\mathcal {S}}^{\top }||_F\\&\quad \le \displaystyle || A_{\mathcal {S}'_{t}\mathcal {D}_{it'}}||_F +||B_{\mathcal {S}'_t\mathcal {S}}||_F +||C_{\mathcal {D}_{it'}\mathcal {S}}||_F \\&\quad \quad +||B_{\mathcal {S}'_t\mathcal {S}}||_F ||C_{\mathcal {D}_{it'}\mathcal {S}}||_F ||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F \end{aligned} \end{aligned}$$

(12)

where the inequality is due to the subadditivity and submultiplicativity of the matrix norm.

Let $\epsilon _{\mathcal {S}'_{t}}\triangleq (1/|\mathcal {S}|)\sum _{x\in \mathcal {S}'_{t}}||\varLambda ^{-1}(x-c(x))||^2$ and $\epsilon _{\mathcal {D}_{it'}}\triangleq (1/|\mathcal {S}|)\sum _{x\in \mathcal {D}_{it'}}||\varLambda ^{-1}(x-c(x))||^2$. Then,

$$\begin{aligned} \begin{aligned}&\displaystyle || A_{\mathcal {S}'_{t}\mathcal {D}_{it'}}||^2_F\\&\quad = \displaystyle ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}}-\varSigma _{\mathcal {S}\mathcal {S}}||^2_F\\&\quad = \displaystyle \sum _{s,s'\in \mathcal {S}} (\sigma _{x'_{ts}x_{it's'}}-\sigma _{ss'})^2\\&\quad \le 3e^{-1}\sigma ^4_s \displaystyle \sum _{s,s'\in \mathcal {S}}\left( ||\varLambda ^{-1}(x'_{ts}-s)||^2 + ||\varLambda ^{-1}(x_{it's'}-s')||^2\right) \\&\quad = 3e^{-1}\sigma ^4_s|\mathcal {S}|\displaystyle \Bigg (\sum _{s\in \mathcal {S}}||\varLambda ^{-1}(x'_{ts}-s)||^2 \\&\qquad \displaystyle +\sum _{s'\in \mathcal {S}}||\varLambda ^{-1}(x_{it's'}-s')||^2\Bigg )\\&\quad =3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle (\epsilon _{\mathcal {S}'_{t}} + \epsilon _{\mathcal {D}_{it'}}) \end{aligned} \end{aligned}$$

(13)

since $\epsilon _{\mathcal {S}'_{t}}=(1/|\mathcal {S}|)\sum _{s\in \mathcal {S}}||\varLambda ^{-1}(x'_{ts}-s)||^2$ and $\epsilon _{\mathcal {D}_{it'}}=(1/|\mathcal {S}|)$$\sum _{s'\in \mathcal {S}}||\varLambda ^{-1}(x_{it's'}-s')||^2$. The inequality is due to Lemma 1.

$$\begin{aligned} \begin{aligned}&\displaystyle || B_{\mathcal {S}'_{t}\mathcal {S}}||^2_F\\&\quad = \displaystyle ||\varSigma _{\mathcal {S}'_{t}\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {S}}||^2_F\\&\quad = \displaystyle \sum _{s,s'\in \mathcal {S}} (\sigma _{x'_{ts}s'}-\sigma _{ss'})^2\\&\quad \le 3e^{-1}\sigma ^4_s\displaystyle \sum _{s,s'\in \mathcal {S}}\left( ||\varLambda ^{-1}(x'_{ts}-s)||^2 + ||\varLambda ^{-1}(s'-s')||^2\right) \\&\quad = 3e^{-1}\sigma ^4_s|\mathcal {S}|\displaystyle \sum _{s\in \mathcal {S}}||\varLambda ^{-1}(x'_{ts}-s)||^2\\&\quad =3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {S}'_{t}} \end{aligned} \end{aligned}$$

(14)

such that the inequality is due to Lemma 1.

$$\begin{aligned} \begin{aligned}&\displaystyle || C_{\mathcal {D}_{it'}\mathcal {S}}||^2_F\\&\quad = \displaystyle ||\varSigma _{\mathcal {D}_{it'}\mathcal {S}}-\varSigma _{\mathcal {S}\mathcal {S}}||^2_F\\&\quad = \displaystyle \sum _{s,s'\in \mathcal {S}} (\sigma _{x_{it's}s'}-\sigma _{ss'})^2\\&\quad \le 3e^{-1}\sigma ^4_s\displaystyle \sum _{s,s'\in \mathcal {S}}\left( ||\varLambda ^{-1}(x_{it's}-s)||^2 + ||\varLambda ^{-1}(s'-s')||^2\right) \\&\quad = 3e^{-1}\sigma ^4_s|\mathcal {S}|\displaystyle \sum _{s\in \mathcal {S}}||\varLambda ^{-1}(x_{it's}-s)||^2\\&\quad =3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {D}_{it'}} \end{aligned} \end{aligned}$$

(15)

such that the inequality is due to Lemma 1.

By substituting (13), (14), and (15) into (12),

$$\begin{aligned} \begin{aligned}&\displaystyle ||\varSigma _{\mathcal {S}'_{t}\mathcal {D}_{it'}|\mathcal {S}}||_F\\&\quad \le \displaystyle \sqrt{3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle (\epsilon _{\mathcal {S}'_{t}} + \epsilon _{\mathcal {D}_{it'}})} +\sqrt{3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {S}'_{t}}}\\&\qquad \displaystyle +\sqrt{3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {D}_{it'}}}\\&\qquad +\displaystyle \sqrt{3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {S}'_{t}}} \sqrt{3e^{-1}\sigma ^4_s|\mathcal {S}|^2\displaystyle \epsilon _{\mathcal {D}_{it'}}} ||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F\\&\quad =\displaystyle \sqrt{3/e}\sigma ^2_s|\mathcal {S}|\Big (\sqrt{\epsilon _{\mathcal {S}'_{t}} + \epsilon _{\mathcal {D}_{it'}}}+\sqrt{\epsilon _{\mathcal {S}'_{t}}}+\sqrt{\epsilon _{\mathcal {D}_{it'}}} \\&\qquad \displaystyle +\sigma ^2_s||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F|\mathcal {S}|\sqrt{3\epsilon _{\mathcal {S}'_{t}}\epsilon _{\mathcal {D}_{it'}}/e}\Big ). \end{aligned} \end{aligned}$$

(16)

By substituting (16) into (11),

$$\begin{aligned} \begin{aligned}&\displaystyle ||\varSigma _{\mathcal {S}'\mathcal {D}_i} - \varSigma _{\mathcal {S}'\mathcal {S}}\varSigma _{\mathcal {S}\mathcal {S}}^{-1}\varSigma _{\mathcal {S}\mathcal {D}_i}||_F\\&\quad \le \displaystyle \sqrt{3/e}\sigma ^2_s|\mathcal {S}|\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} \Big (\sqrt{\epsilon _{\mathcal {S}'_{t}} + \epsilon _{\mathcal {D}_{it'}}}+\sqrt{\epsilon _{\mathcal {S}'_{t}}}+\sqrt{\epsilon _{\mathcal {D}_{it'}}} \\&\qquad \displaystyle +\sigma ^2_s||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F|\mathcal {S}|\sqrt{3\epsilon _{\mathcal {S}'_{t}}\epsilon _{\mathcal {D}_{it'}}/e}\Big )\\&\quad \le \displaystyle \sqrt{3/e}\sigma ^2_s|\mathcal {S}| \Bigg (\sqrt{TT'\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} (\epsilon _{\mathcal {S}'_{t}} + \epsilon _{\mathcal {D}_{it'}})} \\&\qquad \displaystyle +\sqrt{TT'\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} \epsilon _{\mathcal {S}'_{t}}} + \sqrt{TT'\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} \epsilon _{\mathcal {D}_{it'}}} \\&\qquad \displaystyle +\sigma ^2_s||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F|\mathcal {S}|\sqrt{TT'(3/e)\sum ^{T'}_{t=1} \sum ^{T}_{t'=1} \epsilon _{\mathcal {S}'_{t}}\epsilon _{\mathcal {D}_{it'}}}\Bigg )\\&\quad = \displaystyle \sqrt{3/e}\sigma ^2_s|\mathcal {S}|\Bigg (\sqrt{TT'\left( T\sum ^{T'}_{t=1} \epsilon _{\mathcal {S}'_{t}} + T'\sum ^{T}_{t'=1}\epsilon _{\mathcal {D}_{it'}}\right) } \\&\qquad \displaystyle +\sqrt{T^2T'\sum ^{T'}_{t=1} \epsilon _{\mathcal {S}'_{t}}} + \sqrt{TT'^2 \sum ^{T}_{t'=1} \epsilon _{\mathcal {D}_{it'}}} \\&\qquad \displaystyle +\sigma ^2_s||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F|\mathcal {S}|\sqrt{TT'(3/e)\sum ^{T'}_{t=1} \epsilon _{\mathcal {S}'_{t}}\sum ^{T}_{t'=1} \epsilon _{\mathcal {D}_{it'}}}\Bigg )\\&\quad = \displaystyle \sqrt{3/e}\sigma ^2_s|\mathcal {S}|TT' \Big (\sqrt{\epsilon _{\mathcal {S}'} + \epsilon _{\mathcal {D}_{i}}} +\sqrt{\epsilon _{\mathcal {S}'}}+\sqrt{\epsilon _{\mathcal {D}_{i}}} \\&\qquad \displaystyle +\sigma ^2_s||\varSigma _{\mathcal {S}\mathcal {S}}^{-1}||_F|\mathcal {S}|\sqrt{3\epsilon _{\mathcal {S}'}\epsilon _{\mathcal {D}_{i}}/e}\Big ) \end{aligned} \end{aligned}$$

such that the second inequality follows from

$$\begin{aligned} \sum ^T_{t=1}\sqrt{a_t}\le \sqrt{T\sum ^T_{t=1}a_t} \end{aligned}$$

which can be obtained by applying Jensen’s inequality to the concave square root function. The last equality is due to $\epsilon _{\mathcal {S}'} = (1/T')\sum ^{T'}_{t=1}\epsilon _{\mathcal {S}'_t}$ and $\epsilon _{\mathcal {D}_{i}} = (1/T)\sum ^{T}_{t'=1}\epsilon _{\mathcal {D}_{it'}}$.

Appendix E: Hyperparameter learning

The hyperparameters of our GP-DDF-ASS and GP-$\hbox {DDF}^+$-ASS algorithms are learned by maximizing the sum of log-marginal likelihoods $\sum _{\mathcal {S}} \log p(y_\mathcal {D}|\mathcal {S})$ over the support set $\mathcal {S}$ of every different local area via gradient ascent with respect to a common set of signal variance, noise variance, and length-scale hyperparameters (Sect. 2) where, as derived in Quiñonero-Candela and Rasmussen (2005),

$$\begin{aligned} \log p(y_\mathcal {D}|\mathcal {S})= & {} -0.5 (\log |\varXi _{\mathcal {D}\mathcal {D}|\mathcal {S}}|+y^{\top }_\mathcal {D}\varXi _{\mathcal {D}\mathcal {D}|\mathcal {S}}^{-1}y_\mathcal {D} \\&+ |\mathcal {D}|\log (2\pi )) \end{aligned}$$

such that $\varXi _{\mathcal {D}\mathcal {D}|\mathcal {S}}\triangleq \varPhi _{\mathcal {D}\mathcal {D}|\mathcal {S}}+\text {blockdiag}[\varSigma _{\mathcal {D}\mathcal {D}|\mathcal {S}}]+\sigma ^2_n I$. Note that these learned hyperparameters of our GP-DDF-ASS and GP-$\hbox {DDF}^+$-ASS algorithms correspond to the case where our proposed lazy transfer learning mechanism incurs minimal information loss, as explained in Sect. 4.2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouyang, R., Low, B.K.H. Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception. Auton Robot 44, 359–376 (2020). https://doi.org/10.1007/s10514-018-09826-z

Download citation

Received: 25 April 2018
Accepted: 21 December 2018
Published: 28 January 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10514-018-09826-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception

Abstract

Access this article

Similar content being viewed by others

Distributed Decision-Theoretic Active Perception for Multi-robot Active Information Gathering

Neighborhood-Oriented Decentralized Learning Communication in Multi-Agent System

MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Gaussian predictive distribution computed by the GP-\(\hbox {DDF}^+\) algorithm

Definition 5

Appendix B: Proof of Proposition 1

Appendix C: Proof of Proposition 2

Appendix D: Proof of Theorem 1

Lemma 1

Proof

Appendix E: Hyperparameter learning

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gaussian process decentralized data fusion meets transfer learning in large-scale distributed cooperative perception

Abstract

Access this article

Similar content being viewed by others

Distributed Decision-Theoretic Active Perception for Multi-robot Active Information Gathering

Neighborhood-Oriented Decentralized Learning Communication in Multi-Agent System

MARLAS: Multi Agent Reinforcement Learning for Cooperated Adaptive Sampling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Gaussian predictive distribution computed by the GP-\(\hbox {DDF}^+\) algorithm

Definition 5

Appendix B: Proof of Proposition 1

Appendix C: Proof of Proposition 2

Appendix D: Proof of Theorem 1

Lemma 1

Proof

Appendix E: Hyperparameter learning

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation