Abstract
Process regression models, such as Gaussian process regression model (GPR), have been widely applied to analyze kinds of functional data. This paper introduces a composite of two T-process (CT), where the first one captures the smooth global trend and the second one models local details. The CT has an advantage in the local variability compared to general T-process. Furthermore, a composite T-process regression (CTP) model is developed, based on the composite T-process. It inherits many nice properties as GPR, while it is more robust against outliers than GPR. Numerical studies including simulation and real data application show that CTP performs well in prediction.
Similar content being viewed by others
References
Álvarez, M., Luengo, D., Titsias, M., Lawrence, N.D.: Efficient multioutput Gaussian processes through variational inducing kernels. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 25–32 (2010)
Archambeau, C., Bach, F.: Multiple Gaussian process models. arXiv:11105238 (2010)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Ba, S., Joseph, V.R.: Composite Gaussian process models for emulating expensive functions. Ann. Appl. Stat. 6, 1838–1860 (2012)
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Dai, J., Krems, R.V.: Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by Gaussian processes with composite kernels. J. Chem. Theory Comput. 16(3), 1386–1395 (2020)
Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models with an application to computer modeling. J. Am. Stat. Assoc. 103(483), 1119–1130 (2008)
Hampel, F.R.: Robust Statistics: The Approach Based on Influence Functions, vol. 196. Wiley, New York (1986)
Higdon, D., Swall, J., Kern, J.: Non-stationary spatial modeling. Bayesian Stat. 6(1), 761–768 (1999)
Jiao, J., Hengjian, C.: Parametric estimation based on robust scatter matrix in semiparametric regression model. J. Beijing Normal Univ. Nat. Sci. Ed. 42(3), 224 (2006)
Kuß, M.: Gaussian process models for robust regression, classification, and reinforcement learning. PhD thesis, Technische Universität (2006)
Liu, H., Cai, J., Ong, Y.S.: Remarks on multi-output Gaussian process regression. Knowl. Based Syst. 144, 102–121 (2018)
Naish-Guzman, A., Holden, S.: Robust regression with twinned Gaussian processes. In: Advances in Neural Information Processing Systems, 20, 1065–1072 (2007)
Neal, R.M.: Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv preprint physics (1997)
Paciorek, C.J., Schervish, M.J.: Nonstationary covariance functions for Gaussian process regression. In: Advances in Neural Information Processing Systems, 16, 273–280 (2003)
Paciorek, C.J., Schervish, M.J.: Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5), 483–506 (2006)
Qin, G., Zhu, Z., Fung, W.K.: Robust estimation of covariance parameters in partial linear model for longitudinal data. J. Stat. Plan. Inference 139(2), 558–570 (2009)
Ranjan, P., Haynes, R., Karsten, R.: A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 53(4), 366–378 (2011)
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Sampson, P.D., Guttorp, P.: Nonparametric estimation of nonstationary spatial covariance structure. J. Am. Stat. Assoc. 87(417), 108–119 (1992)
Seeger, M.W., Kakade, S.M., Foster, D.P.: Information consistency of nonparametric Gaussian process methods. IEEE Trans. Inf. Theory 54, 2376–2382 (2008)
Shah, A., Wilson, A.G., Ghahramani, Z.: Student-t processes as alternatives to Gaussian processes. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 877–885 (2014)
Shi, J.Q., Choi, T.: Gaussian Process Regression Analysis for Functional Data. Chapman and Hall/CRC, London (2011)
Vanhatalo, J., Jylänki, P., Vehtari, A.: Gaussian process regression with student-t likelihood. In: Advances in Neural Information Processing Systems, 22, 1910–1918 (2009)
Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
Wang, X., Jiang, Y., Huang, M., Zhang, H.: Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 108, 632–643 (2013)
Wang, Z., Shi, J.Q., Lee, Y.: Extended t-process regression models. J. Stat. Plan. Inference 189, 38–60 (2017)
Wang, Z., Li, K., Shi, J.Q.: A robust estimation for the extended t-process regression model. Stat. Probab. Lett. 157, 108626 (2020)
Wauthier, F.L., Jordan, M.I.: Heavy-tailed process priors for selective shrinkage. In: Advances in Neural Information Processing Systems, 23, 2406–2414 (2010)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. The MIT Press, Cambridge (2006)
Xiong, Y., Chen, W., Apley, D., Ding, X.: A non-stationary covariance-based kriging method for metamodelling in engineering design. Int. J. Numer. Methods Eng. 71(6), 733–756 (2007)
Xu, P., Lee, Y., Shi, J.Q., Eyre, J.: Automatic detection of significant areas for functional data with directional error control. Stat. Med. 38(3), 376–397 (2019)
Xu, Z., Yan, F., Qi, Y.: Sparse matrix-variate t process blockmodels. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 543–548 (2011)
Yu, S., Tresp, V., Yu, K.: Robust multi-task learning with t-processes. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1103–1110 (2007)
Zang, Q., Klette, R.: Evaluation of an adaptive composite Gaussian model in video surveillance. In: International Conference on Computer Analysis of Images and Patterns, Springer, pp. 165–172 (2003)
Zhang, Y., Yeung, D.Y.: Multi-task learning using generalized t process. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 964–971 (2010)
Acknowledgements
This work is supported by National Natural Science Foundation of China (Grant No. 11971457) and Anhui Provincial Natural Science Foundation (Grant No. 1908085MA06).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Robustness and Consistency
Appendix A: Robustness and Consistency
Proof of Theorem 3.1
Score function of parameter \({\varvec{\beta }}\) from the CTP model as follows,
where \(c_{1 i}=(n+2\nu ) /\left( 2 (\nu -1)+{{\varvec{y}}_{i}^{\top }} {{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}}\right) \). And score function from GPR has the same shape with \(c_{1i}=1\). When \(y_{i j} \rightarrow \infty \text{ for } \text{ some } j\), it easily shows that the score function s from the CTP is bounded, while that from the GPR does not.
For given parameter \(\nu \), following [8] estimator \(\hat{{\varvec{\beta }}} \text{ of } {\varvec{\beta }}\) has the influence function
where \({\varvec{y}}=\{{{\varvec{y}}_{1}},\ldots ,{{\varvec{y}}_{m}}\}\).
Note that the influence function is dominated by the score function \(s({\varvec{\beta }} ; {\varvec{y}})\), which indicates that the influence function from CTP is bounded, while that from the GPR is unbounded. \(\square \)
Before presenting the Proof of Theorem 3.2, we introduce Lemma A.1 firstly.
Lemma A.1
Suppose \({{\varvec{y}}_{i}}=\left\{ y_{i 1}, \ldots , y_{i n}\right\} \) are generated from the model, with the mean function \(h_{i}({\varvec{x}})=0\), and covariance kernel function \(\tau _{i}^{2} g_{i}\) and \(l_i\) are bounded and continuous in parameters \({{\varvec{\theta }}_{i}}\) and \({{\varvec{\alpha }}_{i}}\). Then, for a positive constant c, and any \(\varepsilon >0\), when n is large enough, we have
where \({{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma _{i}^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau _{i}^{2} {{\varvec{G}}_{in}}\), \(q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top } \left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}\), and \(\left\| f_{0 i}\right\| _{k}\) is the reproducing kernel Hilbert space norm of \(f_{0 i}\), \(\hat{{\varvec{\eta }}}_{i}=(\hat{{\varvec{\theta }}}_{i},\hat{{\varvec{\alpha }}}_{i})\).
Proof
Let
where \({\tilde{f}}_{i}=f_{i,\text {global}} | r_{i} \sim G P\left( 0, r_{i} \tau _{i}^{2}g_{i}\right) \).
Since \(r_{i}\) is independent of covariates \({{\varvec{X}}_{i}}\), we have
The proof of Lemma A.1 is similar to Theorem 1 in [21], and we will present more details in situation of our model. According to the Fenchel–Legendre duality relationship , we have
where P is the zero-mean GP prior by \(G P\left( 0, r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in}\right) \), and Q is the posterior distribution of \({\tilde{f}}_{i}\) from a GP model with prior \(G P\left( 0, r_{i}\tau _{i}^{2}{{\varvec{G}}_{in}}\right) \). More details can be found in [5, 19].
For given \(r_{i}\), \(f_{0i}\) can be written as \(f_{0 i} \doteq r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in} {\varvec{\gamma }}\) in the reproducing kernel Hilbert space (RKHS) in [3, 25], where \({\varvec{\gamma }}=\left( \gamma _{1}, \ldots , \gamma _{n}\right) ^{\top }\).
For obtaining the distribution of Q, consider
So, we have \(f_{i,\text{ global }} | {\mathcal {D}}_{n}\sim N ({{\varvec{\mu }}_{i n}}, {{\varvec{C}}_{i n}})\), with
We let \({{\varvec{H}}_{i n}}=\tau ^{2}_{i} {{\varvec{G}}_{in}}\), \(\hat{{\varvec{H}}}_{i n}=\tau ^{2}_{i} \hat{{\varvec{G}}}_{in}\) and \({{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}\). Hence, the mean of Q is \({\varvec{m}}={{\varvec{H}}_{in}}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}} {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) ^{-1}\hat{{\varvec{y}}}_{i}=r_{i}{{\varvec{H}}_{i n}}{\varvec{\gamma }}\), and the covariance of Q is \(M=r_{i} {{\varvec{H}}_{i n}}{{\varvec{B}}_{i}^{-1}}\), where \(\hat{{\varvec{y}}}_{i}=r_{i}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}}\right. \left. {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) {\varvec{\gamma }}\).
From the definition of the Kullback–Leibler divergence, we have
By expanding \(-\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \) to second order, we have
It follows that
Since the covariance function \({{\varvec{H}}_{i n}}\) is bounded and continuous in \({{\varvec{\eta }}_{i}}\), because of \(\hat{{\varvec{\eta }}}_{i} \rightarrow {{\varvec{\eta }}_{i}}\), so \(\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}} \rightarrow 0 \text{ as } n \rightarrow \infty \). Hence, for enough large n, we have positive constants c and \(\varepsilon \) such that
Plugging (5.3) into (5.2), we have
Then, we have
From Lemma 2 of [27], we can get an equality as follows,
where \(g^{*}(r)\) is the density function of \(I G\left( \nu +n / 2,(\nu -1)+q_{i}^{2} / 2\right) \).
Therefore, we have
which shows that this lemma holds. \(\square \)
To prove Theorem 3.2, we need to add a condition
(A) \(\left\| f_{0 i}\right\| _{k}\) is bounded and \(E_{X_{i}}\left( \log \left| {{\varvec{B}}_{i}}\right| \right) =o(n)\).
Proof of Theorem 3.2
Because of \(q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top }\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}=O(n)\), by using the condition (A) and Lemma A.1, we can get
Hence, Theorem 3.2 holds. \(\square \)
Rights and permissions
About this article
Cite this article
Wang, Z., Lv, Y. & Wu, Y. Composite T-Process Regression Models. Commun. Math. Stat. 11, 307–323 (2023). https://doi.org/10.1007/s40304-021-00249-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40304-021-00249-4
Keywords
- Composite Gaussian process regression
- Composite T-process regression
- Extended T-process regression
- Functional data