Composite T-Process Regression Models

Wang, Zhanfeng; Lv, Yuewen; Wu, Yaohua

doi:10.1007/s40304-021-00249-4

Composite T-Process Regression Models

Published: 03 June 2022

Volume 11, pages 307–323, (2023)
Cite this article

Communications in Mathematics and Statistics Aims and scope Submit manuscript

Zhanfeng Wang¹,
Yuewen Lv¹ &
Yaohua Wu¹

217 Accesses
Explore all metrics

Abstract

Process regression models, such as Gaussian process regression model (GPR), have been widely applied to analyze kinds of functional data. This paper introduces a composite of two T-process (CT), where the first one captures the smooth global trend and the second one models local details. The CT has an advantage in the local variability compared to general T-process. Furthermore, a composite T-process regression (CTP) model is developed, based on the composite T-process. It inherits many nice properties as GPR, while it is more robust against outliers than GPR. Numerical studies including simulation and real data application show that CTP performs well in prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Estimation for the Extended t-process Regression Models with Independent Errors

Article 04 September 2019

The extended skew Gaussian process for regression

Article 03 June 2014

A Bayesian Regression Model for the Non-standardized t Distribution with Location, Scale and Degrees of Freedom Parameters

Article 06 July 2022

References

Álvarez, M., Luengo, D., Titsias, M., Lawrence, N.D.: Efficient multioutput Gaussian processes through variational inducing kernels. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 25–32 (2010)
Archambeau, C., Bach, F.: Multiple Gaussian process models. arXiv:11105238 (2010)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Article MathSciNet MATH Google Scholar
Ba, S., Joseph, V.R.: Composite Gaussian process models for emulating expensive functions. Ann. Appl. Stat. 6, 1838–1860 (2012)
Article MathSciNet MATH Google Scholar
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Dai, J., Krems, R.V.: Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by Gaussian processes with composite kernels. J. Chem. Theory Comput. 16(3), 1386–1395 (2020)
Article Google Scholar
Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models with an application to computer modeling. J. Am. Stat. Assoc. 103(483), 1119–1130 (2008)
Article MathSciNet MATH Google Scholar
Hampel, F.R.: Robust Statistics: The Approach Based on Influence Functions, vol. 196. Wiley, New York (1986)
MATH Google Scholar
Higdon, D., Swall, J., Kern, J.: Non-stationary spatial modeling. Bayesian Stat. 6(1), 761–768 (1999)
MATH Google Scholar
Jiao, J., Hengjian, C.: Parametric estimation based on robust scatter matrix in semiparametric regression model. J. Beijing Normal Univ. Nat. Sci. Ed. 42(3), 224 (2006)
MathSciNet MATH Google Scholar
Kuß, M.: Gaussian process models for robust regression, classification, and reinforcement learning. PhD thesis, Technische Universität (2006)
Liu, H., Cai, J., Ong, Y.S.: Remarks on multi-output Gaussian process regression. Knowl. Based Syst. 144, 102–121 (2018)
Article Google Scholar
Naish-Guzman, A., Holden, S.: Robust regression with twinned Gaussian processes. In: Advances in Neural Information Processing Systems, 20, 1065–1072 (2007)
Neal, R.M.: Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv preprint physics (1997)
Paciorek, C.J., Schervish, M.J.: Nonstationary covariance functions for Gaussian process regression. In: Advances in Neural Information Processing Systems, 16, 273–280 (2003)
Paciorek, C.J., Schervish, M.J.: Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5), 483–506 (2006)
Article MathSciNet Google Scholar
Qin, G., Zhu, Z., Fung, W.K.: Robust estimation of covariance parameters in partial linear model for longitudinal data. J. Stat. Plan. Inference 139(2), 558–570 (2009)
Article MathSciNet MATH Google Scholar
Ranjan, P., Haynes, R., Karsten, R.: A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 53(4), 366–378 (2011)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Sampson, P.D., Guttorp, P.: Nonparametric estimation of nonstationary spatial covariance structure. J. Am. Stat. Assoc. 87(417), 108–119 (1992)
Article Google Scholar
Seeger, M.W., Kakade, S.M., Foster, D.P.: Information consistency of nonparametric Gaussian process methods. IEEE Trans. Inf. Theory 54, 2376–2382 (2008)
Article MathSciNet MATH Google Scholar
Shah, A., Wilson, A.G., Ghahramani, Z.: Student-t processes as alternatives to Gaussian processes. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 877–885 (2014)
Shi, J.Q., Choi, T.: Gaussian Process Regression Analysis for Functional Data. Chapman and Hall/CRC, London (2011)
Book MATH Google Scholar
Vanhatalo, J., Jylänki, P., Vehtari, A.: Gaussian process regression with student-t likelihood. In: Advances in Neural Information Processing Systems, 22, 1910–1918 (2009)
Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Wang, X., Jiang, Y., Huang, M., Zhang, H.: Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 108, 632–643 (2013)
Article MathSciNet MATH Google Scholar
Wang, Z., Shi, J.Q., Lee, Y.: Extended t-process regression models. J. Stat. Plan. Inference 189, 38–60 (2017)
Article MathSciNet MATH Google Scholar
Wang, Z., Li, K., Shi, J.Q.: A robust estimation for the extended t-process regression model. Stat. Probab. Lett. 157, 108626 (2020)
Article MathSciNet MATH Google Scholar
Wauthier, F.L., Jordan, M.I.: Heavy-tailed process priors for selective shrinkage. In: Advances in Neural Information Processing Systems, 23, 2406–2414 (2010)
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. The MIT Press, Cambridge (2006)
MATH Google Scholar
Xiong, Y., Chen, W., Apley, D., Ding, X.: A non-stationary covariance-based kriging method for metamodelling in engineering design. Int. J. Numer. Methods Eng. 71(6), 733–756 (2007)
Article MATH Google Scholar
Xu, P., Lee, Y., Shi, J.Q., Eyre, J.: Automatic detection of significant areas for functional data with directional error control. Stat. Med. 38(3), 376–397 (2019)
Article MathSciNet Google Scholar
Xu, Z., Yan, F., Qi, Y.: Sparse matrix-variate t process blockmodels. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 543–548 (2011)
Yu, S., Tresp, V., Yu, K.: Robust multi-task learning with t-processes. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1103–1110 (2007)
Zang, Q., Klette, R.: Evaluation of an adaptive composite Gaussian model in video surveillance. In: International Conference on Computer Analysis of Images and Patterns, Springer, pp. 165–172 (2003)
Zhang, Y., Yeung, D.Y.: Multi-task learning using generalized t process. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 964–971 (2010)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 11971457) and Anhui Provincial Natural Science Foundation (Grant No. 1908085MA06).

Author information

Authors and Affiliations

Department of Statistics and Finance, Management School, University of Science and Technology of China, Hefei, China
Zhanfeng Wang, Yuewen Lv & Yaohua Wu

Authors

Zhanfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuewen Lv
View author publications
You can also search for this author in PubMed Google Scholar
Yaohua Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaohua Wu.

Appendix A: Robustness and Consistency

Proof of Theorem 3.1

Score function of parameter ${\varvec{\beta }}$ from the CTP model as follows,

$$\begin{aligned} s\left( {\varvec{\beta }} ; {{\varvec{y}}_{1}}, \ldots , {{\varvec{y}}_{m}}\right) =\frac{1}{2} \sum _{i=1}^{m} {\text {Tr}}\left( \left( c_{1 i} {{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}} ({{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}})^{\top }-{{\varvec{K}}_{in}^{-1}}\right) \frac{\partial {{\varvec{K}}_{in}}}{\partial {\varvec{\beta }}}\right) , \end{aligned}$$

where $c_{1 i}=(n+2\nu ) /\left( 2 (\nu -1)+{{\varvec{y}}_{i}^{\top }} {{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}}\right) $. And score function from GPR has the same shape with $c_{1i}=1$. When $y_{i j} \rightarrow \infty \text{ for } \text{ some } j$, it easily shows that the score function s from the CTP is bounded, while that from the GPR does not.

For given parameter $\nu $, following [8] estimator $\hat{{\varvec{\beta }}} \text{ of } {\varvec{\beta }}$ has the influence function

$$\begin{aligned} I F({\varvec{y}} ; \hat{{\varvec{\beta }}}, F)=-\left( E\left( \frac{\partial ^{2} l({\varvec{\beta }} ; \nu )}{\partial {\varvec{\beta }} \partial {{\varvec{\beta }}^{\top }}}\right) \right) ^{-1} s({\varvec{\beta }} ; {\varvec{y}}), \end{aligned}$$

where ${\varvec{y}}=\{{{\varvec{y}}_{1}},\ldots ,{{\varvec{y}}_{m}}\}$.

Note that the influence function is dominated by the score function $s({\varvec{\beta }} ; {\varvec{y}})$, which indicates that the influence function from CTP is bounded, while that from the GPR is unbounded. $\square $

Before presenting the Proof of Theorem 3.2, we introduce Lemma A.1 firstly.

Lemma A.1

Suppose ${{\varvec{y}}_{i}}=\left\{ y_{i 1}, \ldots , y_{i n}\right\} $ are generated from the model, with the mean function $h_{i}({\varvec{x}})=0$, and covariance kernel function $\tau _{i}^{2} g_{i}$ and $l_i$ are bounded and continuous in parameters ${{\varvec{\theta }}_{i}}$ and ${{\varvec{\alpha }}_{i}}$. Then, for a positive constant c, and any $\varepsilon >0$, when n is large enough, we have

$$\begin{aligned}&\frac{1}{n}\left( -\log p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) +\log p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) \right) \le \frac{1}{n}\left\{ \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| \right. \\&\left. \quad +\frac{q_{i}^{2}+2(\nu -1)}{2(n+2 \nu -2)}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c\right\} +\varepsilon , \end{aligned}$$

where ${{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma _{i}^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau _{i}^{2} {{\varvec{G}}_{in}}$, $q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top } \left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}$, and $\left\| f_{0 i}\right\| _{k}$ is the reproducing kernel Hilbert space norm of $f_{0 i}$, $\hat{{\varvec{\eta }}}_{i}=(\hat{{\varvec{\theta }}}_{i},\hat{{\varvec{\alpha }}}_{i})$.

Proof

Let

$$\begin{aligned} p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right)&=\int _{{\mathcal {F}}} p_{\phi _{0}}\left( {{\varvec{y}}_{i}} |{\tilde{f}}, r_{i}, {{\varvec{X}}_{i}}\right) \mathrm{d} {\tilde{p}}_{\eta _{i}}({\tilde{f}}), \\ p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right)&=p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, r_{i}, {{\varvec{X}}_{i}}\right) , \end{aligned}$$

where ${\tilde{f}}_{i}=f_{i,\text {global}} | r_{i} \sim G P\left( 0, r_{i} \tau _{i}^{2}g_{i}\right) $.

Since $r_{i}$ is independent of covariates ${{\varvec{X}}_{i}}$, we have

$$\begin{aligned} p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right)&=\int p_{G}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r ,\\ p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right)&=\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r. \end{aligned}$$

The proof of Lemma A.1 is similar to Theorem 1 in [21], and we will present more details in situation of our model. According to the Fenchel–Legendre duality relationship , we have

$$\begin{aligned} -\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \le E_{Q}\left( -\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \right) +D[Q, P], \end{aligned}$$

(5.1)

where P is the zero-mean GP prior by $G P\left( 0, r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in}\right) $, and Q is the posterior distribution of ${\tilde{f}}_{i}$ from a GP model with prior $G P\left( 0, r_{i}\tau _{i}^{2}{{\varvec{G}}_{in}}\right) $. More details can be found in [5, 19].

For given $r_{i}$, $f_{0i}$ can be written as $f_{0 i} \doteq r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in} {\varvec{\gamma }}$ in the reproducing kernel Hilbert space (RKHS) in [3, 25], where ${\varvec{\gamma }}=\left( \gamma _{1}, \ldots , \gamma _{n}\right) ^{\top }$.

For obtaining the distribution of Q, consider

$$\begin{aligned} \left( \begin{array}{l} f_{i,\text{ global }} \\ {{\varvec{y}}_{i}} \end{array}\right) | {{\varvec{X}}_{i}} \sim N\left( \left( \begin{array}{l} 0 \\ 0 \end{array}\right) , \left( \begin{array}{ll} r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}&{} r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}} \\ r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}} &{} r_{i} (\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} ) \end{array}\right) \right) . \end{aligned}$$

So, we have $f_{i,\text{ global }} | {\mathcal {D}}_{n}\sim N ({{\varvec{\mu }}_{i n}}, {{\varvec{C}}_{i n}})$, with

$$\begin{aligned} {{\varvec{\mu }}_{i n}}&=\tau _{i}^{2} {{\varvec{G}}_{in}}(\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1} {{\varvec{y}}_{i}},\\ {{\varvec{C}}_{i n}}&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}-\tau ^{2}_{i} {{\varvec{G}}_{in}}(\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1}(r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}})\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}-\phi _{0}^{-1}(\phi _{0}^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+{{\varvec{I}}_{n}})^{-1}(\tau ^{2}_{i} {{\varvec{G}}_{in}}))\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}})^{-1}. \end{aligned}$$

We let ${{\varvec{H}}_{i n}}=\tau ^{2}_{i} {{\varvec{G}}_{in}}$, $\hat{{\varvec{H}}}_{i n}=\tau ^{2}_{i} \hat{{\varvec{G}}}_{in}$ and ${{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}$. Hence, the mean of Q is ${\varvec{m}}={{\varvec{H}}_{in}}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}} {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) ^{-1}\hat{{\varvec{y}}}_{i}=r_{i}{{\varvec{H}}_{i n}}{\varvec{\gamma }}$, and the covariance of Q is $M=r_{i} {{\varvec{H}}_{i n}}{{\varvec{B}}_{i}^{-1}}$, where $\hat{{\varvec{y}}}_{i}=r_{i}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}}\right. \left. {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) {\varvec{\gamma }}$.

From the definition of the Kullback–Leibler divergence, we have

$$\begin{aligned} \begin{aligned}&D[Q, P]\\&\quad =\int \left( \log p_{Q}-\log p_{P}\right) \mathrm{d} p_{Q} \\&\quad =\int \log \frac{\left| {{\varvec{H}}_{in}} {{\varvec{B}}_{i}^{-1}}\right| ^{-\frac{1}{2}}}{\left| \hat{{\varvec{H}}}_{in}\right| ^{-\frac{1}{2}}}-\frac{1}{2}({\varvec{x}}-{\varvec{m}})^{\top }r_{i}^{-1}{{\varvec{B}}_{i}}{{\varvec{H}}_{i n}^{-1}}({\varvec{x}}-{\varvec{m}})+\frac{1}{2}{{\varvec{x}}^{\top }}r_{i}^{-1}\hat{{\varvec{H}}}_{in}^{-1}{\varvec{x}}\mathrm{d} p_{Q}\\&\quad =\frac{1}{2}\left\{ -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{in}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( r_{i}^{-1}\hat{{\varvec{H}}}_{in}^{-1} r_{i}{{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) \right. \\&\left. \qquad +{{\varvec{m}}^{\top }}r_{i}^{-1}{{\varvec{B}}_{i}}{{\varvec{H}}_{in}^{-1}}{\varvec{m}}-n\right\} \\&\quad =\frac{1}{2}\left\{ -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {\varvec{\gamma }}-n\right\} \\&\quad =\frac{1}{2}\{-\log \left| \hat{{\varvec{H}}}_{in}^{-1} {\varvec{K}}_{i n}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) +r_{i}\left\| f_{0 i}\right\| _{k}^{2}\\&\qquad +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {\varvec{\gamma }}-n\}. \end{aligned} \end{aligned}$$

By expanding $-\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) $ to second order, we have

$$\begin{aligned}&E_{Q}\left( -\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \right) \\&\quad \le -\log p\left( {{\varvec{y}}_{i}} | f_{0 i}, r_{i}\right) +\frac{1}{2} {\text {Tr}}\left( (\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1}{{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) \\&\quad =-\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\frac{1}{2} {\text {Tr}}\left( ({{\varvec{B}}_{i}}-{{\varvec{I}}_{n}}){{\varvec{B}}_{i}^{-1}}\right) \\&\quad =-\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) -\frac{1}{2} {\text {Tr}}\left( {{\varvec{B}}_{i}^{-1}}\right) +\frac{n}{2}. \end{aligned}$$

It follows that

$$\begin{aligned} \begin{aligned}&-\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \\&\quad \le \frac{1}{2}\{-\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {{\varvec{B}}_{i}^{-1}}\right) +r_{i}\left\| f_{0 i}\right\| _{k}^{2}\\&\qquad +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}(\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}) {\varvec{\gamma }}\}. \end{aligned} \end{aligned}$$

(5.2)

Since the covariance function ${{\varvec{H}}_{i n}}$ is bounded and continuous in ${{\varvec{\eta }}_{i}}$, because of $\hat{{\varvec{\eta }}}_{i} \rightarrow {{\varvec{\eta }}_{i}}$, so $\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}} \rightarrow 0 \text{ as } n \rightarrow \infty $. Hence, for enough large n, we have positive constants c and $\varepsilon $ such that

$$\begin{aligned} \begin{array}{l} -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right|<c, \quad {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{in}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {\varvec{\gamma }}<c , \\ \quad {\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) <{\text {Tr}}\left( \left( {{\varvec{I}}_{n}}+\varepsilon {{\varvec{H}}_{in}}\right) {{\varvec{B}}_{i}^{-1}}\right) . \end{array} \end{aligned}$$

(5.3)

Plugging (5.3) into (5.2), we have

$$\begin{aligned} -\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +\frac{r_{i}}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c+n \varepsilon . \end{aligned}$$

Then, we have

$$\begin{aligned}&-\log \int p_{G}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c+n \varepsilon -\log \int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g(r) \mathrm{d} r. \end{aligned}$$

From Lemma 2 of [27], we can get an equality as follows,

$$\begin{aligned}&\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g(r) \mathrm{d} r\\&\quad =\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r \int \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g^{*}(r) \mathrm{d} r, \end{aligned}$$

where $g^{*}(r)$ is the density function of $I G\left( \nu +n / 2,(\nu -1)+q_{i}^{2} / 2\right) $.

Therefore, we have

$$\begin{aligned}&-\log p_{\phi _{0}, {\hat{\eta }}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) +\log p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c-\log \int \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g^{*}(r) \mathrm{d} r \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c+\frac{\left\| f_{0 i}\right\| _{k}^{2}+c}{2} \int r g^{*}(r) \mathrm{d} r \\&\quad =\frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +\frac{q_{i}^{2}+2(\nu -1)}{2(n+2 \nu -2)}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c+n \varepsilon , \end{aligned}$$

which shows that this lemma holds. $\square $

To prove Theorem 3.2, we need to add a condition

(A) $\left\| f_{0 i}\right\| _{k}$ is bounded and $E_{X_{i}}\left( \log \left| {{\varvec{B}}_{i}}\right| \right) =o(n)$.

Proof of Theorem 3.2

Because of $q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top }\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}=O(n)$, by using the condition (A) and Lemma A.1, we can get

$$\begin{aligned} \frac{1}{n} E_{X_{i}}\left( D\left[ p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) , p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) \right] \right) \rightarrow 0, \quad \text{ as } n \rightarrow \infty . \end{aligned}$$

Hence, Theorem 3.2 holds. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Lv, Y. & Wu, Y. Composite T-Process Regression Models. Commun. Math. Stat. 11, 307–323 (2023). https://doi.org/10.1007/s40304-021-00249-4

Download citation

Received: 27 September 2020
Revised: 02 January 2021
Accepted: 21 May 2021
Published: 03 June 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s40304-021-00249-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Composite T-Process Regression Models

Abstract

Access this article

Similar content being viewed by others

Bayesian Estimation for the Extended t-process Regression Models with Independent Errors

The extended skew Gaussian process for regression

A Bayesian Regression Model for the Non-standardized t Distribution with Location, Scale and Degrees of Freedom Parameters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A: Robustness and Consistency

Proof of Theorem 3.1

Lemma A.1

Proof

Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Composite T-Process Regression Models

Abstract

Access this article

Similar content being viewed by others

Bayesian Estimation for the Extended t-process Regression Models with Independent Errors

The extended skew Gaussian process for regression

A Bayesian Regression Model for the Non-standardized t Distribution with Location, Scale and Degrees of Freedom Parameters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix A: Robustness and Consistency

Appendix A: Robustness and Consistency

Proof of Theorem 3.1

Lemma A.1

Proof

Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation