Skip to main content
Log in

Composite T-Process Regression Models

  • Published:
Communications in Mathematics and Statistics Aims and scope Submit manuscript

Abstract

Process regression models, such as Gaussian process regression model (GPR), have been widely applied to analyze kinds of functional data. This paper introduces a composite of two T-process (CT), where the first one captures the smooth global trend and the second one models local details. The CT has an advantage in the local variability compared to general T-process. Furthermore, a composite T-process regression (CTP) model is developed, based on the composite T-process. It inherits many nice properties as GPR, while it is more robust against outliers than GPR. Numerical studies including simulation and real data application show that CTP performs well in prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Álvarez, M., Luengo, D., Titsias, M., Lawrence, N.D.: Efficient multioutput Gaussian processes through variational inducing kernels. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 25–32 (2010)

  2. Archambeau, C., Bach, F.: Multiple Gaussian process models. arXiv:11105238 (2010)

  3. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ba, S., Joseph, V.R.: Composite Gaussian process models for emulating expensive functions. Ann. Appl. Stat. 6, 1838–1860 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  5. Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  6. Dai, J., Krems, R.V.: Interpolation and extrapolation of global potential energy surfaces for polyatomic systems by Gaussian processes with composite kernels. J. Chem. Theory Comput. 16(3), 1386–1395 (2020)

    Article  Google Scholar 

  7. Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models with an application to computer modeling. J. Am. Stat. Assoc. 103(483), 1119–1130 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hampel, F.R.: Robust Statistics: The Approach Based on Influence Functions, vol. 196. Wiley, New York (1986)

    MATH  Google Scholar 

  9. Higdon, D., Swall, J., Kern, J.: Non-stationary spatial modeling. Bayesian Stat. 6(1), 761–768 (1999)

    MATH  Google Scholar 

  10. Jiao, J., Hengjian, C.: Parametric estimation based on robust scatter matrix in semiparametric regression model. J. Beijing Normal Univ. Nat. Sci. Ed. 42(3), 224 (2006)

    MathSciNet  MATH  Google Scholar 

  11. Kuß, M.: Gaussian process models for robust regression, classification, and reinforcement learning. PhD thesis, Technische Universität (2006)

  12. Liu, H., Cai, J., Ong, Y.S.: Remarks on multi-output Gaussian process regression. Knowl. Based Syst. 144, 102–121 (2018)

    Article  Google Scholar 

  13. Naish-Guzman, A., Holden, S.: Robust regression with twinned Gaussian processes. In: Advances in Neural Information Processing Systems, 20, 1065–1072 (2007)

  14. Neal, R.M.: Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv preprint physics (1997)

  15. Paciorek, C.J., Schervish, M.J.: Nonstationary covariance functions for Gaussian process regression. In: Advances in Neural Information Processing Systems, 16, 273–280 (2003)

  16. Paciorek, C.J., Schervish, M.J.: Spatial modelling using a new class of nonstationary covariance functions. Environmetrics 17(5), 483–506 (2006)

    Article  MathSciNet  Google Scholar 

  17. Qin, G., Zhu, Z., Fung, W.K.: Robust estimation of covariance parameters in partial linear model for longitudinal data. J. Stat. Plan. Inference 139(2), 558–570 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ranjan, P., Haynes, R., Karsten, R.: A computationally stable approach to Gaussian process interpolation of deterministic computer simulation data. Technometrics 53(4), 366–378 (2011)

    Article  MathSciNet  Google Scholar 

  19. Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  20. Sampson, P.D., Guttorp, P.: Nonparametric estimation of nonstationary spatial covariance structure. J. Am. Stat. Assoc. 87(417), 108–119 (1992)

    Article  Google Scholar 

  21. Seeger, M.W., Kakade, S.M., Foster, D.P.: Information consistency of nonparametric Gaussian process methods. IEEE Trans. Inf. Theory 54, 2376–2382 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  22. Shah, A., Wilson, A.G., Ghahramani, Z.: Student-t processes as alternatives to Gaussian processes. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 877–885 (2014)

  23. Shi, J.Q., Choi, T.: Gaussian Process Regression Analysis for Functional Data. Chapman and Hall/CRC, London (2011)

    Book  MATH  Google Scholar 

  24. Vanhatalo, J., Jylänki, P., Vehtari, A.: Gaussian process regression with student-t likelihood. In: Advances in Neural Information Processing Systems, 22, 1910–1918 (2009)

  25. Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990)

    Book  MATH  Google Scholar 

  26. Wang, X., Jiang, Y., Huang, M., Zhang, H.: Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 108, 632–643 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Wang, Z., Shi, J.Q., Lee, Y.: Extended t-process regression models. J. Stat. Plan. Inference 189, 38–60 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  28. Wang, Z., Li, K., Shi, J.Q.: A robust estimation for the extended t-process regression model. Stat. Probab. Lett. 157, 108626 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  29. Wauthier, F.L., Jordan, M.I.: Heavy-tailed process priors for selective shrinkage. In: Advances in Neural Information Processing Systems, 23, 2406–2414 (2010)

  30. Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. The MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  31. Xiong, Y., Chen, W., Apley, D., Ding, X.: A non-stationary covariance-based kriging method for metamodelling in engineering design. Int. J. Numer. Methods Eng. 71(6), 733–756 (2007)

    Article  MATH  Google Scholar 

  32. Xu, P., Lee, Y., Shi, J.Q., Eyre, J.: Automatic detection of significant areas for functional data with directional error control. Stat. Med. 38(3), 376–397 (2019)

    Article  MathSciNet  Google Scholar 

  33. Xu, Z., Yan, F., Qi, Y.: Sparse matrix-variate t process blockmodels. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp. 543–548 (2011)

  34. Yu, S., Tresp, V., Yu, K.: Robust multi-task learning with t-processes. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1103–1110 (2007)

  35. Zang, Q., Klette, R.: Evaluation of an adaptive composite Gaussian model in video surveillance. In: International Conference on Computer Analysis of Images and Patterns, Springer, pp. 165–172 (2003)

  36. Zhang, Y., Yeung, D.Y.: Multi-task learning using generalized t process. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 964–971 (2010)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 11971457) and Anhui Provincial Natural Science Foundation (Grant No. 1908085MA06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaohua Wu.

Appendix A: Robustness and Consistency

Appendix A: Robustness and Consistency

Proof of Theorem 3.1

Score function of parameter \({\varvec{\beta }}\) from the CTP model as follows,

$$\begin{aligned} s\left( {\varvec{\beta }} ; {{\varvec{y}}_{1}}, \ldots , {{\varvec{y}}_{m}}\right) =\frac{1}{2} \sum _{i=1}^{m} {\text {Tr}}\left( \left( c_{1 i} {{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}} ({{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}})^{\top }-{{\varvec{K}}_{in}^{-1}}\right) \frac{\partial {{\varvec{K}}_{in}}}{\partial {\varvec{\beta }}}\right) , \end{aligned}$$

where \(c_{1 i}=(n+2\nu ) /\left( 2 (\nu -1)+{{\varvec{y}}_{i}^{\top }} {{\varvec{K}}_{in}^{-1}} {{\varvec{y}}_{i}}\right) \). And score function from GPR has the same shape with \(c_{1i}=1\). When \(y_{i j} \rightarrow \infty \text{ for } \text{ some } j\), it easily shows that the score function s from the CTP is bounded, while that from the GPR does not.

For given parameter \(\nu \), following [8] estimator \(\hat{{\varvec{\beta }}} \text{ of } {\varvec{\beta }}\) has the influence function

$$\begin{aligned} I F({\varvec{y}} ; \hat{{\varvec{\beta }}}, F)=-\left( E\left( \frac{\partial ^{2} l({\varvec{\beta }} ; \nu )}{\partial {\varvec{\beta }} \partial {{\varvec{\beta }}^{\top }}}\right) \right) ^{-1} s({\varvec{\beta }} ; {\varvec{y}}), \end{aligned}$$

where \({\varvec{y}}=\{{{\varvec{y}}_{1}},\ldots ,{{\varvec{y}}_{m}}\}\).

Note that the influence function is dominated by the score function \(s({\varvec{\beta }} ; {\varvec{y}})\), which indicates that the influence function from CTP is bounded, while that from the GPR is unbounded. \(\square \)

Before presenting the Proof of Theorem 3.2, we introduce Lemma A.1 firstly.

Lemma A.1

Suppose \({{\varvec{y}}_{i}}=\left\{ y_{i 1}, \ldots , y_{i n}\right\} \) are generated from the model, with the mean function \(h_{i}({\varvec{x}})=0\), and covariance kernel function \(\tau _{i}^{2} g_{i}\) and \(l_i\) are bounded and continuous in parameters \({{\varvec{\theta }}_{i}}\) and \({{\varvec{\alpha }}_{i}}\). Then, for a positive constant c, and any \(\varepsilon >0\), when n is large enough, we have

$$\begin{aligned}&\frac{1}{n}\left( -\log p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) +\log p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) \right) \le \frac{1}{n}\left\{ \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| \right. \\&\left. \quad +\frac{q_{i}^{2}+2(\nu -1)}{2(n+2 \nu -2)}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c\right\} +\varepsilon , \end{aligned}$$

where \({{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma _{i}^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau _{i}^{2} {{\varvec{G}}_{in}}\), \(q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top } \left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}\), and \(\left\| f_{0 i}\right\| _{k}\) is the reproducing kernel Hilbert space norm of \(f_{0 i}\), \(\hat{{\varvec{\eta }}}_{i}=(\hat{{\varvec{\theta }}}_{i},\hat{{\varvec{\alpha }}}_{i})\).

Proof

Let

$$\begin{aligned} p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right)&=\int _{{\mathcal {F}}} p_{\phi _{0}}\left( {{\varvec{y}}_{i}} |{\tilde{f}}, r_{i}, {{\varvec{X}}_{i}}\right) \mathrm{d} {\tilde{p}}_{\eta _{i}}({\tilde{f}}), \\ p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right)&=p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, r_{i}, {{\varvec{X}}_{i}}\right) , \end{aligned}$$

where \({\tilde{f}}_{i}=f_{i,\text {global}} | r_{i} \sim G P\left( 0, r_{i} \tau _{i}^{2}g_{i}\right) \).

Since \(r_{i}\) is independent of covariates \({{\varvec{X}}_{i}}\), we have

$$\begin{aligned} p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right)&=\int p_{G}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r ,\\ p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right)&=\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r. \end{aligned}$$

The proof of Lemma A.1 is similar to Theorem 1 in [21], and we will present more details in situation of our model. According to the Fenchel–Legendre duality relationship , we have

$$\begin{aligned} -\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \le E_{Q}\left( -\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \right) +D[Q, P], \end{aligned}$$
(5.1)

where P is the zero-mean GP prior by \(G P\left( 0, r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in}\right) \), and Q is the posterior distribution of \({\tilde{f}}_{i}\) from a GP model with prior \(G P\left( 0, r_{i}\tau _{i}^{2}{{\varvec{G}}_{in}}\right) \). More details can be found in [5, 19].

For given \(r_{i}\), \(f_{0i}\) can be written as \(f_{0 i} \doteq r_{i} \tau _{i}^{2}\hat{{\varvec{G}}}_{in} {\varvec{\gamma }}\) in the reproducing kernel Hilbert space (RKHS) in [3, 25], where \({\varvec{\gamma }}=\left( \gamma _{1}, \ldots , \gamma _{n}\right) ^{\top }\).

For obtaining the distribution of Q, consider

$$\begin{aligned} \left( \begin{array}{l} f_{i,\text{ global }} \\ {{\varvec{y}}_{i}} \end{array}\right) | {{\varvec{X}}_{i}} \sim N\left( \left( \begin{array}{l} 0 \\ 0 \end{array}\right) , \left( \begin{array}{ll} r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}&{} r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}} \\ r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}} &{} r_{i} (\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} ) \end{array}\right) \right) . \end{aligned}$$

So, we have \(f_{i,\text{ global }} | {\mathcal {D}}_{n}\sim N ({{\varvec{\mu }}_{i n}}, {{\varvec{C}}_{i n}})\), with

$$\begin{aligned} {{\varvec{\mu }}_{i n}}&=\tau _{i}^{2} {{\varvec{G}}_{in}}(\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1} {{\varvec{y}}_{i}},\\ {{\varvec{C}}_{i n}}&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}-\tau ^{2}_{i} {{\varvec{G}}_{in}}(\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1}(r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}})\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}-\phi _{0}^{-1}(\phi _{0}^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}+ \phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+{{\varvec{I}}_{n}})^{-1}(\tau ^{2}_{i} {{\varvec{G}}_{in}}))\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})\\&=r_{i} \tau ^{2}_{i} {{\varvec{G}}_{in}}({{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}})^{-1}. \end{aligned}$$

We let \({{\varvec{H}}_{i n}}=\tau ^{2}_{i} {{\varvec{G}}_{in}}\), \(\hat{{\varvec{H}}}_{i n}=\tau ^{2}_{i} \hat{{\varvec{G}}}_{in}\) and \({{\varvec{B}}_{i}}={{\varvec{I}}_{n}}+\phi _{0}^{-1}({{\varvec{I}}_{n}}+\phi _{0}^{-1}\sigma ^{2} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}})^{-1}\tau ^{2}_{i} {{\varvec{G}}_{in}}\). Hence, the mean of Q is \({\varvec{m}}={{\varvec{H}}_{in}}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}} {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) ^{-1}\hat{{\varvec{y}}}_{i}=r_{i}{{\varvec{H}}_{i n}}{\varvec{\gamma }}\), and the covariance of Q is \(M=r_{i} {{\varvec{H}}_{i n}}{{\varvec{B}}_{i}^{-1}}\), where \(\hat{{\varvec{y}}}_{i}=r_{i}\left( {{\varvec{H}}_{i n}}+\sigma ^{2}_{i} {{\varvec{\varSigma }}_{i n}^{1 / 2}}\right. \left. {{\varvec{L}}_{i n}} {{\varvec{\varSigma }}_{i n}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}}\right) {\varvec{\gamma }}\).

From the definition of the Kullback–Leibler divergence, we have

$$\begin{aligned} \begin{aligned}&D[Q, P]\\&\quad =\int \left( \log p_{Q}-\log p_{P}\right) \mathrm{d} p_{Q} \\&\quad =\int \log \frac{\left| {{\varvec{H}}_{in}} {{\varvec{B}}_{i}^{-1}}\right| ^{-\frac{1}{2}}}{\left| \hat{{\varvec{H}}}_{in}\right| ^{-\frac{1}{2}}}-\frac{1}{2}({\varvec{x}}-{\varvec{m}})^{\top }r_{i}^{-1}{{\varvec{B}}_{i}}{{\varvec{H}}_{i n}^{-1}}({\varvec{x}}-{\varvec{m}})+\frac{1}{2}{{\varvec{x}}^{\top }}r_{i}^{-1}\hat{{\varvec{H}}}_{in}^{-1}{\varvec{x}}\mathrm{d} p_{Q}\\&\quad =\frac{1}{2}\left\{ -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{in}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( r_{i}^{-1}\hat{{\varvec{H}}}_{in}^{-1} r_{i}{{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) \right. \\&\left. \qquad +{{\varvec{m}}^{\top }}r_{i}^{-1}{{\varvec{B}}_{i}}{{\varvec{H}}_{in}^{-1}}{\varvec{m}}-n\right\} \\&\quad =\frac{1}{2}\left\{ -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {\varvec{\gamma }}-n\right\} \\&\quad =\frac{1}{2}\{-\log \left| \hat{{\varvec{H}}}_{in}^{-1} {\varvec{K}}_{i n}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) +r_{i}\left\| f_{0 i}\right\| _{k}^{2}\\&\qquad +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {\varvec{\gamma }}-n\}. \end{aligned} \end{aligned}$$

By expanding \(-\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \) to second order, we have

$$\begin{aligned}&E_{Q}\left( -\log p\left( {{\varvec{y}}_{i}} | {\tilde{f}}_{i}, r_{i}\right) \right) \\&\quad \le -\log p\left( {{\varvec{y}}_{i}} | f_{0 i}, r_{i}\right) +\frac{1}{2} {\text {Tr}}\left( (\sigma ^{2}_{i} {{\varvec{\varSigma }}_{in}^{1 / 2}} {{\varvec{L}}_{in}} {{\varvec{\varSigma }}_{in}^{1 / 2}}+\phi _{0} {{\varvec{I}}_{n}} )^{-1}{{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) \\&\quad =-\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\frac{1}{2} {\text {Tr}}\left( ({{\varvec{B}}_{i}}-{{\varvec{I}}_{n}}){{\varvec{B}}_{i}^{-1}}\right) \\&\quad =-\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) -\frac{1}{2} {\text {Tr}}\left( {{\varvec{B}}_{i}^{-1}}\right) +\frac{n}{2}. \end{aligned}$$

It follows that

$$\begin{aligned} \begin{aligned}&-\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \\&\quad \le \frac{1}{2}\{-\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right| +\log |{{\varvec{B}}_{i}}|+{\text {Tr}}\left( \left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {{\varvec{B}}_{i}^{-1}}\right) +r_{i}\left\| f_{0 i}\right\| _{k}^{2}\\&\qquad +r_{i} {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{i n}}(\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}) {\varvec{\gamma }}\}. \end{aligned} \end{aligned}$$
(5.2)

Since the covariance function \({{\varvec{H}}_{i n}}\) is bounded and continuous in \({{\varvec{\eta }}_{i}}\), because of \(\hat{{\varvec{\eta }}}_{i} \rightarrow {{\varvec{\eta }}_{i}}\), so \(\hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}} \rightarrow 0 \text{ as } n \rightarrow \infty \). Hence, for enough large n, we have positive constants c and \(\varepsilon \) such that

$$\begin{aligned} \begin{array}{l} -\log \left| \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}\right|<c, \quad {{\varvec{\gamma }}^{\top }} {{\varvec{H}}_{in}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}}-{{\varvec{I}}_{n}}\right) {\varvec{\gamma }}<c , \\ \quad {\text {Tr}}\left( \hat{{\varvec{H}}}_{in}^{-1} {{\varvec{H}}_{i n}} {{\varvec{B}}_{i}^{-1}}\right) <{\text {Tr}}\left( \left( {{\varvec{I}}_{n}}+\varepsilon {{\varvec{H}}_{in}}\right) {{\varvec{B}}_{i}^{-1}}\right) . \end{array} \end{aligned}$$
(5.3)

Plugging (5.3) into (5.2), we have

$$\begin{aligned} -\log p_{G}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) +\log p_{0}\left( {{\varvec{y}}_{i}} | r_{i}, {{\varvec{X}}_{i}}\right) \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +\frac{r_{i}}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c+n \varepsilon . \end{aligned}$$

Then, we have

$$\begin{aligned}&-\log \int p_{G}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c+n \varepsilon -\log \int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g(r) \mathrm{d} r. \end{aligned}$$

From Lemma 2 of [27], we can get an equality as follows,

$$\begin{aligned}&\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g(r) \mathrm{d} r\\&\quad =\int p_{0}\left( {{\varvec{y}}_{i}} | r, {{\varvec{X}}_{i}}\right) g(r) \mathrm{d} r \int \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g^{*}(r) \mathrm{d} r, \end{aligned}$$

where \(g^{*}(r)\) is the density function of \(I G\left( \nu +n / 2,(\nu -1)+q_{i}^{2} / 2\right) \).

Therefore, we have

$$\begin{aligned}&-\log p_{\phi _{0}, {\hat{\eta }}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) +\log p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c-\log \int \exp \left\{ -\left( \frac{r}{2}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) \right) \right\} g^{*}(r) \mathrm{d} r \\&\quad \le \frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +c+\frac{\left\| f_{0 i}\right\| _{k}^{2}+c}{2} \int r g^{*}(r) \mathrm{d} r \\&\quad =\frac{1}{2} \log \left| {{\varvec{B}}_{i}}\right| +\frac{q_{i}^{2}+2(\nu -1)}{2(n+2 \nu -2)}\left( \left\| f_{0 i}\right\| _{k}^{2}+c\right) +c+n \varepsilon , \end{aligned}$$

which shows that this lemma holds. \(\square \)

To prove Theorem 3.2, we need to add a condition

(A) \(\left\| f_{0 i}\right\| _{k}\) is bounded and \(E_{X_{i}}\left( \log \left| {{\varvec{B}}_{i}}\right| \right) =o(n)\).

Proof of Theorem 3.2

Because of \(q_{i}^{2}=\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) ^{\top }\left( {{\varvec{y}}_{i}}-f_{0 i}\left( {{\varvec{X}}_{i}}\right) \right) / \phi _{0}=O(n)\), by using the condition (A) and Lemma A.1, we can get

$$\begin{aligned} \frac{1}{n} E_{X_{i}}\left( D\left[ p_{\phi _{0}}\left( {{\varvec{y}}_{i}} | f_{0 i}, {{\varvec{X}}_{i}}\right) , p_{\phi _{0}, {\hat{\eta }}_{i}}\left( {{\varvec{y}}_{i}} | {{\varvec{X}}_{i}}\right) \right] \right) \rightarrow 0, \quad \text{ as } n \rightarrow \infty . \end{aligned}$$

Hence, Theorem 3.2 holds. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Lv, Y. & Wu, Y. Composite T-Process Regression Models. Commun. Math. Stat. 11, 307–323 (2023). https://doi.org/10.1007/s40304-021-00249-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40304-021-00249-4

Keywords

Mathematics Subject Classification

Navigation