Skip to main content
Log in

Optimal Recovery from Inaccurate Data in Hilbert Spaces: Regularize, But What of the Parameter?

  • Published:
Constructive Approximation Aims and scope

Abstract

In Optimal Recovery, the task of learning a function from observational data is tackled deterministically by adopting a worst-case perspective tied to an explicit model assumption made on the functions to be learned. Working in the framework of Hilbert spaces, this article considers a model assumption based on approximability. It also incorporates observational inaccuracies modeled via additive errors bounded in \(\ell _2\). Earlier works have demonstrated that regularization provides algorithms that are optimal in this situation, but did not fully identify the desired hyperparameter. This article fills the gap in both a local scenario and a global scenario. In the local scenario, which amounts to the determination of Chebyshev centers, the semidefinite recipe of Beck and Eldar (legitimately valid in the complex setting only) is complemented by a more direct approach, with the proviso that the observational functionals have orthonormal representers. In the said approach, the desired parameter is the solution to an equation that can be resolved via standard methods. In the global scenario, where linear algorithms rule, the parameter elusive in the works of Micchelli et al. is found as the byproduct of a semidefinite program. Additionally and quite surprisingly, in case of observational functionals with orthonormal representers, it is established that any regularization parameter is optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. It is likely that the results are still valid in the infinite-dimensional case. However, it would then be unclear how semidefinite programs such as (8) and (9) are solved numerically, so the infinite-dimensional case is not given proper scrutiny in the article.

  2. matlab and Python files illustrating the findings of this article are located at https://github.com/foucart/COR.

  3. Intuitively, the solution to the program (10) written as the minimization of \(\Vert Rf-r\Vert ^2 + (\tau /(1-\tau ))\Vert Sf-s\Vert ^2\) becomes, as \(\tau \rightarrow 1\), the minimizer of \(\Vert Rf-r\Vert ^2\) subject to \(\Vert Sf-s\Vert ^2=0\). This explains the interpretation of \(f_1\). A similar argument explains the interpretation of \(f_0\).

References

  1. Beck, A., Eldar, Y.C.: Regularization in regression with bounded noise: a Chebyshev center approach. SIAM J. Matrix Anal. Appl. 29(2), 606–625 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Data assimilation in reduced modeling. SIAM/ASA J. Uncertain. Quantif. 5(1), 1–29 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  3. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  4. Chen, Z., Haykin, S.: On different facets of regularization theory. Neural Comput. 14(12), 2791–2846 (2002)

    Article  MATH  Google Scholar 

  5. Cohen, A., Dahmen, W., Mula, O., Nichols, J.: Nonlinear reduced models for state and parameter estimation (2020). SIAM/ASA J. Uncertain. Quantif. 10(1), 227–267 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  6. DeVore, R., Petrova, G., Wojtaszczyk, P.: Data assimilation and sampling in Banach spaces. Calcolo 54(3), 963–1007 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17(83), 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  8. Ettehad, M., Foucart, S.: Instances of computational optimal recovery: dealing with observation errors. SIAM/ASA J. Uncertain. Quantif. 9(4), 1438–1456 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  9. Foucart, S.: Mathematical Pictures at a Data Science Exhibition. Cambridge University Press, Cambridge (2022)

    Book  MATH  Google Scholar 

  10. Foucart, S., Liao, C., Shahrampour, S., Wang, Y.: Learning from non-random data in Hilbert spaces: an optimal recovery perspective. Sampl. Theory Signal Process. Data Anal. 20, 1–19 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  11. Garkavi, A.L.: On the optimal net and best cross-section of a set in a normed space. Izvest. Rossiiskoi Akad. Nauk. Seriya Matemat. 26(1), 87–106 (1962)

    MathSciNet  Google Scholar 

  12. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1 (2014).http://cvxr.com/cvx

  13. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  14. Maday, Y., Patera, A.T., Penn, J.D., Yano, M.: A parameterized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics. Int. J. Numer. Methods Eng. 102(5), 933–965 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Melkman, A.A., Micchelli, C.A.: Optimal estimation of linear operators in Hilbert spaces from inaccurate data. SIAM J. Numer. Anal. 16(1), 87–105 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  16. Micchelli, C.A.: Optimal estimation of linear operators from inaccurate data: a second look. Numer. Algorithms 5(8), 375–390 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  17. Micchelli, C.A., Rivlin, T.J.:. A survey of optimal recovery. In: Optimal Estimation in Approximation Theory, pp. 1–54. Springer, Berlin (1977)

  18. Novak, E., Woźniakowski, H.: Tractability of Multivariate Problems: Linear Information. European Mathematical Society, Zürich (2008)

    Book  MATH  Google Scholar 

  19. Plaskota, L.: Noisy Information and Computational Complexity. Cambridge University Press, Cambridge (1996)

    Book  MATH  Google Scholar 

  20. Pólik, I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371–418 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Polyak, B.T.: Convexity of quadratic transformations and its use in control and optimization. J. Optim. Theory Appl. 99(3), 553–583 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  22. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Foucart.

Additional information

Communicated by Albert Cohen.

Dedicated to Ron DeVore, a constant source of enlightenment and inspiration, to celebrate his 80th birthday.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

S. F. is supported by grants from the NSF (CCF-1934904, DMS-2053172) and from the ONR (N00014-20-1-2787)

Appendix

Appendix

This additional section collects justifications for a few facts that were mentioned but not explained in the main text. These facts are: the uniqueness of a Chebyshev center for the model- and data-consistent set (see p. 1.3), the efficient computation of the solution to (7) when \(\Lambda \Lambda ^* = \mathrm {Id}_{\mathbb {R}^m}\) (see p. 2.2), the form of Newton method when solving Eq. (29) (see p. 8), and the reason why the constraint in (41) always implies the constraint in (40) (see pp. 4.1 and 4.2).

1.1 Uniqueness of the Chebyshev Center

Let \(\widehat{f_1},\widehat{f_2}\) be two Chebyshev centers, i.e., minimizers of \(\max \{ \Vert f-g\Vert : \Vert P_{\mathcal {V}^\perp } g \Vert \le \varepsilon , \Vert \Lambda g - y\Vert \le \eta \}\) and let \(\mu \) be the value of the minimum. Consider \(\overline{g} \in H\) such that \(\Vert (\widehat{f_1}+\widehat{f_2})/2 - \overline{g}\Vert = \max \{ \Vert (\widehat{f_1}+\widehat{f_2})/2 - g\Vert : \Vert P_{\mathcal {V}^\perp } g \Vert \le \varepsilon , \Vert \Lambda g - y\Vert \le \eta \}\). Then

$$\begin{aligned} \mu&\le \Vert (\widehat{f_1}+\widehat{f_2})/2 - \overline{g}\Vert \le \frac{1}{2} \Vert \widehat{f_1} - \overline{g}\Vert + \frac{1}{2} \Vert \widehat{f_2} - \overline{g}\Vert \\&\le \frac{1}{2} \max \{ \Vert \widehat{f_1}-g\Vert : \Vert P_{\mathcal {V}^\perp } g \Vert \le \varepsilon , \Vert \Lambda g - y\Vert \le \eta \}\\&\quad + \frac{1}{2} \max \{ \Vert \widehat{f_2}-g\Vert : \Vert P_{\mathcal {V}^\perp } g \Vert \le \varepsilon , \Vert \Lambda g - y\Vert \le \eta \}\\&= \frac{1}{2} \mu + \frac{1}{2} \mu = \mu . \end{aligned}$$

Thus, equality must hold all the way through. This implies that \(\widehat{f_1} - \overline{g} = \widehat{f_2} - \overline{g}\), i.e., that \(\widehat{f_1} = \widehat{f_2}\), as expected.

1.2 Computation of the Regularized Solution

Let \((v_1,\ldots ,v_n)\) be a basis for \(\mathcal {V}\) and let \(u_1,\ldots ,u_m\) denote the Riesz representers of the observation functionals \(\lambda _1,\ldots ,\lambda _m\), which form an orthonormal basis for \(\mathrm{{im}}(\Lambda ^*)\) under the assumption that \(\Lambda \Lambda ^* = \mathrm {Id}_{\mathbb {R}^m}\). With \(C \in \mathbb {R}^{m \times n}\) representing the cross-Gramian with entries \(\langle u_i,v_j \rangle = \lambda _i(v_j)\), the solution to the regularization program (7) is given, even when H is infinite dimensional, by

$$\begin{aligned} f_\tau = \tau \sum _{i=1}^m a_i u_i + \sum _{j=1}^n b_j v_j, \end{aligned}$$

where the coefficient vectors \(a \in \mathbb {R}^m\) and \(b \in \mathbb {R}^n\) are computed according to

$$\begin{aligned} b = \big ( C^\top C \big )^{-1} C^\top y \qquad \text{ and } \qquad a = y-Cb. \end{aligned}$$

This is fairly easy to see for \(\tau =0\) and it has been established in Foucart, Liao, Shahrampour, and Wang [10, Theorem 2] for \(\tau =1\), so the general result follows from Proposition 3. Alternatively, it can be obtained by replicating the steps from the proof of the case \(\tau = 1\) with minor changes.

1.3 Newton Method

Equation (29) takes the form \(F(\tau ) = 0\), where

$$\begin{aligned} F(\tau ) = \lambda _{\min }((1-\tau ) R + \tau S) - \frac{(1-\tau )^2 \varepsilon ^2 - \tau ^2 \eta ^2}{(1-\tau ) \varepsilon ^2 - \tau \eta ^2 + (1-\tau )\tau (1-2\tau ) \delta ^2}. \end{aligned}$$

Newton method produces a sequence \((\tau _k)_{k \ge 0}\) converging to a solution using the recursion

$$\begin{aligned} \tau _{k+1} = \tau _k - \frac{F(\tau _k)}{F'(\tau _k)}, \qquad k \ge 0. \end{aligned}$$
(50)

In order to apply this method, we need the ability to compute the derivative of F with respect to \(\tau \). Setting \(\lambda _{\min } = \lambda _{\min }((1-\tau )R+\tau S)\), this essentially reduces to the computation of \(d \lambda _{\min }/d\tau \), which is performed via the argument below. Note that the argument is not rigorous, as we take for granted the differentiability of the eigenvalue \(\lambda _{\min }\) and of a normalized eigenvector h associated with it. However, nothing prevents us from applying the scheme (50) using the expression for \(d \lambda _{\min }/d\tau \) given in (51) below and agree that a solution has been found if the output \(\tau _K\) satisfies \(F(\tau _K) < \iota \) for some prescribed tolerance \(\iota >0\). Now, the argument starts from the identities

$$\begin{aligned} ((1-\tau )R+\tau S) h = \lambda _{\min } h \qquad \text{ and } \qquad \langle h,h \rangle =1, \end{aligned}$$

which we differentiate to obtain

$$\begin{aligned} (S-R)h + ((1-\tau )R+\tau S) \frac{\mathrm{{d}}h}{\mathrm{{d}}\tau } = \frac{\mathrm{{d}} \lambda _{\min }}{\mathrm{{d}} \tau } h + \lambda _{\min } \frac{\mathrm{{d}}h}{\mathrm{{d}} \tau } \qquad \text{ and } \qquad 2 \Big \langle h, \frac{\mathrm{{d}}h}{\mathrm{{d}}\tau } \Big \rangle = 0. \end{aligned}$$

By taking the inner product with h in the first identity and using the second identity, we derive

$$\begin{aligned} \langle (S-R)h , h \rangle = \frac{\mathrm{{d}} \lambda _{\min }}{\mathrm{{d}}\tau }, \qquad \text{ i.e., } \qquad \frac{\mathrm{{d}} \lambda _{\min }}{\mathrm{{d}}\tau } = \Vert S h\Vert ^2 - \Vert Rh\Vert ^2. \end{aligned}$$

According to Lemma 9, this expression can be transformed, after some work, into

$$\begin{aligned} \frac{\mathrm{{d}} \lambda _{\min }}{\mathrm{{d}}\tau } = \frac{1-2\tau }{\tau (1-\tau )} \, \frac{\lambda _{\min }(1-\lambda _{\min })}{1-2\lambda _{\min }}. \end{aligned}$$
(51)

1.4 Relation Between Semidefinite Constraints

Suppose that the constraint in (41) holds for a regularization map \(\Delta _\tau \). In view of the expressions

$$\begin{aligned}&\Delta _\tau = \big ( (1-\tau ) P_{\mathcal {V}^\perp } + \tau \Lambda ^* \Lambda \big )^{-1} (\tau \Lambda ^*) \quad \text{ and } \\&\mathrm {Id}- \Delta _\tau \Lambda = \big ( (1-\tau ) P_{\mathcal {V}^\perp } + \tau \Lambda ^* \Lambda \big )^{-1} ((1-\tau ) P_{\mathcal {V}^\perp }), \end{aligned}$$

this constraint also reads

$$\begin{aligned} \begin{bmatrix} c P_{\mathcal {V}^\perp } &{} | &{} 0\\ \hline 0 &{} | &{} d \, \mathrm {Id}_{\mathbb {R}^m} \end{bmatrix} \succeq \begin{bmatrix} (1-\tau ) P_{\mathcal {V}^\perp }\\ \hline \tau \Lambda \end{bmatrix} \big ( (1-\tau ) P_{\mathcal {V}^\perp } + \tau \Lambda ^* \Lambda \big )^{-2} \begin{bmatrix} (1-\tau ) P_{\mathcal {V}^\perp } \; | \; \tau \Lambda ^* \end{bmatrix}. \end{aligned}$$

Multiplying on the left by \(\begin{bmatrix} P_{\mathcal {V}^\perp } \; | \; \Lambda ^* \end{bmatrix}\) and on the right by \(\begin{bmatrix} P_{\mathcal {V}^\perp } \\ \hline \Lambda \end{bmatrix}\) yields

$$\begin{aligned}&c P_{\mathcal {V}^\perp } + d \Lambda ^* \Lambda \\&\quad \succeq ( (1-\tau ) P_{\mathcal {V}^\perp } + \tau \Lambda ^* \Lambda ) \big ( (1-\tau ) P_{\mathcal {V}^\perp }+ \tau \Lambda ^* \Lambda \big )^{-2} ( (1-\tau ) P_{\mathcal {V}^\perp } + \tau \Lambda ^* \Lambda ) = \mathrm {Id}. \end{aligned}$$

This is the constraint in (40).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Foucart, S., Liao, C. Optimal Recovery from Inaccurate Data in Hilbert Spaces: Regularize, But What of the Parameter?. Constr Approx 57, 489–520 (2023). https://doi.org/10.1007/s00365-022-09590-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00365-022-09590-5

Keywords

Mathematics Subject Classification

Navigation