PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Gutierrez, Tomás; Valladão, Davi; Pagnoncelli, Bernardo K.

doi:10.1007/s10994-024-06544-9

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Published: 15 April 2024

(2024)
Cite this article

Machine Learning Aims and scope Submit manuscript

Tomás Gutierrez ORCID: orcid.org/0000-0001-5145-7615^1,3,
Davi Valladão¹ &
Bernardo K. Pagnoncelli²

157 Accesses
1 Altmetric
Explore all metrics

Abstract

PolieDRO is a novel analytics framework for classification and regression that harnesses the power and flexibility of data-driven distributionally robust optimization (DRO) to circumvent the need for regularization hyperparameters. Recent literature shows that traditional machine learning methods such as SVM and (square-root) LASSO can be written as Wasserstein-based DRO problems. Inspired by those results we propose a hyperparameter-free ambiguity set that explores the polyhedral structure of data-driven convex hulls, generating computationally tractable regression and classification methods for any convex loss function. Numerical results based on 100 real-world databases and an extensive experiment with synthetically generated data show that our methods consistently outperform their traditional counterparts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of Solution Path Algorithms for Regression and Classification Models

Article 25 March 2022

Robust linear classification from limited training data

Article 18 November 2021

A Metalearning Study for Robust Nonlinear Regression

Data availability

The so-called real-world experiments use publicly available data that can be found on https://archive.ics.uci.edu/ml/index.php. Synthetic data was generated with a Julia script available upon request.

Code availability

The current version of the code is available upon request. An open-source Julia package is under development, namely PolieDRO.jl which will allow users to apply different loss functions to regression and classification problems using the PolieDRO framework.

Notes

Note that we enforce $\underline{p}_0=\overline{p}_0$=1 to ensure that P is a probability measure.

References

Anthony, L.F.W., Kanding, B., & Selvan, R. (2020). Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv preprint arXiv:2007.03051,
Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika, 98(4), 791–806.
Article MathSciNet Google Scholar
Ben-Tal, A., El Ghaoui, L., & Nemirovski, A. (2009). Robust optimization. Princeton University Press.
Book Google Scholar
Bertsimas, D., Brown, D. B., & Caramanis, C. (2011). Theory and applications of robust optimization. SIAM Review, 53(3), 464–501.
Article MathSciNet Google Scholar
Bertsimas, D., Dunn, J., Pawlowski, C., & Zhuo, Y. D. (2019). Robust classification. INFORMS Journal on Optimization, 1(1), 2–34.
Article MathSciNet Google Scholar
Blanchet, J., Kang, Y., & Murthy, K. (2019). Robust Wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3), 830–857.
Article MathSciNet Google Scholar
Bradford Barber, C., Dobkin, D. P., & Huhdanpaa, H. (1996). The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software, 22(4), 469–483.
Article MathSciNet Google Scholar
Bykat, A. (1978). Convex hull of a finite set of points in two dimensions. Information Processing Letters, 7(6), 296–298.
Article MathSciNet Google Scholar
Casella, G., & Berger, R. L. (2001). Statistical inference (2nd ed.). Cengage Learning.
Google Scholar
Chen, R., & Paschalidis, I. C. (2018). A robust learning approach for regression models based on distributionally robust optimization. Journal of Machine Learning Research, 19(13), 1–48.
MathSciNet Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Article Google Scholar
Eddy, W. F. (1977). A new convex hull algorithm for planar sets. ACM Transactions on Mathematical Software, 3(4), 398–403.
Article Google Scholar
Esfahani, P. M., & Kuhn, D. (2018). Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171(1), 115–166.
Article MathSciNet Google Scholar
Fernandes, B., Street, A., Valladão, D., & Fernandes, C. (2016). An adaptive robust portfolio optimization model with loss constraints based on data-driven polyhedral uncertainty sets. European Journal of Operational Research, 255(3), 961–970.
Article MathSciNet Google Scholar
Gamboa, C. A., Valladão, D. M., Street, A., & Mello, T. H. (2021). Decomposition methods for Wasserstein-based data-driven distributionally robust problems. Operations Research Letters, 49(5), 696–702.
Article MathSciNet Google Scholar
Gao, R., Chen, X., & Kleywegt, A. J. (2020). Wasserstein distributionally robust optimization and variation regularization. Operations Research. https://doi.org/10.1287/opre.2022.2383
Article Google Scholar
Goh, J., & Sim, M. (2010). Distributionally robust optimization and its tractable approximations. Operations Research, 58(4–part–1), 902–917.
Article MathSciNet Google Scholar
Greenfield, J. S. (1990). A proof for a quickhull algorithm.
Green, P. J., & Silverman, B. W. (1979). Constructing the convex hull of a set of points in the plane. The Computer Journal, 22(3), 262–266.
Article Google Scholar
Hao, K. (2019). Training a single AI model can emit as much carbon as five cars in their lifetimes. MIT Technology Review, 75, 103.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer Series in StatisticsSpringer.
Book Google Scholar
Kuhn, D., Esfahani, P. M., Nguyen, V. A., & Shafieezadeh-Abadeh, S. (2019). Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics (pp. 130–166). INFORMS
Lacoste, A., Luccioni, A., Schmidt, V. & Dandres, T. (2019). Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700
Shafieezadeh Abadeh, S., Mohajerin Esfahani, P. M., & Kuhn, D. (2015). Distributionally robust logistic regression. In Proceedings of the 28th international conference on neural information processing systems, NIPS’15 (Vol. 1, pp. 1576–1584). MIT Press.
Shafieezadeh-Abadeh, S., Kuhn, D., & Esfahani, P. M. (2019). Regularization via mass transportation. Journal of Machine Learning Research, 20(103), 1–68.
MathSciNet Google Scholar
Shapiro, A., Dentcheva, D., & Ruszczynski, A. (2021). Lectures on stochastic programming: Modeling and theory. SIAM.
Book Google Scholar
Sivaprasad, P. T., Mai, F., Vogels, T., Jaggi, M., & Fleuret, F. (2020). Optimizer benchmarking needs to account for hyperparameter tuning. In International conference on machine learning (pp. 9036–9045). PMLR.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 267–288.
Article MathSciNet Google Scholar
Van Parys, B. P. G., Esfahani, P. M., & Kuhn, D. (2021). From data to decisions: Distributionally robust optimization is optimal. Management Science, 67(6), 3387–3402.
Article Google Scholar
Wiesemann, W., Kuhn, D., & Sim, M. (2014). Distributionally robust convex optimization. Operations Research, 62(6), 1358–1376.
Article MathSciNet Google Scholar
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316.
Article Google Scholar

Download references

Funding

Gutierrez research was funded by FAPERJ grant 26/200.598/2021, CAPES grant and CNPq grant 141185/2019-8. Valladão research was funded by FAPERJ grant E-26/201.287/2021 and CNPq grant 309456/2020-7.

Author information

Authors and Affiliations

Industrial Engineering Department, PUC-Rio, Rio de Janeiro, Brazil
Tomás Gutierrez & Davi Valladão
SKEMA Business School, Université Côte d’Azur, Lille, France
Bernardo K. Pagnoncelli
Lamps Co, Rio de Janeiro, Brazil
Tomás Gutierrez

Authors

Tomás Gutierrez
View author publications
You can also search for this author in PubMed Google Scholar
Davi Valladão
View author publications
You can also search for this author in PubMed Google Scholar
Bernardo K. Pagnoncelli
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Valladão devised the project and the main conceptual ideas. Gutierrez, Valladão, Pagnoncelli worked out the technical details and proofs. Valladão and Pagnoncelli conceived and planned the experiments, while Gutierrez performed the implementation and ran the algorithm for all the numerical results in the paper. Gutierrez analyzed the results and organized them in the paper, and Gutierrez, Valladão, and Pagnoncelli wrote the script.

Corresponding author

Correspondence to Tomás Gutierrez.

Ethics declarations

Conflict of interest

Not applicable.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Problem reformulation

Consider the problem formulation in (4). Naturally, this is not a solvable problem in its current form. We then write the problem’s Lagrangian formulation:

$$\begin{aligned} \begin{array}{ll} {\mathcal {L}}(P, \kappa , \lambda ) =&{} \displaystyle \int _{{\mathcal {C}}_0} h(w;\beta ) dP \\ &{}\displaystyle - \sum _{i \in {\mathcal {F}}}\ \big [ \big ( \underline{p_i} - \int _{{\mathcal {C}}_0}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}}dP\big ) \lambda _i \big ]\\ &{}\displaystyle - \sum _{i \in {\mathcal {F}}}\ \big [ \big ( \int _{{\mathcal {C}}_0}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}}dP - \overline{p_i} \big ) \kappa _i \big ]\\ \end{array} \end{aligned}$$

(22)

Reorganizing the terms, we can then write:

$$\begin{aligned} \begin{array}{ll} {\mathcal {L}}(P, \kappa , \lambda ) =&{} \displaystyle \int _{{\mathcal {C}}_0}\big [ h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i)\big ] dP \\ &{}\displaystyle - \sum _{i \in {\mathcal {F}}}(\lambda _i \underline{p_i} - \kappa _i\overline{p_i}) \end{array} \end{aligned}$$

(23)

The Lagrange dual function $g(\lambda , \kappa )$ can then be written as:

$$\begin{aligned} \begin{array}{ll} g(\lambda , \kappa ) &{}= \displaystyle \sup _{P \in {\mathcal {P}}} {\mathcal {L}}(P, \kappa , \lambda )\\ &{}= \displaystyle \sup _{P \in {\mathcal {P}}} \bigg \{ \displaystyle \int _{{\mathcal {C}}_0}\big [ h(W;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i)\big ] dP\bigg \} \\ &{} \hspace{15mm} \displaystyle - \sum _{i \in {\mathcal {F}}}(\lambda _i \underline{p_i} - \kappa _i\overline{p_i}) \end{array} \end{aligned}$$

(24)

Notice that the term within parenthesis can be analyzed as:

$$\begin{aligned} \begin{array}{ll} \displaystyle \sup _{P \in {\mathcal {P}}} \bigg \{ \displaystyle \int _{{\mathcal {C}}_0}\big [ h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i)\big ] dP\bigg \} &{} \\ = {\left\{ \begin{array}{ll} 0, \hspace{10mm} \text {if } h(W;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i) \le 0\\ \infty , \hspace{9mm} \text {otherwise} \end{array}\right. }&\end{array} \end{aligned}$$

(25)

Since we are only interested in the finite cost case, the dual problem $\displaystyle \min _{\lambda \ge 0,\kappa \ge 0} g(\lambda ,\kappa )$ is given by:

$$\begin{aligned} \begin{array}{llc} \displaystyle \min _{\lambda ,\kappa }&{} \displaystyle \sum _{i \in {\mathcal {F}}} (\kappa _i \overline{p_i} - \lambda _i \underline{p_i}) \\ \text {s.t.} &{}\displaystyle h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i) \le 0,&{} \forall w \in {\mathcal {C}}_0\\ &{} \lambda _i \ge 0, &{} \forall i \in {\mathcal {F}}\\ &{} \kappa _i \ge 0, &{} \forall i \in {\mathcal {F}}\\ \end{array} \end{aligned}$$

(26)

Although we were able to remove the integrals by writing the dual version of the problem, the problem is still not tractable, as the first constraint implies an infinite amount of points to evaluate.

Proposition 2

Let ${\mathcal {R}}(w)$ be a constraint valid $\forall w \in {\mathcal {C}}_0$, and $\bigcup _{i \in {\mathcal {F}}}$ $\overline{{\mathcal {C}}}_i$ be a partition of ${\mathcal {C}}_0$. Then, the following are equivalent:

$$\begin{aligned} {\mathcal {R}}(w), \forall w \in {\mathcal {C}}_0 \iff {\mathcal {R}}(w), \forall w \in \overline{{\mathcal {C}}}_i, \forall i \in {\mathcal {F}} \end{aligned}$$

(27)

Based on Proposition 2, we can rewrite the constraint:

$$\begin{aligned} h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i) \le 0, \forall w \in {\mathcal {C}}_0 \end{aligned}$$

(28)

as

$$\begin{aligned} h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i) \le 0, \forall w \in \overline{{\mathcal {C}}}_i, \forall i \in {\mathcal {F}} \end{aligned}$$

(29)

Proposition 3

Consider the sets ${\mathcal {F}} = \{0, 1,\ldots , I\}$, $\{{\mathcal {V}}_i\}_{i \in {\mathcal {F}}}$ and $\{{\mathcal {C}}_i\}_{i \in {\mathcal {F}}}$ obtained from a procedure as defined in Algorithm 1. In addition, let $w' \in {\mathcal {C}}_0$ and $\bigcup _{i \in {\mathcal {F}}}$ $\overline{{\mathcal {C}}}_i$ a partition of ${\mathcal {C}}_0$. Therefore, $w' \in \overline{{\mathcal {C}}}_i$ for some $i \in {\mathcal {F}}$. In addition, we define the index sets of all supersets (antecedents of ${\mathcal {C}}_i$) by ${\mathcal {A}}(i)= \{i\} \cap \{ i' \in {\mathcal {F}}: {\mathcal {C}}_i \subsetneq {\mathcal {C}}_{i'}\}$ and the index sets of all subsets (descendants of ${\mathcal {C}}_i$) as ${\mathcal {D}}(i)= {\mathcal {F}} {\setminus } {\mathcal {A}}(i)$.

Thus we can write:

1.
$w'\in \{{\mathcal {C}}_j\}_{j \in {\mathcal {A}}(i)}$, $w' \notin \{{\mathcal {C}}_j\}_{j \in {\mathcal {D}}(j)}$ for some $j \in {\mathcal {F}}$
2.
$\displaystyle \sum _{i \in {\mathcal {F}}} {\mathbb {I}}_{\{w' \in {\mathcal {C}}_i\}} \displaystyle = \sum _{i' \in {\mathcal {A}}(i)} 1$

Based on Proposition 3, we can rewrite the constraint:

$$\begin{aligned} h(w;\beta ) - \sum _{i \in {\mathcal {F}}}{\mathbb {I}}_{\{w \in {\mathcal {C}}_i\}} (\kappa _i - \lambda _i) \le 0, \forall w \in \overline{{\mathcal {C}}}_i, \forall i \in {\mathcal {F}} \end{aligned}$$

(30)

as

$$\begin{aligned} h(w;\beta ) - \sum _{i' \in {\mathcal {A}}(i)} (\kappa _i - \lambda _i) \le 0, \forall w \in \overline{{\mathcal {C}}}_i, \forall i \in {\mathcal {F}} \end{aligned}$$

(31)

Since the constraint above is valid $\forall w \in \overline{{\mathcal {C}}}_i, \forall i \in {\mathcal {F}}$, we can write:

$$\begin{aligned} \underset{w \in \overline{{\mathcal {C}}}_i}{\min } \bigg \{h(w;\beta )\bigg \} - \sum _{i' \in {\mathcal {A}}(i)} (\kappa _i - \lambda _i) \le 0, \forall i \in {\mathcal {F}} \end{aligned}$$

(32)

Concluding the proof $\blacksquare$.

Appendix 2: The estimation of empirical probabilities

During the process of estimating the data-driven ambiguity set, we must obtain two results: (i) the polyhedral convex sets and (ii) their associated probability coverage intervals. In Sect. 2.2, we describe the data-driven procedure and present the approximation used to estimate the intervals. In this section, we present some results to endorse the approximation methodology usage.

Consider the outcome of the Algorithm 1 application and the coverage intervals associated to each convex hull ${\mathcal {C}}_i$. Ideally, the probability interval $[\underline{p}_i, \hspace{1mm} \overline{p}_i]$ should include the true probability of a random point obtained from the original data distribution to fall within the convex hull ${\mathcal {C}}_i$, considering the specified significance level $\alpha$. In this context, we refer to accuracy as the percentage of probability intervals $[\underline{p}_i, \hspace{1mm} \overline{p}_i]$ that includes the true probability $p_i$. In addition, another interesting property would be how well the approximation that the outside hull ${\mathcal {C}}_0$ replicates the true distribution support—which we refer to as coverage.

To assess such properties, we ran the following experiment considering a random variable X that follows a Multivariate Normal distribution as the data-generating process:

1.
Let $X \sim N(\mu , \Sigma )$, where $\mu$ is the unit vector of dimension $d = 3$ and $\Sigma$ is the $d \times d$ identity matrix;
2.
Let N be the sample size used to construct the ambiguity set;
3.
For a given N, we generate a sample and apply Algorithm 1, obtaining the convex hulls $\{{\mathcal {C}}_i\}_{i = 0}^{{\mathcal {I}}}$ and probability intervals $\{[\underline{p}_i, \hspace{1mm} \overline{p}_i]\}_{i = 0}^{{\mathcal {I}}}$;
4.
For each convex hull ${\mathcal {C}}_i$ we approximate the true probability coverage value $p_i$ considering the data generating process distribution by generating an extremely large sample size $S = 1.000.000$ and verify whether $p_i \in [\underline{p}_i, \hspace{1mm} \overline{p}_i]$. Such interval is calculated in step 3 using the sample with size N;
5.
We calculate the experiment’s accuracy (percentage of estimated intervals that contain the true probability) and coverage (percentage of the distribution support within

We vary the sample size N from 10 to 10.000 and repeat the experiment 5.000 times for each value. We consider the significance level $\alpha = 10\%$. The average accuracy is presented in Fig. 10 and the average coverage is presented in Fig. 11.

One can observe that as the sample size grows, both accuracy and coverage converge to the expected values considering the significance level of the experiment. Naturally, the quality of such approximation methodology grows as the sample size grows. However, we argue that for a relatively small sample size, one can obtain a decent approximation. In addition, our empirical results validate the method as the experiments in Sects. 4.2 and 4.3 show.

Appendix 3: The choice of significance level

To apply the PolieDRO framework in classification or regression models, one should define the value of the significance level $\alpha$. Such value can be interpreted as the flexibility of the probability coverage of each convex hull that defines the hyperspace of distributions considered, that is, the ambiguity set. The idea is that such convex hulls imply some structure that arises from the data. The added flexibility controls the degree to which the resulting ambiguity set considers possible distributions.

Those values should be considered statistical significance parameters, using typical values such as $\alpha = 10\%$, $\alpha = 5\%$, or $\alpha = 1\%$. We repeat the experiment considering the different values and display them in Tables 11, 12, 13, 14 and 15 for the real world data sets and in Tables 16, 17 and 18 for the synthetic data.

In Tables 11, 12, 13, 14 and 15, we have highlighted in bold the cases where the PolieDRO version outperformed its nominal counterpart for each value of $\alpha$. We have summarized the results in Table 19.

For the synthetic datasets, we followed the same criteria as in Sect. 4.3. We identified the highest number of wins (W), ties (T), or losses (L) for each experiment in Tables 16, 17 and 18, and provided a summary of the results in Table 20.

Our results indicate that the choice of the statistical parameter $\alpha$ has little impact on the study results. In most cases, it does not alter the performance of the PolieDRO models, and in the few cases where it does, the change is not substantial.

Table 11 Mean out of sample accuracy for different $\alpha$

Full size table

Table 12 Mean out of sample accuracy for different $\alpha$

Full size table

Table 13 Mean out of sample accuracy for different $\alpha$

Full size table

Table 14 Average out of sample MSE for different $\alpha$

Full size table

Table 15 Average out of sample MSE for different $\alpha$

Full size table

Table 16 Experiments with synthetically generated data sets, $\alpha = 0.10$

Full size table

Table 17 Experiments with synthetically generated data sets, $\alpha = 0.05$

Full size table

Table 18 Experiments with synthetically generated data sets, $\alpha = 0.01$

Full size table

Table 19 Pairwise performance of the PolieDRO models against their nominal counterparts using real-world data sets, for varying $\alpha$

Full size table

Table 20 Pairwise performance of the PolieDRO models against their nominal counterparts using synthetic data sets, for varying $\alpha$

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gutierrez, T., Valladão, D. & Pagnoncelli, B.K. PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06544-9

Download citation

Received: 06 June 2022
Revised: 18 March 2024
Accepted: 25 March 2024
Published: 15 April 2024
DOI: https://doi.org/10.1007/s10994-024-06544-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Abstract

Access this article

Similar content being viewed by others

A Survey of Solution Path Algorithms for Regression and Classification Models

Robust linear classification from limited training data

A Metalearning Study for Robust Nonlinear Regression

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix 1: Problem reformulation

Proposition 2

Proposition 3

Appendix 2: The estimation of empirical probabilities

Appendix 3: The choice of significance level

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PolieDRO: a novel classification and regression framework with non-parametric data-driven regularization

Abstract

Access this article

Similar content being viewed by others

A Survey of Solution Path Algorithms for Regression and Classification Models

Robust linear classification from limited training data

A Metalearning Study for Robust Nonlinear Regression

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix 1: Problem reformulation

Proposition 2

Proposition 3

Appendix 2: The estimation of empirical probabilities

Appendix 3: The choice of significance level

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation