Abstract
Unbiased assessment of the predictivity of models learnt by supervised machine learning (ML) methods requires knowledge of the learned function over a reserved test set (not used by the learning algorithm). The quality of the assessment depends, naturally, on the properties of the test set and on the error statistic used to estimate the prediction error. In this work we tackle both issues, proposing a new predictivity criterion that carefully weights the individual observed errors to obtain a global error estimate, and using incremental experimental design methods to “optimally” select the test points on which the criterion is computed. Several incremental constructions are studied, including greedy-packing (coffee-house design), support points and kernel herding techniques. Our results show that the incremental and weighted versions of the latter two, based on Maximum Mean Discrepancy concepts, yield superior performance. An industrial test case provided by the historical French electricity supplier (EDF) illustrates the practical relevance of the methodology, indicating that it is an efficient alternative to expensive cross-validation techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baudin, M., Dutfoy, A., Iooss, B., Popelin, A-P.: Open TURNS: An industrial software for uncertainty quantification in simulation. In: Ghanem, R., Higdon, D., Owhadi, H. (eds.) Springer Handbook on Uncertainty Quantification, pp. 2001–2038. Springer (2017)
Berlinet , A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer (2004)
Borovicka, T., Jr. Jirina, M., Kordik, P., Jirina, M.: Selecting representative data sets. In: Karahoca, A. (eds) Advances in Data Mining, Knowledge Discovery and Applications, pp. 43–70. INTECH (2012)
Chen, W.Y., Barp, A., Briol, F.-X., Gorham, J., Girolami, M., Mackey, L., Oates, C.: Stein Point Markov Chain Monte Carlo. arXiv preprint. arXiv:1905.03673 (2019)
Chen, W.Y., Mackey, L., Gorham, J., Briol, F.-X., Oates, C.J.: Stein Points. Proc. ICML (2018). arXiv preprint arXiv:1803.10161v4
Chen, Y., Welling, M., Smola, A.: Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pp. 109–116. AUAI Press (2010)
Chevalier, C., Bect, J., Ginsbourger, D., Picheny, V., Richet, Y., Vazquez, E.: Fast kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56, 455–465 (2014)
Crombecq, K., Laermans, E., Dhaene, T.: Efficient space-filling and non-collapsing sequential design strategies for simulation-based modelling. Eur. J. Oper. Res. 214, 683–696 (2011)
Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul. 85, 1283–1305 (2015)
Da Veiga, S., Gamboa, F., Iooss, B., Prieur, C.: Basics and Trends in Sensitivity Analysis. Theory and Practice in R. SIAM (2021)
de Crécy, A., Bazin, P., Glaeser, H., Skorek, T., Joufcla, J., Probst, P., Fujioka, K., Chung, B.D., Oh, D.Y., Kyncl, M., Pernica, R., Macek, J., Meca, R., Macian, R., D’Auria, F., Petruzzi, A., Batet, L., Perez, M., Reventos, F.: Uncertainty and sensitivity analysis of the LOFT L2–5 test: results of the BEMUSE programme. Nucl. Eng. Design 12, 3561–3578 (2008)
Demay, C., Iooss, B., Le Gratiet, L., Marrel, A.: Model selection for Gaussian Process regression: an application with highlights on the model variance validation. Qual. Reliab. Eng. Int. J. 38, 1482–1500 (2022). https://doi.org/10.1002/qre.2973
Dubrule, O.: Cross validation of kriging in a unique neighborhood. J. Int. Assoc. Math. Geol. 15(6), 687–699 (1983)
ENIQ: Qualification of an AI/ML NDT system—Technical basis. NUGENIA, ENIQ Technical Report (2019)
Fang, K.-T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman & Hall/CRC (2006)
Geffraye, G., Antoni, O., Farvacque, M., Kadri, D., Lavialle, G., Rameau, B., Ruby, A.: CATHARE2 V2.5_2: a single version for various applications. Nucl. Eng. Des. 241, 4456–4463 (2011)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings Algorithmic Learning Theory, pp. 63–77. Springer-Verlag (2005)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer (2009)
Hawkins, R., Paterson, C., Picardi, C., Jia, Y., Calinescu, R., Habli, I.: Guidance on the assurance of machine learning in autonomous systems (AMLAS). University of York, Assuring Autonomy International Programme (AAIP) (2021)
Iooss, B.: Sample selection from a given dataset to validate machine learning models. In Proceedings of 50th Meeting of the Italian Statistical Society (SIS2021), pp. 88–93. Pisa, Italy, June (2021)
Iooss, B., Boussouf, L., Feuillard, V., Marrel, A.: Numerical studies of the metamodel fitting and validation processes. Int. J. Adv. Syst. Measure. 3, 11–21 (2010)
Joseph, V.R., Vakayil, A.: SPlit: an optimal method for data splitting. Technometrics 64(2), 166–176 (2022)
Kennard, R.W., Stone, L.A.: Computer aided design of experiments. Technometrics 11, 137–148 (1969)
Kleijnen, J.P.C., Sargent, R.G.: A methodology for fitting and validating metamodels in simulation. Eur. J. Oper. Res. 120, 14–29 (2000)
Lemaire, M., Chateauneuf, A., Mitteau, J.-C.: Structural Reliability. Wiley (2009)
Li, W., Lu, L., Xie, X., Yang, M.: A novel extension algorithm for optimized Latin hypercube sampling. J. Stat. Comput. Simul. 87, 2549–2559 (2017)
Lorenzo, G., Zanocco, P., Giménez, M., Marquès, M., Iooss, B., Bolado-Lavin, R., Pierro, F., Galassi, G., D’Auria, F., Burgazzi, L.: Assessment of an isolation condenser of an integral reactor in view of uncertainties in engineering parameters. Sci. Technol. Nucl. Install. (2011). https://doi.org/10.1155/2011/827354
Mak, S., Joseph, V.R.: Support points. Ann. Stat. 46, 2562–2592 (2018)
Marrel, A., Chabridon, V.: Statistical developments for target and conditional sensitivity analysis: Application on safety studies for nuclear reactor. Reliab. Eng. Syst. Saf. 214, 107711 (2021)
Marrel, A., Iooss, B., Chabridon, V.: The ICSCREAM methodology: identification of penalizing configurations in computer experiments using screening and metamodel - Applications in thermal-hydraulics. Nucl. Sci. Eng. 196, 301–321 (2022). https://doi.org/10.1080/00295639.2021.1980362
Molnar, C.: Interpretable Machine Learning. github (2019)
Morris, M.D., Mitchell, T.J.: Exploratory designs for computational experiments. J. Stat. Planning Inference 43, 381–402 (1995)
Müller, W.G.: Collecting Spatial Data, 3rd edn. Springer (2007)
Nash, J., Sutcliffe, J.: River flow forecasting through conceptual models part I-A discussion of principles. J. Hydrol. 10(3), 282–290 (1970)
Nogales Gómez, A., Pronzato, L., Rendas, M.-J.: Incremental space-filling design based on coverings and spacings: improving upon low discrepancy sequences. J. Stat. Theory Pract. 15(4), 77 (2021)
Pronzato, L.: Performance analysis of greedy algorithms for minimising a maximum mean discrepancy. Statistics and Computing, to appear (2022), hal-03114891. arXiv:2101.07564
Pronzato, L., Müller, W.: Design of computer experiments: space filling and beyond. Stat. Comput. 22, 681–701 (2012)
Pronzato, L., Rendas, M.-J.: Validation design I: construction of validation designs via kernel herding. Preprint (2021), hal-03474805. arXiv:2112.05583
Pronzato, L., Zhigljavsky, A.A.: Bayesian quadrature and energy minimization for space-filling design. SIAM/ASA J. Uncertainty Quant. 8, 959–1011 (2020)
Qian, P.Z.G., Ai, M., Wu, C.F.J.: Construction of nested space-filling designs. Ann. Stat. 37, 3616–3643 (2009)
Qian, P.Z.G., Wu, C.F.J.: Sliced space filling designs. Biometrika 96, 945–956 (2009)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press (2006)
Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments. Springer (2003)
Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41(5), 2263–2291 (2013)
Shang, B., Apley, D.W.: Fully-sequential space-filling design algorithms for computer experiments. J. Qual. Technol. 53(2), 173–196 (2021)
Sheikholeslami, R., Razavi, S.: Progressive Latin hypercube sampling: an efficient approach for robust sampling-based analysis of environmental models. Environ. Model. Softw. 93, 109–126 (2017)
Smith, R.C.: Uncertainty Quantification. SIAM (2014)
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007)
Snee, R.D.: Validation of regression models: methods and examples. Technometrics 19, 415–428 (1977)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)
Székely, G.J., Rizzo, M.L.: Testing for equal distributions in high dimension. InterStat 5, 1–6 (2004)
Székely, G.J., Rizzo, M.L.: Energy statistics: a class of statistics based on distances. J. Stat. Planning Inference 143, 1249–1272 (2013)
Teymur, O., Gorham, J., Riabiz, M., Oates, C.J.: Optimal quantisation of probability measures using maximum mean discrepancy. In International Conference on Artificial Intelligence and Statistics, pp. 1027–1035 (2021). arXiv preprint arXiv:2010.07064v1
Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. Syst. 58(2), 109–130 (2001)
Xu, Y., Goodacre, R.: On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J. Anal. Testing 2, 249–262 (2018)
Acknowledgements
This work was supported by project INDEX (INcremental Design of EXperiments) ANR-18-CE91-0007 of the French National Research Agency (ANR). The authors are grateful to Guillaume Levillain and Thomas Bittar for their code development during their work at EDF. Thanks also to Sébastien Da Veiga for fruitful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Appendix A: Maximum Mean Discrepancy
Let K be a positive definite kernel on \(\mathcal {X}\times \mathcal {X}\), defining a reproducing kernel Hilbert space (RKHS) \(\mathcal {H}_K\) of functions on \(\mathcal {X}\), with scalar product \(\langle f,g\rangle _{\mathcal {H}_K}\) and norm \(\Vert f\Vert _{\mathcal {H}_K}\); see, e.g., [2]. For any \(f\in \mathcal {H}_K\) and any probability measures \(\mu \) and \(\xi \) on \(\mathcal {X}\), we have
where we have denoted \(K_\textbf{x}(\cdot )=K(\textbf{x},\cdot )\) and used the reproducing property \(f(\textbf{x})=\langle f,K_\textbf{x}\rangle _{\mathcal {H}_K}\) for all \(\textbf{x}\in \mathcal {X}\), and where, for any probability measure \(\nu \) on \(\mathcal {X}\) and \(\textbf{x}\in \mathcal {X}\),
is the potential of \(\nu \) at \(\textbf{x}\). \(P_{K,\nu }\in \mathcal {H}_K\) and is called kernel embedding of \(\nu \) in ML. In some cases, the potential can be expressed analytically (see. Appendix 6), otherwise it can be estimated by numerical quadrature (Quasi Monte Carlo). Cauchy-Schwartz inequality applied to (16) gives
and therefore
The Maximum Mean Discrepancy (MMD) between \(\xi \) and \(\mu \) (for the kernel K and set \(\mathcal {X}\)) is \(d_K(\xi ,\mu )=\Vert P_{K,\xi }-P_{K,\mu }\Vert _{\mathcal {H}_K}\). Direct calculation gives
where the random variables \(\zeta \) and \(\zeta '\) in (19) are independent, see [49]. When K is the energy distance kernel (10), one recovers the expression (11) for the corresponding MMD. One may refer to [51] for an illuminating exposition on MMD, kernel embedding, and conditions on K (the notion of characteristic kernel) that make \(d_K\) a metric on the space of probability measures on \(\mathcal {X}\). The distance and Matérn kernels considered in this paper are characteristic.
1.2 Appendix B: Analytical Computation of Potentials for Matérn Kernels
As for tensor-product kernels, the potential is the product of the one-dimensional potentials, we only consider one-dimensional input spaces.
For \(\mu \) the uniform distribution on [0, 1] and K the Matérn kernel \(K_{5/2,\theta }\) with smoothness \(\nu =5/2\) and correlation length \(\theta \), see (15), we get
where
The expressions \(P_{K_{\nu ,\theta },\mu }(x)\) for \(\nu =1/2\) and \(\nu =3/2\) can be found in [40].
When \(\mu \) is the standard normal distribution \(\mathcal {N}(0,1)\), the potential \(P_{K_{5/2,\theta },\mathcal {N}(0,1)}\) is \( P_{K_{5/2,\theta },\mathcal {N}(0,1)}(x) = T_\theta (x) + T_\theta (-x), \) where
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Fekhari, E., Iooss, B., Muré, J., Pronzato, L., Rendas, MJ. (2022). Model Predictivity Assessment: Incremental Test-Set Selection and Accuracy Evaluation. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-16609-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)