Skip to main content

Model Predictivity Assessment: Incremental Test-Set Selection and Accuracy Evaluation

  • Conference paper
  • First Online:
Studies in Theoretical and Applied Statistics (SIS 2021)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

Abstract

Unbiased assessment of the predictivity of models learnt by supervised machine learning (ML) methods requires knowledge of the learned function over a reserved test set (not used by the learning algorithm). The quality of the assessment depends, naturally, on the properties of the test set and on the error statistic used to estimate the prediction error. In this work we tackle both issues, proposing a new predictivity criterion that carefully weights the individual observed errors to obtain a global error estimate, and using incremental experimental design methods to “optimally” select the test points on which the criterion is computed. Several incremental constructions are studied, including greedy-packing (coffee-house design), support points and kernel herding techniques. Our results show that the incremental and weighted versions of the latter two, based on Maximum Mean Discrepancy concepts, yield superior performance. An industrial test case provided by the historical French electricity supplier (EDF) illustrates the practical relevance of the methodology, indicating that it is an efficient alternative to expensive cross-validation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.org/project/otkerneldesign/.

References

  1. Baudin, M., Dutfoy, A., Iooss, B., Popelin, A-P.: Open TURNS: An industrial software for uncertainty quantification in simulation. In: Ghanem, R., Higdon, D., Owhadi, H. (eds.) Springer Handbook on Uncertainty Quantification, pp. 2001–2038. Springer (2017)

    Google Scholar 

  2. Berlinet , A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer (2004)

    Google Scholar 

  3. Borovicka, T., Jr. Jirina, M., Kordik, P., Jirina, M.: Selecting representative data sets. In: Karahoca, A. (eds) Advances in Data Mining, Knowledge Discovery and Applications, pp. 43–70. INTECH (2012)

    Google Scholar 

  4. Chen, W.Y., Barp, A., Briol, F.-X., Gorham, J., Girolami, M., Mackey, L., Oates, C.: Stein Point Markov Chain Monte Carlo. arXiv preprint. arXiv:1905.03673 (2019)

  5. Chen, W.Y., Mackey, L., Gorham, J., Briol, F.-X., Oates, C.J.: Stein Points. Proc. ICML (2018). arXiv preprint arXiv:1803.10161v4

  6. Chen, Y., Welling, M., Smola, A.: Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, pp. 109–116. AUAI Press (2010)

    Google Scholar 

  7. Chevalier, C., Bect, J., Ginsbourger, D., Picheny, V., Richet, Y., Vazquez, E.: Fast kriging-based stepwise uncertainty reduction with application to the identification of an excursion set. Technometrics 56, 455–465 (2014)

    Article  MathSciNet  Google Scholar 

  8. Crombecq, K., Laermans, E., Dhaene, T.: Efficient space-filling and non-collapsing sequential design strategies for simulation-based modelling. Eur. J. Oper. Res. 214, 683–696 (2011)

    Article  Google Scholar 

  9. Da Veiga, S.: Global sensitivity analysis with dependence measures. J. Stat. Comput. Simul. 85, 1283–1305 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  10. Da Veiga, S., Gamboa, F., Iooss, B., Prieur, C.: Basics and Trends in Sensitivity Analysis. Theory and Practice in R. SIAM (2021)

    Google Scholar 

  11. de Crécy, A., Bazin, P., Glaeser, H., Skorek, T., Joufcla, J., Probst, P., Fujioka, K., Chung, B.D., Oh, D.Y., Kyncl, M., Pernica, R., Macek, J., Meca, R., Macian, R., D’Auria, F., Petruzzi, A., Batet, L., Perez, M., Reventos, F.: Uncertainty and sensitivity analysis of the LOFT L2–5 test: results of the BEMUSE programme. Nucl. Eng. Design 12, 3561–3578 (2008)

    Article  Google Scholar 

  12. Demay, C., Iooss, B., Le Gratiet, L., Marrel, A.: Model selection for Gaussian Process regression: an application with highlights on the model variance validation. Qual. Reliab. Eng. Int. J. 38, 1482–1500 (2022). https://doi.org/10.1002/qre.2973

  13. Dubrule, O.: Cross validation of kriging in a unique neighborhood. J. Int. Assoc. Math. Geol. 15(6), 687–699 (1983)

    Article  MathSciNet  Google Scholar 

  14. ENIQ: Qualification of an AI/ML NDT system—Technical basis. NUGENIA, ENIQ Technical Report (2019)

    Google Scholar 

  15. Fang, K.-T., Li, R., Sudjianto, A.: Design and Modeling for Computer Experiments. Chapman & Hall/CRC (2006)

    Google Scholar 

  16. Geffraye, G., Antoni, O., Farvacque, M., Kadri, D., Lavialle, G., Rameau, B., Ruby, A.: CATHARE2 V2.5_2: a single version for various applications. Nucl. Eng. Des. 241, 4456–4463 (2011)

    Google Scholar 

  17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)

    Google Scholar 

  18. Gretton, A., Bousquet, O., Smola, A., Schölkopf, B.: Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings Algorithmic Learning Theory, pp. 63–77. Springer-Verlag (2005)

    Google Scholar 

  19. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer (2009)

    Google Scholar 

  20. Hawkins, R., Paterson, C., Picardi, C., Jia, Y., Calinescu, R., Habli, I.: Guidance on the assurance of machine learning in autonomous systems (AMLAS). University of York, Assuring Autonomy International Programme (AAIP) (2021)

    Google Scholar 

  21. Iooss, B.: Sample selection from a given dataset to validate machine learning models. In Proceedings of 50th Meeting of the Italian Statistical Society (SIS2021), pp. 88–93. Pisa, Italy, June (2021)

    Google Scholar 

  22. Iooss, B., Boussouf, L., Feuillard, V., Marrel, A.: Numerical studies of the metamodel fitting and validation processes. Int. J. Adv. Syst. Measure. 3, 11–21 (2010)

    Google Scholar 

  23. Joseph, V.R., Vakayil, A.: SPlit: an optimal method for data splitting. Technometrics 64(2), 166–176 (2022)

    Article  MathSciNet  Google Scholar 

  24. Kennard, R.W., Stone, L.A.: Computer aided design of experiments. Technometrics 11, 137–148 (1969)

    Article  MATH  Google Scholar 

  25. Kleijnen, J.P.C., Sargent, R.G.: A methodology for fitting and validating metamodels in simulation. Eur. J. Oper. Res. 120, 14–29 (2000)

    Article  MATH  Google Scholar 

  26. Lemaire, M., Chateauneuf, A., Mitteau, J.-C.: Structural Reliability. Wiley (2009)

    Google Scholar 

  27. Li, W., Lu, L., Xie, X., Yang, M.: A novel extension algorithm for optimized Latin hypercube sampling. J. Stat. Comput. Simul. 87, 2549–2559 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  28. Lorenzo, G., Zanocco, P., Giménez, M., Marquès, M., Iooss, B., Bolado-Lavin, R., Pierro, F., Galassi, G., D’Auria, F., Burgazzi, L.: Assessment of an isolation condenser of an integral reactor in view of uncertainties in engineering parameters. Sci. Technol. Nucl. Install. (2011). https://doi.org/10.1155/2011/827354

  29. Mak, S., Joseph, V.R.: Support points. Ann. Stat. 46, 2562–2592 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  30. Marrel, A., Chabridon, V.: Statistical developments for target and conditional sensitivity analysis: Application on safety studies for nuclear reactor. Reliab. Eng. Syst. Saf. 214, 107711 (2021)

    Article  Google Scholar 

  31. Marrel, A., Iooss, B., Chabridon, V.: The ICSCREAM methodology: identification of penalizing configurations in computer experiments using screening and metamodel - Applications in thermal-hydraulics. Nucl. Sci. Eng. 196, 301–321 (2022). https://doi.org/10.1080/00295639.2021.1980362

  32. Molnar, C.: Interpretable Machine Learning. github (2019)

    Google Scholar 

  33. Morris, M.D., Mitchell, T.J.: Exploratory designs for computational experiments. J. Stat. Planning Inference 43, 381–402 (1995)

    Article  MATH  Google Scholar 

  34. Müller, W.G.: Collecting Spatial Data, 3rd edn. Springer (2007)

    Google Scholar 

  35. Nash, J., Sutcliffe, J.: River flow forecasting through conceptual models part I-A discussion of principles. J. Hydrol. 10(3), 282–290 (1970)

    Article  Google Scholar 

  36. Nogales Gómez, A., Pronzato, L., Rendas, M.-J.: Incremental space-filling design based on coverings and spacings: improving upon low discrepancy sequences. J. Stat. Theory Pract. 15(4), 77 (2021)

    Google Scholar 

  37. Pronzato, L.: Performance analysis of greedy algorithms for minimising a maximum mean discrepancy. Statistics and Computing, to appear (2022), hal-03114891. arXiv:2101.07564

  38. Pronzato, L., Müller, W.: Design of computer experiments: space filling and beyond. Stat. Comput. 22, 681–701 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  39. Pronzato, L., Rendas, M.-J.: Validation design I: construction of validation designs via kernel herding. Preprint (2021), hal-03474805. arXiv:2112.05583

  40. Pronzato, L., Zhigljavsky, A.A.: Bayesian quadrature and energy minimization for space-filling design. SIAM/ASA J. Uncertainty Quant. 8, 959–1011 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  41. Qian, P.Z.G., Ai, M., Wu, C.F.J.: Construction of nested space-filling designs. Ann. Stat. 37, 3616–3643 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  42. Qian, P.Z.G., Wu, C.F.J.: Sliced space filling designs. Biometrika 96, 945–956 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  43. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press (2006)

    Google Scholar 

  44. Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments. Springer (2003)

    Google Scholar 

  45. Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41(5), 2263–2291 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  46. Shang, B., Apley, D.W.: Fully-sequential space-filling design algorithms for computer experiments. J. Qual. Technol. 53(2), 173–196 (2021)

    Article  Google Scholar 

  47. Sheikholeslami, R., Razavi, S.: Progressive Latin hypercube sampling: an efficient approach for robust sampling-based analysis of environmental models. Environ. Model. Softw. 93, 109–126 (2017)

    Article  Google Scholar 

  48. Smith, R.C.: Uncertainty Quantification. SIAM (2014)

    Google Scholar 

  49. Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007)

    Google Scholar 

  50. Snee, R.D.: Validation of regression models: methods and examples. Technometrics 19, 415–428 (1977)

    Article  MATH  Google Scholar 

  51. Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)

    Google Scholar 

  52. Székely, G.J., Rizzo, M.L.: Testing for equal distributions in high dimension. InterStat 5, 1–6 (2004)

    Google Scholar 

  53. Székely, G.J., Rizzo, M.L.: Energy statistics: a class of statistics based on distances. J. Stat. Planning Inference 143, 1249–1272 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  54. Teymur, O., Gorham, J., Riabiz, M., Oates, C.J.: Optimal quantisation of probability measures using maximum mean discrepancy. In International Conference on Artificial Intelligence and Statistics, pp. 1027–1035 (2021). arXiv preprint arXiv:2010.07064v1

  55. Wold, S., Sjöström, M., Eriksson, L.: PLS-regression: a basic tool of chemometrics. Chemometr. Intell. Lab. Syst. 58(2), 109–130 (2001)

    Article  Google Scholar 

  56. Xu, Y., Goodacre, R.: On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J. Anal. Testing 2, 249–262 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by project INDEX (INcremental Design of EXperiments) ANR-18-CE91-0007 of the French National Research Agency (ANR). The authors are grateful to Guillaume Levillain and Thomas Bittar for their code development during their work at EDF. Thanks also to Sébastien Da Veiga for fruitful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bertrand Iooss .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Appendix A: Maximum Mean Discrepancy

Let K be a positive definite kernel on \(\mathcal {X}\times \mathcal {X}\), defining a reproducing kernel Hilbert space (RKHS) \(\mathcal {H}_K\) of functions on \(\mathcal {X}\), with scalar product \(\langle f,g\rangle _{\mathcal {H}_K}\) and norm \(\Vert f\Vert _{\mathcal {H}_K}\); see, e.g., [2]. For any \(f\in \mathcal {H}_K\) and any probability measures \(\mu \) and \(\xi \) on \(\mathcal {X}\), we have

$$\begin{aligned} \left| \int _\mathcal {X}f(\textbf{x})\, \text{ d }\xi (\textbf{x}) - \int _\mathcal {X}f(\textbf{x})\, \text{ d }\mu (\textbf{x}) \right|= & {} \left| \int _\mathcal {X}\langle f,K_\textbf{x}\rangle _{\mathcal {H}_K}\, \text{ d }(\xi -\mu )(\textbf{x}) \right| \nonumber \\= & {} \left| \langle f,(P_{K,\xi }-P_{K,\mu }\rangle _{\mathcal {H}_K} \right| \,, \end{aligned}$$
(16)

where we have denoted \(K_\textbf{x}(\cdot )=K(\textbf{x},\cdot )\) and used the reproducing property \(f(\textbf{x})=\langle f,K_\textbf{x}\rangle _{\mathcal {H}_K}\) for all \(\textbf{x}\in \mathcal {X}\), and where, for any probability measure \(\nu \) on \(\mathcal {X}\) and \(\textbf{x}\in \mathcal {X}\),

$$\begin{aligned} P_{K,\nu }(\textbf{x}) = \int _\mathcal {X}K(\textbf{x}, \textbf{x}') \, \text{ d }\nu (\textbf{x}') \,, \end{aligned}$$
(17)

is the potential of \(\nu \) at \(\textbf{x}\). \(P_{K,\nu }\in \mathcal {H}_K\) and is called kernel embedding of \(\nu \) in ML. In some cases, the potential can be expressed analytically (see. Appendix 6), otherwise it can be estimated by numerical quadrature (Quasi Monte Carlo). Cauchy-Schwartz inequality applied to (16) gives

$$ \left| \int _\mathcal {X}f(\textbf{x})\, \text{ d }\xi (\textbf{x}) - \int _\mathcal {X}f(\textbf{x})\, \text{ d }\mu (\textbf{x}) \right| \le \Vert f\Vert _{\mathcal {H}_K}\,\Vert P_{K,\xi }-P_{K,\mu }\Vert _{\mathcal {H}_K} $$

and therefore

$$ \Vert P_{K,\xi }-P_{K,\mu }\Vert _{\mathcal {H}_K}=\sup _{f\in \mathcal {H}_K:\ \Vert f\Vert _{\mathcal {H}_K}=1} \left| \int _\mathcal {X}f(\textbf{x})\, \text{ d }\xi (\textbf{x}) - \int _\mathcal {X}f(\textbf{x})\, \text{ d }\mu (\textbf{x}) \right| \,. $$

The Maximum Mean Discrepancy (MMD) between \(\xi \) and \(\mu \) (for the kernel K and set \(\mathcal {X}\)) is \(d_K(\xi ,\mu )=\Vert P_{K,\xi }-P_{K,\mu }\Vert _{\mathcal {H}_K}\). Direct calculation gives

$$\begin{aligned} d_K^2(\xi ,\mu )= & {} \Vert P_{K,\xi }-P_{K,\mu }\Vert _{\mathcal {H}_K}^2 = \int _{\mathcal {X}^2} K(\textbf{x},\textbf{x}')\, \text{ d }(\xi -\mu )(\textbf{x})\text{ d }(\xi -\mu )(\textbf{x}') \end{aligned}$$
(18)
$$\begin{aligned}= & {} \mathbb {E}_{\zeta ,\zeta '\sim \xi } K(\zeta ,\zeta ') + \mathbb {E}_{\zeta ,\zeta '\sim \mu } K(\zeta ,\zeta ') - 2\mathbb {E}_{\zeta \sim \xi , \zeta '\sim \mu } K(\zeta ,\zeta ') \,, \end{aligned}$$
(19)

where the random variables \(\zeta \) and \(\zeta '\) in (19) are independent, see [49]. When K is the energy distance kernel (10), one recovers the expression (11) for the corresponding MMD. One may refer to [51] for an illuminating exposition on MMD, kernel embedding, and conditions on K (the notion of characteristic kernel) that make \(d_K\) a metric on the space of probability measures on \(\mathcal {X}\). The distance and Matérn kernels considered in this paper are characteristic.

1.2 Appendix B: Analytical Computation of Potentials for Matérn Kernels

As for tensor-product kernels, the potential is the product of the one-dimensional potentials, we only consider one-dimensional input spaces.

For \(\mu \) the uniform distribution on [0, 1] and K the Matérn kernel \(K_{5/2,\theta }\) with smoothness \(\nu =5/2\) and correlation length \(\theta \), see (15), we get

$$\begin{aligned} P_{K_{5/2,\theta },\mu }(x) = \frac{16 \theta }{3 \sqrt{5}} - \frac{1}{15 \theta } (S_\theta (x) + S_\theta (1-x)), \end{aligned}$$

where

$$\begin{aligned} \nonumber S_\theta (x) = \exp \left( - \frac{\sqrt{5}}{\theta } x \right) \left( 5 \sqrt{5} x^2 + 25 \theta x + 8 \sqrt{5} \theta ^2 \right) . \end{aligned}$$

The expressions \(P_{K_{\nu ,\theta },\mu }(x)\) for \(\nu =1/2\) and \(\nu =3/2\) can be found in [40].

When \(\mu \) is the standard normal distribution \(\mathcal {N}(0,1)\), the potential \(P_{K_{5/2,\theta },\mathcal {N}(0,1)}\) is \( P_{K_{5/2,\theta },\mathcal {N}(0,1)}(x) = T_\theta (x) + T_\theta (-x), \) where

$$\begin{aligned} T_\theta (x)= & {} \frac{1}{6} \left( \frac{5}{\theta ^2} x^2 + \left( 3 - \frac{10}{\theta ^2} \right) \frac{\sqrt{5}}{\theta } x + \frac{5}{\theta ^2} \left( \frac{5}{\theta ^2} -2 \right) + 3 \right) \\{} & {} \times \, \textrm{erfc} \left( \frac{\frac{\sqrt{5}}{\theta } - x}{\sqrt{2}} \right) \exp \left( \frac{5}{2 \theta ^2} - \frac{\sqrt{5}}{\theta }x\right) \\ {}{} & {} + \frac{1}{3 \sqrt{2 \pi }} \frac{\sqrt{5}}{\theta } \left( 3 - \frac{5}{\theta ^2} \right) \exp \left( -\frac{x^2}{2}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fekhari, E., Iooss, B., Muré, J., Pronzato, L., Rendas, MJ. (2022). Model Predictivity Assessment: Incremental Test-Set Selection and Accuracy Evaluation. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_20

Download citation

Publish with us

Policies and ethics