Abstract
Distribution regression is the regression case where the input objects are distributions. Many machine learning problems can be analyzed in this framework, such as multi-instance learning and learning from noisy data. This paper attempts to build a conformal predictive system (CPS) for distribution regression, where the prediction of the system for a test input is a cumulative distribution function (CDF) of the corresponding test label. The CDF output by a CPS provides useful information about the test label, as it can estimate the probability of any event related to the label and be transformed to prediction interval and prediction point with the help of the corresponding quantiles. Furthermore, a CPS has the property of validity as the prediction CDFs and the prediction intervals are statistically compatible with the realizations. This property is desired for many risk-sensitive applications, such as weather forecast. To the best of our knowledge, this is the first work to extend the learning framework of CPS to distribution regression problems. We first embed the input distributions to a reproducing kernel Hilbert space using kernel mean embedding approximated by random Fourier features, and then build a fast CPS on the top of the embeddings. While inheriting the property of validity from the learning framework of CPS, our algorithm is simple, easy to implement and fast. The proposed approach is tested on synthetic data sets and can be used to tackle the problem of statistical postprocessing of ensemble forecasts, which demonstrates the effectiveness of our algorithm for distribution regression problems.
Similar content being viewed by others
Data availability
The data used during the current study are available from the corresponding author upon reasonable request.
References
Balasubramanian V, Ho S-S, Vovk V (2014) Conformal prediction for reliable machine learning: theory. Newnes, Adaptations and Applications
Bosc N, Atkinson F, Felix E, Gaulton A, Hersey A, Leach AR (2019) Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11(1):4. https://doi.org/10.1186/s13321-018-0325-4
Bourouis S, Al-Osaimi FR, Bouguila N, Sallay H, Aldosari F, Al Mashrgy M (2019) Bayesian inference by reversible jump MCMC for clustering based on finite generalized inverted Dirichlet mixtures. Soft Comput 23(14):5799–5813. https://doi.org/10.1007/s00500-018-3244-4
Bousquet O, Elisseeff A (2002) Stability and generalization. J Machine Learn Res 2(3):499–526. https://doi.org/10.1162/153244302760200704
Cortés-Ciriano I, Bender A (2019) Reliable prediction errors for deep neural networks using test-time dropout. J Chem Inf Model 59(7):3330–3339. https://doi.org/10.1021/acs.jcim.9b00297
Fraley C, Raftery A E, Gneiting T, & Sloughter J M (2018) ensemblebma: An r package for probabilistic forecasting using ensembles and bayesian model averaging, r package version 5.1.5.[Available online at https://CRAN.R-project.org/package=ensembleBMA.]
Gneiting T, Katzfuss M (2014) Probabilistic forecasting. Annual Rev Stat Appl 1(1):125–151. https://doi.org/10.1146/annurev-statistics-062713-085831
Gneiting T, Raftery AE (2005) Atmospheric science. Weather Forecast Ensemble Methods Sci 310(5746):248–249. https://doi.org/10.1126/science.1115255
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122. https://doi.org/10.1007/s13042-011-0019-y
Jitkrittum W, Gretton A, Heess N, Eslami S M A, Lakshminarayanan B, Sejdinovic D, & Szabó Z (2015) Kernel-based just-in-time learning for passing expectation propagation messages. In: UAI’15 proceedings of the thirty-first conference on uncertainty in artificial intelligence (pp 405–414)
Laxhammar R, & Falkman G (2011) Sequential conformal anomaly detection in trajectories based on hausdorff distance. In 14th international conference on information fusion (pp 1–8). IEEE
Laxhammar R, Falkman G (2013) Online learning and sequential anomaly detection in trajectories. IEEE Trans Pattern Anal Mach Intell 36(6):1158–1173. https://doi.org/10.1109/TPAMI.2013.172
Lopez-Paz D, Muandet K, lkopf B S, & Tolstikhin I (2015) Towards a learning theory of cause-effect inference. In proceedings of the 32nd international conference on machine learning (pp 1452–1461)
Melluish T, Saunders C, Nouretdinov I, & Vovk V (2001) Comparing the bayes and typicalness frameworks. In European conference on machine learning (pp 360–371). Springer, Berlin, Heidelberg
Messner J (2017) Ensemble postprocessing data sets. R package ensemblepp version 0.1–0.[Available online at https://CRAN.R-project.org/package=ensemblepp]
Muandet K, Fukumizu K, Sriperumbudur B, Scholkopf B (2017) Kernel mean embedding of distributions: a review and beyond. Found Trends Machine Learn. https://doi.org/10.1561/2200000060
Nouretdinov I, Costafreda SG, Gammerman A, Chervonenkis A, Vovk V, Vapnik V, Fu CH (2011) Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression. Neuroimage 56(2):809–813. https://doi.org/10.1016/j.neuroimage.2010.05.023
Oliphant TE (2006) A guide to NumPy. Trelgol Publishing, USA
Papadopoulos H, Gammerman A, & Vovk V (2009) Confidence predictions for the diagnosis of acute abdominal pain. In IFIP international conference on artificial intelligence applications and innovations (pp 175–184). Springer, Boston, MA
Póczos B, Singh A, Rinaldo A, & Wasserman L A (2013) Distribution-free distribution regression. In artificial intelligence and statistics (pp 507–515)
R Core Team (2018) R: A language and environment for statistical computing, R foundation for statistical computing, Vienna, Austria. [Available online at https://www.R-project.org/]
Raftery AE, Gneiting T, Balabdaoui F, Polakowski M (2005) Using Bayesian model averaging to calibrate forecast ensembles. Mon Weather Rev 133(5):1155–1174. https://doi.org/10.1175/Mwr2906.1
Rahimi A, & Recht B (2007) Random features for large-scale kernel machines. In advances in neural information processing systems 20 (Vol. 20, pp. 1177–1184)
Rahimi A, & Recht B (2008a) Uniform approximation of functions with random bases. In 2008a 46th annual allerton conference on communication, control, and computing (pp 555–561). IEEE
Rahimi A, Recht B (2008b) Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In Adv Neural Inform Process Syst 21(21):1313–1320
Ren WJ, Wang YW, Han M (2021) Time series prediction based on echo state network tuned by divided adaptive multi-objective differential evolution algorithm. Soft Comput. https://doi.org/10.1007/s00500-020-05457-8
Van Rossum G, & Drake Jr F L (1995) Python tutorial (Vol. 620): Centrum voor Wiskunde en Informatica Amsterdam
Scheuerer M, Hamill TM (2015) Statistical postprocessing of ensemble precipitation forecasts by fitting censored. Shifted Gamma Distribut Month Weather Rev 143(11):4578–4596. https://doi.org/10.1175/Mwr-D-15-0061.1
Scheuerer M (2014) Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quarter J Royal Meteorol Soc 140(680):1086–1096
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Sloughter JM, Raftery AE, Gneiting T, Fraley C (2007) Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon Weather Rev 135(9):3209–3220. https://doi.org/10.1175/Mwr3441.1
Szabó Z, Gretton A, Póczos B, & Sriperumbudur B (2015) Two-stage sampled learning theory on distributions. In Artif Intell Stat (pp 948–957)
Szabó Z, Sriperumbudur BK, Poczos B, Gretton A (2016) Learning theory for distribution regression. J Machine Learn Res 17(1):5272–5311
Vannitsem S, Wilks DS, Messner J (2018) Statistical postprocessing of ensemble forecasts. Elsevier, London
Vovk V, Nouretdinov I, Manokhin V, Gammerman A (2018a) Conformal predictive distributions with kernels. Braverman Read Machine Learn Key Ideas Inception Curr State 11100:103–121. https://doi.org/10.1007/978-3-319-99492-5_4
Vovk V, Shen JL, Manokhin V, Xie MG (2019) Nonparametric predictive distributions based on conformal prediction. Mach Learn 108(3):445–474. https://doi.org/10.1007/s10994-018-5755-8
Vovk V, Petej I, Nouretdinov I, Manokhin V, Gammerman A (2020a) Computationally efficient versions of conformal predictive distributions. Neurocomputing 397:292–308. https://doi.org/10.1016/j.neucom.2019.10.110
Vovk V, Gammerman A, Shafer G (2005) Algorithmic learning in a random world. Springer Science & Business Media, USA
Vovk V, Nouretdinov I, Manokhin V, & Gammerman A (2018b) Cross-conformal predictive distributions. In conformal and probabilistic prediction and applications (pp 37–51)
Vovk V, Petej I, Toccaceli P, Gammerman A, Ahlberg E, & Carlsson L (2020b). Conformal calibrators. In conformal and probabilistic prediction and applications (pp 84–99)
Vovk V (2019) Universally consistent conformal predictive distributions. In conformal and probabilistic prediction and applications (pp 105–122)
Wang JS, Han S, Guo QP (2014) Echo state networks based predictive model of vinyl chloride monomer convention velocity optimized by artificial fish swarm algorithm. Soft Comput 18(3):457–468. https://doi.org/10.1007/s00500-013-1068-9
Wang D, Wang P, Shi JZ (2018) A fast and efficient conformal regressor with regularized extreme learning machine. Neurocomputing 304:1–11. https://doi.org/10.1016/j.neucom.2018.04.012
Wang D, Wang P, Yuan Y, Wang P, Shi J (2020) A fast conformal predictive system with regularized extreme learning machine. Neural Netw 126:347–361. https://doi.org/10.1016/j.neunet.2020.03.022
Yan D, Chu Y, Zhang H, Liu D (2018) Information discriminative extreme learning machine. Soft Comput 22(2):677–689. https://doi.org/10.1007/S00500-016-2372-Y
Yuen R A, Baran S, Fraley C, Gneiting T, Lerch S, Scheuerer M, & Thorarinsdottir T L (2018) ensembleMOS: Ensemble model output statistics. R package version 0.8.2
Zhai JH, Xu HY, Wang XZ (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16(9):1493–1502. https://doi.org/10.1007/s00500-012-0824-6
Acknowledgements
The authors would like to thank the anonymous editor and reviewers for their valuable comments and suggestions which improved this work.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62106169, 72261147706, 72231005 and 61972282.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Wei Zhang and Di Wang. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Wei Zhang, Zhen He and Di Wang declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, W., He, Z. & Wang, D. A conformal predictive system for distribution regression with random features. Soft Comput 27, 11789–11800 (2023). https://doi.org/10.1007/s00500-023-07859-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-07859-w