Abstract
Drawing conclusions from set-valued data calls for a trade-off between caution and precision. In this paper, we propose a way to construct a hierarchical family of subsets within set-valued categorical observations. Each subset corresponds to a level of cautiousness, the smallest one as a singleton representing the most optimistic choice. To achieve this, we extend the framework of Optimistic Superset Learning (OSL), which disambiguates set-valued data by determining the singleton corresponding to the most predictive model. We utilize a variant of OSL for classification with 0/1 loss to find the instantiations whose corresponding empirical risks are below context-depending thresholds. Varying this threshold induces a hierarchy among those instantiations. In order to rule out ties corresponding to the same classification error, we utilize a hyperparameter of Support Vector Machines (SVM) that controls the model’s complexity. We twist the tuning of this hyperparameter to find instantiations whose optimal separations have the greatest generality. Finally, we apply our method on the prototypical example of yet undecided political voters as set-valued observations. To this end, we use both simulated data and pre-election polls by Civey including undecided voters for the 2021 German federal election.
Keywords
- Optimistic superset learning
- Set-valued data
- Support vector machines
- Data disambiguation
- Epistemic imprecision
- Undecided voters
We sincerely thank the polling institute Civey for providing the data as well as the anonymous reviewers for their valuable feedback and stimulating remarks. DK further thanks the LMU mentoring program for its support.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that this formalization allows \(Y_i\) to also (partially) consist of singletons.
- 2.
Notably, \(q = |\mathcal {Y}| - k\), where k is the number of categories in \(\mathcal {Y}\) that are not present in the data.
- 3.
This subsetting of \(\textbf{Y}\) can be seen as a form of “data choice” similar to model choice.
- 4.
Criterion (1) aims at a unique minimum. In general, in the light of the next section, we understand \(arg\,min\) potentially in a set-valued manner, i.e. giving the set of all elements where the minimum is attained.
- 5.
The loss is called optimistic due to the minimum in (2): each prediction \(\hat{y}_i\) is assessed optimistically by assuming the most favorable ground-truth \(y \in Y_i\).
- 6.
Notably, some models can be more informative on certain aspects of the data generating process than others. For instance, naive Bayes classifiers model the joint distribution \(\mathbb {P}(x,y)\) as opposed to standard regression models that are typically concerned with the conditional distribution \(\mathbb {P}(y|x)\).
- 7.
Note that \(n \cdot \mathcal {R}_{emp}(\textbf{h}, \textbf{x}, \textbf{y}) \in \mathbb {N}\).
- 8.
However, in [9, sect. 3.1] the class of models, thus the model’s hyperparameters, is fixed.
- 9.
- 10.
For kernelized versions of SVMs this hyperplane is generally only linear in the transformed feature space. However, we can still think of C as a proxy for the generality of optimal separation in that transformed space.
- 11.
We use Grid Search for solving this minimization problem. When evaluations are rather expensive, Bayesian Optimization, Simulated Annealing or Evolutionary Algorithms might be preferred. For an overview of these heuristic optimizers and their limitations, see [23, chapter 10].
- 12.
- 13.
The covariates appear to be generally of rather low predictive power : Training and generalization error, even exclusively for the decided, are high.
References
Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Couso, I., Dubois, D.: Statistical reasoning with set-valued information: Ontic vs. epistemic views. Int. J. Approximate Reasoning 55, 1502–1518 (2014)
Couso, I., Sánchez, L.: Machine learning models, epistemic set-valued data and generalized loss functions: an encompassing approach. Inf. Sci. 358, 129–150 (2016)
Denœux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 25(1), 119–130 (2011)
Faas, T., Klingelhöfer, T.: The more things change, the more they stay the same? The German federal election of 2017 and its consequences. West Eur. Polit. 42(4), 914–926 (2019)
Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: Advances in Neural Information Processing Systems, vol. 11 (1998)
Hüllermeier, E.: Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization. Int. J. Approximate Reasoning 55, 1519–1534 (2014)
Hüllermeier, E., Cheng, W.: Superset learning based on generalized loss minimization. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 260–275. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23525-7_16
Hüllermeier, E., Destercke, S., Couso, I.: Learning from imprecise data: adjustments of optimistic and pessimistic variants. In: Ben Amor, N., Quost, B., Theobald, M. (eds.) SUM 2019. LNCS (LNAI), vol. 11940, pp. 266–279. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35514-2_20
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)
Kreiss, D., Augustin, T.: Undecided voters as set-valued information – towards forecasts under epistemic imprecision. In: Davis, J., Tabia, K. (eds.) SUM 2020. LNCS (LNAI), vol. 12322, pp. 242–250. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58449-8_18
Kreiss, D., Augustin, T.: Towards a paradigmatic shift in pre-election polling adequately including still undecided voters-some ideas based on set-valued data for the 2021 German federal election. arXiv preprint arXiv:2109.12069 (2021)
Kreiss, D., Nalenz, M., Augustin, T.: Undecided voters as set-valued information, machine learning approaches under complex uncertainty. In: ECML/PKDD 2020 Tutorial and Workshop on Uncertainty in Machine Learning (2020)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)
Manski, C.: Partial Identification of Probability Distributions. Springer, Cham (2003)
Molchanov, I., Molinari, F.: Random Sets in Econometrics. Cambridge University Press, Cambridge (2018)
Nguyen, N., Caruana, R.: Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–559 (2008)
Oscarsson, H., Oskarson, M.: Sequential vote choice: applying a consideration set model of heterogeneous decision processes. Electoral Stud. 57, 275–283 (2019)
Plass, J., Cattaneo, M., Augustin, T., Schollmeyer, G., Heumann, C.: Reliable inference in categorical regression analysis for non-randomly coarsened observations. Int. Stat. Rev. 87, 580–603 (2019)
Plass, J., Fink, P., Schöning, N., Augustin, T.: Statistical modelling in surveys without neglecting the undecided. In: ISIPTA 15, pp. 257–266. SIPTA (2015)
Ponomareva, M., Tamer, E.: Misspecification in moment inequality models: back to moment equalities? Econometrics J. 14, 186–203 (2011)
Rodemann, J.: Robust generalizations of stochastic derivative-free optimization. Master’s thesis, LMU Munich (2021)
Schollmeyer, G., Augustin, T.: Statistical modeling under partial identification: distinguishing three types of identification regions in regression analysis with interval data. Int. J. Approximate Reasoning 56, 224–248 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rodemann, J., Kreiss, D., Hüllermeier, E., Augustin, T. (2022). Levelwise Data Disambiguation by Cautious Superset Classification. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham. https://doi.org/10.1007/978-3-031-18843-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-18843-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18842-8
Online ISBN: 978-3-031-18843-5
eBook Packages: Computer ScienceComputer Science (R0)