Skip to main content

Levelwise Data Disambiguation by Cautious Superset Classification

  • 329 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 13562)


Drawing conclusions from set-valued data calls for a trade-off between caution and precision. In this paper, we propose a way to construct a hierarchical family of subsets within set-valued categorical observations. Each subset corresponds to a level of cautiousness, the smallest one as a singleton representing the most optimistic choice. To achieve this, we extend the framework of Optimistic Superset Learning (OSL), which disambiguates set-valued data by determining the singleton corresponding to the most predictive model. We utilize a variant of OSL for classification with 0/1 loss to find the instantiations whose corresponding empirical risks are below context-depending thresholds. Varying this threshold induces a hierarchy among those instantiations. In order to rule out ties corresponding to the same classification error, we utilize a hyperparameter of Support Vector Machines (SVM) that controls the model’s complexity. We twist the tuning of this hyperparameter to find instantiations whose optimal separations have the greatest generality. Finally, we apply our method on the prototypical example of yet undecided political voters as set-valued observations. To this end, we use both simulated data and pre-election polls by Civey including undecided voters for the 2021 German federal election.


  • Optimistic superset learning
  • Set-valued data
  • Support vector machines
  • Data disambiguation
  • Epistemic imprecision
  • Undecided voters

We sincerely thank the polling institute Civey for providing the data as well as the anonymous reviewers for their valuable feedback and stimulating remarks. DK further thanks the LMU mentoring program for its support.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    Note that this formalization allows \(Y_i\) to also (partially) consist of singletons.

  2. 2.

    Notably, \(q = |\mathcal {Y}| - k\), where k is the number of categories in \(\mathcal {Y}\) that are not present in the data.

  3. 3.

    This subsetting of \(\textbf{Y}\) can be seen as a form of “data choice” similar to model choice.

  4. 4.

    Criterion (1) aims at a unique minimum. In general, in the light of the next section, we understand \(arg\,min\) potentially in a set-valued manner, i.e. giving the set of all elements where the minimum is attained.

  5. 5.

    The loss is called optimistic due to the minimum in (2): each prediction \(\hat{y}_i\) is assessed optimistically by assuming the most favorable ground-truth \(y \in Y_i\).

  6. 6.

    Notably, some models can be more informative on certain aspects of the data generating process than others. For instance, naive Bayes classifiers model the joint distribution \(\mathbb {P}(x,y)\) as opposed to standard regression models that are typically concerned with the conditional distribution \(\mathbb {P}(y|x)\).

  7. 7.

    Note that \(n \cdot \mathcal {R}_{emp}(\textbf{h}, \textbf{x}, \textbf{y}) \in \mathbb {N}\).

  8. 8.

    However, in [9, sect. 3.1] the class of models, thus the model’s hyperparameters, is fixed.

  9. 9.

    For multi-class classification (as in Sect. 5), hyperplanes from one-versus-all classifications are combined by a voting scheme and Platt scaling, for details see [11, pages 8–9]. When tuning with regard to C, one common C-value is used for all one-versus-all classifications.

  10. 10.

    For kernelized versions of SVMs this hyperplane is generally only linear in the transformed feature space. However, we can still think of C as a proxy for the generality of optimal separation in that transformed space.

  11. 11.

    We use Grid Search for solving this minimization problem. When evaluations are rather expensive, Bayesian Optimization, Simulated Annealing or Evolutionary Algorithms might be preferred. For an overview of these heuristic optimizers and their limitations, see [23, chapter 10].

  12. 12.

    Any clustering algorithm can be used. In our applications in Sect. 5, we opt for k-means clustering as proposed by [15].

  13. 13.

    The covariates appear to be generally of rather low predictive power : Training and generalization error, even exclusively for the decided, are high.


  1. Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  2. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    CrossRef  Google Scholar 

  3. Couso, I., Dubois, D.: Statistical reasoning with set-valued information: Ontic vs. epistemic views. Int. J. Approximate Reasoning 55, 1502–1518 (2014)

    Google Scholar 

  4. Couso, I., Sánchez, L.: Machine learning models, epistemic set-valued data and generalized loss functions: an encompassing approach. Inf. Sci. 358, 129–150 (2016)

    CrossRef  Google Scholar 

  5. Denœux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 25(1), 119–130 (2011)

    CrossRef  Google Scholar 

  6. Faas, T., Klingelhöfer, T.: The more things change, the more they stay the same? The German federal election of 2017 and its consequences. West Eur. Polit. 42(4), 914–926 (2019)

    CrossRef  Google Scholar 

  7. Gentile, C., Warmuth, M.: Linear hinge loss and average margin. In: Advances in Neural Information Processing Systems, vol. 11 (1998)

    Google Scholar 

  8. Hüllermeier, E.: Learning from imprecise and fuzzy observations: data disambiguation through generalized loss minimization. Int. J. Approximate Reasoning 55, 1519–1534 (2014)

    CrossRef  MathSciNet  Google Scholar 

  9. Hüllermeier, E., Cheng, W.: Superset learning based on generalized loss minimization. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9285, pp. 260–275. Springer, Cham (2015).

    CrossRef  Google Scholar 

  10. Hüllermeier, E., Destercke, S., Couso, I.: Learning from imprecise data: adjustments of optimistic and pessimistic variants. In: Ben Amor, N., Quost, B., Theobald, M. (eds.) SUM 2019. LNCS (LNAI), vol. 11940, pp. 266–279. Springer, Cham (2019).

    CrossRef  Google Scholar 

  11. Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab-an S4 package for kernel methods in R. J. Stat. Softw. 11(9), 1–20 (2004)

    CrossRef  Google Scholar 

  12. Kreiss, D., Augustin, T.: Undecided voters as set-valued information – towards forecasts under epistemic imprecision. In: Davis, J., Tabia, K. (eds.) SUM 2020. LNCS (LNAI), vol. 12322, pp. 242–250. Springer, Cham (2020).

    CrossRef  Google Scholar 

  13. Kreiss, D., Augustin, T.: Towards a paradigmatic shift in pre-election polling adequately including still undecided voters-some ideas based on set-valued data for the 2021 German federal election. arXiv preprint arXiv:2109.12069 (2021)

  14. Kreiss, D., Nalenz, M., Augustin, T.: Undecided voters as set-valued information, machine learning approaches under complex uncertainty. In: ECML/PKDD 2020 Tutorial and Workshop on Uncertainty in Machine Learning (2020)

    Google Scholar 

  15. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 129–137 (1982)

    CrossRef  MathSciNet  Google Scholar 

  16. Manski, C.: Partial Identification of Probability Distributions. Springer, Cham (2003)

    MATH  Google Scholar 

  17. Molchanov, I., Molinari, F.: Random Sets in Econometrics. Cambridge University Press, Cambridge (2018)

    CrossRef  Google Scholar 

  18. Nguyen, N., Caruana, R.: Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–559 (2008)

    Google Scholar 

  19. Oscarsson, H., Oskarson, M.: Sequential vote choice: applying a consideration set model of heterogeneous decision processes. Electoral Stud. 57, 275–283 (2019)

    Google Scholar 

  20. Plass, J., Cattaneo, M., Augustin, T., Schollmeyer, G., Heumann, C.: Reliable inference in categorical regression analysis for non-randomly coarsened observations. Int. Stat. Rev. 87, 580–603 (2019)

    CrossRef  MathSciNet  Google Scholar 

  21. Plass, J., Fink, P., Schöning, N., Augustin, T.: Statistical modelling in surveys without neglecting the undecided. In: ISIPTA 15, pp. 257–266. SIPTA (2015)

    Google Scholar 

  22. Ponomareva, M., Tamer, E.: Misspecification in moment inequality models: back to moment equalities? Econometrics J. 14, 186–203 (2011)

    CrossRef  MathSciNet  Google Scholar 

  23. Rodemann, J.: Robust generalizations of stochastic derivative-free optimization. Master’s thesis, LMU Munich (2021)

    Google Scholar 

  24. Schollmeyer, G., Augustin, T.: Statistical modeling under partial identification: distinguishing three types of identification regions in regression analysis with interval data. Int. J. Approximate Reasoning 56, 224–248 (2015)

    CrossRef  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Julian Rodemann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodemann, J., Kreiss, D., Hüllermeier, E., Augustin, T. (2022). Levelwise Data Disambiguation by Cautious Superset Classification. In: Dupin de Saint-Cyr, F., Öztürk-Escoffier, M., Potyka, N. (eds) Scalable Uncertainty Management. SUM 2022. Lecture Notes in Computer Science(), vol 13562. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18842-8

  • Online ISBN: 978-3-031-18843-5

  • eBook Packages: Computer ScienceComputer Science (R0)