Consistency of Losses for Learning from Weak Labels

  • Jesús Cid-Sueiro
  • Darío García-García
  • Raúl Santos-Rodríguez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8724)


In this paper we analyze the consistency of loss functions for learning from weakly labelled data, and its relation to properness. We show that the consistency of a given loss depends on the mixing matrix, which is the transition matrix relating the weak labels and the true class. A linear transformation can be used to convert a conventional classification-calibrated (CC) loss into a weak CC loss. By comparing the maximal dimension of the set of mixing matrices that are admissible for a given CC loss with that for proper losses, we show that classification calibration is a much less restrictive condition than properness. Moreover, we show that while the transformation of conventional proper losses into a weak proper losses does not preserve convexity in general, conventional convex CC losses can be easily transformed into weak and convex CC losses. Our analysis provides a general procedure to construct convex CC losses, and to identify the set of mixing matrices admissible for a given transformation. Several examples are provided to illustrate our approach.


Entropy Manifold Hull Cali 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ambroise, C., Denoeux, T., Govaert, G., Smets, P.: Learning from an imprecise teacher: probabilistic and evidential approaches. In: Applied Stochastic Models and Data Analysis, vol. 1, pp. 100–105 (2001)Google Scholar
  2. 2.
    Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. Journal of the American Statistical Association 101(473), 138–156 (2006)CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Chittineni, C.: Learning with imperfectly labeled patterns. Pattern Recognition 12(5), 281–291 (1980)CrossRefMATHGoogle Scholar
  4. 4.
    Cid-Sueiro, J.: Proper losses for learning from partial labels. In: Advances in Neural Information Processing Systems 25, pp. 1574–1582 (2012)Google Scholar
  5. 5.
    Côme, E., Oukhellou, L., Denux, T., Aknin, P.: Mixture model estimation with soft labels. In: Dubois, D., Lubiano, M., Prade, H., Gil, M., Grzegorzewski, P., Hryniewicz, O. (eds.) Soft Methods for Handling Variability and Imprecision. AISC, vol. 48, pp. 165–174. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Cour, T., Sapp, B., Taskar, B.: Learning from partial labels. Journal of Machine Learning Research 12, 1225–1261 (2011)MathSciNetGoogle Scholar
  7. 7.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition, Applications of Mathematics, vol. 31. Springer (1997)Google Scholar
  8. 8.
    Grandvalet, Y.: Logistic regression for partial labels. In: 9th Information Processing and Management of Uncertainty in Knowledge-based System, pp. 1935–1941 (2002)Google Scholar
  9. 9.
    Grandvalet, Y., Bengio, Y.: Learning from partial labels with minimum entropy (2004)Google Scholar
  10. 10.
    Hüllermeier, E., Beringer, J.: Learning from ambiguously labeled examples. Intell. Data Anal. 10(5), 419–439 (2006)Google Scholar
  11. 11.
    Jin, R., Ghahramani, Z.: Learning with multiple labels. In: Advances in Neural Information Processing Systems 15, pp. 897–904 (2002)Google Scholar
  12. 12.
    Krishnan, T.: Efficiency of learning with imperfect supervision. Pattern Recognition 21(2), 183–188 (1988)CrossRefGoogle Scholar
  13. 13.
    Krishnan, T., Nandy, S.C.: Discriminant analysis with a stochastic supervisor. Pattern Recognition 20(4), 379–384 (1987)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Liang, P., Jordan, M., Klein, D.: Learning from measurements in exponential families. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 641–648. ACM (2009)Google Scholar
  15. 15.
    Lin, Y.: A note on margin-based loss functions in classification. Statistics & Probability Letters 68(1), 73–82 (2004)CrossRefMATHMathSciNetGoogle Scholar
  16. 16.
    Luo, J., Orabona, F.: Learning from candidate labeling sets. In: Advances in Neural Information Processing Systems 23, pp. 1504–1512 (2010)Google Scholar
  17. 17.
    Nguyen, N., Caruana, R.: Classification with partial labels. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–559. ACM, New York (2008)CrossRefGoogle Scholar
  18. 18.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. Journal of Machine Learning Research 99, 1297–1322 (2010)MathSciNetGoogle Scholar
  19. 19.
    Tewari, A., Bartlett, P.L.: On the consistency of multiclass classification methods. Journal of Machine Learning Research 8, 1007–1025 (2007)MATHMathSciNetGoogle Scholar
  20. 20.
    Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks, vol. 4, pp. 219–224 (1999)Google Scholar
  21. 21.
    Zhang, T.: Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research 5, 1225–1251 (2004)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Jesús Cid-Sueiro
    • 1
  • Darío García-García
    • 2
  • Raúl Santos-Rodríguez
    • 3
  1. 1.Universidad Carlos III de MadridSpain
  2. 2.OMNIA TeamCommonwealth Bank of AustraliaAustralia
  3. 3.Image Processing Lab.Univ. de ValenciaSpain

Personalised recommendations