Generalization Error Bounds Using Unlabeled Data

  • Matti Kääriäinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3559)

Abstract

We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that the method is of little use in practice due to its limitation to the realizable case.

The idea in our second method is to use unlabeled data to transform bounds for randomized classifiers into bounds for simpler deterministic classifiers. As a concrete example of how the general method works in practice, we apply it to a bound based on cross-validation. The result is a semi-supervised bound for classifiers learned based on all the labeled data. The bound is easy to implement and apply and should be tight whenever cross-validation makes sense. Applying the bound to SVMs on the MNIST benchmark data set gives results that suggest that the bound may be tight enough to be useful in practice.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Madani, O., Pennock, D.M., Flake, G.W.: Co-validation: Using model disagreement to validate classification algorithms. In: NIPS 2004 Preproceedings (2004)Google Scholar
  2. 2.
    Balcan, M.F., Blum, A.: A PAC-style model for learning from labeled and unlabeled data, Draft (2004)Google Scholar
  3. 3.
    Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recognition Letters 16, 105–111 (1995)CrossRefGoogle Scholar
  4. 4.
    Ratsaby, J., Venkatesh, S.S.: Learning from a mixture of labeled and unlabeled examples with parametric side information. In: Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT 1995), New York, NY, USA, pp. 412–417. ACM Press, New York (1995)CrossRefGoogle Scholar
  5. 5.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, New York (1998)MATHGoogle Scholar
  6. 6.
    Schuurmans, D., Southey, F.: Metric-based methods for adaptive model selection and regularization. Machine Learning 42(1–3), 51–84 (2002)CrossRefGoogle Scholar
  7. 7.
    Bengio, Y., Chapados, N.: Extensions to metric-based model selection. Journal of Machine Learning Research 3, 1209–1227 (2003)MATHCrossRefGoogle Scholar
  8. 8.
    Ben-David, S., Itai, A., Kushilevitz, E.: Learning by distances. Information and Computation 117, 240–250 (1995)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463–482 (2002)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Kääriäinen, M.: Relating the Rademacher and VC bounds. Technical Report Report C-2004-57, Department of Computer Science, Series of Publications C (2004)Google Scholar
  11. 11.
    Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984)MATHCrossRefGoogle Scholar
  12. 12.
    McAllester, D.A.: PAC-Bayesian stochastic model selection. Machine Learning 51, 5–21 (2003)MATHCrossRefGoogle Scholar
  13. 13.
    Cesa-Bianchi, N., Gentile, C.: Improved risk tail bounds for on-line algorithms A presentation in the (Ab)use of Bounds workshop (2004)Google Scholar
  14. 14.
    Blum, A., Kalai, A., Langford, J.: Beating the hold-out: bounds for k-fold and progressive cross-validation. In: Proceedings of the 12th Annual Conference on Computational Learning Theory, New York, NY, pp. 203–208. ACM Press, New York (1999)Google Scholar
  15. 15.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Applications of Mathematics, vol. 31. Springer, Heidelberg (1996)MATHGoogle Scholar
  16. 16.
    Langford, J.: Practical prediction theory for classification (2003), A tutorial presented at ICML 2003., Available at http://hunch.net/jl/projects/prediction_bounds/tutorial/tutorial.pdf
  17. 17.
    Kutin, S., Niyogi, P.: Almost-everywhere algorithmic stability and generalization error. In: Proceedings of Uncertainty in AI, pp. 275–282 (2002)Google Scholar
  18. 18.
    Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods – Support Vector Learning, MIT Press, Cambridge (1999)Google Scholar
  19. 19.
    Seeger, M.: Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Matti Kääriäinen
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations