Implicitly Constrained Semi-supervised Least Squares Classification

  • Jesse H. Krijthe
  • Marco Loog
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9385)


We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.



Part of this work was funded by project P23 of the Dutch public-private research community COMMIT.


  1. 1.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013).
  2. 2.
    Bennett, K.P., Demiriz, A.: Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 11, 368–374 (1998)Google Scholar
  3. 3.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) COMPSTAT 2010, pp. 177–186. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-Supervised Learning. MIT press, Cambridge (2006)CrossRefGoogle Scholar
  6. 6.
    Cozman, F., Cohen, I.: Risks of semi-supervised learning. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, Chap. 4, pp. 56–72. MIT press (2006)Google Scholar
  7. 7.
    Cozman, F.G., Cohen, I., Cirelo, M.C.: Semi-supervised learning of mixture models. In: Proceedings of the Twentieth International Conference on Machine Learning (2003)Google Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Spinger, New York (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Krijthe, J.H., Loog, M.: Implicitly constrained semi-supervised linear discriminant analysis. In: International Conference on Pattern Recognition, pp. 3762–3767, Stockholm (2014)Google Scholar
  10. 10.
    Li, Y.F., Zhou, Z.H.: Towards making unlabeled data never hurt. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 175–188 (2015)CrossRefGoogle Scholar
  11. 11.
    Loog, M., Jensen, A.: Semi-supervised nearest mean classification through a constrained log-likelihood. IEEE Trans. Neural Networks Learn. Syst. 26(5), 995–1006 (2015)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Loog, M.: Semi-supervised linear discriminant analysis through moment-constraint parameter estimation. Pattern Recognit. Lett. 37, 24–31 (2014)CrossRefGoogle Scholar
  13. 13.
    McLachlan, G.J.: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Am. Stat. Assoc. 70(350), 365–369 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 34, 1–34 (2000)zbMATHGoogle Scholar
  15. 15.
    Opper, M., Kinzel, W.: Statistical mechanics of generalization. In: Domany, E., Hemmen, J.L., Schulten, K. (eds.) Models of Neural Networks III, pp. 151–209. Springer, New York (1996)CrossRefGoogle Scholar
  16. 16.
    Poggio, T., Smale, S.: The mathematics of learning: dealing with data. Not. AMS 50, 537–544 (2003)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Raudys, S., Duin, R.P.: Expected classification error of the fisher linear classifier with pseudo-inverse covariance matrix. Pattern Recogn. Lett. 19(5–6), 385–392 (1998)CrossRefzbMATHGoogle Scholar
  18. 18.
    Rifkin, R., Yeo, G., Poggio, T.: Regularized least-squares classification. Nato Sci. Ser. Sub Ser. III Comput. Syst. Sci. 190, 131–154 (2003)Google Scholar
  19. 19.
    Seeger, M.: Learning with labeled and unlabeled data. Technical report (2001)Google Scholar
  20. 20.
    Singh, A., Nowak, R.D., Zhu, X.: Unlabeled data: now it helps, now it doesnt. In: Advances in Neural Information Processing Systems, pp. 1513–1520 (2008)Google Scholar
  21. 21.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soci. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Widrow, B., Hoff, M.E.: Adaptive switching circuits. IRE WESCON Convention Rec. 4, 96–104 (1960)Google Scholar
  23. 23.
    Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning, vol. 3. Morgan & Claypool, San Rafael (2009) zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Pattern Recognition LaboratoryDelft University of TechnologyDelftThe Netherlands
  2. 2.Department of Molecular EpidemiologyLeiden University Medical CenterLeidenThe Netherlands
  3. 3.The Image GroupUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations