Advertisement

Scaling Up Semi-supervised Learning: An Efficient and Effective LLGC Variant

  • Bernhard Pfahringer
  • Claire Leschi
  • Peter Reutemann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4426)

Abstract

Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi- supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the total number of both labeled and unlabeled training examples. In this paper we apply modifications to the standard LLGC algorithm to improve efficiency to a point where we can handle datasets with hundreds of thousands of training data. The modifications are priming of the unlabeled data, and most importantly, sparsification of the similarity matrix. We report promising results on large text classification problems.

Keywords

Support Vector Machine Unlabeled Data Neural Information Processing System Linear Support Vector Machine Global Consistency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. Technical Report RC23462, IBM T.J. Watson Research Center, Yorktown Heights, NY, USA (2004)Google Scholar
  2. 2.
    Balcan, M.-F., Blum, A.: On a theory of learning with similarity functions. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania, pp. 73–80. ACM Press, New York (2006), doi:10.1145/1143844.1143854CrossRefGoogle Scholar
  3. 3.
    Balcan, M.-F., et al.: Person identification in webcam images: an application of semi-supervised learning. In: Proc. of the 22nd International Conference on Machine Learning (ICML 05), Workshop on Learning with Partially Classified Training Data, Bonn, Germany, August 2005, pp. 1–9 (2005)Google Scholar
  4. 4.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML ’06: Proceedings of the 23rd international conference on Machine learning, Pittsburgh, Pennsylvania, ACM Press, New York (2006), doi:10.1145/1143844.1143854Google Scholar
  5. 5.
    Bickel, S. (ed.): Proceedings of the ECML/PKDD 2006 Discovery Challenge Workshop, Humboldt University Berlin (2006)Google Scholar
  6. 6.
    Blum, A., Chawla, S.: Learning from labeled and unlabeled data using graph mincuts. In: Brodley, C.E., Pohoreckyj Danyluk, A. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Morgan Kaufmann, San Francisco (2001)Google Scholar
  7. 7.
    Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT), Madison, Wisconsin, USA, July 1998, pp. 92–100 (1998)Google Scholar
  8. 8.
    Breitenbach, M., Grudic, G.Z.: Clustering with local and global consistency. Technical Report CU-CS-973-04, University of Colorado, Department of Computer Science (2004)Google Scholar
  9. 9.
    Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 585–592. MIT Press, Cambridge (2002)Google Scholar
  10. 10.
    Chapelle, O., Zien, A.: Semi-supervised learning by low density separation. In: Proc. of the 10th International Workshop on Artificial Intelligence and Statistics (AISTATS), Barbados, January 2005, pp. 57–64 (2005)Google Scholar
  11. 11.
    Delalleau, O., Bengio, Y., Roux, N.L.: Efficient non-parametric function induction in semi-supervised learning. In: Proceedings of the 10th International Workshop on Artificial Intelligence and statistics (AISTAT 2005) (2005)Google Scholar
  12. 12.
    Driessens, K., et al.: Using weighted nearest neighbor to benefit from unlabeled data. In: Ng, W.-K., et al. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Garcke, J., Griebel, M.: Semi-supervised learning with sparse grids. In: Proceedings of the Workshop on Learning with Partially Classified Training Data (ICML2005), Bonn, Germany (2005)Google Scholar
  14. 14.
    Huang, T.M., Kecman, V.: Performance comparisons of semi-supervised learning algorithms. In: Proc. of the 22nd International Conference on Machine Learning (ICML 05), Workshop on Learning with Partially Classified Training Data, Bonn, Germany, August 2005, pp. 45–49 (2005)Google Scholar
  15. 15.
    Joachims, T.: Transductive inference for text classification using support vector machines. In: Bratko, I., Dzeroski, S. (eds.) Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), Bled, Slovenia, June 27-30, 1999, pp. 200–209. Morgan Kaufmann, San Francisco (1999)Google Scholar
  16. 16.
    Joachims, T.: Transductive learning via spectral graph partitioning. In: Fawcett, T., Mishra, N. (eds.) Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003), Washington, DC, USA, August 21-24, 2003, pp. 290–297. AAAI Press, Menlo Park (2003)Google Scholar
  17. 17.
    Jones, R.: Learning to extract entities from labeled and unlabeled text. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania, USA (2005)Google Scholar
  18. 18.
    Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Sammut, C., Hoffmann, A.G. (eds.) Machine Learning, Proceedings of the Nineteenth International Conference (ICML) (2002)Google Scholar
  19. 19.
    Lewis, D., et al.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)Google Scholar
  20. 20.
    Mahdavani, M., et al.: Fast computation methods for visually guided robots. In: Proceedings of the The 2005 International Conference on Robotics and Automation (ICRA) (2005)Google Scholar
  21. 21.
    Ng, A.Y., Jordan, M.T., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 849–856. MIT Press, Cambridge (2001)Google Scholar
  22. 22.
    Nigam, K., et al.: Text classification from labeled and unlabeled documents using em. Machine Learning 39(2/3) (2000)Google Scholar
  23. 23.
    Oliveira, C.S., Cozman, F.G., Cohen, I.: Splitting the unsupervised and supervised components of semi-supervised learning. In: Proc. of the 22nd International Conference on Machine Learning (ICML 05), Workshop on Learning with Partially Classified Training Data, Bonn, Germany, August 2005, pp. 67–73 (2005)Google Scholar
  24. 24.
    Pfahringer, B.: A semi-supervised spam mail detector. In: Bickel, S. (ed.) Proceedings of the ECML/PKDD 2006 Discovery Challenge Workshop, Humboldt University Berlin, pp. 48–53 (2006)Google Scholar
  25. 25.
    Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: 7th IEEE Workshop on Applications of Computer Vision, pp. 29–36. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  26. 26.
    Seeger, M.: Learning from labeled and unlabeled data. Technical report, University of Edinburgh, Institute for Adaptive and Neural Computation (2001)Google Scholar
  27. 27.
    Smola, A.J., Kondor, R.: Kernels and regularization on graphs. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 144–158. Springer, Heidelberg (2003)Google Scholar
  28. 28.
    Szummer, M., Jaakkola, T.: Partially labeled classification with markov random walks. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14, pp. 945–952. MIT Press, Cambridge (2001)Google Scholar
  29. 29.
    Vapnik, V.N.: Statistical learning theory. J. Wilsley, New York (1998)zbMATHGoogle Scholar
  30. 30.
    Vinueza, A., Grudic, G.Z.: Unsupervised outlier detection and semi-supervised learning. Technical Report CU-CS-976-04, University of Colorado, Department of Computer Science (2004)Google Scholar
  31. 31.
    Weston, J., et al.: Semi-supervised protein classification using cluster kernels. Bioinformatics 21(15), 3241–3247 (2005)CrossRefGoogle Scholar
  32. 32.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 189–196 (1995)Google Scholar
  33. 33.
    Yu, K., Yu, S., Tresp, V.: Blockwise supervised inference on large graphs. In: Proc. of the 22nd International Conference on Machine Learning, Workshop on Learning with Partially Classified Training Data, Bonn, Germany (2005)Google Scholar
  34. 34.
    Zhou, D., et al.: Learning with local and global consistency. In: Thrun, S.Y., Lawrence, K.S., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)Google Scholar
  35. 35.
    Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proc. of the 22nd International Conference on Machine Learning (ICML 05), Bonn, Germany, August 2005, pp. 1041–1048 (2005)Google Scholar
  36. 36.
    Zhou, D., et al.: Ranking on data manifolds. In: Thrun, S.Y., Lawrence, K.S., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)Google Scholar
  37. 37.
    Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)Google Scholar
  38. 38.
    Zhu, X.: Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, Pennsylvania, USA (2005)Google Scholar
  39. 39.
    Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised searning using gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Machine Learning, Proceedings of the Twentieth International Conference (ICML) (2003)Google Scholar
  40. 40.
    Zhu, X., Lafferty, J.: Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In: Proceedings of the 22nd International Conference on Machine Learning (ICML2005) (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Bernhard Pfahringer
    • 1
  • Claire Leschi
    • 2
  • Peter Reutemann
    • 1
  1. 1.Department of Computer Science, University of Waikato, HamiltonNew Zealand
  2. 2.INSA LyonFrance

Personalised recommendations