Advertisement

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

  • Mohamed Farouk Abdel Hady
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5997)

Abstract

Semi-supervised learning reduces the cost of labeling the training data of a supervised learning algorithm through using unlabeled data together with labeled data to improve the performance. Co-Training is a popular semi-supervised learning algorithm, that requires multiple redundant and independent sets of features (views). In many real-world application domains, this requirement can not be satisfied. In this paper, a single-view variant of Co-Training, CoBC (Co-Training by Committee), is proposed, which requires an ensemble of diverse classifiers instead of the redundant and independent views. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combines the merits of committee-based semi-supervised learning and committee-based active learning. An empirical study on handwritten digit recognition is conducted where the random subspace method (RSM) is used to create ensembles of diverse C4.5 decision trees. Experiments show that these two combinations outperform the other non committee-based ones.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proc. of the 9th Int. Conf. on Information and knowledge management, New York, NY, USA, pp. 86–93 (2000)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Annual Conf. on Computational Learning Theory (COLT 1998), pp. 92–100. Morgan Kaufmann Publishers, San Francisco (1998)CrossRefGoogle Scholar
  3. 3.
    Muslea, I., Minton, S., Knoblock, C.A.: Selective sampling with redundant views. In: Proc. of the 17th National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 621–626 (2000)Google Scholar
  4. 4.
    Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the Query by Committee algorithm. Machine Learning 28(2-3), 133–168 (1997)MATHCrossRefGoogle Scholar
  5. 5.
    McCallum, A.K., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proc. of the 15th Int. Conf. on Machine Learning (ICML 1998), pp. 350–358. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  6. 6.
    Muslea, I., Minton, S., Knoblock, C.A.: Active + Semi-Supervised learning = robust multi-view learning. In: Proc. of the 19th Int. Conf. on Machine Learning (ICML 2002), pp. 435–442 (2002)Google Scholar
  7. 7.
    Zhou, Z.H., Chen, K.J., Jiang, Y.: Exploiting unlabeled data in content-based image retrieval. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 525–536. Springer, Heidelberg (2004)Google Scholar
  8. 8.
    Zhou, Z.H., Li, M.: Semi-supervised learning by disagreement. Knowledge and Information Systems (in press)Google Scholar
  9. 9.
    Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. on Systems, Man and Cybernetics- Part A: Systems and Humans 37(6), 1088–1098 (2007)CrossRefGoogle Scholar
  10. 10.
    Blake, C., Merz, C.: UCI repository of machine learning databases. University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  11. 11.
    Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  12. 12.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mohamed Farouk Abdel Hady
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany

Personalised recommendations