Advertisement

Multi-view Discriminative Sequential Learning

  • Ulf Brefeld
  • Christoph Büscher
  • Tobias Scheffer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)

Abstract

Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination. However, semi-supervised learning mechanisms that utilize inexpensive unlabeled sequences in addition to few labeled sequences – such as the Baum-Welch algorithm – are available only for generative models. The multi-view approach is based on the principle of maximizing the consensus among multiple independent hypotheses; we develop this principle into a semi-supervised hidden Markov perceptron, and a semi-supervised hidden Markov support vector learning algorithm. Experiments reveal that the resulting procedures utilize unlabeled data effectively and discriminate more accurately than their purely supervised counterparts.

Keywords

Unlabeled Data Neural Information Processing System Name Entity Recognition Word Sense Disambiguation Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Abney, S.: Bootstrapping. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)Google Scholar
  2. 2.
    Altun, Y., Hofmann, T., Smola, A.J.: Gaussian process classification for segmenting and annotating sequences. In: Proceedings of the International Conference on Machine Learning (2004)Google Scholar
  3. 3.
    Altun, Y., Johnson, M., Hofmann, T.: Discriminative learning for label sequences via boosting. In: Advances in Neural Information Processing Systems (2003)Google Scholar
  4. 4.
    Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proc. of the International Conference on Machine Learning (2003)Google Scholar
  5. 5.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the Conference on Computational Learning Theory (1998)Google Scholar
  6. 6.
    Brefeld, U., Scheffer, T.: Co-em support vector learning. In: Proceedings of the International Conference on Machine Learning (2004)Google Scholar
  7. 7.
    Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)Google Scholar
  8. 8.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems (2002)Google Scholar
  9. 9.
    Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)Google Scholar
  10. 10.
    Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for co-training. In: Proceedings of Neural Information Processing Systems (2001)Google Scholar
  11. 11.
    de Sa, V.: Learning classification with unlabeled data. In: Proceedings of Neural Information Processing Systems (1994)Google Scholar
  12. 12.
    Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: Proceedings of the International Conference on Machine Learning (2002)Google Scholar
  13. 13.
    Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC Bioinformatics 6(1), S9 (2005)CrossRefGoogle Scholar
  14. 14.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic modesl for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (2001)Google Scholar
  15. 15.
    Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proc. of the Int. Conference on Machine Learning (2004)Google Scholar
  16. 16.
    McAllester, D., Collins, M., Pereira, F.: Case-factor diagrams for structured probabilistic modeling. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2004)Google Scholar
  17. 17.
    McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (2000)Google Scholar
  18. 18.
    McDonald, R., Pereira, F.: Identifying gene and protein mentions in text using conditional random fields. In: Proceedings of the BioCreative Workshop (2004)Google Scholar
  19. 19.
    Muslea, I., Kloblock, C., Minton, S.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the International Conf. on Machine Learning (2002)Google Scholar
  20. 20.
    Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Information and Knowledge Management (2000)Google Scholar
  21. 21.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems (2003)Google Scholar
  22. 22.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning (2004)Google Scholar
  23. 23.
    Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the Annual Meeting of the Association for Comp. Ling. (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Ulf Brefeld
    • 1
  • Christoph Büscher
    • 1
  • Tobias Scheffer
    • 1
  1. 1.Department of Computer ScienceHumboldt-Universität zu BerlinBerlinGermany

Personalised recommendations