Abstract
Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination. However, semi-supervised learning mechanisms that utilize inexpensive unlabeled sequences in addition to few labeled sequences – such as the Baum-Welch algorithm – are available only for generative models. The multi-view approach is based on the principle of maximizing the consensus among multiple independent hypotheses; we develop this principle into a semi-supervised hidden Markov perceptron, and a semi-supervised hidden Markov support vector learning algorithm. Experiments reveal that the resulting procedures utilize unlabeled data effectively and discriminate more accurately than their purely supervised counterparts.
Chapter PDF
Similar content being viewed by others
Keywords
- Unlabeled Data
- Neural Information Processing System
- Name Entity Recognition
- Word Sense Disambiguation
- Entity Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abney, S.: Bootstrapping. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)
Altun, Y., Hofmann, T., Smola, A.J.: Gaussian process classification for segmenting and annotating sequences. In: Proceedings of the International Conference on Machine Learning (2004)
Altun, Y., Johnson, M., Hofmann, T.: Discriminative learning for label sequences via boosting. In: Advances in Neural Information Processing Systems (2003)
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proc. of the International Conference on Machine Learning (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the Conference on Computational Learning Theory (1998)
Brefeld, U., Scheffer, T.: Co-em support vector learning. In: Proceedings of the International Conference on Machine Learning (2004)
Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems (2002)
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Dasgupta, S., Littman, M., McAllester, D.: PAC generalization bounds for co-training. In: Proceedings of Neural Information Processing Systems (2001)
de Sa, V.: Learning classification with unlabeled data. In: Proceedings of Neural Information Processing Systems (1994)
Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: Proceedings of the International Conference on Machine Learning (2002)
Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC Bioinformatics 6(1), S9 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic modesl for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (2001)
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proc. of the Int. Conference on Machine Learning (2004)
McAllester, D., Collins, M., Pereira, F.: Case-factor diagrams for structured probabilistic modeling. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2004)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (2000)
McDonald, R., Pereira, F.: Identifying gene and protein mentions in text using conditional random fields. In: Proceedings of the BioCreative Workshop (2004)
Muslea, I., Kloblock, C., Minton, S.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the International Conf. on Machine Learning (2002)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of Information and Knowledge Management (2000)
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems (2003)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the International Conference on Machine Learning (2004)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the Annual Meeting of the Association for Comp. Ling. (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brefeld, U., Büscher, C., Scheffer, T. (2005). Multi-view Discriminative Sequential Learning. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_11
Download citation
DOI: https://doi.org/10.1007/11564096_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)