Abstract
We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.
Chapter PDF
Similar content being viewed by others
Keywords
- Hide Markov Model
- Unlabeled Data
- Neural Information Processing System
- Entity Recognition
- Annotate Sequence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Altun, Y., McAllester, D., Belkin, M.: Maximum margin semi–supervised learning for structured variables. In: Advances in Neural Information Processing Systems (2006)
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the International Conference on Machine Learning (2003)
Atserias, J., Zaragoza, H., Ciaramita, M., Attardi, G.: Semantically annotated snapshot of the english wikipedia. In: European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (May 2008)
Baluja, S.: Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data. In: Advances in Neural Information Processing Systems (1998)
Cao, L., Chen, C.W.: A novel product coding and recurrent alternate decoding scheme for image transmission over noisy channels. IEEE Transactions on Communications 51(9), 1426–1431 (2003)
Chapelle, O., Schölkopf, B., Zien, A.: Semi–supervised Learning. MIT Press, Cambridge (2006)
Collins, M.: Discriminative reranking for natural language processing. In: Proceedings of the International Conference on Machine Learning (2000)
Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)
Dietterich, T.G.: Machine learning for sequential data: A review. In: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (2002)
Do, T.-M.-T., Artieres, T.: Large margin training for hidden Markov models with partially observed states. In: Proceedings of the International Conference on Machine Learning (2009)
Forney, G.D.: The Viterbi algorithm. Proceedings of IEEE 61(3), 268–278 (1973)
Hammersley, J.M., Clifford, P.E.: Markov random fields on finite graphs and lattices (1971) (unpublished manuscript)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the International Conference on Machine Learning (1999)
Juang, B., Rabiner, L.: Hidden Markov models for speech recognition. Technometrics 33, 251–272 (1991)
King, T.H., Dipper, S., Frank, A., Kuhn, J., Maxwell, J.: Ambiguity management in grammar writing. In: Proceedings of the ESSLLI 2000 Workshop on Linguistic Theory and Grammar Implementation (2000)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (2001)
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proceedings of the International Conference on Machine Learning (2004)
Lee, C., Wang, S., Jiao, F., Greiner, R., Schuurmans, D.: Learning to model spatial dependency: Semi-supervised discriminative random fields. In: Advances in Neural Information Processing Systems (2007)
McAllester, D., Hazan, T., Keshet, J.: Direct loss minimization for structured perceptronsi. In: Advances in Neural Information Processing Systems (2011)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (2000)
Mika, P., Ciaramita, M., Zaragoza, H., Atserias, J.: Learning to tag and tagging to learn: A case study on wikipedia. IEEE Intelligent Systems 23, 26–33 (2008)
Mukherjee, S., Ramakrishnan, I.V.: Taming the unstructured: Creating structured content from partially labeled schematic text sequences. In: Chung, S. (ed.) OTM 2004. LNCS, vol. 3291, pp. 909–926. Springer, Heidelberg (2004)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Nothman, J., Murphy, T., Curran, J.R.: Analysing wikipedia and gold-standard corpora for ner training. In: EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 612–620. Association for Computational Linguistics, Morristown (2009)
Novikoff, A.B.: On convergence proofs on perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata (1962)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)
Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: Proceedings of ACL 2008: HLT, pp. 1–9. Association for Computational Linguistics, Columbus (2008)
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: COLING-2002: Proceedings of the 6th Conference on Natural Language Learning, pp. 1–4. Association for Computational Linguistics, Morristown (2002)
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2003, pp. 142–147 (2003)
Scheffer, T., Wrobel, S.: Active hidden Markov models for information extraction. In: Proceedings of the International Symposium on Intelligent Data Analysis (2001)
Schwarz, R., Chow, Y.L.: The n-best algorithm: An efficient and exact procedure for finding the n most likely hypotheses. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (1990)
Taskar, B., Guestrin, C., Koller, D.: Max–margin Markov networks. In: Advances in Neural Information Processing Systems (2004)
Truyen, T.T., Bui, H.H., Phung, D.Q., Venkatesh, S.: Learning discriminative sequence models from partially labelled data for activity recognition. In: Proceedings of the Pacific Rim International Conference on Artificial Intelligence (2008)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)
Xu, L., Wilkinson, D., Southey, F., Schuurmans, D.: Discriminative unsupervised learning of structured predictors. In: Proceedings of the International Conference on Machine Learning (2006)
Yu, C.-N., Joachims, T.: Learning structural svms with latent variables. In: Proceedings of the International Conference on Machine Learning (2009)
Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, vol. 23 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fernandes, E.R., Brefeld, U. (2011). Learning from Partially Annotated Sequences. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-23780-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)