Learning from Partially Annotated Sequences

Fernandes, Eraldo R.; Brefeld, Ulf

doi:10.1007/978-3-642-23780-5_36

Eraldo R. Fernandes²³ &
Ulf Brefeld²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6911))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3002 Accesses
8 Citations

Abstract

We study sequential prediction models in cases where only fragments of the sequences are annotated with the ground-truth. The task does not match the standard semi-supervised setting and is highly relevant in areas such as natural language processing, where completely labeled instances are expensive and require editorial data. We propose to generalize the semi-supervised setting and devise a simple transductive loss-augmented perceptron to learn from inexpensive partially annotated sequences that could for instance be provided by laymen, the wisdom of the crowd, or even automatically. Experiments on mono- and cross-lingual named entity recognition tasks with automatically generated partially annotated sentences from Wikipedia demonstrate the effectiveness of the proposed approach. Our results show that learning from partially labeled data is never worse than standard supervised and semi-supervised approaches trained on data with the same ratio of labeled and unlabeled tokens.

Download to read the full chapter text

Chapter PDF

A Two-Stage Deep Neural Network for Sequence Labeling

Named Entity Recognizer Trainable from Partially Annotated Data

Using Wikipedia for Cross-Language Named Entity Recognition

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Altun, Y., McAllester, D., Belkin, M.: Maximum margin semi–supervised learning for structured variables. In: Advances in Neural Information Processing Systems (2006)
Google Scholar
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov support vector machines. In: Proceedings of the International Conference on Machine Learning (2003)
Google Scholar
Atserias, J., Zaragoza, H., Ciaramita, M., Attardi, G.: Semantically annotated snapshot of the english wikipedia. In: European Language Resources Association (ELRA), editor, Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (May 2008)
Google Scholar
Baluja, S.: Probabilistic modeling for face orientation discrimination: Learning from labeled and unlabeled data. In: Advances in Neural Information Processing Systems (1998)
Google Scholar
Cao, L., Chen, C.W.: A novel product coding and recurrent alternate decoding scheme for image transmission over noisy channels. IEEE Transactions on Communications 51(9), 1426–1431 (2003)
Article Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi–supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar
Collins, M.: Discriminative reranking for natural language processing. In: Proceedings of the International Conference on Machine Learning (2000)
Google Scholar
Collins, M.: Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (2002)
Google Scholar
Dietterich, T.G.: Machine learning for sequential data: A review. In: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (2002)
Google Scholar
Do, T.-M.-T., Artieres, T.: Large margin training for hidden Markov models with partially observed states. In: Proceedings of the International Conference on Machine Learning (2009)
Google Scholar
Forney, G.D.: The Viterbi algorithm. Proceedings of IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Hammersley, J.M., Clifford, P.E.: Markov random fields on finite graphs and lattices (1971) (unpublished manuscript)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the International Conference on Machine Learning (1999)
Google Scholar
Juang, B., Rabiner, L.: Hidden Markov models for speech recognition. Technometrics 33, 251–272 (1991)
Article MathSciNet MATH Google Scholar
King, T.H., Dipper, S., Frank, A., Kuhn, J., Maxwell, J.: Ambiguity management in grammar writing. In: Proceedings of the ESSLLI 2000 Workshop on Linguistic Theory and Grammar Implementation (2000)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (2001)
Google Scholar
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: Proceedings of the International Conference on Machine Learning (2004)
Google Scholar
Lee, C., Wang, S., Jiao, F., Greiner, R., Schuurmans, D.: Learning to model spatial dependency: Semi-supervised discriminative random fields. In: Advances in Neural Information Processing Systems (2007)
Google Scholar
McAllester, D., Hazan, T., Keshet, J.: Direct loss minimization for structured perceptronsi. In: Advances in Neural Information Processing Systems (2011)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the International Conference on Machine Learning (2000)
Google Scholar
Mika, P., Ciaramita, M., Zaragoza, H., Atserias, J.: Learning to tag and tagging to learn: A case study on wikipedia. IEEE Intelligent Systems 23, 26–33 (2008)
Article Google Scholar
Mukherjee, S., Ramakrishnan, I.V.: Taming the unstructured: Creating structured content from partially labeled schematic text sequences. In: Chung, S. (ed.) OTM 2004. LNCS, vol. 3291, pp. 909–926. Springer, Heidelberg (2004)
Chapter Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Nothman, J., Murphy, T., Curran, J.R.: Analysing wikipedia and gold-standard corpora for ner training. In: EACL 2009: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 612–620. Association for Computational Linguistics, Morristown (2009)
Google Scholar
Novikoff, A.B.: On convergence proofs on perceptrons. In: Proceedings of the Symposium on the Mathematical Theory of Automata (1962)
Google Scholar
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–285 (1989)
Article Google Scholar
Richman, A.E., Schone, P.: Mining wiki resources for multilingual named entity recognition. In: Proceedings of ACL 2008: HLT, pp. 1–9. Association for Computational Linguistics, Columbus (2008)
Google Scholar
Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: language-independent named entity recognition. In: COLING-2002: Proceedings of the 6th Conference on Natural Language Learning, pp. 1–4. Association for Computational Linguistics, Morristown (2002)
Google Scholar
Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of CoNLL 2003, pp. 142–147 (2003)
Google Scholar
Scheffer, T., Wrobel, S.: Active hidden Markov models for information extraction. In: Proceedings of the International Symposium on Intelligent Data Analysis (2001)
Google Scholar
Schwarz, R., Chow, Y.L.: The n-best algorithm: An efficient and exact procedure for finding the n most likely hypotheses. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (1990)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max–margin Markov networks. In: Advances in Neural Information Processing Systems (2004)
Google Scholar
Truyen, T.T., Bui, H.H., Phung, D.Q., Venkatesh, S.: Learning discriminative sequence models from partially labelled data for activity recognition. In: Proceedings of the Pacific Rim International Conference on Artificial Intelligence (2008)
Google Scholar
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research 6, 1453–1484 (2005)
MathSciNet MATH Google Scholar
Xu, L., Wilkinson, D., Southey, F., Schuurmans, D.: Discriminative unsupervised learning of structured predictors. In: Proceedings of the International Conference on Machine Learning (2006)
Google Scholar
Yu, C.-N., Joachims, T.: Learning structural svms with latent variables. In: Proceedings of the International Conference on Machine Learning (2009)
Google Scholar
Zien, A., Brefeld, U., Scheffer, T.: Transductive support vector machines for structured variables. In: Proceedings of the International Conference on Machine Learning (2007)
Google Scholar
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Advances in Neural Information Processing Systems, vol. 23 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Pontifícia Universidade Católica do Rio de Janeiro, Brazil
Eraldo R. Fernandes
Yahoo! Research, Barcelona, Spain
Ulf Brefeld

Authors

Eraldo R. Fernandes
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Brefeld
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandes, E.R., Brefeld, U. (2011). Learning from Partially Annotated Sequences. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23780-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-23780-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23779-9
Online ISBN: 978-3-642-23780-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from Partially Annotated Sequences

Abstract

Chapter PDF

Similar content being viewed by others

A Two-Stage Deep Neural Network for Sequence Labeling

Named Entity Recognizer Trainable from Partially Annotated Data

Using Wikipedia for Cross-Language Named Entity Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning from Partially Annotated Sequences

Abstract

Chapter PDF

Similar content being viewed by others

A Two-Stage Deep Neural Network for Sequence Labeling

Named Entity Recognizer Trainable from Partially Annotated Data

Using Wikipedia for Cross-Language Named Entity Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation