Training Conditional Random Fields with Unlabeled Data and Limited Number of Labeled Examples

  • Tak-Lam Wong
  • Wai Lam
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3930)


Conditional random fields is a probabilistic approach which has been applied to sequence labeling task achieving good performance. We attempt to extend the model so that human effort in preparing labeled training examples can be reduced by considering unlabeled data. Instead of maximizing the conditional likelihood, we aim at maximizing the likelihood of the observation of the sequences from both of the labeled and unlabeled data. We have conducted extensive experiments in two different data sets to evaluate the performance. The experimental results show that our model learned from both labeled and unlabeled data has a better performance over the model learned by only considering labeled training examples.


Hide Markov Model Unlabeled Data Conditional Random Field Label Training Chain Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21, 543–565 (1995)Google Scholar
  2. 2.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech taggind. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 133–142 (1996)Google Scholar
  3. 3.
    Bikel, D.M., Schwartz, R., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34, 211–231 (1995)CrossRefGoogle Scholar
  4. 4.
    Freitag, D., McCallum, A.: Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of the Seventh National Conference on Artifical Intelligence (AAAI), pp. 584–589 (2000)Google Scholar
  5. 5.
    Wong, T.L., Lam, W.: Hot item mining and summarization from multiple auction web sites. In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM), pp. 797–800 (2005)Google Scholar
  6. 6.
    Punyakanok, V., Roth, D.: The use of classifiers in sequential inference. Advances in Neural Information Processing Systems 13, 995–1001 (2000)Google Scholar
  7. 7.
    Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: The Second Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 192–199 (2001)Google Scholar
  8. 8.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of Eighteenth International Conference on Machine Learning (ICML), pp. 282–289 (2001)Google Scholar
  9. 9.
    McCallum, A.: Efficiently inducing features of conditional random fields. In: Proceedings of the Nineteenth Conference on Uncertainty in Articifical Intelligence (UAI), pp. 403–410 (2003)Google Scholar
  10. 10.
    Pietra, S.D., Pietra, V.D., Lafferty, J.: Inducing features of random fields. IEEE Transaction on Pattern Analysis and Machine Intelligence 19, 380–393 (1985)CrossRefGoogle Scholar
  11. 11.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 213–220 (2003)Google Scholar
  12. 12.
    Nigam, K., Mccallum, A.K., Thrun, S.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)MATHCrossRefGoogle Scholar
  13. 13.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100 (1998)Google Scholar
  14. 14.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley & Sons, Inc., Chichester (1997)MATHGoogle Scholar
  15. 15.
    Wong, T.L., Lam, W.: A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM), pp. 257–264 (2004)Google Scholar
  16. 16.
    Wong, T.L., Lam, W.: Learning to refine ontology for a new web site using a Bayesian approach. In: Proceedings of the 2005 SIAM International Conference on Data Mining (SDM), pp. 298–309 (2005)Google Scholar
  17. 17.
    McCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of Seventeenth International Conference on Machine Learning (ICML), pp. 591–598 (2000)Google Scholar
  18. 18.
    Sang, E.F.T.K., Buchholz, S.: Introduction to CoNLL-2000 shared task: Chunking. In: Proceedings of the Conference on Computational Natural Language Learning (CoNLL), pp. 127–132 (2000)Google Scholar
  19. 19.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The penn treebank. Computational Linguistics 19, 313–330 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tak-Lam Wong
    • 1
  • Wai Lam
    • 1
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatin, Hong Kong

Personalised recommendations