Efficient Inference in Large Conditional Random Fields

  • Trevor Cohn
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Conditional Random Fields (CRFs) are widely known to scale poorly, particularly for tasks with large numbers of states or with richly connected graphical structures. This is a consequence of inference having a time complexity which is at best quadratic in the number of states. This paper describes a novel parameterisation of the CRF which ties the majority of clique potentials, while allowing individual potentials for a subset of the labellings. This has two beneficial effects: the parameter space of the model (and thus the propensity to over-fit) is reduced, and the time complexity of training and decoding becomes sub-quadratic. On a standard natural language task, we reduce CRF training time four-fold, with no loss in accuracy. We also show how inference can be performed efficiently in richly connected graphs, in which current methods are intractable.


Belief Propagation Conditional Random Field Parse Tree Factor Graph Observation Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)Google Scholar
  2. 2.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, pp. 213–220 (2003)Google Scholar
  3. 3.
    Cohn, T., Smith, A., Osborne, M.: Scaling conditional random fields using error-correcting codes. In: Proceedings of ACL (2005)Google Scholar
  4. 4.
    Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of CoNLL, pp. 49–55 (2002)Google Scholar
  5. 5.
    Kschischang, F., Frey, B., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Trans. Inform. Theory 47, 498–519 (2001)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating predicate argument structure. In: Proceedings of ARPA Human Language Technology Workshop (1994)Google Scholar
  7. 7.
    Collins, M.: Ranking algorithms for named entity extraction: Boosting and the voted perceptron. In: Proceedings of ACL, pp. 489–496 (2002)Google Scholar
  8. 8.
    Gillick, L., Cox, S.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, pp. 532–535 (1989)Google Scholar
  9. 9.
    Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, pp. 252–259 (2003)Google Scholar
  10. 10.
    Palmer, M., Gildea, D., Kingsbury, P.: The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31(1), 71–105 (2005)CrossRefGoogle Scholar
  11. 11.
    Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In: Proceedings of CoNLL, pp. 152–164 (2005)Google Scholar
  12. 12.
    Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J., Jurafsky, D.: Support vector learning for semantic argument classification. Machine Learning journal, Special issue on Speech and Natural Language Processing 60(1), 11–39 (2005)Google Scholar
  13. 13.
    Xue, N., Palmer, M.: Calibrating features for semantic role labeling. In: Proceedings of EMNLP, pp. 88–94 (2004)Google Scholar
  14. 14.
    Roth, D., Yih, W.: Integer linear programming inference for conditional random fields. In: Proceedings of ICML, pp. 737–744 (2005)Google Scholar
  15. 15.
    Siddiqi, S., Moore, A.: Fast inference and learning in large-state-space HMMs. In: Proceedings of ICML, pp. 800–807 (2005)Google Scholar
  16. 16.
    Pal, C., Sutton, C., McCallum, A.: Sparse forward-backward using minimum divergence beams for fast training of conditional random fields. In: Proceedings of ICASSP (2006)Google Scholar
  17. 17.
    Roark, B., Saraclar, M., Collins, M., Johnson, M.: Discriminative language modeling with conditional random fields and the perceptron algorithm. In: Proceedings of ACL, pp. 48–55 (2004)Google Scholar
  18. 18.
    Wu, J., Khudanpur, S.: Efficient training methods for maximum entropy language modelling. In: Proceedings of the ICSLP, pp. 114–117 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Trevor Cohn
    • 1
  1. 1.School of InformaticsUniversity of EdinburghUnited Kingdom

Personalised recommendations