A Probabilistic Graphical Model for Recognizing NP Chunks in Texts

  • Minhua Huang
  • Robert M. Haralick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5459)

Abstract

We present a probabilistic graphical model for identifying noun phrase patterns in texts. This model is derived from mathematical processes under two reasonable conditional independence assumptions with different perspectives compared with other graphical models, such as CRFs or MEMMs. Empirical results shown our model is effective. Experiments on WSJ data from the Penn Treebank, our method achieves an average of precision 97.7% and an average of recall 98.7%. Further experiments on the CoNLL-2000 shared task data set show our method achieves the best performance compared to competing methods that other researchers have published on this data set. Our average precision is 95.15% and an average recall is 96.05%.

Keywords

NP chunking graphical models cliques separators 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abney, S., Abney, S.P.: Parsing by chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)CrossRefGoogle Scholar
  2. 2.
    Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the second conference on Applied natural language processing, pp. 136–143 (1988)Google Scholar
  3. 3.
    Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)Google Scholar
  4. 4.
    Molina, A., Pla, F., Informátics, D.D.S., Hammerton, J., Osborne, M., Armstrong, S., Daelemans, W.: Shallow parsing using specialized hmms. Journal of Machine Learning Research 2, 595–613 (2002)Google Scholar
  5. 5.
    Wu-Chieh, W., Lee, Y.S., Yang, J.C.: Robust and efficient multiclass svm models for phrase pattern recognition. Pattern Recognition 41, 2874–2889 (2008)CrossRefGoogle Scholar
  6. 6.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2002)Google Scholar
  7. 7.
    MaCallum, A., Freitag, D., Pereira, F.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of 17th International Conf. on Machine Learning, pp. 591–598 (2000)Google Scholar
  8. 8.
    Lafferty, J., MaCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conf. on Machine Learning, pp. 282–289 (2001)Google Scholar
  9. 9.
    Tjong, E.F., Sang, K.: Introduction to the CoNLL 2000 Shared Task: Chunking. In: Proceedings of CoNLL 2000, pp. 127–132 (2000)Google Scholar
  10. 10.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–330 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Minhua Huang
    • 1
  • Robert M. Haralick
    • 1
  1. 1.Computer Science, Graduate CenterCity University of New YorkNew YorkUSA

Personalised recommendations