Advertisement

A Decision Tree Approach to Sentence Chunking

  • Samuel W. K. Chan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4830)

Abstract

This paper proposes an algorithm which can chunk a given sentence into meaningful and coherent segments. The algorithm is based on the assumption that segment boundaries can be identified by analyzing various information-theoretic measures of the part-of-speech (POS) n-grams within the sentence. The assumption is supported by a series of experiments using the POS-tagged corpus and Treebank from Academia Sinica. Experimental results show that the combination of different classifiers based on the measures improves the system coverage while maintaining its precision in our evaluation of 10,000 sentences.

Keywords

Mutual Information Noun Phrase Natural Language Processing Parse Tree Decision Tree Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-Based Parsing, Kluwer Academic Publishers, Dordrecht (1991)Google Scholar
  2. Argamon-Engelson, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. Journal of Experimental Theoretical Artificial Intelligence 11, 369–390 (1999)CrossRefGoogle Scholar
  3. Chen, F.-Y., Tsai, P.-F., Chen, K.-J., Huang, C.-R.: Sinica Treebank. Computational Linguistics and Chinese Language Processing 4(2), 87–103 (2000)Google Scholar
  4. Church, K.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of Second Conference on Applied Natural Language Processing, Texas (1988)Google Scholar
  5. Doolittle, R.F.: Similar amino acid sequences: Chance or common ancestry? Science 214, 149–159 (1981)CrossRefGoogle Scholar
  6. Feldman, R., Hirsh, H.: Finding associations in collections of text. In: Michalski, R.S., Bratko, I., Kubat, M. (eds.) Machine Learning and Data Mining: Methods and Applications, pp. 223–240. John Wiley, Chichester (1997)Google Scholar
  7. Joshi, A.: The relevance of tree adjoining grammar to generation. In: Kempen, G. (ed.) Natural language generation: new results in Artificial Intelligence, psychology and linguistics, pp. 233–252. Kluwer Academic Publishers, Dordrecht (1987)Google Scholar
  8. Kaplan, R.M., Bresnan, J.: Lexical-Functional Grammar: A formal system for grammatical representation. In: Bresnan, J. (ed.) The Mental Representation of Grammatical Relations, pp. 173–281. The MIT Press, Cambridge (1982)Google Scholar
  9. Knight, K.: Mining online text. Communications of the ACM 42(11), 58–61 (1999)CrossRefMathSciNetGoogle Scholar
  10. Magerman, D.M., Marcus, M.P.: Parsing a natural language using mutual information statistics. In: AAAI 1990. Proceedings of Eighth National Conference on Artificial Intelligence, pp. 984–989 (1990)Google Scholar
  11. Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  12. Pedersen, T., Kayaalp, M., Bruce, R.: Significant lexical relationships. In: Proceedings of the 13th National Conference on Artificial Intelligence, Portland (1996)Google Scholar
  13. Pollard, C., Sag, I.: Head-Driven Phrase Structure Grammar. University of Chicago, Press, Chicago, CSLI Publications, Stanford (1994)Google Scholar
  14. Quinlan, R.: (2000) http://www.rulequest.com/
  15. Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)Google Scholar
  16. Tjong, K.S.E., Veenstra, J.: Representing text chunks. In: EACL 1999. Proceedings of Association for Computational Linguistics, Bergen (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Samuel W. K. Chan
    • 1
  1. 1.Dept. of Decision Sciences, The Chinese University of Hong Kong SAR, Hong Kong 

Personalised recommendations