Abstract
In general, there are two types of noun phrases (NP): Base Noun Phrase (BNP), and Maximal-Length Noun Phrase (MNP). MNP identification can largely reduce the complexity of full parsing, help analyze the general structure of complex sentences, and provide important clues for detecting main predicates in Chinese sentences. In this paper, we propose a 2-phase hybrid approach for MNP identification which adopts salient features such as expanded chunks and classified punctuations to improve performance. Experimental result shows a high quality performance of 89.66% in F1-measure.
The detailed explanation of Expanded Chunks and Classified Punctuations will be shown in Section 3.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abney, S.P.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)
Qiang, Z., Maosong, S., Changning, H.: Automatically Identify Chinese Maximal Noun Phrase, Technical Report 99001, State Key Lab. of Intelligent Technology and Systems, Dept. of Computer Science and Technology, Tsinghua University (1998)
Bourigault, D.: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In: Boitet, C. (ed.) Proceedings of the 15th International Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 977–981 (1992)
Chen, K.-h., Chen, H.-H.: Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. In: Proceedings of 32nd Annual Meeting of Association of Computational Linguistics, New York, pp. 234–241 (1994)
Li, W., Pan, H., Zhou, M., Wong, K.-F., Lum, V.: Corpus-based Maximal-length Chinese Noun Phrase Extraction. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 246–251 (1995)
Tse, A.S.Y., Wong, K.-F., et al.: Effectiveness Analysis of Linguistics- and Corpus-based Noun Phrase Partial Parsers. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 252–257 (1995)
Yin, C.: Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases, Master Dissertation, POSTECH, Korea (2005) (in Korean)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 Shared Task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, pp. 127–132 (2000)
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying Conditional Random Fields to Chinese Shallow Parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowski, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of CoNLL 2000, pp. 857–863 (2000)
Kudo, T., Matsumoto, Y.: Chunking with Support Vector Machines. In: Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics (NAACL), pp. 192–199 (2001)
WEKA machine learning toolkit, http://www.cs.waikato.ac.nz/~ml/
LIBSVM: Multi-Class Support Vector Machine Learning Toolkit, http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Lin, S.-F.: Study and Application of Punctuation (标点符号的学习和应用), People’s Pulisher, P.R.China (in Chinese)
Penn Chinese TreeBank 4.0, http://www.cis.upenn.edu/~chinese
Zhou, M.: A Block-based Robust Dependency Parser for Unrestricted Chinese Text. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 78–84 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bai, XM., Li, JJ., Kim, DI., Lee, JH. (2006). Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_28
Download citation
DOI: https://doi.org/10.1007/11940098_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)