Skip to main content

Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese

  • Conference paper
Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead (ICCPOL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

Abstract

In general, there are two types of noun phrases (NP): Base Noun Phrase (BNP), and Maximal-Length Noun Phrase (MNP). MNP identification can largely reduce the complexity of full parsing, help analyze the general structure of complex sentences, and provide important clues for detecting main predicates in Chinese sentences. In this paper, we propose a 2-phase hybrid approach for MNP identification which adopts salient features such as expanded chunks and classified punctuations to improve performance. Experimental result shows a high quality performance of 89.66% in F1-measure.

The detailed explanation of Expanded Chunks and Classified Punctuations will be shown in Section 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Abney, S.P.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)

    Google Scholar 

  2. Qiang, Z., Maosong, S., Changning, H.: Automatically Identify Chinese Maximal Noun Phrase, Technical Report 99001, State Key Lab. of Intelligent Technology and Systems, Dept. of Computer Science and Technology, Tsinghua University (1998)

    Google Scholar 

  3. Bourigault, D.: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In: Boitet, C. (ed.) Proceedings of the 15th International Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 977–981 (1992)

    Google Scholar 

  4. Chen, K.-h., Chen, H.-H.: Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. In: Proceedings of 32nd Annual Meeting of Association of Computational Linguistics, New York, pp. 234–241 (1994)

    Google Scholar 

  5. Li, W., Pan, H., Zhou, M., Wong, K.-F., Lum, V.: Corpus-based Maximal-length Chinese Noun Phrase Extraction. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 246–251 (1995)

    Google Scholar 

  6. Tse, A.S.Y., Wong, K.-F., et al.: Effectiveness Analysis of Linguistics- and Corpus-based Noun Phrase Partial Parsers. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 252–257 (1995)

    Google Scholar 

  7. Yin, C.: Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases, Master Dissertation, POSTECH, Korea (2005) (in Korean)

    Google Scholar 

  8. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 Shared Task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, pp. 127–132 (2000)

    Google Scholar 

  9. Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying Conditional Random Fields to Chinese Shallow Parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowski, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of CoNLL 2000, pp. 857–863 (2000)

    Google Scholar 

  11. Kudo, T., Matsumoto, Y.: Chunking with Support Vector Machines. In: Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics (NAACL), pp. 192–199 (2001)

    Google Scholar 

  12. WEKA machine learning toolkit, http://www.cs.waikato.ac.nz/~ml/

  13. LIBSVM: Multi-Class Support Vector Machine Learning Toolkit, http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html

  14. Lin, S.-F.: Study and Application of Punctuation (标点符号的学习和应用), People’s Pulisher, P.R.China (in Chinese)

    Google Scholar 

  15. Penn Chinese TreeBank 4.0, http://www.cis.upenn.edu/~chinese

  16. Zhou, M.: A Block-based Robust Dependency Parser for Unrestricted Chinese Text. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 78–84 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bai, XM., Li, JJ., Kim, DI., Lee, JH. (2006). Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_28

Download citation

  • DOI: https://doi.org/10.1007/11940098_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics