Abstract
Based on second order hidden markov model (HMM), this paper proposed a Viterbi-decoding chunking algorithm and a novel chunking post-processing algorithm. The method for estimating the parameter in HMM makes use of token subcategory and lexicalization information, which balances the disambiguation ability and data sparseness problem in maximum likelihood estimate (MLE) caused by the token subcategory and lexicalization. To compensate for the absence of complex context during HMM based chunking, this paper proposed a post-processing algorithm which makes a stable improvement to chunking algorithm and avoids the illegal token path in chunking. The experiment indicates that the performance of this chunking system achieves 93% f-measure on the CoNLL 2000 standard testing corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abney, S.: Parsring by chunks. In: Principle-Based Parsing: Computation and Psycholinguistics, pp. 257–278. Kluwer Academic Publishers (1991)
Honglin, S., Shiwen, Y.: Survey of shallow paring. Contemporary Linguistics 2, 74–83 (2000)
Qiang, G., Maosun, S., Changning, H.: Chunk system of Chines sentence. JOS 11, 1158–1165 (1999)
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformaiton-based learning. In: Proceedings of the Third ACL Workshop on Very Large Corpora, Cambridge, MA, USA, pp. 82–94 (1995)
Sang, E.F., Veenstra, J.: Representing text chunks. In: Proceedings of EACL 1999, Bergen, Norway, pp. 173–179 (1999)
Sujian, L., Qun, L., Zhifeng, Y.: Chunk system in Chinese sentence. JOS 26, 1722–1727 (2003)
Kudoh, T., Matsumoto, Y.: Use of support vector learning for chunking identification. In: Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning, pp. 142–144 (2000)
Collins, M.: Head Driven Statistical Models for Natural Language Parsing. Ph.D. thesis. The University of Pennsylvania (1999)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 423–430 (2003)
Gale, W.A., Church, K.: What’s wrong with adding one? In: Oostdijk, N., de Haan, P. (eds.) Corpus-Based Research into Language, Rodolpi, Amsterdam
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics, pp. 310–318 (1996)
Johnson, M.: PCFG models of linguistic tree representations. Computational Linguistics 24, 613–632 (1998)
Molina, A., Pla, F.: Shallow Parsing using Specialized HMMs. Journal of Machine learning Research 2, 595–613 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, S., Zhang, Z., Liu, P. (2015). Lexicalized Token Subcategory and Complex Context Based Shallow Parsing. In: Lu, Q., Gao, H. (eds) Chinese Lexical Semantics. CLSW 2015. Lecture Notes in Computer Science(), vol 9332. Springer, Cham. https://doi.org/10.1007/978-3-319-27194-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-27194-1_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27193-4
Online ISBN: 978-3-319-27194-1
eBook Packages: Computer ScienceComputer Science (R0)