Efficient and Robust Phrase Chunking Using Support Vector Machines

Wu, Yu-Chieh; Yang, Jie-Chi; Lee, Yue-Shi; Yen, Show-Jane

doi:10.1007/11880592_27

Yu-Chieh Wu²⁰,
Jie-Chi Yang²¹,
Yue-Shi Lee²² &
…
Show-Jane Yen²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Asia Information Retrieval Symposium

953 Accesses
1 Citations

Abstract

Automatic text chunking is a task which aims to recognize phrase structures in natural language text. It is the key technology of knowledge-based system where phrase structures provide important syntactic information for knowledge representation. Support Vector Machine (SVM-based) phrase chunking system had been shown to achieve high performance for text chunking. But its inefficiency limits the actual use on large dataset that only handles several thousands tokens per second. In this paper, we firstly show that the state-of-the-art performance (94.25) in the CoNLL-2000 shared task based on conventional SVM learning. However, the off-the-shelf SVM classifiers are inefficient when the number of phrase types scales to high. Therefore, we present two novel methods that make the system substantially faster in terms of training and testing while only results in a slightly decrease of system performance. Experimental result shows that our method achieves 94.09 in F rate, which handles 13000 tokens per second in the CoNLL-2000 chunking task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ando, R.K., Zhang, T.: A high-performance semi-supervised learning method for text chunking. In: Proceedings of 43rd Annual Meetings of the Association for Computational Linguistics, pp. 1–9 (2005)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part of speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP) (2003)
Google Scholar
Carreras, X., Marquez, L., Castro, J.: Filtering-ranking perceptron learning for partial parsing. Machine Learning Journal 59, 1–31 (2005)
Google Scholar
Giménez, J., Márquez, L.: Fast and accurate Part-of-Speech tagging: the SVM approach revisited. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pp. 158–165 (2003)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: The Proceedings of the 2nd Meetings of the North American Chapter and the Association for the Computational Linguistics, pp. 192–199 (2001)
Google Scholar
Molina, A., Pla, F.: Shallow Parsing using Specialized HMMs. Journal of Machine Learning Research, 595–613 (2002)
Google Scholar
Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. Advanced in Neural Information Processing Systems 12, 547–553 (2000)
Google Scholar
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation based learning. In: Proceedings of the 3rd Workshop on Very Large Corpora, pp. 82–94 (1995)
Google Scholar
Sagae, K., Lavie, A., MacWhinney, B.: Automatic Measurement of Syntactic Development in Child Language. In: Proceedings of 43rd Annual Meetings of the Association for Computational Linguistics, pp. 197–204 (2005)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: Proceedings of Conference on Natural Language Learning (CoNLL), pp. 127–132 (2000)
Google Scholar
Tjong Kim Sang, E.F.: Memory-based shallow parsing. Journal of Machine Learning Research, 559–594 (2002)
Google Scholar
Watanabe, T., Sumita, E., Okuno, H.G.: Chunk-based statistical translation. In: Proceedings of 41st Annual Meetings of the Association for Computational Linguistics, pp. 303–310 (2003)
Google Scholar
Wu, Y.-C., Chang, C.-H., Lee, Y.-S.: A general and multi-lingual phrase chunking model based on masking method. In: Proceedings of 7th International Conference on Intelligent Text Processing and Computational Linguistics, pp. 144–155 (2006)
Google Scholar
Zhang, T., Damerau, F., Johnson, D.: Text Chunking based on a Generalization Winnow. Journal of Machine Learning Research 2, 615–637 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Central University,
Yu-Chieh Wu
Graduate Institute of Network Learning Technology, National Central University, No.300, Jhong-Da Rd., Jhongli City, Taoyuan County, 320, Taiwan, R.O.C.
Jie-Chi Yang
Department of Computer Science and Information Engineering, Ming Chuan University, No.5, De-Ming Rd, Gweishan District, Taoyuan, 333, Taiwan, R.O.C.
Yue-Shi Lee & Show-Jane Yen

Authors

Yu-Chieh Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jie-Chi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yue-Shi Lee
View author publications
You can also search for this author in PubMed Google Scholar
Show-Jane Yen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore
Hwee Tou Ng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Mun-Kew Leong
Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Min-Yen Kan
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, P.O. Box, 119613, Singapore
Donghong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, YC., Yang, JC., Lee, YS., Yen, SJ. (2006). Efficient and Robust Phrase Chunking Using Support Vector Machines. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_27

Download citation

DOI: https://doi.org/10.1007/11880592_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics