Abstract
Recently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper, we describe an architecture, called UCSG shallow parser architecture, which combines linguistic constraints expressed in the form of finite state grammars with statistical rating using HMMs built from a POS-tagged corpus and an A* search for global optimization for determining the best shallow parse for a given sentence. The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora. The UCSG architecture uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses. The architecture supports bootstrapping with an aim to reduce the need for parsed training corpora. The complete system has been implemented in Perl under Linux. In this paper we first describe the UCSG shallow parsing architecture and then focus on the evaluation of the UCSG finite state grammar for the chunking task for English. Recall of 91.16% and 93.73% have been obtained on the Susanne parsed corpus and CoNLL 2000 chunking task test data set respectively. Extensive experimentation is under way to evaluate the other modules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Doran, C., Egedi, D., Hockey, B.A., Srinivas, B., Zaidel, M.: XTAG system – a wide coverage grammar for english. In: Proceedings of the 15th. International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, vol. II, pp. 922–928 (1994)
Abney, S.P.: Parsing by Chunks. In: Principle-based parsing: Computation and psycholinguistics edn. Kluwer, Dordrecht (1991)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL-2000 and LLL 2000, Lisbon, Portugal, pp. 127–132 (2000)
Murthy, K.N.: Universal Clause Structure Grammar. PhD Thesis, University of Hyderabad (1995)
Murthy, K.N.: Universal Clause Structure Grammar and the Syntax of Relatively Free Word Order Languages. South Asian Language Review VII (1997)
Abney, S.: Partial parsing via finite-state cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prag, pp. 8–15 (1996)
Grefenstette, G.: Light parsing as finite state filtering. In: Workshop on Extended finite state models of language, Budapest, Hungary (1996)
Roche, E.: Parsing with finite state transducers. In: Finite–state language processing edn. MIT Press, Cambridge (1997)
Vilain, M., Day, D.: Phrase parsing with rule sequence processors: an application to the shared conll task. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proc. of CoNLL-2000 and LLL-2000, pp. 160–162. Lisbon, Portugal (2000)
Dejean, H.: Learning rules and their exceptions. Journal of Machine Learning Research 2, 669–693 (2002)
Osborne, M.: Shallow parsing as part-of-speech tagging. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 145–147 (2000)
Veenstra, J., van den Bosch, A.: Single-classifier memory-based phrase chunking. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 157–159 (2000)
Zhou, G., Su, J., Tey, T.: Hybrid text chunking. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 163–166 (2000)
Koeling, R.: Chunking with maximum entropy models. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 139–141 (2000)
Johansson, C.: A context sensitive maximum likelihood approach to chunking. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 136–138 (2000)
Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. In: Journal of Machine Learning Research, vol. 2, pp. 615–637 (2002)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. Technical Report CIS TR MS-CIS-02-35, University of Pennsylvania (2003)
Molina, A., Pla, F.: Shallow parsing using specialized hmms. Journal of Machine Learning Research 2, 595–613 (2002)
Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP-2003, Borovets, Bulgaria, pp. 127–132 (2003)
Gondy, L., Hsinchun, C., Jesse, M.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics 36, 145–158 (2003)
Kudoh, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal , pp. 142–144 (2000)
van Halteren, H.: Chunking with wpdv models. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 154–156 (2000)
Sang, E.F.T.K.: Memory-based shallow parsing. Journal of Machine Learning Research 2, 559–594 (2002)
Crysmann, B., et al.: An integrated archictecture for shallow and deep processing systems. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), University of Pennsylvania, Philadelphia (2002)
Kaplan III, R.M., M., J.T., King, T.H., Crouch, R.: Integrating finite-state technology with deep lfg grammars1. In: Proceedings of the Workshop on Combining Shallow and Deep Processing for NLP(ESSLLI) (2004)
Hopcroft, J., Ullman, J.: Introduction to automata theory, languages, and computation. Addison-Wesley, Reading (1979)
Burnard, L.: In: The users reference guide for the British National Corpus. Oxford University Press, Oxford (1995)
Sampson, G.: The susanne treebank: Release, Univ.of Sussex, England, vol. 5 (2000)
Nagesh, K.: Towards a robust shallow parser. Masters thesis, Department of Computer and Information Sciences, University of Hyderabad (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, G.B., Murthy, K.N. (2006). UCSG Shallow Parser. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_18
Download citation
DOI: https://doi.org/10.1007/11671299_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32205-4
Online ISBN: 978-3-540-32206-1
eBook Packages: Computer ScienceComputer Science (R0)