Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3878))

Abstract

Recently, there is an increasing interest in integrating rule based methods with statistical techniques for developing robust, wide coverage, high performance parsing systems. In this paper, we describe an architecture, called UCSG shallow parser architecture, which combines linguistic constraints expressed in the form of finite state grammars with statistical rating using HMMs built from a POS-tagged corpus and an A* search for global optimization for determining the best shallow parse for a given sentence. The primary aim of the design of the UCSG parsing architecture is developing a judicious combination of linguistic and statistical methods to develop wide coverage robust shallow parsing systems, without the need for large scale manually parsed training corpora. The UCSG architecture uses a grammar to specify all valid structures and a statistical component to rate and rank the possible alternatives, so as to produce the best parse first without compromising on the ability to produce all possible parses. The architecture supports bootstrapping with an aim to reduce the need for parsed training corpora. The complete system has been implemented in Perl under Linux. In this paper we first describe the UCSG shallow parsing architecture and then focus on the evaluation of the UCSG finite state grammar for the chunking task for English. Recall of 91.16% and 93.73% have been obtained on the Susanne parsed corpus and CoNLL 2000 chunking task test data set respectively. Extensive experimentation is under way to evaluate the other modules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Doran, C., Egedi, D., Hockey, B.A., Srinivas, B., Zaidel, M.: XTAG system – a wide coverage grammar for english. In: Proceedings of the 15th. International Conference on Computational Linguistics (COLING 1994), Kyoto, Japan, vol. II, pp. 922–928 (1994)

    Google Scholar 

  2. Abney, S.P.: Parsing by Chunks. In: Principle-based parsing: Computation and psycholinguistics edn. Kluwer, Dordrecht (1991)

    Google Scholar 

  3. Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the conll-2000 shared task: Chunking. In: Cardie, C., Daelemans, W., Nedellec, C., Tjong Kim Sang, E. (eds.) Proceedings of CoNLL-2000 and LLL 2000, Lisbon, Portugal, pp. 127–132 (2000)

    Google Scholar 

  4. Murthy, K.N.: Universal Clause Structure Grammar. PhD Thesis, University of Hyderabad (1995)

    Google Scholar 

  5. Murthy, K.N.: Universal Clause Structure Grammar and the Syntax of Relatively Free Word Order Languages. South Asian Language Review VII (1997)

    Google Scholar 

  6. Abney, S.: Partial parsing via finite-state cascades. In: Workshop on Robust Parsing, 8th European Summer School in Logic, Language and Information, Prag, pp. 8–15 (1996)

    Google Scholar 

  7. Grefenstette, G.: Light parsing as finite state filtering. In: Workshop on Extended finite state models of language, Budapest, Hungary (1996)

    Google Scholar 

  8. Roche, E.: Parsing with finite state transducers. In: Finite–state language processing edn. MIT Press, Cambridge (1997)

    Google Scholar 

  9. Vilain, M., Day, D.: Phrase parsing with rule sequence processors: an application to the shared conll task. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proc. of CoNLL-2000 and LLL-2000, pp. 160–162. Lisbon, Portugal (2000)

    Google Scholar 

  10. Dejean, H.: Learning rules and their exceptions. Journal of Machine Learning Research 2, 669–693 (2002)

    Article  MATH  Google Scholar 

  11. Osborne, M.: Shallow parsing as part-of-speech tagging. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 145–147 (2000)

    Google Scholar 

  12. Veenstra, J., van den Bosch, A.: Single-classifier memory-based phrase chunking. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 157–159 (2000)

    Google Scholar 

  13. Zhou, G., Su, J., Tey, T.: Hybrid text chunking. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 163–166 (2000)

    Google Scholar 

  14. Koeling, R.: Chunking with maximum entropy models. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 139–141 (2000)

    Google Scholar 

  15. Johansson, C.: A context sensitive maximum likelihood approach to chunking. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 136–138 (2000)

    Google Scholar 

  16. Zhang, T., Damerau, F., Johnson, D.: Text chunking based on a generalization of winnow. In: Journal of Machine Learning Research, vol. 2, pp. 615–637 (2002)

    Google Scholar 

  17. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. Technical Report CIS TR MS-CIS-02-35, University of Pennsylvania (2003)

    Google Scholar 

  18. Molina, A., Pla, F.: Shallow parsing using specialized hmms. Journal of Machine Learning Research 2, 595–613 (2002)

    Article  MATH  Google Scholar 

  19. Carreras, X., Marquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP-2003, Borovets, Bulgaria, pp. 127–132 (2003)

    Google Scholar 

  20. Gondy, L., Hsinchun, C., Jesse, M.: A shallow parser based on closed-class words to capture relations in biomedical text. Journal of Biomedical Informatics 36, 145–158 (2003)

    Article  Google Scholar 

  21. Kudoh, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: Cardie, C., Daelemans, W., Nedellec, C. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal , pp. 142–144 (2000)

    Google Scholar 

  22. van Halteren, H.: Chunking with wpdv models. In: Cardie, E.C., Daelemans, W., Nedellec, C., Sang, T.K. (eds.) Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal, pp. 154–156 (2000)

    Google Scholar 

  23. Sang, E.F.T.K.: Memory-based shallow parsing. Journal of Machine Learning Research 2, 559–594 (2002)

    Article  MATH  Google Scholar 

  24. Crysmann, B., et al.: An integrated archictecture for shallow and deep processing systems. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), University of Pennsylvania, Philadelphia (2002)

    Google Scholar 

  25. Kaplan III, R.M., M., J.T., King, T.H., Crouch, R.: Integrating finite-state technology with deep lfg grammars1. In: Proceedings of the Workshop on Combining Shallow and Deep Processing for NLP(ESSLLI) (2004)

    Google Scholar 

  26. Hopcroft, J., Ullman, J.: Introduction to automata theory, languages, and computation. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  27. Burnard, L.: In: The users reference guide for the British National Corpus. Oxford University Press, Oxford (1995)

    Google Scholar 

  28. Sampson, G.: The susanne treebank: Release, Univ.of Sussex, England, vol. 5 (2000)

    Google Scholar 

  29. Nagesh, K.: Towards a robust shallow parser. Masters thesis, Department of Computer and Information Sciences, University of Hyderabad (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, G.B., Murthy, K.N. (2006). UCSG Shallow Parser. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2006. Lecture Notes in Computer Science, vol 3878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11671299_18

Download citation

  • DOI: https://doi.org/10.1007/11671299_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32205-4

  • Online ISBN: 978-3-540-32206-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics