Using parsed corpora for circumventing parsing

Joshi, Aravind K.; Srinivas, B.

doi:10.1007/3-540-60925-3_63

Aravind K. Joshi¹ &
B. Srinivas¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

International Joint Conference on Artificial Intelligence

190 Accesses

Abstract

Corpora tagged with part-of-speech and phrase structure information have been used for both exploratory data analysis as well as unsupervised learning of language models. These corpora have proved invaluable resources for research activities such as training part-of-speech taggers, disambiguating word-senses, detecting noun-phrases, inducing selectional restrictions, extracting argument structure and inducing probabilistic grammars.

In this paper, we present some new techniques that use parsed corpora, not for inducing grammars but for circumventing parsing as much as possible. In particular, we will describe how a parsed corpus using a wide-coverage Lexicalized Tree Adjoining Grammar (LTAG) is used for this purpose. The first technique exploits the fact that LTAGs represent dependency and constituency information in a uniform way. The second technique uses Explanation-Based Learning methodology to view parsing as Finite State Transduction. Both the techniques exploit the central notions of LTAGs — lexicalization, extended domain of locality and factoring of recursion from the domain over which dependencies are specified.

We would like to thank R. Chandrasekhar, Christine Doran, Mitch Marcus and Martha Palmer for their valuable comments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Eric Brill. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31^st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993.
Google Scholar
Ted Briscoe. Prospects for Practical Parsing of Unrestricted Text: Robust Statistical Parsing Techniques. In Corpus-based Research into Language. Rodopi, 1994.
Google Scholar
Kuang-Hua Chen and Hsin-Hsi Chen. Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In Proceedings of the 32^nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.
Google Scholar
Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In 2nd Applied Natural Language Processing Conference, Austin, Texas, 1988.
Google Scholar
Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, and Martin Zaidel. XTAG System — A Wide Coverage Grammar for English. In Proceedings of the 17^th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.
Google Scholar
F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision Tree Parsing using a Hidden Derivation Model. ARPA Workshop on Human Language Technology, pages 260–265, 1994.
Google Scholar
Aravind K. Joshi and B. Srinivas. Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing. In Proceedings of the 17^th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.
Google Scholar
R. Leech, G. & Garside. Computer Corpora: Selected Papers and Bibliography, chapter Running a grammar factory:the production of syntactically analysed corpora or ‘treebanks'. Berlin, 1991.
Google Scholar
Mitchell M. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19.2:313–330, June 1993.
Google Scholar
Steve Minton. Quantitative Results concerning the utility of Explanation-Based Learning. In Proceedings of 7^th AAAI Conference, pages 564–569, Saint Paul, Minnesota, 1988.
Google Scholar
Tom M. Mitchell, Richard M. Keller, and Smadar T. Kedar-Carbelli. Explanation-Based Generalization: A Unifying View. Machine Learning 1, 1:47–80, 1986.
Google Scholar
Günter Neumann. Application of Explanation-based Learning for Efficient Processing of Constraint-based Grammars. In 10 ^th IEEE Conference on Artificial Intelligence for Applications, San Antonio, Texas, 1994.
Google Scholar
Manny Rayner. Applying Explanation-Based Generalization to Natural Language Processing. In Proceedings of the International Conference on Fifth Generation Computer Systems, Tokyo, 1988.
Google Scholar
Francesc Ribas. On learning more appropriate selectional restrictions. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.
Google Scholar
G. Sampson. Susanne: a Doomsday book of English Grammar. In Corpus-based Research into Language. Rodopi, Amsterdam, 1994.
Google Scholar
Chister Samuelsson. Grammar Specialization through Entropy Thresholds. In 32nd Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.
Google Scholar
Christer Samuelsson and Manny Rayner. Quantitative Evaluation of Explanation-Based Learning as an Optimization Tool for Large-Scale Natural Language System. In Proceedings of the 12^th International Joint Conference on Artificial Intelligence, Sydney,Australia, 1991.
Google Scholar
Y. Schabes, M. Roth, and R. Osborne. Parsing the Wall Street Journal with the Inside-Outside Algorithm. In Proceedings of the European ACL, 1993.
Google Scholar
Yves Schabes. Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis, Computer Science Department, University of Pennsylvania, 1990.
Google Scholar
Yves Schabes, Anne Abeillé, and Aravind K. Joshi. Parsing strategies with ‘lexicalized’ grammars: Application to Tree Adjoining Grammars. In Proceedings of the 12^th International Conference on Computational Linguistics (COLING'88), Budapest, Hungary, August 1988.
Google Scholar
B. Srinivas, Christine Doran, and Seth Kulick. Heuristics and parse ranking. In Proceedings of the 4^th Annual International Workshop on Parsing Technologies, Prague, September 1995.
Google Scholar
Frank van Harmelen and Allan Bundy. Explanation-Based Generalization = Partial Evaluation. Artificial Intelligence, 36:401–412, 1988.
Google Scholar
Atro Voutilainen. NPtool, a Detector of English Noun Phrases. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, University of Pennsylvania, 19104-6228, Philadelphia, PA
Aravind K. Joshi & B. Srinivas

Authors

Aravind K. Joshi
View author publications
You can also search for this author in PubMed Google Scholar
B. Srinivas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joshi, A.K., Srinivas, B. (1996). Using parsed corpora for circumventing parsing. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_63

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_63
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics