Abstract
Corpora tagged with part-of-speech and phrase structure information have been used for both exploratory data analysis as well as unsupervised learning of language models. These corpora have proved invaluable resources for research activities such as training part-of-speech taggers, disambiguating word-senses, detecting noun-phrases, inducing selectional restrictions, extracting argument structure and inducing probabilistic grammars.
In this paper, we present some new techniques that use parsed corpora, not for inducing grammars but for circumventing parsing as much as possible. In particular, we will describe how a parsed corpus using a wide-coverage Lexicalized Tree Adjoining Grammar (LTAG) is used for this purpose. The first technique exploits the fact that LTAGs represent dependency and constituency information in a uniform way. The second technique uses Explanation-Based Learning methodology to view parsing as Finite State Transduction. Both the techniques exploit the central notions of LTAGs — lexicalization, extended domain of locality and factoring of recursion from the domain over which dependencies are specified.
We would like to thank R. Chandrasekhar, Christine Doran, Mitch Marcus and Martha Palmer for their valuable comments.
Preview
Unable to display preview. Download preview PDF.
References
Eric Brill. Automatic grammar induction and parsing free text: A transformation-based approach. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993.
Ted Briscoe. Prospects for Practical Parsing of Unrestricted Text: Robust Statistical Parsing Techniques. In Corpus-based Research into Language. Rodopi, 1994.
Kuang-Hua Chen and Hsin-Hsi Chen. Extracting noun phrases from large-scale texts: A hybrid approach and its automatic evaluation. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.
Kenneth Ward Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In 2nd Applied Natural Language Processing Conference, Austin, Texas, 1988.
Christy Doran, Dania Egedi, Beth Ann Hockey, B. Srinivas, and Martin Zaidel. XTAG System — A Wide Coverage Grammar for English. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.
F. Jelinek, J. Lafferty, D. Magerman, R. Mercer, A. Ratnaparkhi, and S. Roukos. Decision Tree Parsing using a Hidden Derivation Model. ARPA Workshop on Human Language Technology, pages 260–265, 1994.
Aravind K. Joshi and B. Srinivas. Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing. In Proceedings of the 17th International Conference on Computational Linguistics (COLING '94), Kyoto, Japan, August 1994.
R. Leech, G. & Garside. Computer Corpora: Selected Papers and Bibliography, chapter Running a grammar factory:the production of syntactically analysed corpora or ‘treebanks'. Berlin, 1991.
Mitchell M. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics, 19.2:313–330, June 1993.
Steve Minton. Quantitative Results concerning the utility of Explanation-Based Learning. In Proceedings of 7th AAAI Conference, pages 564–569, Saint Paul, Minnesota, 1988.
Tom M. Mitchell, Richard M. Keller, and Smadar T. Kedar-Carbelli. Explanation-Based Generalization: A Unifying View. Machine Learning 1, 1:47–80, 1986.
Günter Neumann. Application of Explanation-based Learning for Efficient Processing of Constraint-based Grammars. In 10 th IEEE Conference on Artificial Intelligence for Applications, San Antonio, Texas, 1994.
Manny Rayner. Applying Explanation-Based Generalization to Natural Language Processing. In Proceedings of the International Conference on Fifth Generation Computer Systems, Tokyo, 1988.
Francesc Ribas. On learning more appropriate selectional restrictions. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.
G. Sampson. Susanne: a Doomsday book of English Grammar. In Corpus-based Research into Language. Rodopi, Amsterdam, 1994.
Chister Samuelsson. Grammar Specialization through Entropy Thresholds. In 32nd Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994.
Christer Samuelsson and Manny Rayner. Quantitative Evaluation of Explanation-Based Learning as an Optimization Tool for Large-Scale Natural Language System. In Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sydney,Australia, 1991.
Y. Schabes, M. Roth, and R. Osborne. Parsing the Wall Street Journal with the Inside-Outside Algorithm. In Proceedings of the European ACL, 1993.
Yves Schabes. Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis, Computer Science Department, University of Pennsylvania, 1990.
Yves Schabes, Anne Abeillé, and Aravind K. Joshi. Parsing strategies with ‘lexicalized’ grammars: Application to Tree Adjoining Grammars. In Proceedings of the 12th International Conference on Computational Linguistics (COLING'88), Budapest, Hungary, August 1988.
B. Srinivas, Christine Doran, and Seth Kulick. Heuristics and parse ranking. In Proceedings of the 4th Annual International Workshop on Parsing Technologies, Prague, September 1995.
Frank van Harmelen and Allan Bundy. Explanation-Based Generalization = Partial Evaluation. Artificial Intelligence, 36:401–412, 1988.
Atro Voutilainen. NPtool, a Detector of English Noun Phrases. In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joshi, A.K., Srinivas, B. (1996). Using parsed corpora for circumventing parsing. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_63
Download citation
DOI: https://doi.org/10.1007/3-540-60925-3_63
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive