Knowledge acquisition of predicate argument structures from technical texts using Machine Learning: the system Asium
In this paper, we describe the Machine Learning system, asium1, which learns Subcaterorization Frames of verbs and ontologies from the syntactic parsing of technical texts in natural language. The restrictions of selection in the subcategorization frames are filled by the ontology’s concepts. Applications requiring such knowledge are crucial and numerous. The most direct applications are semantic control of texts and syntactic parsing disambiguation. This knowledge acquisition task cannot be fully automatically performed. Instead,we propose a cooperative ML method which provides the user with a global view of the acquisition task and also with acquisition tools like automatic concepts splitting, example generation, and an ontology view with attachments to the verbs. Validation steps using these features are intertwined with learning steps so that the user validates the concepts as they are learned. Experiments performed on two diérent corpora (cooking domain and patents) give very promising results.
Keywordsmachine learning natural language processing ontology predicate argument structure corpus-based learning clustering
Unable to display preview. Download preview PDF.
- [Basili97]R. Basili and M. T. Pazienza. Lexical Acquisition for Information Extraction. In Maria Teresa Pazienza, editor, Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, pages 14–18, Frascati, Italy, July 1997. LNAI Tutorial, Springer.Google Scholar
- [Bourigault96]D. Bourigault, I. Gonzalez-Mullier, and C. Gros. LEXTER, a Natural Language Processing Tool for Terminology Extraction. In 7th EURALEX International Congress, Göoteborg, August 1996.Google Scholar
- [Brent91]M. R. Brent. Automatic acquisition of subcategorization frames from untagged text. In Proceedings of the 29st annual meeting of the Association for Computational Linguistics, ACL, pages 209–214, 1991.Google Scholar
- [Buchholz98]S. Buchholz. Distinguishing Complements from Adjuncts using Memory-Based Learning. In Proceedings of the ESSLLI’98 workshop on Automated Acquisition of Syntax and Parsing, 1998.Google Scholar
- [Constant95]P. Constant. L’analyseur Linguistique SYLEX. In 5éme Éole d’été du CNET, 1995.Google Scholar
- [Faure98]D. Faure and C. Nédellec. A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology Acquisition. In Paola Velardi, editor, LREC workshop on Adapting lexical and corpus ressources to sublanguages and applications, pages 5-12, Granada, Spain, May 1998.Google Scholar
- [Grefenstette92]G. Grefenstette. Sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. In Proceedings of the 30st annual meeting of the Association for Computational Linguistics, ACL, 1992. 14–18.Google Scholar
- [Grishman94]R. Grishman and J. Sterling. Generalizing Automatically Generated Selectional Patterns. Proceedings of COLING’ 94 15th International Conference on Computational Linguistics, Kyoto, Japan, August 1994.Google Scholar
- [Hindle90]D. Hindle. Noun classiffcation from predicate-argument structures. In Proceedings of the 28st annual meeting of the Association for Computational Linguistics, ACL, Pittsburgh, PA, pages 1268–1275, 1990.Google Scholar
- [Pereira93]F. Pereira, N. Tishby, and L. Lee. Distributional Clustering of English Words. In Proceedings of the 31st annual meeting of the Association for Computational Linguistics, ACL, pages 183–190, 1993.Google Scholar
- [Thompson95]C. A. Thompson. Acquisition of a Lexicon from Semantic Representations of Sentences. In 33rd Annual Meeting of the Association of Computational Linguistics, Boston, MA July, (ACL-95)., pages 335–337, 1995.Google Scholar
- [Zelle93]J. M. Zelle and R. J. Mooney. Learning semantic grammars with constructive inductive logic programming. Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 817–822, 1993.Google Scholar