Knowledge acquisition of predicate argument structures from technical texts using Machine Learning: the system Asium

  • David Faure
  • Claire Nédellec
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1621)


In this paper, we describe the Machine Learning system, asium1, which learns Subcaterorization Frames of verbs and ontologies from the syntactic parsing of technical texts in natural language. The restrictions of selection in the subcategorization frames are filled by the ontology’s concepts. Applications requiring such knowledge are crucial and numerous. The most direct applications are semantic control of texts and syntactic parsing disambiguation. This knowledge acquisition task cannot be fully automatically performed. Instead,we propose a cooperative ML method which provides the user with a global view of the acquisition task and also with acquisition tools like automatic concepts splitting, example generation, and an ontology view with attachments to the verbs. Validation steps using these features are intertwined with learning steps so that the user validates the concepts as they are learned. Experiments performed on two diérent corpora (cooking domain and patents) give very promising results.


machine learning natural language processing ontology predicate argument structure corpus-based learning clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Basili97]
    R. Basili and M. T. Pazienza. Lexical Acquisition for Information Extraction. In Maria Teresa Pazienza, editor, Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, pages 14–18, Frascati, Italy, July 1997. LNAI Tutorial, Springer.Google Scholar
  2. [Bourigault96]
    D. Bourigault, I. Gonzalez-Mullier, and C. Gros. LEXTER, a Natural Language Processing Tool for Terminology Extraction. In 7th EURALEX International Congress, Göoteborg, August 1996.Google Scholar
  3. [Brent91]
    M. R. Brent. Automatic acquisition of subcategorization frames from untagged text. In Proceedings of the 29st annual meeting of the Association for Computational Linguistics, ACL, pages 209–214, 1991.Google Scholar
  4. [Buchholz98]
    S. Buchholz. Distinguishing Complements from Adjuncts using Memory-Based Learning. In Proceedings of the ESSLLI’98 workshop on Automated Acquisition of Syntax and Parsing, 1998.Google Scholar
  5. [Constant95]
    P. Constant. L’analyseur Linguistique SYLEX. In 5éme Éole d’été du CNET, 1995.Google Scholar
  6. [Faure98]
    D. Faure and C. Nédellec. A Corpus-based Conceptual Clustering Method for Verb Frames and Ontology Acquisition. In Paola Velardi, editor, LREC workshop on Adapting lexical and corpus ressources to sublanguages and applications, pages 5-12, Granada, Spain, May 1998.Google Scholar
  7. [Grefenstette92]
    G. Grefenstette. Sextant: exploring unexplored contexts for semantic extraction from syntactic analysis. In Proceedings of the 30st annual meeting of the Association for Computational Linguistics, ACL, 1992. 14–18.Google Scholar
  8. [Grishman94]
    R. Grishman and J. Sterling. Generalizing Automatically Generated Selectional Patterns. Proceedings of COLING’ 94 15th International Conference on Computational Linguistics, Kyoto, Japan, August 1994.Google Scholar
  9. [Harris68]
    Z. Harris. Mathematical Structures of Language. New York: Wiley, 1968.zbMATHGoogle Scholar
  10. [Hindle90]
    D. Hindle. Noun classiffcation from predicate-argument structures. In Proceedings of the 28st annual meeting of the Association for Computational Linguistics, ACL, Pittsburgh, PA, pages 1268–1275, 1990.Google Scholar
  11. [Peat91]
    H.J. Peat and P. Willet. The limitations of term co-occurrence data for query expansion in document retrieval systems. Journal of the American Society for Information Science, 42(5):378–383, 1991.CrossRefGoogle Scholar
  12. [Pereira93]
    F. Pereira, N. Tishby, and L. Lee. Distributional Clustering of English Words. In Proceedings of the 31st annual meeting of the Association for Computational Linguistics, ACL, pages 183–190, 1993.Google Scholar
  13. [Shaw89]
    M.L.G. Shaw and B. R. Gaines. Comparing conceptual structures: consensus, conict, correspondence and contrast. In Knowledge Acquisition, volume 1, pages 341–363, 1989.CrossRefGoogle Scholar
  14. [Thompson95]
    C. A. Thompson. Acquisition of a Lexicon from Semantic Representations of Sentences. In 33rd Annual Meeting of the Association of Computational Linguistics, Boston, MA July, (ACL-95)., pages 335–337, 1995.Google Scholar
  15. [Zelle93]
    J. M. Zelle and R. J. Mooney. Learning semantic grammars with constructive inductive logic programming. Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 817–822, 1993.Google Scholar

Copyright information

© Springer-Verlag 1999

Authors and Affiliations

  • David Faure
    • 1
  • Claire Nédellec
    • 1
  1. 1.Laboratoire de Recherche en Informatique, UMR 86-23 du CNRS, Èquipe Inférence et ApprentissageUniversité Paris-SudOrsayFrance

Personalised recommendations