Machine Translation

, Volume 8, Issue 3, pp 147–173 | Cite as

What can be learned from raw texts?

An integrated tool for the acquisition of case roles, taxonomic relations and disambiguation criteria
  • Roberto Basili
  • Maria Teresa Pazienza
  • Paola Velardi


The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However, the methods by which lexical knowledge should be extracted from plain texts is still a matter of debate and experimentation. In this paper we present an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines typical NLP techniques, such as (shallow) syntax and semantic markers, with numerical processing. The lexical data extracted by this method, calledclustered association data, are used for a variety of interesting purposes, such as the detection of selectional restrictions, the derivation of syntactic ambiguity criteria and the acquisition of taxonomic relations.


Artificial Intelligence Computational Linguistic Association Data Language Translation Numerical Processing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ACL 1990,Proceedings of ACL 90, Pittsburgh, Pennsylvania, 1990.Google Scholar
  2. ACL 1991,Proceedings of ACL 91, Berkeley, California, 1991.Google Scholar
  3. Antonacci, F., M.T. Pazienza, M. Russo and P. Velardi: 1989, ‘Representation and Control Strategies for Large Knowledge Domains — An Application to NLP’,Applied Artificial Intelligence,4.Google Scholar
  4. Basili, R., M.T. Pazienza and P. Velardi: 1992a, ‘Computational Lexicons — The Neat Examples and the Odd Exemplars’,Proceedings of 3rd. ANLP.Google Scholar
  5. Basili, R., M.T. Pazienza and P. Velardi: 1993, ‘Semi-Automatic Extraction of Linguistic Information for Syntactic Disambiguation’,Applied Artificial Intelligence 7.Google Scholar
  6. Basili, R., M.T. Pazienza and P. Velardi: 1992b, ‘A Shallow Syntactic Analyzer to Extract Word Associations from Corpora’,Literary and Linguistic Computing.Google Scholar
  7. Basili, R., M.T. Pazienza, and P. Velardi: 1992c, ‘Combining NLP and Statistical Techniques for Lexical Acquisition’, (Working Notes of AAAI Fall Symp. Series),Probabilistic Approaches to Natural Language. MIT Press, Cambridge, Massachusetts.Google Scholar
  8. Boggess, L., R. Agarwal and R. Davis: 1991, ‘Disambiguation of Prepositional Phrases in Automatically Labeled Technical Texts’,Proceedings of AAAI.Google Scholar
  9. Boguraev, B. and T. Briscoe (eds.): 1989,Computational Lexicography for Natural language processing, Longman.Google Scholar
  10. Bruce, R. and L. Guthrie: 1992, ‘Genus disambiguation a Study in Weighted Preference’,Proceedings of COLING, Nantes.Google Scholar
  11. Calzolari, N. and R. Bindi: 1990, ‘Acquisition of Lexical Information from Corpus’,Proceedings of COLING (August), Helsinki.Google Scholar
  12. Church, K.W. and P. Hanks: 1990, ‘Word Association Norms, Mutual Information, and Lexicography’,Computational Linguistics (March),16(1).Google Scholar
  13. Copestake A.: 1992, ‘The ACQUILEX LKB Representation Issues in Semi-Automatic Acquisition of Large Lexicons’,Proceedings of 3rd ANLP.Google Scholar
  14. Cutting, D., J. Kupiec, J. Pedersen and P. Sibun: 1992, ‘A Practical Part-of-Speech Tagger’,Proceedings of 3rd ANLP, Trento, Italy.Google Scholar
  15. Dagan, I. and A. Itai: 1990, ‘Automatic Processing for the Resolution of Anaphora References’,COLING,3, 330–332.Google Scholar
  16. Dahl, V.: 1989, ‘Discontinuous Grammars’,Computational Intelligence,5.Google Scholar
  17. Evens, M. (ed.): 1989,Relational Structures of the Lexicon, Cambridge University Press.Google Scholar
  18. Fasolo, M., L. Garbuio, N. Guarino: 1990, ‘Comprensione di Descrizioni di Attivita’ Economico-Produttive Espresse in Linguaggio Naturale’,Proceedings of GULP Conference, Padova.Google Scholar
  19. Jacobs, P.: 1988, ‘Making Sense of Lexical Acquisition’,Proceedings of AAAI88 (August), St. Paul.Google Scholar
  20. Grishman, R. and J. Sterling: 1992, ‘Acquisition of Selectional Patterns’,Proceedings of COLING.Google Scholar
  21. Guthrie J., L. Guthrie, Y. Wilks and H. Aidinejad: 1991, ‘Subject-Dependent Co-occurrence and Word Sense Disambiguation’, ACL 1990,Proceedings of ACL, Berkley, California.Google Scholar
  22. Yarowsky, D.: 1992, ‘Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora’,Proceedings of COLING 92, Nantes.Google Scholar
  23. Hindle, D.: ‘User Manual for Fidditch, A Deterministic Parser’,Naval Research Technical Memorandum, 7590–7142.Google Scholar
  24. Hindle, D.: 1990, ‘Noun Classification from Predicate Argument Structures’,Proceedings of ACL, Pittsburgh, Pennsylvania.Google Scholar
  25. Hindle, D. and M. Rooths: 1991, ‘Structural Ambiguity and Lexical Relations’,Proceedings of ACL, Berkley, California.Google Scholar
  26. Krovetz, R.: 1991, ‘Lexical Acquisition and Information Retrieval’, in U. Zernik and Lawrence Erlbaum (eds.),Lexical Acquisition Using On-line Resources to Build a Lexicon.Google Scholar
  27. Marziali, A.: 1992, ‘Laurea’, dissertation, University of Roma II, Dept. of Electrical Engineering.Google Scholar
  28. Pazienza, M.T. and P. Velardi: 1987, ‘A Structured Representation of Word Senses for Semantical Analysis’,3rd conf. of European Section of the ACL.Google Scholar
  29. Pazienza, M.T. and P. Velardi: 1991, ‘Knowledge Acquisition for Natural Language Processing Tools and Methods’,Proceedings of Int. Conf. on Current Issues in Computational Linguistics (June), University of Malaysia.Google Scholar
  30. Russo, M.: 1987, ‘A Generative Grammar Approach for the Morphological and Morphosyntactic Analysis of the Italian Language’,3rd. Conf. of the European Section of the ACL, (Copenhagen, April 1–3).Google Scholar
  31. Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: ‘Automatic Learning for Semantic Collocations’,Proceedings of 3rd ANLP.Google Scholar
  32. Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: 1992, ‘Linguistic Knowledge Generator’,Proceedings of COLING.Google Scholar
  33. Seo, J. and R. Simmons: 1989, ‘Syntactic Graphs a Representation of the Union of all the Parse Trees’,Computational Linguistics.Google Scholar
  34. Smadja, F.A.: 1989, ‘Lexical Co-occurrence — The Missing Link’,Literary and Linguistic Computing,4(3).Google Scholar
  35. Smadja, F.A.: 1989, ‘Macrocoding the Lexicon with Co-occurrence Knowledge’,First Lexical Acquisition Workshop (August), Detroit.Google Scholar
  36. Smadja, F.A. and K. McKewon: 1990, ‘Automatically Extracting and Repesenting Collocations for Language Generation’,Proceedings of ACL, Pittsburgh, Pennsylvania.Google Scholar
  37. Smadja, F.: 1991, ‘From N-Grams to Collocations an Evaluation of XTRACT’,Proceedings of ACL, Berkley, California.Google Scholar
  38. Sowa, J.: 1984,Conceptual Structures Information Processing in Mind and Machine, Addison-Wesley.Google Scholar
  39. Spath, H.: 1979,Cluster Analysis Algorithms, Ellis Hopwood.Google Scholar
  40. Velardi, P. and M.T. Pazienza: 1989, ‘Computer Aided Interpretation of Lexical Cooccurrences’,Proceedings of 27th. ACL.Google Scholar
  41. Velardi, P., M.T. Pazienza and M. Fasolo: 1991, ‘How to Encode Linguistic Knowledge a Method for Learning Representations and Computer-Aided Acquisition’,Computational Linguistics,2(17).Google Scholar
  42. Webster, M. and M. Marcus: 1989, ‘Automatic Acquisition of Lexical Semantics of Verbs from Sentence Frames’,Proceedings of ACL, Vancouver.Google Scholar
  43. Zernik, U.: 1989, ‘Lexical Acquisition Learning from Corpus by Capitalizing on Lexical Categories’,Proceedings of IJCAI.Google Scholar
  44. Zernik, U. and P. Jacobs: 1990, ‘Tagging for Learning Collecting Thematic Relations from Corpus’,Proceedings of COLING 90, (Helsinki, August).Google Scholar
  45. Zimmermann, H.: 1985,Fuzzy Set Theory - and Its Applications, Kluwer-Nijhoff Publishing.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Roberto Basili
    • 1
  • Maria Teresa Pazienza
    • 1
  • Paola Velardi
    • 2
  1. 1.Dip. di Ingegneria ElettronicaUniversita' di Roma “Tor Vergata”Italy
  2. 2.Istituto d'InformaticaUniversita' di AnconaItaly

Personalised recommendations