Machine Translation

, Volume 8, Issue 3, pp 175–201 | Cite as

Acquisition of selectional patterns in sublanguages

  • Roberto Basili
  • Maria Teresa Pazienza
  • Paola Velardi


When implementing computational lexicons it is important to keep in mind the texts that a NLP system must deal with. Words relate to each other in many different, often odd ways this information is rarely found in dictionaries, and it is quite hard to deduce a priori. In this paper we present a technique for the acquisition of statistically significant selectional restrictions from corpora and discuss the results of an experimental application with reference to two specific sublaguages (legal and commercial). We show that there are important cooccurrence preferences among words which cannot be established a priori as they are determined for each choice of sublanguage. The method for detecting cooccurrences is based on the analysis of word associations augmented with syntactic markers and semantic tags. Word pairs are extracted by a morphosyntactic analyzer and clustered according to their semantic tags. A statistical measure is applied to the data to evaluate the sigificance of any relations detected. Selectional restrictions are acquired by a two-step process. First, statistically prevailing ‘coarse grained’ conceptual patterns are used by a linguist to identify the relevant selectional restrictions in sublanguages. Second, semiautomatically acquired ‘coarse’ selectional restrictions are used as the ‘semantic bias’ of a system, ARIOSTO_LEX, for the automatic acquisition of a case-based semantic lexicon.


Statistical Measure Word Pair Computational Linguistic Word Association Language Translation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Basili, R., M.T. Pazienza and P. Velardi: 1992, ‘Computational Lexicons: The Neat Examples and the Odd Exemplars’,Proceedings of 3rd. ANLP.Google Scholar
  2. Basili, R., M.T. Pazienza, and P. Velardi: 1992, ‘Semi-Automatic Extraction of Linguistic Information for Syntactic Disambiguation’,Literary and Linguistic Computing,2.Google Scholar
  3. Basili, R., M.T. Pazienza, and P. Velardi: 1992, ‘Combining NLP and Statistical Techniques for Lexical Acquisition’, (Working notes of AAAI Fall Symp. Series),Probabilistic Approaches to Natural Language, Cambridge MA.Google Scholar
  4. Basili, R., M.T. Pazienza and P. Velardi: 1993 ‘What Can Be Learned from Raw Text?’,Machine Translation 8, 147–173.Google Scholar
  5. Boggess, L., R. Agarwal and R. Davis: 1991, ‘Disambiguation of Prepositional Phrases in Automatically Labeled Technical Texts’,Proceedings of AAAI.Google Scholar
  6. Boguraev, B. and T. Briscoe (eds.): 1989,Computatiomal Lexicography for Natural Language Processing, Longman.Google Scholar
  7. Boguraev, B.: 1991,Building a Lexicon the Contribution of Computers, IBM Report, T.J. Watson Research Center.Google Scholar
  8. Calzolari, N. and R. Bindi: ‘Acquisition of Lexical Information from Corpus’, in (COL 1990).Google Scholar
  9. Church, K.W. and P. Hanks: 1990, ‘Word Association Norms, Mutual Information, and Lexicography’,Computational Linguistics (March),16(1).Google Scholar
  10. Church, K., W. Gale, P. Hanks and D. Hindle: 1991, ‘Using Statistics in Lexical Analysis’, in (Zernik)Google Scholar
  11. Copestake, A.: 1992, ‘The ACQUILEX LKB Representation Issues in Semi-Automatic Acquisition of Large Lexicons’,Proceedings of 3rd ANLP.Google Scholar
  12. Dowty, D.: 1989, On the Semantic Content of the Notion of ‘Thematic Roles’, in Chierchiaet al. (eds.),Properties, Types and Meaning, Kluwer Academic Publishers.Google Scholar
  13. Evens, M.: 1988,Relational Models of the Lexicon, Cambridge University Press.Google Scholar
  14. Fillmore, C. L.: 1968, ‘The Case for Casers, in Bach and Harms (eds.),Universal in Linguistic theory, Holt, Rinehart and Winston, NY.Google Scholar
  15. Grishman, R. and J. Sterling: 1992, ‘The Acquisition of Selectional Patterns’,Proceedings of COLING.Google Scholar
  16. Guthrie, J., L. Guthrie, Y. Wilks and H. Aidinejad: 1991, ‘Subject-dependent Co-Occurrence and Word Sense Disambiguation’, in (ACL1991).Google Scholar
  17. Hindle, D.: 1990, ‘Noun Classification from Predicate Argument Structures’, in (ACL).Google Scholar
  18. Hindle, D. and M. Rooths: 1991, ‘Structural Ambiguity and Lexical Relations’, in (ACL1991).Google Scholar
  19. Yarowsky, D.: 1992, ‘Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora’,Proceedings of COLING.Google Scholar
  20. Kittredge, R.J.: 1987, ‘The Significance of a Sublanguage for Automatic Translation’, in S. Nirenburg (ed.),Machine Translation, Cambridge Press.Google Scholar
  21. Macpherson, M.: 1991, ‘Redefining the Level of the Word’, in (LS&KR).Google Scholar
  22. Montemagni, S. and L. Vanderwende: 1992, ‘Structural Patterns vrs. String Patterns for Extracting Semantic Information from Dictionaries’,Proceedings of COLING.Google Scholar
  23. Pazienza, M.T. and P. Velardi: 1991, ‘Knowledge Acquisition for Natural Language Processing Tools and Methods’.Proceedings of Int. Conference on Current Issues in Computational Linguistics, (Penang, June) Univ. of Malaysia.Google Scholar
  24. Sekine, S., J. Carroll, S. Ananiadou and J. Tsujii: 1992, ‘Automatic Learning for Semantic Collocations’,Proceedings of 3rd ANLP.Google Scholar
  25. Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: 1992, ‘Linguistic Knowledge Generator’,Proceedings of COLING.Google Scholar
  26. Sowa J.F.: 1984,Conceptual Structures in Mind and Machine, Addison Wesley.Google Scholar
  27. Smadja, F.A.: 1989, ‘Lexical Co-occurrence — The Missing Link’,Literary and Linguistic Computing,4(3).Google Scholar
  28. Smadja, F.: 1991, ‘1989’, Macrocoding the Lexicon with Co-occurrence Knowledge, First Lexical Acquisition Workshop, August, Detroit, and in (Zernik).Google Scholar
  29. Smadja, F. and K. McKewon: 1990, ‘Automatically Extracting and Repesenting Collocations for Language Generation’, in (ACL 1990).Google Scholar
  30. Smadja, F.: 1991, ‘From N-Grams to Collocations an Evaluation of XTRACT’, in (ACL1991).Google Scholar
  31. Velardi, P., M.T. Pazienza and M. De Giovanetti: 1988, ‘Conceptual Graphs for the Analysis and Generation of Sentences’, inIBM Journal of R&D, special issue on language processing, March.Google Scholar
  32. Velardi, P. and M.T. Pazienza: 1989, ‘Computer Aided Interpretation of Lexical Cooccurrences’,Proceedings of 27th. ACL.Google Scholar
  33. Velardi, P., M.T. Pazienza and M. Fasolo: 1991, ‘How to Encode Linguistic Knowledge a Method for Learning Representations and Computer-Aided Acquisition’,Computational Linguistics,2(17).Google Scholar
  34. Webster, M. and M. Marcus: 1989, ‘Automatic Acquisition of Lexical Semantics of Verbs from Sentence Frames’,Proceedings of ACL, (Vancouver).Google Scholar
  35. Zernik, U.: 1989, ‘Lexical Acquisition Learning from Corpus by Capitalizing on Lexical Categories’,Proceedings of IJCAI (Detroit).Google Scholar
  36. Zernik, U. and P. Jacobs: 1990, ‘Tagging for Learning Collecting Thematic Relations from Corpus’,Proceedings of COLING (Helsinki, August).Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Roberto Basili
    • 1
  • Maria Teresa Pazienza
    • 1
  • Paola Velardi
    • 2
  1. 1.Dip. di Ingegneria ElettronicaUniversita' di Roma “Tor Vergata”Italy
  2. 2.Istituto d'InformaticaUniversita’ di AnconaItaly

Personalised recommendations