Skip to main content

Acquisition of selectional patterns in sublanguages

Abstract

When implementing computational lexicons it is important to keep in mind the texts that a NLP system must deal with. Words relate to each other in many different, often odd ways this information is rarely found in dictionaries, and it is quite hard to deduce a priori. In this paper we present a technique for the acquisition of statistically significant selectional restrictions from corpora and discuss the results of an experimental application with reference to two specific sublaguages (legal and commercial). We show that there are important cooccurrence preferences among words which cannot be established a priori as they are determined for each choice of sublanguage. The method for detecting cooccurrences is based on the analysis of word associations augmented with syntactic markers and semantic tags. Word pairs are extracted by a morphosyntactic analyzer and clustered according to their semantic tags. A statistical measure is applied to the data to evaluate the sigificance of any relations detected. Selectional restrictions are acquired by a two-step process. First, statistically prevailing ‘coarse grained’ conceptual patterns are used by a linguist to identify the relevant selectional restrictions in sublanguages. Second, semiautomatically acquired ‘coarse’ selectional restrictions are used as the ‘semantic bias’ of a system, ARIOSTO_LEX, for the automatic acquisition of a case-based semantic lexicon.

This is a preview of subscription content, access via your institution.

References

  • Basili, R., M.T. Pazienza and P. Velardi: 1992, ‘Computational Lexicons: The Neat Examples and the Odd Exemplars’,Proceedings of 3rd. ANLP.

  • Basili, R., M.T. Pazienza, and P. Velardi: 1992, ‘Semi-Automatic Extraction of Linguistic Information for Syntactic Disambiguation’,Literary and Linguistic Computing,2.

  • Basili, R., M.T. Pazienza, and P. Velardi: 1992, ‘Combining NLP and Statistical Techniques for Lexical Acquisition’, (Working notes of AAAI Fall Symp. Series),Probabilistic Approaches to Natural Language, Cambridge MA.

  • Basili, R., M.T. Pazienza and P. Velardi: 1993 ‘What Can Be Learned from Raw Text?’,Machine Translation 8, 147–173.

    Google Scholar 

  • Boggess, L., R. Agarwal and R. Davis: 1991, ‘Disambiguation of Prepositional Phrases in Automatically Labeled Technical Texts’,Proceedings of AAAI.

  • Boguraev, B. and T. Briscoe (eds.): 1989,Computatiomal Lexicography for Natural Language Processing, Longman.

  • Boguraev, B.: 1991,Building a Lexicon the Contribution of Computers, IBM Report, T.J. Watson Research Center.

  • Calzolari, N. and R. Bindi: ‘Acquisition of Lexical Information from Corpus’, in (COL 1990).

  • Church, K.W. and P. Hanks: 1990, ‘Word Association Norms, Mutual Information, and Lexicography’,Computational Linguistics (March),16(1).

  • Church, K., W. Gale, P. Hanks and D. Hindle: 1991, ‘Using Statistics in Lexical Analysis’, in (Zernik)

  • Copestake, A.: 1992, ‘The ACQUILEX LKB Representation Issues in Semi-Automatic Acquisition of Large Lexicons’,Proceedings of 3rd ANLP.

  • Dowty, D.: 1989, On the Semantic Content of the Notion of ‘Thematic Roles’, in Chierchiaet al. (eds.),Properties, Types and Meaning, Kluwer Academic Publishers.

  • Evens, M.: 1988,Relational Models of the Lexicon, Cambridge University Press.

  • Fillmore, C. L.: 1968, ‘The Case for Casers, in Bach and Harms (eds.),Universal in Linguistic theory, Holt, Rinehart and Winston, NY.

    Google Scholar 

  • Grishman, R. and J. Sterling: 1992, ‘The Acquisition of Selectional Patterns’,Proceedings of COLING.

  • Guthrie, J., L. Guthrie, Y. Wilks and H. Aidinejad: 1991, ‘Subject-dependent Co-Occurrence and Word Sense Disambiguation’, in (ACL1991).

  • Hindle, D.: 1990, ‘Noun Classification from Predicate Argument Structures’, in (ACL).

  • Hindle, D. and M. Rooths: 1991, ‘Structural Ambiguity and Lexical Relations’, in (ACL1991).

  • Yarowsky, D.: 1992, ‘Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora’,Proceedings of COLING.

  • Kittredge, R.J.: 1987, ‘The Significance of a Sublanguage for Automatic Translation’, in S. Nirenburg (ed.),Machine Translation, Cambridge Press.

  • Macpherson, M.: 1991, ‘Redefining the Level of the Word’, in (LS&KR).

  • Montemagni, S. and L. Vanderwende: 1992, ‘Structural Patterns vrs. String Patterns for Extracting Semantic Information from Dictionaries’,Proceedings of COLING.

  • Pazienza, M.T. and P. Velardi: 1991, ‘Knowledge Acquisition for Natural Language Processing Tools and Methods’.Proceedings of Int. Conference on Current Issues in Computational Linguistics, (Penang, June) Univ. of Malaysia.

  • Sekine, S., J. Carroll, S. Ananiadou and J. Tsujii: 1992, ‘Automatic Learning for Semantic Collocations’,Proceedings of 3rd ANLP.

  • Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: 1992, ‘Linguistic Knowledge Generator’,Proceedings of COLING.

  • Sowa J.F.: 1984,Conceptual Structures in Mind and Machine, Addison Wesley.

  • Smadja, F.A.: 1989, ‘Lexical Co-occurrence — The Missing Link’,Literary and Linguistic Computing,4(3).

  • Smadja, F.: 1991, ‘1989’, Macrocoding the Lexicon with Co-occurrence Knowledge, First Lexical Acquisition Workshop, August, Detroit, and in (Zernik).

  • Smadja, F. and K. McKewon: 1990, ‘Automatically Extracting and Repesenting Collocations for Language Generation’, in (ACL 1990).

  • Smadja, F.: 1991, ‘From N-Grams to Collocations an Evaluation of XTRACT’, in (ACL1991).

  • Velardi, P., M.T. Pazienza and M. De Giovanetti: 1988, ‘Conceptual Graphs for the Analysis and Generation of Sentences’, inIBM Journal of R&D, special issue on language processing, March.

  • Velardi, P. and M.T. Pazienza: 1989, ‘Computer Aided Interpretation of Lexical Cooccurrences’,Proceedings of 27th. ACL.

  • Velardi, P., M.T. Pazienza and M. Fasolo: 1991, ‘How to Encode Linguistic Knowledge a Method for Learning Representations and Computer-Aided Acquisition’,Computational Linguistics,2(17).

  • Webster, M. and M. Marcus: 1989, ‘Automatic Acquisition of Lexical Semantics of Verbs from Sentence Frames’,Proceedings of ACL, (Vancouver).

  • Zernik, U.: 1989, ‘Lexical Acquisition Learning from Corpus by Capitalizing on Lexical Categories’,Proceedings of IJCAI (Detroit).

  • Zernik, U. and P. Jacobs: 1990, ‘Tagging for Learning Collecting Thematic Relations from Corpus’,Proceedings of COLING (Helsinki, August).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Basili, R., Pazienza, M.T. & Velardi, P. Acquisition of selectional patterns in sublanguages. Mach Translat 8, 175–201 (1993). https://doi.org/10.1007/BF00982638

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00982638

Keywords

  • Statistical Measure
  • Word Pair
  • Computational Linguistic
  • Word Association
  • Language Translation