What can be learned from raw texts?

Basili, Roberto; Pazienza, Maria Teresa; Velardi, Paola

doi:10.1007/BF00982637

What can be learned from raw texts?

An integrated tool for the acquisition of case roles, taxonomic relations and disambiguation criteria

Published: September 1993

Volume 8, pages 147–173, (1993)
Cite this article

Machine Translation

Roberto Basili¹,
Maria Teresa Pazienza¹ &
Paola Velardi²

43 Accesses
8 Citations
Explore all metrics

Abstract

The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However, the methods by which lexical knowledge should be extracted from plain texts is still a matter of debate and experimentation. In this paper we present an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines typical NLP techniques, such as (shallow) syntax and semantic markers, with numerical processing. The lexical data extracted by this method, calledclustered association data, are used for a variety of interesting purposes, such as the detection of selectional restrictions, the derivation of syntactic ambiguity criteria and the acquisition of taxonomic relations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ACL 1990,Proceedings of ACL 90, Pittsburgh, Pennsylvania, 1990.
ACL 1991,Proceedings of ACL 91, Berkeley, California, 1991.
Antonacci, F., M.T. Pazienza, M. Russo and P. Velardi: 1989, ‘Representation and Control Strategies for Large Knowledge Domains — An Application to NLP’,Applied Artificial Intelligence,4.
Basili, R., M.T. Pazienza and P. Velardi: 1992a, ‘Computational Lexicons — The Neat Examples and the Odd Exemplars’,Proceedings of 3rd. ANLP.
Basili, R., M.T. Pazienza and P. Velardi: 1993, ‘Semi-Automatic Extraction of Linguistic Information for Syntactic Disambiguation’,Applied Artificial Intelligence 7.
Basili, R., M.T. Pazienza and P. Velardi: 1992b, ‘A Shallow Syntactic Analyzer to Extract Word Associations from Corpora’,Literary and Linguistic Computing.
Basili, R., M.T. Pazienza, and P. Velardi: 1992c, ‘Combining NLP and Statistical Techniques for Lexical Acquisition’, (Working Notes of AAAI Fall Symp. Series),Probabilistic Approaches to Natural Language. MIT Press, Cambridge, Massachusetts.
Google Scholar
Boggess, L., R. Agarwal and R. Davis: 1991, ‘Disambiguation of Prepositional Phrases in Automatically Labeled Technical Texts’,Proceedings of AAAI.
Boguraev, B. and T. Briscoe (eds.): 1989,Computational Lexicography for Natural language processing, Longman.
Bruce, R. and L. Guthrie: 1992, ‘Genus disambiguation a Study in Weighted Preference’,Proceedings of COLING, Nantes.
Calzolari, N. and R. Bindi: 1990, ‘Acquisition of Lexical Information from Corpus’,Proceedings of COLING (August), Helsinki.
Church, K.W. and P. Hanks: 1990, ‘Word Association Norms, Mutual Information, and Lexicography’,Computational Linguistics (March),16(1).
Copestake A.: 1992, ‘The ACQUILEX LKB Representation Issues in Semi-Automatic Acquisition of Large Lexicons’,Proceedings of 3rd ANLP.
Cutting, D., J. Kupiec, J. Pedersen and P. Sibun: 1992, ‘A Practical Part-of-Speech Tagger’,Proceedings of 3rd ANLP, Trento, Italy.
Dagan, I. and A. Itai: 1990, ‘Automatic Processing for the Resolution of Anaphora References’,COLING,3, 330–332.
Google Scholar
Dahl, V.: 1989, ‘Discontinuous Grammars’,Computational Intelligence,5.
Evens, M. (ed.): 1989,Relational Structures of the Lexicon, Cambridge University Press.
Fasolo, M., L. Garbuio, N. Guarino: 1990, ‘Comprensione di Descrizioni di Attivita’ Economico-Produttive Espresse in Linguaggio Naturale’,Proceedings of GULP Conference, Padova.
Jacobs, P.: 1988, ‘Making Sense of Lexical Acquisition’,Proceedings of AAAI88 (August), St. Paul.
Grishman, R. and J. Sterling: 1992, ‘Acquisition of Selectional Patterns’,Proceedings of COLING.
Guthrie J., L. Guthrie, Y. Wilks and H. Aidinejad: 1991, ‘Subject-Dependent Co-occurrence and Word Sense Disambiguation’, ACL 1990,Proceedings of ACL, Berkley, California.
Yarowsky, D.: 1992, ‘Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora’,Proceedings of COLING 92, Nantes.
Hindle, D.: ‘User Manual for Fidditch, A Deterministic Parser’,Naval Research Technical Memorandum, 7590–7142.
Hindle, D.: 1990, ‘Noun Classification from Predicate Argument Structures’,Proceedings of ACL, Pittsburgh, Pennsylvania.
Hindle, D. and M. Rooths: 1991, ‘Structural Ambiguity and Lexical Relations’,Proceedings of ACL, Berkley, California.
Krovetz, R.: 1991, ‘Lexical Acquisition and Information Retrieval’, in U. Zernik and Lawrence Erlbaum (eds.),Lexical Acquisition Using On-line Resources to Build a Lexicon.
Marziali, A.: 1992, ‘Laurea’, dissertation, University of Roma II, Dept. of Electrical Engineering.
Pazienza, M.T. and P. Velardi: 1987, ‘A Structured Representation of Word Senses for Semantical Analysis’,3rd conf. of European Section of the ACL.
Pazienza, M.T. and P. Velardi: 1991, ‘Knowledge Acquisition for Natural Language Processing Tools and Methods’,Proceedings of Int. Conf. on Current Issues in Computational Linguistics (June), University of Malaysia.
Russo, M.: 1987, ‘A Generative Grammar Approach for the Morphological and Morphosyntactic Analysis of the Italian Language’,3rd. Conf. of the European Section of the ACL, (Copenhagen, April 1–3).
Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: ‘Automatic Learning for Semantic Collocations’,Proceedings of 3rd ANLP.
Sekine, S., J. Carrol, S. Ananiadou and J. Tsujii: 1992, ‘Linguistic Knowledge Generator’,Proceedings of COLING.
Seo, J. and R. Simmons: 1989, ‘Syntactic Graphs a Representation of the Union of all the Parse Trees’,Computational Linguistics.
Smadja, F.A.: 1989, ‘Lexical Co-occurrence — The Missing Link’,Literary and Linguistic Computing,4(3).
Smadja, F.A.: 1989, ‘Macrocoding the Lexicon with Co-occurrence Knowledge’,First Lexical Acquisition Workshop (August), Detroit.
Smadja, F.A. and K. McKewon: 1990, ‘Automatically Extracting and Repesenting Collocations for Language Generation’,Proceedings of ACL, Pittsburgh, Pennsylvania.
Smadja, F.: 1991, ‘From N-Grams to Collocations an Evaluation of XTRACT’,Proceedings of ACL, Berkley, California.
Sowa, J.: 1984,Conceptual Structures Information Processing in Mind and Machine, Addison-Wesley.
Spath, H.: 1979,Cluster Analysis Algorithms, Ellis Hopwood.
Velardi, P. and M.T. Pazienza: 1989, ‘Computer Aided Interpretation of Lexical Cooccurrences’,Proceedings of 27th. ACL.
Velardi, P., M.T. Pazienza and M. Fasolo: 1991, ‘How to Encode Linguistic Knowledge a Method for Learning Representations and Computer-Aided Acquisition’,Computational Linguistics,2(17).
Webster, M. and M. Marcus: 1989, ‘Automatic Acquisition of Lexical Semantics of Verbs from Sentence Frames’,Proceedings of ACL, Vancouver.
Zernik, U.: 1989, ‘Lexical Acquisition Learning from Corpus by Capitalizing on Lexical Categories’,Proceedings of IJCAI.
Zernik, U. and P. Jacobs: 1990, ‘Tagging for Learning Collecting Thematic Relations from Corpus’,Proceedings of COLING 90, (Helsinki, August).
Zimmermann, H.: 1985,Fuzzy Set Theory - and Its Applications, Kluwer-Nijhoff Publishing.

Download references

Author information

Authors and Affiliations

Dip. di Ingegneria Elettronica, Universita' di Roma “Tor Vergata”, Italy
Roberto Basili & Maria Teresa Pazienza
Istituto d'Informatica, Universita' di Ancona, Italy
Paola Velardi

Authors

Roberto Basili
View author publications
You can also search for this author in PubMed Google Scholar
Maria Teresa Pazienza
View author publications
You can also search for this author in PubMed Google Scholar
Paola Velardi
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Basili, R., Pazienza, M.T. & Velardi, P. What can be learned from raw texts?. Mach Translat 8, 147–173 (1993). https://doi.org/10.1007/BF00982637

Download citation

Issue Date: September 1993
DOI: https://doi.org/10.1007/BF00982637

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

What can be learned from raw texts?

Abstract

Access this article

Similar content being viewed by others

Searching for extended units of meaning—and what to do when you find them

Beyond lexical frequencies: using R for text analysis in the digital humanities

Chinese lexical database (CLD)

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

What can be learned from raw texts?

Abstract

Access this article

Similar content being viewed by others

Searching for extended units of meaning—and what to do when you find them

Beyond lexical frequencies: using R for text analysis in the digital humanities

Chinese lexical database (CLD)

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation