Skip to main content
Log in

An occurrence-based model of word categorization

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Small corpora present problems for traditional statistical analysis because of their sparsity of data. We discuss a methodology for classifying words in edited, plain text corpora which has the potential for working on relatively small corpora. This approach, which we calloccurrence-based processing, counts which contexts occur around a given word, but pays no attention to the number of times that each context occurs. We obtain good results on an artificial language and compare our results to Elman's connectionist analysis of the same artificial language. We obtain more modest results on real world corpora, but the results are sufficient to draw some methodological and language theoretical conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A.V. Aho, J.E. Hopcroft and J.D. Ulman,Data Structures and Algorithms (Addison-Wesley Reading, MA, 1983).

    Google Scholar 

  2. P.A. Bensch, Neostructuralism: A commentary on the correlations between the work of Zelig Harris and Jeffrey Elman, CRL Newsletter, 5.2, Center for Research in Language, University of California, San Diego (1991).

    Google Scholar 

  3. P.A. Bensch, Occurrence-based word categorization, Doctoral Dissertation, University of California, San Diego (1993).

    Google Scholar 

  4. M. Brent, Semantic classification of verbs from their syntactic contexts: Automated lexicography with implications for child language acquisition,Proceedings of the 12th Meeting of the Cognitive Science Society (1990) pp. 428–437.

  5. E. Brill and M. Marcus, Tagging an unfamiliar text with minimum human supervision,Working Notes of the AAAI Fall Symp. on Probabilistic Approaches to Natural Language, ed. R. Goldman (AAAI Press, 1992).

  6. N. Chomsky,Aspects of the Theory of Syntax (MIT Press, Cambridge, MA, 1965).

    Google Scholar 

  7. N. Chomsky,Lectures on Government and Binding: The Pisa Lectures (Foris, Dordrecht, Holland, 1981).

    Google Scholar 

  8. R.O. Dudaand P.E. Hart,Pattern Classification and Scene Analysis (Wiley, New York, 1973).

    Google Scholar 

  9. J.L. Elman, Representation and structure in connectionist models, CRL Technical Report 8903, Center for Research in Language, University of California, San Diego (1989).

    Google Scholar 

  10. J.L. Elman, Finding structure in time, Cognitive Sci. 14(1990)179–211.

    Google Scholar 

  11. S. Finch and N. Chater, Bootstrapping syntactic categories using statistical methods,Background and Experiments in Machine Learning of Natural Language, ed. W. Daelemans and D. Powers (Institute for Language Technology and AI, Tilburg University).

  12. T. Givon,SYNTAX: A Functional-Typological Introduction, Vol. I (Benjamins, Amsterdam, 1984).

    Google Scholar 

  13. J. Grimshaw, Form, function, and the language acquisition device,The Logical Problem of Language Acquisition, ed. C.L. Baker and J.J. McCarthy (MIT Press, Cambridge, MA, 1981).

    Google Scholar 

  14. R. Grishman, L. Hirschman and N.T. Nhan, Discovery procedures for sublanguage selectional patterns: Initial experiments, Comput. Linguistics 12.3(1986)205–215.

    Google Scholar 

  15. Z. Harris,A Grammar of English on Mathematical Principles (Wiley, New York, 1982).

    Google Scholar 

  16. Z. Harris,The Form of Information in Science: Analysis of an Immunology Sublanguage (Kluwer Academic, Dordrecht, The Netherlands, 1989).

    Google Scholar 

  17. T.C. Hu,Combinatorial Algorithms (Addison-Wesley, Reading, MA, 1982).

    Google Scholar 

  18. J. Macnamara,Names for Things: A Study of Child Language (Bradford Books/MIT Press, Cambridge, MA, 1982).

    Google Scholar 

  19. S. Pinker,Language Learnability and Language Development (Harvard University Press, Cambridge, MA, 1984).

    Google Scholar 

  20. S. Pinker,Learnability and Cognition: The Acquisition of Argument Structure (MIT Press, Cambridge, MA, 1989).

    Google Scholar 

  21. A. Radford,Transformational Grammar: A First Course (Cambridge University Press, Cambridge, 1988).

    Google Scholar 

  22. H. Schutze, Part-of-speech induction from scratch, manuscript to be presented at ACL93.

  23. J.R. Taylor,Linguistic Categorization: Prototypes in Linguistic Theory (Clarendon Press, Oxford, 1989).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bensch, P.A., Savitch, W.J. An occurrence-based model of word categorization. Ann Math Artif Intell 14, 1–16 (1995). https://doi.org/10.1007/BF01530891

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01530891

Keywords

Navigation