Constructing Empirical Formulas for Testing Word Similarity by the Inductive Method of Model Self-Organization

  • Pavel Makagonov
  • Mikhail Alexandrov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2389)


Identification of words with the same base meaning is a necessary procedure for many algorithms of computational linguistics and text processing. We propose to use for this a knowledge-poor approach using an empirical formula based on the number of the coincident letters in the initial parts of the two words and the number of non-coincident letters in the final parts of these two words. To construct such a formula for a given language, we use inductive method of self-organization developed by A. Ivahnenko. This method considers a set of models (formulas) of a given class and selects the best ones using training samples and test samples. We give a detailed example for English. We also show how to apply the formula for creating word frequency list.


Empirical Formula Word Pair Automatic Documentation Mathematical Linguistics External Criterion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gelbukh, A. (1992): Effective implementation of morphology model for an inflectional natural language. J. Automatic Documentation and Mathematical Linguistics, Allerton Press, 26, NI, pp. 22–31.Google Scholar
  2. 2.
    Gelbukh, A. (ed.) (2000): Computational Linguistics and Intelligent Text Processing. Proc. of CICLing-2002, IPN, Mexico City, 2002, 430 pp.Google Scholar
  3. 3.
    Gelbukh, A. and G. Sidorov (2002): A Method for Development of Automatic Morphological Analysis Systems for Inflective Languages. In: Text, Speech and Dialogue, Proc. of TSD-2002, September 2002, Lecture Notes in Computer Science, Springer Verlag.Google Scholar
  4. 4.
    Gelbukh, A. (2002): A data structure for prefix search under access locality requirements and its application to spelling correction. J. Computación y Sistemas.Google Scholar
  5. 5.
    Ivahnenko, A. (1980): Manual on typical algorithms of modeling. Tehnika Publ., Kiev (in Russian).Google Scholar
  6. 6.
    Makagonov, P., Alexandrov, M. (2002): Empirical formula for testing word similarity and its application for constructing a word frequency list. In: A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing, Proc. of CICLing-2002, Lecture Notes in Computer Science, N 2276, Springer Verlag, pp. 425–432.CrossRefGoogle Scholar
  7. 7.
    Manning, D. C, Schutze, H. (1999): Foundations of statistical natural language processing. MIT Press.Google Scholar
  8. 8.
    Porter, M. (1980): An algorithm for suffix stripping. Program, 14, pp. 130–137.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Pavel Makagonov
    • 1
  • Mikhail Alexandrov
    • 2
  1. 1.Moscow Mayor’s DirectorateMoscow City GovernmentMoscowRussia
  2. 2.Center for Computing ResearchNational Polytechnic Institute (IPN)Mexico

Personalised recommendations