Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming

Kazakov, Dimitar; Manandhar, Suresh

doi:10.1023/A:1007629103294

Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming

Published: April 2001

Volume 43, pages 121–162, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming

Download PDF

Dimitar Kazakov¹ &
Suresh Manandhar¹

8293 Accesses
14 Citations
Explore all metrics

Abstract

This article presents a combination of unsupervised and supervised learning techniques for the generation of word segmentation rules from a raw list of words. First, a language bias for word segmentation is introduced and a simple genetic algorithm is used in the search for a segmentation that corresponds to the best bias value. In the second phase, the words segmented by the genetic algorithm are used as an input for the first order decision list learner CLOG. The result is a set of first order rules which can be used for segmentation of unseen words. When applied on either the training data or unseen data, these rules produce segmentations which are linguistically meaningful, and to a large degree conforming to the annotation provided.

Article PDF

Methods and Algorithms for Unsupervised Learning of Morphology

Grammar Induction - Experimental Results

Artificial Intelligence Problems and Combinatorial Optimization

Article 27 July 2023

References

Antworth, E. (1991). Introduction to two-level phonology. Notes on Linguistics, 53, 4–18.
Google Scholar
Bescherelle. (1990). Laconjugaison dictionnaire de douze mille verbes. Paris: Hatier.
Google Scholar
Blockeel, H. (1994). Application of inductive logic programming to natural language processing. Master's Thesis, Katholieke Universiteit Leuven, Belgium.
Google Scholar
Brent, M. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34, 71–106.
Google Scholar
Brent, M., Lundberg, A.,& Murthy, S. (1995). Discovering morphemic suffixes: A case study in minimum description length induction. In Proc. of theFifth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, FL.: Vanderbilt University.
Google Scholar
Brill, E. (1994). Some advances in transformation-based part of speech tagging. In: Proc. of the Twelfth National Conference on Artificial Intelligence (pp. 748–753), AAI Press/MIT Press.
Chomsky, N.& Halle, M. (1968). The Sound patterns in English. New York: Harper and Row.
Google Scholar
Collins, M.& Singer, Y. (1999). Unsupervised models for named entity classification. In EMNLP-WVLC99.
Cussens, J. (1997). Part-of-speech tagging using Progol. In Proc. of the Seventh International Workshop on Inductive Logic Programming. (pp. 93–108).
Daelamans, W., van den Bosch, A.,& Weijters, A. (1997). IGTree: Using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11, 407–423.
Google Scholar
de Saussure, F. (1916). Course in general linguistics. New York: Philosophical Library. 1959 edition.
Google Scholar
Deligne, S. (1996). Modèles de séquences de longueurs variables: Application au traitement du langage écrit et de la parole. Ph.D. Thesis, ENST Paris, France.
Google Scholar
Deligne, S.& Bimbot, F. (1997). Inference of variable-length linguistic and acoustic units by multigrams. Free Speech Journal, 1(4). http://cslu.cse.ogi.edu/fsj/.
Fradin, B. (1994). L'approche á deux niveaux en morphologie computationnelle et les dévelopments récents de la morphologie. Traitement automatique des langues, 35(2), 9–48.
Google Scholar
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.
Harris, Z. S. (1955). From phoneme to morpheme. Language, 31(2).
Havránek, B.& Jedlička, A. (1981). Českámluvnice. Prague, Czechoslovakia: SPN.
Google Scholar
Kaplan, R. M. & Kay, M. (1994). Regular models of phonological rule systems. Computational Linguistics, 20(3), 331–379.
Google Scholar
Kazakov, D. (1993). Modul pro komunikaci v přirozeném jazyce. Master's Thesis, Czech Technical University, Prague.
Google Scholar
Kazakov, D. (1996). An inductive approach to natural language parser design. In K. Oflazer& H. Somers (Eds.), Proceedings of NeMLaP-2, Ankara (pp. 209–217). Bilkent University.
Google Scholar
Kazakov, D. (1997). Unsupervised learning of naïve morphology with genetic algorithms. In W. Daelemans, A. Bosch,& A. Weijters (Eds.), Workshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks. Prague, Czech Republic (pp. 105–112).
Kazakov,D. (2000a). Achievements and prospects of learningword morphology with inductive logic programming. In J. Cussens& S. Džeroski (Eds.), Learning Language in Logic. Berlin: Springer.
Google Scholar
Kazakov,D. (2000b). Natural language applications of machine learning. Ph.D. thesis, CzechTechnical University, Prague, Czech Republic.
Kazakov, D.& Manandhar, S. (1998). A hybrid approach to word segmentation. In D. Page (Ed.), Proc. of the Eighth International Conference on Inductive Logic Programming, Madison, Wisconsin, (pp. 125–134). Springer-Verlag.
Google Scholar
Knuth, D. (1973). The art of computer programming: Sorting and searching, (Vol. 3). Reading, Massachusetts: Addison-Wesley.
Google Scholar
Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Finland: University of Helsinki, Dept. of General Linguistics.
Google Scholar
Lavrač, N.& Džeroski, S. (1994). Inductive logic programming techniques and applications. Chichester: Ellis Horwood.
Google Scholar
Ling, C. X. (1994). Learning the past tense of English verbs: The symbolic pattern associatior vs. connectionist models. Journal of Artificial Intelligence Research, 1, 209–229.
Google Scholar
Manandhar, S., Džeroski, S.,& Erjavec, T. (1998). Learning multilingual morphology with CLOG. In D. Page (Ed.), Proc. of The Eighth International Conference on Inductive Logic Programming, Madison, Wisconsin, (pp. 135–144).
Matthews, P. H. (1974). Morphology: An introduction to the theory of word-structure. Cambridge University Press. First edition.
Matthews, P. H. (1997). The concise oxford dictionary of linguistics. Oxford University Press.
Mikheev, A. (1997). Automatic rule induction for unknown word guessing. Computational Linguistics, 23(3), 405–423.
Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Google Scholar
Mooney, R. J.& Califf, M. E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, 3, 1–24.
Google Scholar
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13, 245–286.
Google Scholar
Muggleton, S.& Bain, M. (1999). Analogical prediction. In Proc. of the Ninth InternationalWorkshop on Inductive Logic Programming, Bled, Slovenia. Springer-Verlag.
Google Scholar
Pirelli, V. (1993). Morphology, analogy and machine translation. Ph.D. Thesis, Salford University, UK
Google Scholar
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Riloff, E. & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI-99, Orlando, FL (pp. 474–479).
Ritchie, G. D., Russel, G. J., Black, A. W.,& Pulman, S. G. (1992). Computational morphology: practical mechanisms for the English lexicon. London: MIT.
Google Scholar
Rumelhart, D. E.& McClelland, J. (1986). On learning the past Tense of English verbs. In Parallel distributed processing (Vol. II, pp. 216–271). Cambridge, MA: MIT Press.
Google Scholar
Shannon, C. E.& Weaver, W. (1963). The mathematical theory of communication. Urbana: University of Illinois Press.
Google Scholar
Skoumalová, H. (1997). Czech lexicon by two-level morphology. In R. Marcinkevičiene and N. Volz (Eds.), Proceedings of the Second European Seminar of TELRI—"Language Applications for a Multilingual Europe," IDS/VDU, Mannheim/Kaunas (pp. 123–145).
Google Scholar
Thompson, C. A., Califf, M. A.,& Mooney, R. J. (1999). Active learning for natural language parsing and information extraction. In ICML99 (pp. 406–414).
Yvon, F. (1997). Paradigmatic cascades:Alinguistically sound model of pronunciation by analogy. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistic (ACL), Madrid.
Zelle, J. M.& Mooney, R. J. (1993). Learning semantic grammars with constructive inductive logic programming. In Proceedings of AAAI-93 (pp. 817–822). Washington, DC: AAI Press/MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

University of York, Heslington, York, YO10 5DD, UK
Dimitar Kazakov & Suresh Manandhar

Authors

Dimitar Kazakov
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Manandhar
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kazakov, D., Manandhar, S. Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming. Machine Learning 43, 121–162 (2001). https://doi.org/10.1023/A:1007629103294

Download citation

Issue Date: April 2001
DOI: https://doi.org/10.1023/A:1007629103294

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming

Abstract

Article PDF

Similar content being viewed by others

Methods and Algorithms for Unsupervised Learning of Morphology

Grammar Induction - Experimental Results

Artificial Intelligence Problems and Combinatorial Optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming

Abstract

Article PDF

Similar content being viewed by others

Methods and Algorithms for Unsupervised Learning of Morphology

Grammar Induction - Experimental Results

Artificial Intelligence Problems and Combinatorial Optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation