Abstract
This article presents a combination of unsupervised and supervised learning techniques for the generation of word segmentation rules from a raw list of words. First, a language bias for word segmentation is introduced and a simple genetic algorithm is used in the search for a segmentation that corresponds to the best bias value. In the second phase, the words segmented by the genetic algorithm are used as an input for the first order decision list learner CLOG. The result is a set of first order rules which can be used for segmentation of unseen words. When applied on either the training data or unseen data, these rules produce segmentations which are linguistically meaningful, and to a large degree conforming to the annotation provided.
Article PDF
Similar content being viewed by others
References
Antworth, E. (1991). Introduction to two-level phonology. Notes on Linguistics, 53, 4–18.
Bescherelle. (1990). Laconjugaison dictionnaire de douze mille verbes. Paris: Hatier.
Blockeel, H. (1994). Application of inductive logic programming to natural language processing. Master's Thesis, Katholieke Universiteit Leuven, Belgium.
Brent, M. (1999). An efficient, probabilistically sound algorithm for segmentation and word discovery. Machine Learning, 34, 71–106.
Brent, M., Lundberg, A.,& Murthy, S. (1995). Discovering morphemic suffixes: A case study in minimum description length induction. In Proc. of theFifth International Workshop on Artificial Intelligence and Statistics. Ft. Lauderdale, FL.: Vanderbilt University.
Brill, E. (1994). Some advances in transformation-based part of speech tagging. In: Proc. of the Twelfth National Conference on Artificial Intelligence (pp. 748–753), AAI Press/MIT Press.
Chomsky, N.& Halle, M. (1968). The Sound patterns in English. New York: Harper and Row.
Collins, M.& Singer, Y. (1999). Unsupervised models for named entity classification. In EMNLP-WVLC99.
Cussens, J. (1997). Part-of-speech tagging using Progol. In Proc. of the Seventh International Workshop on Inductive Logic Programming. (pp. 93–108).
Daelamans, W., van den Bosch, A.,& Weijters, A. (1997). IGTree: Using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11, 407–423.
de Saussure, F. (1916). Course in general linguistics. New York: Philosophical Library. 1959 edition.
Deligne, S. (1996). Modèles de séquences de longueurs variables: Application au traitement du langage écrit et de la parole. Ph.D. Thesis, ENST Paris, France.
Deligne, S.& Bimbot, F. (1997). Inference of variable-length linguistic and acoustic units by multigrams. Free Speech Journal, 1(4). http://cslu.cse.ogi.edu/fsj/.
Fradin, B. (1994). L'approche á deux niveaux en morphologie computationnelle et les dévelopments récents de la morphologie. Traitement automatique des langues, 35(2), 9–48.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.
Harris, Z. S. (1955). From phoneme to morpheme. Language, 31(2).
Havránek, B.& Jedlička, A. (1981). Českámluvnice. Prague, Czechoslovakia: SPN.
Kaplan, R. M. & Kay, M. (1994). Regular models of phonological rule systems. Computational Linguistics, 20(3), 331–379.
Kazakov, D. (1993). Modul pro komunikaci v přirozeném jazyce. Master's Thesis, Czech Technical University, Prague.
Kazakov, D. (1996). An inductive approach to natural language parser design. In K. Oflazer& H. Somers (Eds.), Proceedings of NeMLaP-2, Ankara (pp. 209–217). Bilkent University.
Kazakov, D. (1997). Unsupervised learning of naïve morphology with genetic algorithms. In W. Daelemans, A. Bosch,& A. Weijters (Eds.), Workshop Notes of the ECML/MLnet Workshop on Empirical Learning of Natural Language Processing Tasks. Prague, Czech Republic (pp. 105–112).
Kazakov,D. (2000a). Achievements and prospects of learningword morphology with inductive logic programming. In J. Cussens& S. Džeroski (Eds.), Learning Language in Logic. Berlin: Springer.
Kazakov,D. (2000b). Natural language applications of machine learning. Ph.D. thesis, CzechTechnical University, Prague, Czech Republic.
Kazakov, D.& Manandhar, S. (1998). A hybrid approach to word segmentation. In D. Page (Ed.), Proc. of the Eighth International Conference on Inductive Logic Programming, Madison, Wisconsin, (pp. 125–134). Springer-Verlag.
Knuth, D. (1973). The art of computer programming: Sorting and searching, (Vol. 3). Reading, Massachusetts: Addison-Wesley.
Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Finland: University of Helsinki, Dept. of General Linguistics.
Lavrač, N.& Džeroski, S. (1994). Inductive logic programming techniques and applications. Chichester: Ellis Horwood.
Ling, C. X. (1994). Learning the past tense of English verbs: The symbolic pattern associatior vs. connectionist models. Journal of Artificial Intelligence Research, 1, 209–229.
Manandhar, S., Džeroski, S.,& Erjavec, T. (1998). Learning multilingual morphology with CLOG. In D. Page (Ed.), Proc. of The Eighth International Conference on Inductive Logic Programming, Madison, Wisconsin, (pp. 135–144).
Matthews, P. H. (1974). Morphology: An introduction to the theory of word-structure. Cambridge University Press. First edition.
Matthews, P. H. (1997). The concise oxford dictionary of linguistics. Oxford University Press.
Mikheev, A. (1997). Automatic rule induction for unknown word guessing. Computational Linguistics, 23(3), 405–423.
Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.
Mooney, R. J.& Califf, M. E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, 3, 1–24.
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13, 245–286.
Muggleton, S.& Bain, M. (1999). Analogical prediction. In Proc. of the Ninth InternationalWorkshop on Inductive Logic Programming, Bled, Slovenia. Springer-Verlag.
Pirelli, V. (1993). Morphology, analogy and machine translation. Ph.D. Thesis, Salford University, UK
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Riloff, E. & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI-99, Orlando, FL (pp. 474–479).
Ritchie, G. D., Russel, G. J., Black, A. W.,& Pulman, S. G. (1992). Computational morphology: practical mechanisms for the English lexicon. London: MIT.
Rumelhart, D. E.& McClelland, J. (1986). On learning the past Tense of English verbs. In Parallel distributed processing (Vol. II, pp. 216–271). Cambridge, MA: MIT Press.
Shannon, C. E.& Weaver, W. (1963). The mathematical theory of communication. Urbana: University of Illinois Press.
Skoumalová, H. (1997). Czech lexicon by two-level morphology. In R. Marcinkevičiene and N. Volz (Eds.), Proceedings of the Second European Seminar of TELRI—"Language Applications for a Multilingual Europe," IDS/VDU, Mannheim/Kaunas (pp. 123–145).
Thompson, C. A., Califf, M. A.,& Mooney, R. J. (1999). Active learning for natural language parsing and information extraction. In ICML99 (pp. 406–414).
Yvon, F. (1997). Paradigmatic cascades:Alinguistically sound model of pronunciation by analogy. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistic (ACL), Madrid.
Zelle, J. M.& Mooney, R. J. (1993). Learning semantic grammars with constructive inductive logic programming. In Proceedings of AAAI-93 (pp. 817–822). Washington, DC: AAI Press/MIT Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kazakov, D., Manandhar, S. Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming. Machine Learning 43, 121–162 (2001). https://doi.org/10.1023/A:1007629103294
Issue Date:
DOI: https://doi.org/10.1023/A:1007629103294