Abstract
In this paper, we investigate the possibility of applying machine learning methods to data derived from the area of natural language and show how rules, induced by machine learning, are changed after the original data are compressed by grouping together entries, attributes, and attribute values. Also shown is how excessive compression of input data may affect the accuracy of induced rules.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Brady, J. (1991). Towards automatic categorization of concordances usingRoget’s International Thesaurus. InProceedings of the Third Midwest Artificial Intelligence and Cognitive Science Society Conference (pp. 93–97). Carbondale, IL.
Grzymala-Busse, J. W. (1989). An overview of the LERS1 learning system. InProceedings of the Second International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (pp. 838–844). New York: ACM Press.
Grzymala-Busse, J. W. (1990). On the reduction of instance space in learning from examples. InProceedings of the Fifth International Symposium on Methodologies for Intelligent Systems (pp. 388–395). Amsterdam: North-Holland.
Grzymala-Busse, J. W., &Than, S. (1992). On the compression of instance space in inductive learning. InProceedings of the Fourth Midwest Artificial Intelligence and Cognitive Science Society Conference (pp. 92–96), Utica, IL.
Hartmanis, J., &Stearns, R. E. (1966).Algebraic structure theory of sequential machines. Englewood Cliffs, NJ: Prentice-Hall.
Kibler, D., &Aha, D. W. (1987). Learning representative exemplars of concepts: An initial case study. InProceedings of the Fourth International Workshop on Machine Learning (pp. 24–30). Los Altos, CA: Morgan Kaufmann.
Michalski, R. S., &Chilausky, R. L. (1980). Knowledge acquisition by encoding expert rules versus computer induction from examples: A case study involving soybean pathology.International Journal of Man-Machine Studies,12, 63–87.
Old, J. (1991). Analysis of polysemy and homography of the word ’’ lead’’ inRoget’s International Thesaurus. InProceedings of the Third Midwest Artificial Intelligence and Cognitive Science Society Conference (pp. 98–102), Carbondale, IL.
Pawlak, Z. (1982). Rough sets. International Journal of Computer & Information Sciences,11, 341–356.
Pawlak, Z., Slowinski, K., &Slowinski, R. (1986). Rough classification of patients after highly selective vagotomy for duodenal ulcer.International Journal of Man-Machine Studies,24, 413–433.
Sedelow, W., &Sedelow, S. (1992). Toward generic artificial intelligence: A different tack. InProceedings of the Fourth Midwest Artificial Intelligence and Cognitive Science Society Conference (pp. 122–130). Utica, IL.
Van de Velde, W. (1988). Learning through progressive refinement. InProceedings of the EWSL 88, Third European Working Session on Learning (pp. 211–226). Marshfield, MA: Pitman.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Grzymala-Busse, J.W., Than, S. Data compression in machine learning applied to natural language. Behavior Research Methods, Instruments, & Computers 25, 318–321 (1993). https://doi.org/10.3758/BF03204518
Issue Date:
DOI: https://doi.org/10.3758/BF03204518