LEFT–Logical Expressions Feature Transformation: A Framework for Transformation of Symbolic Features
Abstract
The accuracy of a classifier relies heavily on the encoding and representation of input data. Many machine learning algorithms require that the input vectors be composed of numeric values on which arithmetic and comparison operators be applied. However, many real life applications involve the collection of data, which is symbolic or ‘nominal type’ data, on which these operators are not available. This paper presents a framework called logical expression feature transformation (LEFT), which can be used for mapping symbolic attributes to a continuous domain, for further processing by a learning machine. It is a generic method that can be used with any suitable clustering method and any appropriate distance metric. The proposed method was tested on synthetic and real life datasets. The results show that this framework not only achieves dimensionality reduction but also improves the accuracy of a classifier.
Keywords
Feature Vector Symbolic Data Logical Expression Binary Encode Breast Cancer DatasetPreview
Unable to display preview. Download preview PDF.
References
- 1.Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons (2000)Google Scholar
- 2.Ralambondrainy, H.: A conceptual version of the k-means algorithm. Pattern Recognition Letters 16, 1147–1157 (1995)CrossRefGoogle Scholar
- 3.Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
- 4.Hernández-Pereira, E., Suárez-Romero, J., Fontenla-Romero, O., Alonso-Betanzos, A.: Conversion methods for symbolic features: A comparison applied to an intrusion detection problem. Expert Systems with Applications 36, 10612–10617 (2009)CrossRefGoogle Scholar
- 5.Nagabhushan, P., Gowda, K.C., Diday, E.: Dimensionality reduction of symbolic data. Pattern Recognition Letters 16, 219–223 (1995)CrossRefGoogle Scholar
- 6.Michalski, R.S., Stepp, R.E.: Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–410 (1983)CrossRefGoogle Scholar
- 7.Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons (1990)Google Scholar
- 8.Huang, Z.: Extenstions to the k-means algorithm for clustering large data sets with categorial values. Data Mining and Knowledge Discovery 2, 283–304 (1998)CrossRefGoogle Scholar
- 9.Guyon, I., Saffari, A., Dror, G., Cawley, G.: Agnostic learning vs. prior knowledge challenge. In: Proceedings of International Joint Conference on Neural Networks (August 2007)Google Scholar
- 10.Saffari, A., Guyon, I.: Quick start guide for CLOP (May 2006), http://ymer.org/research/files/clop/QuickStartV1.0.pdf
- 11.Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
- 12.Knopf, A.A.: Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms. G. H. Lincoff (Pres.), New York (1981)Google Scholar
- 13.Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
- 14.Zwitter, M., Soklic, M.: Breast cancer data. Institute of Oncology, University Medical Center, Ljubljana, Yugoslavia (1988); Donors: Tan, M., Schlimmer, J.,Google Scholar
- 15.Aha, D.W.: Incremental constructive induction: An instance-based approach. In: Proceedings of the Eighth International Workshop on Machine Learning (1991)Google Scholar