Abstract
Memory-Based Reasoning and K-Nearest Neighbor Searching are frequently adopted data mining techniques. But, they suffer from scalability. Indexing is a promising solution. However, it is difficult to index categorical attributes, since there does not exist linear ordering property among categories in a nominal attribute. In this paper, we proposed heuristic algorithms to map categories to numbers. Distance relationships among categories are preserved as many as possible. We empirically studied the performance of the algorithms under different distance situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bennett, K.P., Fayyad, U., Geiger, D.: Density-Based Indexing for Approximate Nearest-Neighbor Queries. In: ACM KDD, pp. 233–243 (1999)
Berry, M.J.A., Linoff, G.: Memory-Based Reasoning, ch. 9. In: Data Mining Techniques: for Marketing, Sales, and Customer Support, pp. 157–186 (1997)
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: ACM KDD, pp. 73–83 (1999)
David, G., John, K., Prabhakar, R.: Clustering Categorical Data: An Approach Based on Dynamical Systems. In: VLDB Conference, pp. 311–322 (1998)
Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: ACM SIGMOD, pp. 47–57 (1984)
Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Toward removing the Curse of Dimensionality. In: ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Stanfill, C., Waltz, D.: Toward memory-based Reasoning. Communications of the ACM 29, 1213–1228 (1986)
Zhang, J.: Selecting Typical Instances in Instance-Based Learning. In: Proceedings of the Ninth International Conference on Machine Learning, pp. 470–479 (1992)
Kai, Y., Xiaowei, X., Jianhua, T., Ester, M., Kriegel, H.-P.: Instance Selection Techniques for Memory-Based Collaborative Filtering. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kuo, HC., Lin, YS., Huang, JP. (2004). Distance Preserving Mapping from Categories to Numbers for Indexing. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30133-2_166
Download citation
DOI: https://doi.org/10.1007/978-3-540-30133-2_166
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23206-3
Online ISBN: 978-3-540-30133-2
eBook Packages: Springer Book Archive