Abstract
Clustering of data is a difficult problem that is related to various fields and applications. Challenge is greater, as input space dimensions become larger and feature scales are different from each other. Hierarchical clustering methods are more flexible than their partitioning counterparts, as they do not need the number of clusters as input. Still, plain hierarchical clustering does not provide a satisfactory framework for extracting meaningful results in such cases. Major drawbacks have to be tackled, such as curse of dimensionality and initial error propagation, as well as complexity and data set size issues. In this paper we propose an unsupervised extension to hierarchical clustering in the means of feature selection, in order to overcome the first drawback, thus increasing the robustness of the whole algorithm. The results of the application of this clustering to a portion of dataset in question are then refined and extended to the whole dataset through a classification step, using k-nearest neighbor classification technique, in order to tackle the latter two problems. The performance of the proposed methodology is demonstrated through the application to a variety of well known publicly available data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hirota, K., Pedrycz, W.: Fuzzy computing for data mining. Proceedings of the IEEE 87, 1575–1600 (1999)
Kohavi, R., Sommerfield, D.: Feature Subset Selection Using theWrapper Model: Overfitting and Dynamic Search Space Topology. In: Proceedings of KDD-1995 (1995)
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning 40, 203–229 (2000)
Miyamoto, S.: Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, Dordrecht (1990)
Swiniarski, R.W., Skowron, A.: Rough set methods in feature selection and recognition. Pattern Recognition Letters 24, 833–849 (2003)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1998)
Tsapatsoulis, N., Wallace, M., Kasderidis, S.: Improving the Performance of Resource Allocation Networks through Hierarchical Clustering of High – Dimensional Data. In: Proceedings of the International Conference on Artificial Neural Networks (ICANN), Istanbul, Turkey (2003)
Wallace, M., Stamou, G.: Towards a Context Aware Mining of User Interests for Consumption of Multimedia Documents. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Lausanne, Switzerland (2002)
Yager, R.R.: Intelligent control of the hierarchical agglomerative clustering process. IEEE Transactions on Systems, Man and Cybernetics, Part B 30(6), 835–845 (2000); Tsapatsoulis, N., Wallace, M. and Kasderidis, S.
Wallace, M., Mylonas, P.: Detecting and Verifying Dissimilar Patterns in Unlabelled Data. In: 8th Online World Conference on Soft Computing in Industrial Applications, September 29-October 17 (2003)
Mitchell, T.M.: Machine Learning. McGraw-Hill Companies, Inc., New York (1997)
Wallace, M., Kollias, S.: Soft Attribute Selection for Hierarchical Clustering in High Dimensions. In: Proceedings of the International Fuzzy Systems Association World Congress( IFSA), Istanbul, Turkey, June-July (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mylonas, P., Wallace, M., Kollias, S. (2004). Using k-Nearest Neighbor and Feature Selection as an Improvement to Hierarchical Clustering. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-24674-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21937-8
Online ISBN: 978-3-540-24674-9
eBook Packages: Springer Book Archive