Abstract
Clustering is a powerful exploratory technique for extracting the knowledge of given data. Several clustering techniques that have been proposed require predetermined number of clusters. However, the triangular kernel-nearest neighbor-based clustering (TKNN) has been proven able to determine the number and member of clusters automatically. TKNN provides good solutions for clustering non-spherical and high-dimensional data without prior knowledge of data labels. On the other hand, there is no definite measure to evaluate the accuracy of the clustering result. In order to evaluate the performance of the proposed TKNN clustering algorithm, we utilized various benchmark classification datasets. Thus, TKNN is proposed for discovering true clusters with arbitrary shape, size and density contained in the datasets. The experimental results on benched-mark datasets showed the effectiveness of our technique. Our proposed TKNN achieved more accurate clustering results and required less time processing compared with k-means, ILGC, DBSCAN and KFCM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, T.K.: Kernel Density Estimation and K-means Clustering to Profile Road Accident Hotspots. Accident Analysis and Prevention 41(3), 359–364 (2009)
Golob, T.F., Recker, W.W.: A Method for Relating Type of Crash to Traffic Flow Characteristics on Urban Freeways. Transportation Research Part A: Policy and Practice 38(1), 53–80 (2004)
Shekhar, S., et al.: Data Mining and Visualization of Twin-cities Traffic Data, in Technical Report (TR 01-015), University of Minnesota (2001)
Skyving, M., Berg, H.Y., Laflamme, L.: A Pattern Analysis of Traffic Crashes Fatal to Older Drivers. Accident Analysis and Prevention 41(2), 253–258 (2009)
Steinbach, M., et al.: Discovery of climate indices using clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC (2003)
Wang, M., Wang, A.P., Li, A.B.: Mining spatial-temporal clusters from geo-databases. In: 2nd International Conference on Advanced Data Mining and Applications, Xian, PEOPLES R CHINA (2006)
Lin, F., et al.: Discovery of teleconnections using data mining technologies in global climate datasets. Data Science Journal 6(suppl.), S749–S755 (2007)
Birant, D., Kut, A.: ST-DBSCAN: An algorithm for clustering spatial-temp oral data. Data & Knowledge Engineering 60(1), 208–221 (2007)
Chang, W., Zeng, D., Chen, H.C.: Prospective spatio-temporal data analysis for security informatics. In: 8th IEEE International Conference on Intelligent Transportation Systems (ITSC 2005). IEEE, Vienna (2005)
Zhang, D., Chen, S.: Kernel-based fuzzy and probabilistic c-means clustering. In: The International Conference on Artificial Neural Networks, Istanbul, Turkey (2003)
Zhang, D., Chen, S.: Clustering incomplete data using kernel-based fuzzy c-means algorithm. Neural Processing Letters 18, 155–162 (2003)
Hinneburg, A., Keim, D.A.: An effecient approach to clustering in large multimedia databases with noise. In: The Fourth International Conference on Knowledge Discovery and data Mining (KDD 1998). AAAI Press, Menlo Park (1998)
Hinneburg, A., Keim, D.A.: A general approach to clustering in large database with noise. Knowledge and Information Systems 5(4), 387–415 (2003)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley (2006)
Ester, M., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding on the 2nd International Conference on Knowledge Discovery and Data Mining, Portland (1996)
Musdholifah, A., Hashim, S.Z.M.: Triangular kernel nearest neighbor-based clustering for pattern extraction in spatio-temporal database. In: The 10th International Conference on Intelligent System Design and Applications, Egypt (2010)
Classification data, UCI Repository of Machine Learning Database
Wasito, I., Hashim, S.Z.M., Sukmaningrum, S.: Iterative Local Gaussian Clustering for Expressed Genes Identification Linked to Malignancy of Human Colorectal Carcinoma. Bioinformation 2(5), 175–181 (2007)
Tran, T.N., Wehrens, R., Buydens, L.M.C.: KNN-kernel density-based clustering for high-dimensional multivariate data. Computational Statistics & Data Analysis 51, 513–525 (2006)
Clustering datasets, Speech and Image Processing Unit, School of Computing, University of Eastern Finland (2012)
Fu, L., Medico, E.: A novel fuzzy clustering method for the analysis of DNA microarray data. BMC bioinformatics 8(1), 3 (2007)
Jain, A.K., Law, M.H.C.: Data Clustering: A User’s Dilemma. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 1–10. Springer, Heidelberg (2005)
Chang, H., Yeung, D.-Y.: Robust path-based spectral clustering. Pattern Recognition 41(1), 191–203 (2008)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Transactions on Knowledge Discovery from Data 1(1), 1–30 (2007)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transaction on Computers 100(1), 68–86 (1971)
Veenman, C.J.: A maximum variance cluster algorithm. IEE Transaction on Pattern Analysis and Machine Intelligence 24(9), 1273–1280 (2002)
Van Rijsbergen, C.J.: Information retrieval. Butterworths, London (1979)
Gullo, F., Ponti, G., Tagarelli, A.: Clustering Uncertain Data Via K-Medoids. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 229–242. Springer, Heidelberg (2008)
Martinez, W.L., Martinez, A.R.: Exploratory data analysis with MATLAB. Chapman & Hall/CRC (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Musdholifah, A., Hashim, S.Z.M. (2013). Triangular Kernel Nearest-Neighbor-Based Clustering Algorithm for Discovering True Clusters. In: Washio, T., Luo, J. (eds) Emerging Trends in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36778-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-36778-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36777-9
Online ISBN: 978-3-642-36778-6
eBook Packages: Computer ScienceComputer Science (R0)