The European Information Society pp 135-157 | Cite as
A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means
The overarching objective of this paper is to introduce a novel Fast, Efficient, and Scalable k-means k-means (FES-k-means*) algorithm. This algorithm is designed to increase the overall performance of the standard k-means clustering technique. The FES-k-means* algorithm uses a hybrid approach that comprises the k-d tree data structure, the nearest neighbor query, the standard k-means algorithm, and Mashor’s adaptation rate. The algorithm is tested using two real datasets and two synthetic datasets and is employed twice on all four datasets. The first trial consisted of previously MIL-SOM* trained data, and the second was on raw, untrained data. The approach presented with this method enables unfounded knowledge discovery, otherwise unclaimed by conventional clustering methods. When used in conjunction with the MIL-SOM* training technique, theFES-k-means* algorithm reduces the computation time and produces quality clusters. In particular, the robust FES-k-means* method opens doors to (1) faster cluster production than conventional clustering methods, (2) scalability allowing application in other platforms, and its ability to handle small and large datasets, compact or scattered, and (3) efficient geospatial data analysis of large datasets. All of the above makes FES-k-means* live up to defending its well-deserved name—Fast, Efficient, and Scalable k-means (FES-k-means*). The findings of this study are vital to the relatively new and expanding subfield of geospatial data management.
Keywords
spatial data data mining clustering techniques large datasets kmeansPreview
Unable to display preview. Download preview PDF.
References
- Alsabti, K., S. Ranka, V. Singh. (1998). An efficient k-means clustering algorithm. Proceedings in IPPS: 11th International Parallel Processing Symposium Workshop on High Performance Data Mining, IEEE, Computer Society Press.Google Scholar
- Bacć ´o, F., Lobo, V., and M. Painho. (2004) Geo-Self-Organizing Map (Geo-SOM) for Building and Exploring Homogeneous Regions. In Egenhofer, M.J., Freskes, C. and Miller, H.J., (eds.) Geographical Information Science. Proceedings of Third International Conference, GIScience. Adelphi, MD, USA. October 20–23, 2004. Springer-Verlay Berlin Heidelberg.Google Scholar
- Ball, G. H., and D. J. Hall. (1967) A clustering technique for summarizing multivariate data. Behavioral Sciences. 12(2):153–55.CrossRefGoogle Scholar
- Birant, D. and A. Kut. (2007) ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering. Elsevier. 601(1): 208–221.Google Scholar
- Chinrungrueng C., and C. H. Sequin. (1995) Optimal adaptive k-means algorithm with dynamic adjustment of learning rate. IEEE Transactions on neural networks. 6(1): 157–69.CrossRefGoogle Scholar
- Ester, M., H.P. Kriegel, J. Sander, and X. Xu. (1996). A density based algorithm for discovering clusters in large geospatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon.Google Scholar
- Guha, S., R. Rastogi, and K. Shim. (1998) Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Seattle, WA, June). ACM Press, New York, NY.Google Scholar
- Jain, A.K., M.N. Murty, and P.J. Flynn. (1999). Data clustering: a review. ACM Computing Surveys. 31(3): 264–323.CrossRefGoogle Scholar
- Kanungo, T., D.M. Mount, N. S. Netanyahu, et al. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on pattern analysis and machine intelligence. 24(7): 881–92.CrossRefGoogle Scholar
- Karypis, G., E.H Han, V. Kumar. (1999) Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer Special Issue on Data Analysis and Mining. 32(8): 68–75.Google Scholar
- Kim, K. and E. Yamashita (2007)Using a k-means clustering algorithm to examine patterns of pedestrian involved crashes in Honolulu, Hawaii. Journal of Advanced Transportation. 41(1): 60–89.CrossRefGoogle Scholar
- Kohonen, T. (1990) The self-organizing map. Proceedings of the IEEE. 78(9): 1464–80.pGoogle Scholar
- Krishna, K. and M. Narasimha. (1999) Genetic k-means algorithm. IEEE Transactions on Systems, Man and Cybernetics, Part B. 29(3): 433–439.CrossRefGoogle Scholar
- Kruskal, J.B. (1964) Non-metric multidimensional scaling: a numerical method. Psychometrika 29: 115–29.CrossRefGoogle Scholar
- Likas A., N. Vlassis, and J.J. Verbeek. (2003) The global k-means clustering algorithm. Pattern Recognition. 36: 451–61.CrossRefGoogle Scholar
- MacQueen, J.B. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory, University of California. 282–97.Google Scholar
- Mashor, M.Y. (1998) Improving the performance of k-means clustering algorithm to position the centers of RBF network. International Journal of the Computer, The Internet and Management. 6(2).Google Scholar
- Ng, R. and J. Han. (1994) Efficient and Effective Clustering methods for spatial data mining. In Proceedings of the 20th International Conference of ery Large Data Bases (VLDB), September 1994, Santiago, Chile. 144–55.Google Scholar
- Oyana TJ, Achenie LEK, Cuadros-Vargas E, Rivers PA, and Scott KE. (2006) A Mathematical Improvement of the Self-Organizing Map Algorithm. Chapter 8: ICT and Mathematical Modelling (pp 522–531). In Mwakali J.A. and Taban-Wani G. (eds.): Advance in Engineering and Technology, London: Elsevier Ltd. 847.Google Scholar
- Pham D.T., S.S. Dimov, C.D. Nguyen. (2004) A two-phase k-means algorithm for large datasets. Proc. Instn Mech. Engrs, Part C: Journal Mechanical Engineering Science. 218(C). 1269–73.Google Scholar
- Roussopoulos N., S. Kelley, F. Vincent. (1995) Nearest Neighbor Queries. In Proceedings of Special Interest Group on Management of data (SIGMOD), San Jose, CA, USA. 71–79.Google Scholar
- Sammon, J. W. (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on Computers. 18(5): 401–09.CrossRefGoogle Scholar
- Song, M. and S. Rajasekaran. (2005) Finding frequent item sets by transaction mapping. Symposium on Applied Computing. 488–492.Google Scholar
- Steinley, D. (2006) K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology. 59: 1–34.CrossRefGoogle Scholar
- Su, M. and H. T. Chang. (2000) Fast self-organizing feature map algorithm. IEEE transactions on neural network. 11(3): 721–33.CrossRefGoogle Scholar
- Theodoridis, S. and K. Koutroumbas. (2003) Pattern Recognition (2ed). Elsevier Science (USA). China machine press. 431–535.Google Scholar
- Tsai, C.F., C.W. Tsai, H.C. Wu, and T. Yang.(2004) ACODF: a novel data clustering approach for data mining in large databases. Journal of Systems and Software. 73(1): 133–145.CrossRefGoogle Scholar
- Vesanto, J. and E. Alhoniemi. (2000) Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks. 11(3): 586–600.CrossRefGoogle Scholar
- Vrahatis, M. N., B. Boutsinas, P. Alevizos, and G. Pavlides. (2002) The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity. 18. 375–91.CrossRefGoogle Scholar
- Wang, L., L. Bo, L. Jiao. (2006) A modified k-means clustering with a density-sensitive distance metric. Lecture notes in Computer Science. Springer Berlin , Heidelber. 4062: 544–551.Google Scholar
- Xu, R and D. Wunsch II. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks. 16(3): 645–78.CrossRefGoogle Scholar
- Zhang T., R. Ramakrishnan, M. Livny. (1996) BIRCH: An efficient data clustering method for very large databases. Proc. ACM SIGMOD Conf. Management of Data. 103–14.Google Scholar