A Geospatial Implementation of a Novel Delineation Clustering Algorithm Employing the K-means

  • Tonny J. Oyana
  • Kara E. Scott
Part of the Lecture Notes in Geoinformation and Cartography book series (LNGC)

The overarching objective of this paper is to introduce a novel Fast, Efficient, and Scalable k-means k-means (FES-k-means*) algorithm. This algorithm is designed to increase the overall performance of the standard k-means clustering technique. The FES-k-means* algorithm uses a hybrid approach that comprises the k-d tree data structure, the nearest neighbor query, the standard k-means algorithm, and Mashor’s adaptation rate. The algorithm is tested using two real datasets and two synthetic datasets and is employed twice on all four datasets. The first trial consisted of previously MIL-SOM* trained data, and the second was on raw, untrained data. The approach presented with this method enables unfounded knowledge discovery, otherwise unclaimed by conventional clustering methods. When used in conjunction with the MIL-SOM* training technique, theFES-k-means* algorithm reduces the computation time and produces quality clusters. In particular, the robust FES-k-means* method opens doors to (1) faster cluster production than conventional clustering methods, (2) scalability allowing application in other platforms, and its ability to handle small and large datasets, compact or scattered, and (3) efficient geospatial data analysis of large datasets. All of the above makes FES-k-means* live up to defending its well-deserved name—Fast, Efficient, and Scalable k-means (FES-k-means*). The findings of this study are vital to the relatively new and expanding subfield of geospatial data management.

Keywords

spatial data data mining clustering techniques large datasets kmeans 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alsabti, K., S. Ranka, V. Singh. (1998). An efficient k-means clustering algorithm. Proceedings in IPPS: 11th International Parallel Processing Symposium Workshop on High Performance Data Mining, IEEE, Computer Society Press.Google Scholar
  2. Bacć ´o, F., Lobo, V., and M. Painho. (2004) Geo-Self-Organizing Map (Geo-SOM) for Building and Exploring Homogeneous Regions. In Egenhofer, M.J., Freskes, C. and Miller, H.J., (eds.) Geographical Information Science. Proceedings of Third International Conference, GIScience. Adelphi, MD, USA. October 20–23, 2004. Springer-Verlay Berlin Heidelberg.Google Scholar
  3. Ball, G. H., and D. J. Hall. (1967) A clustering technique for summarizing multivariate data. Behavioral Sciences. 12(2):153–55.CrossRefGoogle Scholar
  4. Birant, D. and A. Kut. (2007) ST-DBSCAN: An algorithm for clustering spatial-temporal data. Data & Knowledge Engineering. Elsevier. 601(1): 208–221.Google Scholar
  5. Chinrungrueng C., and C. H. Sequin. (1995) Optimal adaptive k-means algorithm with dynamic adjustment of learning rate. IEEE Transactions on neural networks. 6(1): 157–69.CrossRefGoogle Scholar
  6. Ester, M., H.P. Kriegel, J. Sander, and X. Xu. (1996). A density based algorithm for discovering clusters in large geospatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon.Google Scholar
  7. Guha, S., R. Rastogi, and K. Shim. (1998) Cure: An efficient clustering algorithm for large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Seattle, WA, June). ACM Press, New York, NY.Google Scholar
  8. Jain, A.K., M.N. Murty, and P.J. Flynn. (1999). Data clustering: a review. ACM Computing Surveys. 31(3): 264–323.CrossRefGoogle Scholar
  9. Kanungo, T., D.M. Mount, N. S. Netanyahu, et al. (2002). An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on pattern analysis and machine intelligence. 24(7): 881–92.CrossRefGoogle Scholar
  10. Karypis, G., E.H Han, V. Kumar. (1999) Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer Special Issue on Data Analysis and Mining. 32(8): 68–75.Google Scholar
  11. Kim, K. and E. Yamashita (2007)Using a k-means clustering algorithm to examine patterns of pedestrian involved crashes in Honolulu, Hawaii. Journal of Advanced Transportation. 41(1): 60–89.CrossRefGoogle Scholar
  12. Kohonen, T. (1990) The self-organizing map. Proceedings of the IEEE. 78(9): 1464–80.pGoogle Scholar
  13. Krishna, K. and M. Narasimha. (1999) Genetic k-means algorithm. IEEE Transactions on Systems, Man and Cybernetics, Part B. 29(3): 433–439.CrossRefGoogle Scholar
  14. Kruskal, J.B. (1964) Non-metric multidimensional scaling: a numerical method. Psychometrika 29: 115–29.CrossRefGoogle Scholar
  15. Likas A., N. Vlassis, and J.J. Verbeek. (2003) The global k-means clustering algorithm. Pattern Recognition. 36: 451–61.CrossRefGoogle Scholar
  16. MacQueen, J.B. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory, University of California. 282–97.Google Scholar
  17. Mashor, M.Y. (1998) Improving the performance of k-means clustering algorithm to position the centers of RBF network. International Journal of the Computer, The Internet and Management. 6(2).Google Scholar
  18. Ng, R. and J. Han. (1994) Efficient and Effective Clustering methods for spatial data mining. In Proceedings of the 20th International Conference of ery Large Data Bases (VLDB), September 1994, Santiago, Chile. 144–55.Google Scholar
  19. Oyana TJ, Achenie LEK, Cuadros-Vargas E, Rivers PA, and Scott KE. (2006) A Mathematical Improvement of the Self-Organizing Map Algorithm. Chapter 8: ICT and Mathematical Modelling (pp 522–531). In Mwakali J.A. and Taban-Wani G. (eds.): Advance in Engineering and Technology, London: Elsevier Ltd. 847.Google Scholar
  20. Pham D.T., S.S. Dimov, C.D. Nguyen. (2004) A two-phase k-means algorithm for large datasets. Proc. Instn Mech. Engrs, Part C: Journal Mechanical Engineering Science. 218(C). 1269–73.Google Scholar
  21. Roussopoulos N., S. Kelley, F. Vincent. (1995) Nearest Neighbor Queries. In Proceedings of Special Interest Group on Management of data (SIGMOD), San Jose, CA, USA. 71–79.Google Scholar
  22. Sammon, J. W. (1969) A nonlinear mapping for data structure analysis. IEEE Transactions on Computers. 18(5): 401–09.CrossRefGoogle Scholar
  23. Song, M. and S. Rajasekaran. (2005) Finding frequent item sets by transaction mapping. Symposium on Applied Computing. 488–492.Google Scholar
  24. Steinley, D. (2006) K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology. 59: 1–34.CrossRefGoogle Scholar
  25. Su, M. and H. T. Chang. (2000) Fast self-organizing feature map algorithm. IEEE transactions on neural network. 11(3): 721–33.CrossRefGoogle Scholar
  26. Theodoridis, S. and K. Koutroumbas. (2003) Pattern Recognition (2ed). Elsevier Science (USA). China machine press. 431–535.Google Scholar
  27. Tsai, C.F., C.W. Tsai, H.C. Wu, and T. Yang.(2004) ACODF: a novel data clustering approach for data mining in large databases. Journal of Systems and Software. 73(1): 133–145.CrossRefGoogle Scholar
  28. Vesanto, J. and E. Alhoniemi. (2000) Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks. 11(3): 586–600.CrossRefGoogle Scholar
  29. Vrahatis, M. N., B. Boutsinas, P. Alevizos, and G. Pavlides. (2002) The new k-windows algorithm for improving the k-means clustering algorithm. Journal of Complexity. 18. 375–91.CrossRefGoogle Scholar
  30. Wang, L., L. Bo, L. Jiao. (2006) A modified k-means clustering with a density-sensitive distance metric. Lecture notes in Computer Science. Springer Berlin , Heidelber. 4062: 544–551.Google Scholar
  31. Xu, R and D. Wunsch II. (2005) Survey of Clustering Algorithms. IEEE Transactions on Neural Networks. 16(3): 645–78.CrossRefGoogle Scholar
  32. Zhang T., R. Ramakrishnan, M. Livny. (1996) BIRCH: An efficient data clustering method for very large databases. Proc. ACM SIGMOD Conf. Management of Data. 103–14.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Tonny J. Oyana
    • 1
  • Kara E. Scott
    • 1
  1. 1.Department of Geography and Environmental ResourcesSouthern Illinois UniversityMC 4514, CarbondaleUSA

Personalised recommendations