Advertisement

GeoInformatica

, Volume 7, Issue 3, pp 229–253 | Cite as

ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata

  • Diansheng Guo
  • Donna J. Peuquet
  • Mark Gahegan
Article

Abstract

The unprecedented large size and high dimensionality of existing geographic datasets make the complex patterns that potentially lurk in the data hard to find. Clustering is one of the most important techniques for geographic knowledge discovery. However, existing clustering methods have two severe drawbacks for this purpose. First, spatial clustering methods focus on the specific characteristics of distributions in 2- or 3-D space, while general-purpose high-dimensional clustering methods have limited power in recognizing spatial patterns that involve neighbors. Second, clustering methods in general are not geared toward allowing the human-computer interaction needed to effectively tease-out complex patterns. In the current paper, an approach is proposed to open up the “black box” of the clustering process for easy understanding, steering, focusing and interpretation, and thus to support an effective exploration of large and high dimensional geographic data. The proposed approach involves building a hierarchical spatial cluster structure within the high-dimensional feature space, and using this combined space for discovering multi-dimensional (combined spatial and non-spatial) patterns with efficient computational clustering methods and highly interactive visualization techniques. More specifically, this includes the integration of: (1) a hierarchical spatial clustering method to generate a 1-D spatial cluster ordering that preserves the hierarchical cluster structure, and (2) a density- and grid-based technique to effectively support the interactive identification of interesting subspaces and subsequent searching for clusters in each subspace. The implementation of the proposed approach is in a fully open and interactive manner supported by various visualization techniques.

geographic knowledge discovery spatial clustering and ordering hierarchical subspace clustering visualization and interaction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. Aggarwal and P. Yu. “Finding generalized projected clusters in high dimensional spaces,” ACM SIGMOD International Conference on Management of Data, 2000.Google Scholar
  2. 2.
    C.C. Aggarwal. “Re-designing distance functions and distance-based applications for high dimensional data,” SIGMOD Rec., Vol. 30:13–18, 2001.Google Scholar
  3. 3.
    C.C. Aggarwal, A. Hinneburg, and D.A. Keim, “On the surprising behavior of distance metrics in high dimensional space,” in Database Theory—ICDT 2001, Vol. 1973, Springer-Verlag: Berlin, 2001.Google Scholar
  4. 4.
    R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. “Automatic subspace clustering of high dimensional data for data mining applications,” ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, 1998.Google Scholar
  5. 5.
    M. Ankerst, M.M. Breunig, H.-P. Kriegel, and J. Sander. “OPTICS: Ordering Points To Identify the Clustering Structure,” ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, 1999.Google Scholar
  6. 6.
    M. Ankerst, M. Ester, and H.-P. Kriegel. “Towards an effective cooperation of the user and the computer for classification,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000.Google Scholar
  7. 7.
    S. Baase and A.V. Gelder. Computer Algorithms. Addison-Wesley, 2000.Google Scholar
  8. 8.
    A. Bookstein, V.A. Kulyukin, and T. Raita. “Generalized Hamming Distance,” Information Retrieval, Vol. 5:353–375, 2002.Google Scholar
  9. 9.
    P. Bradley, U. Fayyad, and C. Reina. “Scaling clustering algorithms to large databases,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York City, 1998.Google Scholar
  10. 10.
    C. Cheng, A. Fu, and Y. Zhang. “Entropy-based subspace clustering for mining numerical data,” ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 1999.Google Scholar
  11. 11.
    R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification. John Wiley & Sons, New York, 2000.Google Scholar
  12. 12.
    M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise,” The 2nd International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 1996.Google Scholar
  13. 13.
    V. Estivill-Castro and I. Lee. “Amoeba: Hierarchical clustering based on spatial proximity using Delaunaty diagram,” 9th International Symposium on Spatial Data Handling, Beijing, China, 2000.Google Scholar
  14. 14.
    U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. “From data mining to knowledge discovery-An review,” in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusay (Eds.), Advances in Knowledge Discovery, AAAI Press/The MIT Press: Cambridge, MA, 1996.Google Scholar
  15. 15.
    C. Fraley. “Algorithms for model-based gaussian hierarchical clustering,” SIAM Journal on Scientific Computing, Vol. 20:270–281, 1998.Google Scholar
  16. 16.
    M. Gahegan. “On the application of inductive machine learning tools to geographical analysis,” Geographical Analysis, Vol. 32:113–139, 2000.Google Scholar
  17. 17.
    A.D. Gordon. “A review of hierarchical classification,” Journal of the Royal Statistical Society. Series A (General), Vol. 150:119–137, 1987.Google Scholar
  18. 18.
    A.D. Gordon, “Hierarchical classification,” in P. Arabie, L.J. Hubert, and G.D. Soete (Eds.), Clustering and Classification, World Scientific Publ.: River Edge, NJ, 1996.Google Scholar
  19. 19.
    L. Guibas and J. Stolfi. “Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams,” ACT TOG, Vol. 4: 1985.Google Scholar
  20. 20.
    D. Harel and Y. Koren. “Clustering spatial data using random walks,” Proceedings of the seventh conference on Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, 2001.Google Scholar
  21. 21.
    A. Hinneburg and D.A. Keim. “Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering,” Proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999.Google Scholar
  22. 22.
    A. Inselberg. “The plane with parallel coordinates,” The Visual Computer, Vol. 1:69–97, 1985.Google Scholar
  23. 23.
    A.K. Jain and R.C. Dubes, Algorithms for clustering data. Prentice Hall: Englewood Cliffs, NJ, 1988.Google Scholar
  24. 24.
    A.K. Jain, M.N. Murty, and P.J. Flynn. “Data clustering: A review,” ACM Computing Surveys (CSUR), Vol. 31:264–323, 1999.Google Scholar
  25. 25.
    I.-S. Kang, T.-W. Kim, and K.-J. Li. “A spatial data mining method by Delaunay triangulation,” The 5th international workshop on Advances in geographic information systems, LasVegas, Nevada, 1997.Google Scholar
  26. 26.
    H.J. Miller and J. Han. “Geographic data mining and knowledge discovery: an overview,” in H.J. Miller and J. Han (Eds.), Geographic Data Mining and Knowledge Discovery, Taylor & Francis: London and New York, 2001.Google Scholar
  27. 27.
    R. Ng and J. Han. “Efficient and effective clustering methods for spatial data mining,” Proc. 20th International Conference on Very Large Databases, Santiago, Chile, 1994.Google Scholar
  28. 28.
    S. Openshaw. “Developing appropriate spatial analysis methods for GIS,” in D.J. Maguire (Ed.), Geographical Information Systems, Vol. 1: Principles, Longman/Wiley, 1991.Google Scholar
  29. 29.
    S. Openshaw, M. Charlton, C. Wymer, and A. Craft. “A Mark 1 geographical analysis machine for the automated analysis of point data sets,” International Journal of Geographical Information Science, Vol. 1:335–358, 1987.Google Scholar
  30. 30.
    D.J. Peuquet. Representations of Space and Time. New York: Guilford Press, 2002.Google Scholar
  31. 31.
    C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. “A Monte Carlo algorithm for fast projective clustering,” ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, 2002.Google Scholar
  32. 32.
    E. Schikuta. “Grid clustering: An efficient hierarchical clustering method for very large data sets,” 13th Conf. on Pattern Recognition, Vol. 2, 1996.Google Scholar
  33. 33.
    T.A. Slocum. Thematic Cartography and Visualization. Upper Saddle River, N.J.: Prentice Hall, 1999.Google Scholar
  34. 34.
    A.K.H. Tung, J. Hou, and J. Han. “Spatial clustering in the presence of obstacles,” The 17th International Conference on Data Engineering (ICDE'01), 2001.Google Scholar
  35. 35.
    S. Vaithyanathan and B. Dom. “Model-based hierarchical clustering,” The Sixteenth Conference on Uncertainty in Artificial Intelligence, Stanford, CA, 2000.Google Scholar
  36. 36.
    D. Vandev and Y.G. Tsvetanova. “Perfect chains and single linkage clustering algorithm,” Statistical Data Analysis, Proceedings SDA-95, 1995.Google Scholar
  37. 37.
    W. Wang, J. Yang, and R. Muntz. “STING: A statistical information grid approach to spatial data mining,” 23rd Int. Conf on Very Large Data Bases, Athens, Greece, 1997.Google Scholar
  38. 38.
    C. Zhang and Y. Murayama. “Testing local spatial autocorrelation using k-order neighbors,” International Journal of Geographical Information Science, Vol. 14:681–692, 2000.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Diansheng Guo
    • 1
  • Donna J. Peuquet
    • 1
  • Mark Gahegan
    • 1
  1. 1.Department of Geography and GeoVISTA CenterPennsylvania State UniversityUniversity Park

Personalised recommendations