Advertisement

Journal of Intelligent Information Systems

, Volume 27, Issue 3, pp 243–266 | Cite as

Spatial ordering and encoding for geographic data mining and visualization

  • Diansheng GuoEmail author
  • Mark Gahegan
Article

Abstract

Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining and visualization to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a one-dimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is also amenable for processing together with aspatial variables by any existing (non-spatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two space-filling curves, six hierarchical clustering based methods, and a one-dimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the complete-linkage clustering consistently gives the best overall performance, surpassing well-known space-filling curves in preserving spatial locality. Moreover, clustering-based methods can encode not only simple geographic locations, e.g., x and y coordinates, but also a wide range of other spatial relations, e.g., network distances or arbitrarily weighted graphs.

Keywords

Spatial data mining Spatio-temporal visualization Space-filling curve Hierarchical clustering Linear ordering Multidimensional scaling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andrienko, G., & Andrienko, N. (1999). Interactive maps for visual data exploration. International Journal of Geographical Information System, 13(4), 355–374.CrossRefGoogle Scholar
  2. Andrienko, N., Andrienko, G., & Gatalsky, P. (2003). Exploratory spatio-temporal visualization: An analytical review. Journal of Visual Languages & Computing, 14(6), 503–541.CrossRefGoogle Scholar
  3. Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data (pp. 49–60). Philadelphia, PA, USA: ACM.Google Scholar
  4. Baase, S., & Gelder, A. V. (2000). Computer algorithms. Addison-Wesley.Google Scholar
  5. Bar-Joseph, Z., Demaine, E. D., Gifford, D. K., Hamel, A. M., Jaakkola, T. S., & Srebro, N. (2003). K-ary clustering with optimal leaf ordering for gene expression data. Bioinformatics, 19(9), 1070–1078.CrossRefGoogle Scholar
  6. Bar-Joseph, Z., Gifford, D. K., & Jaakkola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(Suppl. 1), S22–S29.Google Scholar
  7. Breinholt, G., & Schierz, C. (1998). Algorithm 781: Generating Hilbert’s space-filling curve by recursion. ACM Transactions on Mathematical Software, 24(2), 184–189.zbMATHMathSciNetCrossRefGoogle Scholar
  8. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. New York: Wiley.zbMATHGoogle Scholar
  9. Ertoz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. The third SIAM International Conference on Data Mining (SDM ’03). San Francisco, California, USA.Google Scholar
  10. Ester, M., Kriegel, H. P., & Sander, J. (1997). Spatial data mining: A database approach. Advances in spatial databases. Berlin 33, Springer Berlin Heidelberg New York. 1262, 47–66.Google Scholar
  11. Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. The second international conference on knowledge discovery and data mining (pp. 226–231). Portland, Oregon, USA: AAAI.Google Scholar
  12. Fredrikson, A., North, C., Plaisant, C., & Shneiderman, B. (1999). Temporal, geographical and categorical aggregations viewed through coordinated displays: A case study with highway incident data. Workshop on New Paradigms in Information Visualization and Manipulation (in conjunction with ACM CIKM’99), Kansas City, Missouri, November 6, ACM New York, pp. 26–34.Google Scholar
  13. Friendly, M., & Kwan, E. (2003). Effect ordering for data displays. Computational Statistics & Data Analysis, 43(4), 509–539.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Gahegan, M. (2000). The case for inductive and visual techniques in the analysis of spatial data. Journal of Geographical Systems, 2(1), 77–83.CrossRefGoogle Scholar
  15. Goodchild, M. F., & Grandfield, A. W. (1983). Optimizing raster storage: An examination of four alternatives. Proceedings, Auto-Carto, 6, 400–407.Google Scholar
  16. Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society. Series A (General), 150(2), 119–137.zbMATHMathSciNetCrossRefGoogle Scholar
  17. Gordon, A. D. (1996). Hierarchical classification. In P. Arabie, L. J. Hubert, & G. D. Soete (Eds.), Clustering and classification (pp. 65–122). River Edge, New Jersey, USA: World Scientific.Google Scholar
  18. Gotsman, C., & Lindenbaum, M. (1996). On the metric properties of discrete space-filling curves. IEEE Transactions on Image Processing, 5(5), 794–797.CrossRefGoogle Scholar
  19. Guo, D., Gahegan, M., MacEachren, A. M., & Zhou, B. (2005). Multivariate analysis and geovisualization with an integrated geographic knowledge discovery approach. Cartography and Geographic Information Science, 32(2), 113–132.CrossRefGoogle Scholar
  20. Guo, D., Peuquet, D., & Gahegan, M. (2003). ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica, 7(3), 229–253.zbMATHCrossRefGoogle Scholar
  21. Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. Morgan Kaufmann.Google Scholar
  22. Han, J., Kamber, M., & Tung, A. K. H. (2001). Spatial clustering methods in data mining: A survey. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 33–50). London: Taylor & Francis.Google Scholar
  23. Han, J., Koperski, K., & Stefanovic, N. (1997). GeoMiner: A system prototype for spatial data mining. ACM SIGMOD International Conference on Management of Data, Tucson, Arizona, USA, pp. 553–556.Google Scholar
  24. Hilbert, D. (1891). Uber die stetige Abbildung einer Linie auf Flachenstuck. Mathematische Annalen, 38, 459–460.zbMATHMathSciNetCrossRefGoogle Scholar
  25. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Englewood Cliffs, NJ: Prentice Hall.zbMATHGoogle Scholar
  26. Jarvis, R. A., & Patrick, E. A. (1973). Clustering using a similarity measure based on shared near neighbours. IEEE Transactions on Computers, 22(11), 1025–1034.Google Scholar
  27. Keim, D. A., Panse, C., Sips, M., & North, S. C. (2004). Visual data mining in large geospatial point sets. IEEE Computer Graphics and Applications, 24(5), 36–44.CrossRefGoogle Scholar
  28. Koperski, K., & Han, J. W. (1995). Discovery of spatial association rules in geographic information databases. Advances in Spatial Databases. Berlin 33, Springer Berlin Heidelberg New York. 951, 47–66.Google Scholar
  29. Koperski, K., Han, J., & Stefanovic, N. (1998). An efficient two-step method for classification of spatial data. 1998 International Symposium on Spatial Data Handling SDH’98, Vancouver, British Columbia, Canada, pp. 45–54.Google Scholar
  30. Lamarque, C. H., & Robert, F. (1996). Image analysis using space-filling curves and 1D wavelet bases. Pattern Recognition, 29(8), 1309–1322.CrossRefGoogle Scholar
  31. Lawder, J. K., & King, P. J. H. (2001). Querying multi-dimensional data indexed using the Hilbert space-filling curve. SIGMOD Record, 30(1), 19–24.CrossRefGoogle Scholar
  32. Mark, D. M. (1990). Neighbor-based properties of some ordering of two-dimensional space. Geographical Analysis, 22(2), 145–157.CrossRefGoogle Scholar
  33. Miller, H. J., & Han, J. (2001). Geographic data mining and knowledge discovery: An overview. In H. J. Miller, & J. Han (Eds.), Geographic data mining and knowledge discovery (pp. 3–32). London: Taylor & Francis.Google Scholar
  34. Mokbel, M. F., & Aref, W. G. (2003). Analysis of multi-dimensional space-filling curves. GeoInformatica, 7(3), 179–209.CrossRefGoogle Scholar
  35. Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the Hilbert space-filling curve. IEEE Transaction on Knowledge and Data Engineering, 13(1), 1–18.zbMATHGoogle Scholar
  36. Morton, G. (1966). A computer-oriented geodetic data base and a new technique for file sequencing. IBM Canada: Unpublished report.Google Scholar
  37. Murray, A. T., & Shyy, T. K. (2000). Integrating attribute and space characteristics in choropleth display and spatial data mining. International Journal of Geographical Information Science, 14(7), 649–667.CrossRefGoogle Scholar
  38. Ng, R., & Han, J. (1994). Efficient and effective clustering methods for spatial data mining. Proc. 20th international conference on very large databases (pp. 144–155). Santiago, Chile.Google Scholar
  39. Openshaw, S. (1994). Two exploratory space–time-attribute pattern analysers relevant to GIS. In S. Fotheringham (Ed.), Spatial analysis and GIS. Technical issues in geographic information systems (pp. 83–104). Taylor & Francis.Google Scholar
  40. Reinelt, G. (1994). The travelling salesman. Computational solutions for TSP applications. Berlin Heidelberg New York: Springer.Google Scholar
  41. Sammon, J. W. (1969). A non-linear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.Google Scholar
  42. Shekhar, S., Zhang, P., Huang, Y., & Vatsavai, R. (2004). Trend in spatail data mining. In H. Kargupta, A. Joshi, K. Sivakumar, & Y. Yesha (Eds.), Data mining: Next generation challenges and future directions (pp. 357–381). AAAI/MIT Press.Google Scholar
  43. Skubalska-Rafajlowicz, E. (2001). Data compression for pattern recognition based on space-filling curve pseudo-inverse mapping. Nonlinear Analysis—Theory Methods & Applications, 47(1), 315–326.zbMATHMathSciNetCrossRefGoogle Scholar
  44. Steenberghen, T., Dufays, T., Thomas, I., & Flahaut, B. (2004). Intra-urban location and clustering of road accidents using GIS: A Belgian example. International Journal of Geographical Information Science, 18(2), 169–181.CrossRefGoogle Scholar
  45. Wang, W., Yang, J., & Muntz, R. (1997). STING : A statistical information grid approach to spatial data mining. 23rd Int. conf on very large data bases (pp. 186–195). Athens, Greece: Morgan Kaufmann.Google Scholar
  46. Wirth, N. (1976). Algorithms + Data structures = Programs. Prentice Hall.Google Scholar
  47. Wong, P. C., Wong, K. K., Foote, H., & Thomas, J. (2003). Global visualization and alignments of whole bacterial genomes. IEEE Transactions on Visualization and Computer Graphics, 9(3), 361–377.CrossRefGoogle Scholar
  48. Yamada, I., & Thill, J.-C. (2004). Comparison of planar and network k-functions in traffic accident analysis. Journal of Transport Geography, 12, 149–158.CrossRefGoogle Scholar
  49. Young, F. W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  1. 1.Department of GeographyUniversity of South CarolinaColumbiaUSA
  2. 2.GeoVISTA Center, Department of GeographyPennsylvania State UniversityUniversity ParkUSA

Personalised recommendations