GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

Jiamthapthaksin, Rachsuda; Eick, Christoph F.; Lee, Seungchan

doi:10.1007/s10115-010-0355-3

GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

Regular Paper
Published: 12 November 2010

Volume 29, pages 597–628, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Rachsuda Jiamthapthaksin^1,2,
Christoph F. Eick² &
Seungchan Lee²

170 Accesses
8 Citations
Explore all metrics

Abstract

Major challenges of clustering geo-referenced data include identifying arbitrarily shaped clusters, properly utilizing spatial information, coping with diverse extrinsic characteristics of clusters and supporting region discovery tasks. The goal of region discovery is to identify interesting regions in geo-referenced datasets based on a domain expert’s notion of interestingness. Almost all agglomerative clustering algorithms only focus on the first challenge. The goal of the proposed work is to develop agglomerative clustering frameworks that deal with all four challenges. In particular, we propose a generic agglomerative clustering framework for geo-referenced datasets (GAC-GEO) generalizing agglomerative clustering by allowing for three plug-in components. GAC-GEO agglomerates neighboring clusters maximizing a plug-in fitness function that capture the notion of interestingness of clusters. It enhances typical agglomerative clustering algorithms in two ways: fitness functions support task-specific clustering, whereas generic neighboring relationships increase the number of merging candidates. We also demonstrate that existing agglomerative clustering algorithms can be considered as specific cases of GAC-GEO. We evaluate the proposed framework on an artificial dataset and two real-world applications involving region discovery. The experimental results show that GAC-GEO is capable of identifying arbitrarily shaped hotspots for different data mining tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EHUCM: An Efficient Algorithm for Mining High Utility Co-location Patterns from Spatial Datasets with Feature-specific Utilities

A MapReduce approach for spatial co-location pattern mining via ordered-clique-growth

Article 02 December 2019

Interestingness Hotspot Discovery in Spatial Datasets Using a Graph-Based Approach

References

Anders KH (2003) A hierarchical graph-clustering approach to find groups of objects. Technical Paper. In: ICA commission on map generalization, the 5th workshop on progress in automated map generalization
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM-SIGMOD international conference on management of data. ACM Press, Philadelphia, Pennsylvania, pp 49–60
Chaoji V, Hasan MA, Salem S, Zaki MJ (2009) SPARCL: an effective and efficient algorithm for mining arbitrary shape-based clusters. Knowl Inf Syst 21(2): 201–229
Article Google Scholar
Choo J, Jiamthapthaksin R, Chen C, Celepcikay O, Giusti C, Eick CF (2007) MOSAIC: a proximity graph approach to agglomerative clustering. In: Proceedings of the 9th international conference on data warehousing and knowledge discovery (DaWaK), pp 231–240
Davidson I, Ravi SS (2005) Hierarchical clustering with constraints: Theory and practice. In: Proceedings of the 9th European conference on machine learning and principles and practice of knowledge discovery in databases, pp 59–70
Ding W, Eick CF, Yuan X, Wang J, Nicot J-P (2007) On regional association rule scoping. In: Proceedings of international workshop on spatial and spatio-temporal data mining (SSTDM),vol 30, pp 595–600
Ding W, Jiamthapthaksin R, Parmar R, Jiang D, Stepinski T, Eick CF (2008) Towards region discovery in spatial datasets. In: Proceedings of pacific-asia conference on knowledge discovery and data mining (PAKDD), pp 88–99
DMML datasets (2008) Datasets. In: Data mining and machine learning group website, University of Houston, Texas. http://www.tlc2.uh.edu/dmmlg/Datasets. Accessed 1 July 2008
Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with noise. Inf Syst 32(7): 978–986
Article Google Scholar
Eick CF, Parmar R, Ding W, Stepinki T, Nicot J-P (2008) Finding regional co-location patterns for sets of continuous variables in spatial datasets. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in GIS (ACM-GIS)
Eick CF, Vaezian B, Jiang D, Wang J (2006) Discovery of interesting regions in spatial datasets using supervised clustering. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD), pp 127–138
EPA (2008) Databases and software. In: U.S. Environmental Protection Agency (EPA) official website. http://www.epa.gov/epahome/data.html. Accessed 1 Aug 2008
Ester M, Kriegel HP, Sander J, Xu X (1996) Density-based spatial clustering of applications with noise. In: Proceedings of the international conference on knowledge discovery and data mining, pp 2976–2981
Gao D, Peuquet D, Gahegan M (2002) Opening the black box: interactive hierarchical clustering for multivariate spatial patterns. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, pp 131–136
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 73–84
Hinneburg A, Keim D (1998) An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, pp 58–65
Jiang B (2004) Spatial clustering for mining knowledge in support of generalization processes in GIS. In: ICA workshop on generalization and multiple representation
Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. In: IEEE Computer 32(8):68–75
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Google Scholar
Koga H, Ishibashi T, Watanabe T (2006) Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowl Inf Syst 12(1): 25–53
Article Google Scholar
Kriegel HP, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of international conference on knowledge discovery in data mining, pp 672–677
Lin C, Chen M (2002) A robust and efficient clustering algorithm based on cohesion self-merging. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 582–587
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, University of California Press, pp 281–297
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4): 308–313
MATH Google Scholar
Netflix (2009) Netflix prize official website. http://www.netflixprize.com/. Accessed 1 Oct 2009
NOAA (2008) Explore NOAA. In: National Oceanic and Atmospheric Administration (NOAA) official website. http://www.noaa.gov/. Accessed 1 Sept 2008
Otoo EJ, Shoshani A, Hwang S-W (2001) Clustering high dimensional massive scientific datasets. Intell Inf Syst 17(2–3): 147–168
Article MATH Google Scholar
Piotte M, Chabbert M (2009) The Pragmatic Theory solution to the Netflix Grand Prize (Report from the Netflix Prize Winners)
Rinsurongkawong V, Eick CF (2008) Change analysis in spatial datasets by interestingness comparison. ACM-SIGSPATIAL Newsletter, pp 33–38
Sander J, Ester M, Kriegel HP, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Discov 2(2): 169–194
Article Google Scholar
Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining. Addison Wesley, Reading
Google Scholar
TWDB (2008) TWDB data. In: Texas Water Development Board (TWDB) official website. http://www.twdb.state.tx.us/data/data.asp. Accessed 1 Sept 2008
UCI repository (2008) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 1 Aug 2008
Xiong H, Steinbach M, Ruslim A, Kumar V (2009) Characterizing pattern preserving clustering. Knowl Inf Syst 19(3): 311–336
Article Google Scholar
Zhong S, Ghosh J (2003) A unified framework for model-based clustering. Mach Learn Res 4: 1001–1037
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of Science and Technology, Assumption University, Bangkok, Thailand
Rachsuda Jiamthapthaksin
Computer Science Department, University of Houston, Houston, TX, USA
Rachsuda Jiamthapthaksin, Christoph F. Eick & Seungchan Lee

Authors

Rachsuda Jiamthapthaksin
View author publications
You can also search for this author in PubMed Google Scholar
Christoph F. Eick
View author publications
You can also search for this author in PubMed Google Scholar
Seungchan Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rachsuda Jiamthapthaksin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiamthapthaksin, R., Eick, C.F. & Lee, S. GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets. Knowl Inf Syst 29, 597–628 (2011). https://doi.org/10.1007/s10115-010-0355-3

Download citation

Received: 30 November 2009
Revised: 19 July 2010
Accepted: 22 October 2010
Published: 12 November 2010
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10115-010-0355-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

Abstract

Access this article

Similar content being viewed by others

EHUCM: An Efficient Algorithm for Mining High Utility Co-location Patterns from Spatial Datasets with Feature-specific Utilities

A MapReduce approach for spatial co-location pattern mining via ordered-clique-growth

Interestingness Hotspot Discovery in Spatial Datasets Using a Graph-Based Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets

Abstract

Access this article

Similar content being viewed by others

EHUCM: An Efficient Algorithm for Mining High Utility Co-location Patterns from Spatial Datasets with Feature-specific Utilities

A MapReduce approach for spatial co-location pattern mining via ordered-clique-growth

Interestingness Hotspot Discovery in Spatial Datasets Using a Graph-Based Approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation