Abstract
We provide in this article a branch-and-bound algorithm that solves the problem of finding the k closest pairs of points (p,q), p ∈ P,q ∈ Q, considering two sets of points in the euclidean plane P,Q stored in external memory assuming that only one of the sets has a spatial index. This problem arises naturally in many scenarios, for instance when the set without an index is the answer to a spatial query. The main idea of our algorithm is to partition the space occupied by the set without an index into several cells or subspaces and to make use of the properties of a set of metrics defined on two Minimum Bounding Rectangles (MBRs). We evaluated our algorithm for different values of k by means of a series of experiments that considered both synthetical and real world datasets. We compared the performance of our algorithm with that of techniques that either assume that both datasets have a spatial index or that none has an index. The results show that our algorithm needs only between a 0.3 and a 35 % of the disk accesses required by such techniques. Our algorithm also shows a good scalability, both in terms of k and of the size of the data set.
Similar content being viewed by others
Notes
1K = 1,000 points.
References
Blott S, Weber R (1997) A simple vector-approximation file for similarity search in high-dimensional vector spaces. Technical report, Institute of Information Systems, ETH Zentrum, Zurich, Switzerland
Brinkhoff T, Kriegel H-P, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: ACM SIGMOD conference on management of data. ACM, Washington, DC, pp 237–246
Corral A (2002) Algoritmos para el procesamiento de consultas espaciales utilizando R-trees. La consulta de los pares más cercanos y su aplicación en bases de datos espaciales. PhD thesis, Universidad de Almería, Escuela Politécnica Superior, España
Corral A, Almendros-Jiménez JM (2007) A performance comparison of distance-based query algorithms using R-trees in spatial databases. Inf Sci 177:2207–2237
Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2004) Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl Eng 49(1):67–104
Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2000) Closest pair queries in spatial databases. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM Press, New York, pp 189–200
Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2006) Cost models for distance joins queries using r-trees. Data Knowl Eng 57:1–36
Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231
Günther O (1993) Efficient computation of spatial joins. In: Proceedings of the 9th international conference on data engineering. IEEE Computer Society., Washington, DC, pp 50–59
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD conference on management of data. ACM, pp 47–57
Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: ACM SIGMOD conference on management of data. Seattle, WA, pp 237–248
Huang Y-W, Jing N, Rundensteiner EA (1997) A cost model for estimating the performance of spatial joins using R-trees. In: SSDBM, pp 30–38
Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32
Mamoulis N, Papadias D (2003) Slot index spatial join. IEEE Trans Knowl Data Eng 15(1):211–231
Ming-Ling L, Chinya R (1994) Spatial joins using seeded trees. In: ACM SIGMOD conference on management of data. Minneapolis, Minnesota, pp 209–220
Ming-Ling L, Chinya R (1996) Spatial hash-joins. In: ACM SIGMOD conference on management of data. Montreal, Canada, pp 247–258
Patel JM, DeWitt DJ (1996) Partition based spatial-merge join. In: SIGMOD ’96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data. ACM Press, New York, pp 259–270
Pincheira M, Gutierrez G, Gajardo L (2010) Closest pair query on spatial data sets without index. In: Ochoa SF, Meza F, Mery D, Cubillos C (eds) Proceedings of the XXIX international conference of the Chilean computer science society. IEEE Computer Society, Antofagasta, pp 178–182, 15–19 November 2010
Qiao S, Tang C, Peng J, Li H, Ni S (2008) Efficient k-closest-pair range-queries in spatial databases. In: Proceedings of the 2008 the 9th international conference on web-age information management, WAIM ’08. IEEE Computer Society, Washington, DC, pp 99–104
Sang-Wook K, Wan-Sup C, Min-Jae L, Kyu-Young W (1995) A new algorithm for processing joins using the multilevel grid file. In: Proceedings of the 4th international conference on database systems for advanced applications (DASFAA). World Scientific Press, pp 115–123
Shan J, Zhang D, Salzberg B (2003) On spatial-range closest-pair query. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases, vol 2750. Lecture notes in computer science. Springer, pp 252–269
Shekhar S, Chawla S (2003) Spatial databases—a tour. Prentice Hall
Acknowledgements
We are grateful to Miguel Pincheira for providing a software that simulates the algorithm in [18], to Antonio Corral for some kind suggestions and to two anonymous referees for many observations that helped us improve this article.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by the research project DIUBB 073218 A/R, and was partially done in the context of a postdoctoral stay by the first author at the University of A Coruña (Spain), supported by project MECESUP UBB0704.
Rights and permissions
About this article
Cite this article
Gutiérrez, G., Sáez, P. The k closest pairs in spatial databases. Geoinformatica 17, 543–565 (2013). https://doi.org/10.1007/s10707-012-0169-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-012-0169-4