Skip to main content
Log in

The k closest pairs in spatial databases

When only one set is indexed

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

We provide in this article a branch-and-bound algorithm that solves the problem of finding the k closest pairs of points (p,q), p ∈ P,q ∈ Q, considering two sets of points in the euclidean plane P,Q stored in external memory assuming that only one of the sets has a spatial index. This problem arises naturally in many scenarios, for instance when the set without an index is the answer to a spatial query. The main idea of our algorithm is to partition the space occupied by the set without an index into several cells or subspaces and to make use of the properties of a set of metrics defined on two Minimum Bounding Rectangles (MBRs). We evaluated our algorithm for different values of k by means of a series of experiments that considered both synthetical and real world datasets. We compared the performance of our algorithm with that of techniques that either assume that both datasets have a spatial index or that none has an index. The results show that our algorithm needs only between a 0.3 and a 35 % of the disk accesses required by such techniques. Our algorithm also shows a good scalability, both in terms of k and of the size of the data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. 1K = 1,000 points.

References

  1. Blott S, Weber R (1997) A simple vector-approximation file for similarity search in high-dimensional vector spaces. Technical report, Institute of Information Systems, ETH Zentrum, Zurich, Switzerland

  2. Brinkhoff T, Kriegel H-P, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: ACM SIGMOD conference on management of data. ACM, Washington, DC, pp 237–246

    Google Scholar 

  3. Corral A (2002) Algoritmos para el procesamiento de consultas espaciales utilizando R-trees. La consulta de los pares más cercanos y su aplicación en bases de datos espaciales. PhD thesis, Universidad de Almería, Escuela Politécnica Superior, España

  4. Corral A, Almendros-Jiménez JM (2007) A performance comparison of distance-based query algorithms using R-trees in spatial databases. Inf Sci 177:2207–2237

    Article  Google Scholar 

  5. Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2004) Algorithms for processing k-closest-pair queries in spatial databases. Data Knowl Eng 49(1):67–104

    Article  Google Scholar 

  6. Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2000) Closest pair queries in spatial databases. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM Press, New York, pp 189–200

    Chapter  Google Scholar 

  7. Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2006) Cost models for distance joins queries using r-trees. Data Knowl Eng 57:1–36

    Article  Google Scholar 

  8. Gaede V, Günther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231

    Article  Google Scholar 

  9. Günther O (1993) Efficient computation of spatial joins. In: Proceedings of the 9th international conference on data engineering. IEEE Computer Society., Washington, DC, pp 50–59

    Chapter  Google Scholar 

  10. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: ACM SIGMOD conference on management of data. ACM, pp 47–57

  11. Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: ACM SIGMOD conference on management of data. Seattle, WA, pp 237–248

  12. Huang Y-W, Jing N, Rundensteiner EA (1997) A cost model for estimating the performance of spatial joins using R-trees. In: SSDBM, pp 30–38

  13. Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32

  14. Mamoulis N, Papadias D (2003) Slot index spatial join. IEEE Trans Knowl Data Eng 15(1):211–231

    Article  Google Scholar 

  15. Ming-Ling L, Chinya R (1994) Spatial joins using seeded trees. In: ACM SIGMOD conference on management of data. Minneapolis, Minnesota, pp 209–220

    Google Scholar 

  16. Ming-Ling L, Chinya R (1996) Spatial hash-joins. In: ACM SIGMOD conference on management of data. Montreal, Canada, pp 247–258

    Google Scholar 

  17. Patel JM, DeWitt DJ (1996) Partition based spatial-merge join. In: SIGMOD ’96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data. ACM Press, New York, pp 259–270

    Chapter  Google Scholar 

  18. Pincheira M, Gutierrez G, Gajardo L (2010) Closest pair query on spatial data sets without index. In: Ochoa SF, Meza F, Mery D, Cubillos C (eds) Proceedings of the XXIX international conference of the Chilean computer science society. IEEE Computer Society, Antofagasta, pp 178–182, 15–19 November 2010

    Google Scholar 

  19. Qiao S, Tang C, Peng J, Li H, Ni S (2008) Efficient k-closest-pair range-queries in spatial databases. In: Proceedings of the 2008 the 9th international conference on web-age information management, WAIM ’08. IEEE Computer Society, Washington, DC, pp 99–104

    Google Scholar 

  20. Sang-Wook K, Wan-Sup C, Min-Jae L, Kyu-Young W (1995) A new algorithm for processing joins using the multilevel grid file. In: Proceedings of the 4th international conference on database systems for advanced applications (DASFAA). World Scientific Press, pp 115–123

  21. Shan J, Zhang D, Salzberg B (2003) On spatial-range closest-pair query. In: Hadzilacos T, Manolopoulos Y, Roddick J, Theodoridis Y (eds) Advances in spatial and temporal databases, vol 2750. Lecture notes in computer science. Springer, pp 252–269

  22. Shekhar S, Chawla S (2003) Spatial databases—a tour. Prentice Hall

Download references

Acknowledgements

We are grateful to Miguel Pincheira for providing a software that simulates the algorithm in [18], to Antonio Corral for some kind suggestions and to two anonymous referees for many observations that helped us improve this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilberto Gutiérrez.

Additional information

This work was partially supported by the research project DIUBB 073218 A/R, and was partially done in the context of a postdoctoral stay by the first author at the University of A Coruña (Spain), supported by project MECESUP UBB0704.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gutiérrez, G., Sáez, P. The k closest pairs in spatial databases. Geoinformatica 17, 543–565 (2013). https://doi.org/10.1007/s10707-012-0169-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-012-0169-4

Keywords

Navigation