Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Top-k spatial distance joins

  • 42 Accesses

Abstract

Top-k joins have been extensively studied when numerical valued attributes are joined on an equality predicate. Other types of join attributes and predicates have received little to no attention. In this paper, we consider spatial objects that are assigned a score (e.g., a ranking). Give two collections R, S of such objects and a spatial distance threshold 𝜖, we introduce the top-k spatial distance join (k-SDJoin) to identify the k pairs of objects, which have the highest combined score (based on an aggregate function γ) among all object pairs in R × S with a spatial distance at most 𝜖. State-the-of-art methods for relational top-k joins can be adapted for k-SDJoin, but their focus is on minimizing the number of objects accessed from the inputs; however, when spatial objects are joined, the computational cost can easily become the bottleneck. In view of this, we propose a novel evaluation algorithm, which greatly reduces the computational cost, without compromising the access cost. The main idea is to access and efficiently join blocks of objects from each collection, using appropriate bounds to avoid computing the entire spatial 𝜖-distance join. As the performance of our solution heavily relies on the size of the input blocks, we devise an approach for automated block size tuning enhanced by a novel generic model for estimating the number of objects to be accessed from each input. Contrary to previous efforts, our model employs cheap-to-compute statistics and requires no prior knowledge of data distribution. Our extensive experimental analysis demonstrates the efficiency of our algorithm compared to methods based on existing literature that prioritize either the ranking or the spatial join component of k-SDJoin queries.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. 1.

    An exception is the work of [21] which, however, is restricted to a specific type of attributes (probabilities) and a specific aggregation function (product).

  2. 2.

    Input collections R and S need not to be sorted on their scoring attribute for example, if they stem from previous query operators which produce such interesting orders.

  3. 3.

    When a dataset is sorted in descending order of its scoring attribute, the lowest seen score is equivalent to the last seen score.

  4. 4.

    In this paper, we define the dist function on non-leaf entries as the minimum distance between the MBR of two tree entries bounding boxes or between the MBR of a tree entry and an object, i.e., \(dist(e,e^{\prime }) = MINDIST(e,e^{\prime })\) or dist(o, e) = MINDIST(o, e), respectively.

  5. 5.

    We briefly discuss the cost of automatically determining block size λ in the next section.

  6. 6.

    We denote by \(r_{c_{R}}\) and \(s_{c_{S}}\) the cR and cS-th objects in the sorted inputs, respectively.

References

  1. 1.

    Arge L, Procopiuc O, Ramaswamy S, Suel T, Vitter JS (1998) Scalable sweeping-based spatial join. In: VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, New York City, pp 570–581

  2. 2.

    Belussi A, Faloutsos C (1995) Estimating the selectivity of spatial queries using the ‘correlation’ fractal dimension. In: VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, Zurich, pp 299–310

  3. 3.

    Brinkhoff T, Kriegel HP, Seeger B (1993) Efficient processing of spatial joins using R-trees. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, pp 237–246

  4. 4.

    Chakrabarti K, Chaudhuri S, Ganti V (2011) Interval-based pruning for top-k processing over compressed lists. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, Hannover, pp 709–720

  5. 5.

    Chan EPF (2003) Buffer queries. IEEE TKDE 15(4):895–910

  6. 6.

    Corral A, Manolopoulos Y, Theodoridis Y, Vassilakopoulos M (2000) Closest pair queries in spatial databases. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 189–200

  7. 7.

    Doulkeridis C, Vlachou A, Kotidis Y, Polyzotis N (2012) Processing of rank joins in highly distributed systems. In: IEEE 28Th international conference on data engineering (ICDE 2012), Washington, pp 606–617

  8. 8.

    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, pp 102–113

  9. 9.

    Faloutsos C, Seeger B, Traina A, Traina C Jr (2000) Spatial join selectivity using power laws. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 177–188

  10. 10.

    Finger J, Polyzotis N (2009) Robust and efficient algorithms for rank join evaluation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, pp 415–428

  11. 11.

    Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD’84, Proceedings of Annual Meeting, Boston, pp 47–57

  12. 12.

    Hjaltason GR, Samet H (1998) Incremental distance join algorithms for spatial databases. In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, Seattle, pp 237–248

  13. 13.

    Hu H, Li G, Bao Z, Feng J, Wu Y, Gong Z, Xu Y (2016) Top-k spatio-textual similarity join. IEEE TKDE 28(2):551–565

  14. 14.

    Ilyas IF, Aref WG, Elmagarmid AK (2003) Supporting top-k join queries in relational databases. In: VLDB 2003, Proceedings of 29th International Conference on Very Large Data Bases, Berlin, pp 754–765

  15. 15.

    Ilyas IF, Shah R, Aref WG, Vitter JS, Elmagarmid AK (2004) Rank-aware query optimization. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, pp 203–214

  16. 16.

    Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):7

  17. 17.

    Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506

  18. 18.

    Kim Y, Shim K (2012) Parallel top-k similarity join algorithms using mapreduce. In: IEEE 28Th international conference on data engineering (ICDE 2012), Washington, pp 510–521

  19. 19.

    Koudas N, Muthukrishnan S, Srivastava D (2000) Optimal histograms for hierarchical range queries. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Dallas, pp 196–204

  20. 20.

    Li C, Chang KCC, Ilyas IF, Song S (2005) Ranksql: Query algebra and optimization for relational top-k queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, pp 131–142

  21. 21.

    Ljosa V, Singh AK (2008) Top-k spatial joins of probabilistic objects. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, Cancu̇n, pp 566–575

  22. 22.

    Mamoulis N, Papadias D (2001) Multiway spatial joins. ACM Trans Database Syst 26(4):424–475

  23. 23.

    Mamoulis N, Yiu ML, Cheng KH, Cheung DW (2007) Efficient top-k aggregation of ranked inputs. ACM TODS 32(3):19–63

  24. 24.

    Martinenghi D, Tagliasacchi M (2010) Proximity rank join. PVLDB 3 (1):352–363

  25. 25.

    Natsev A, Chang YC, Smith JR, Li CS, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, Roma, pp 281–290

  26. 26.

    Nobari S, Tauheed F, Heinis T, Karras P, Bressan S, Ailamaki A (2013) TOUCH: in-memory spatial join by hierarchical data-oriented partitioning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, pp 701–712

  27. 27.

    Ntarmos N, Patlakas I, Triantafillou P (2014) Rank join queries in nosql databases. PVLDB 7(7):493–504

  28. 28.

    Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: Advances in spatial and temporal databases, 7th international symposium, SSTD 2001, Redondo Beach, Proceedings, pp 443–459

  29. 29.

    Patel JM, DeWitt DJ (1996) Partition based spatial-merge join. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, pp 259–270

  30. 30.

    Petersen SB, Neves-Petersen MT, Henriksen SB, Mortensen RJ, Geertz-Hansen HM (2012) Scale-free behaviour of amino acid pair interactions in folded proteins. PLos ONE 7(7):1–14

  31. 31.

    Poosala V, Haas PJ, Ioannidis YE, Shekita EJ (1996) Improved histograms for selectivity estimation of range predicates. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, pp 294–305

  32. 32.

    Qi S, Bouros P, Mamoulis N (2013) Efficient top-k spatial distance joins. In: Advances in spatial and temporal databases - 13th international symposium, SSTD 2013, Munich, pp 1–18

  33. 33.

    Qian Z, Xu J, Zheng K, Zhao P, Zhou X (2018) Semantic-aware top-k spatial keyword queries. World Wide Web 21(3):573–594

  34. 34.

    Ray S, Simion B, Brown AD, Johnson R (2014) Skew-resistant parallel in-memory spatial join. In: Conference on scientific and statistical database management, SSDBM’14, Aalborg, pp 6:1–6:12

  35. 35.

    Roussopoulos N, Kelley S, Vincent F (1995) Nearest neighbor queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, pp 71–79

  36. 36.

    Saouk M, Doulkeridis C, Vlachou A, Nørvåg K (2016) Efficient processing of top-k joins in mapreduce. In: 2016 IEEE International conference on big data, bigdata 2016, Washington, pp 570–577

  37. 37.

    Schnaitter K, Polyzotis N (2010) Optimal algorithms for evaluating rank joins in database systems. ACM TODS 35(1):6:1–6:47

  38. 38.

    Schnaitter K, Spiegel J, Polyzotis N (2007) Depth estimation for ranking query optimization. In: Proceedings of the 33rd International Conference on Very Large Data Bases. University of Vienna, Austria, pp 902–913

  39. 39.

    Shin H, Moon B, Lee S (2000) Adaptive multi-stage distance join processing. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, pp 343–354

  40. 40.

    Smith AJ (1978) Sequentiality and prefetching in database systems. ACM TODS 3(3):223–247

  41. 41.

    Wu M, Berti-Équille L, Marian A, Procopiuc CM, Srivastava D (2010) Processing top-k join queries. PVLDB 3(1-2):860–870

  42. 42.

    Xiao C, Wang W, Lin X, Shang H (2009) Top-k set similarity joins. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, pp 916–927

  43. 43.

    Xin D, Han J, Chang KC (2007) Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, pp 103–114

  44. 44.

    Zhang S, Han J, Liu Z, Wang K, Xu Z (2009) SJMR: parallelizing spatial join with mapreduce on clusters. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, New Orleans, pp 1–8

  45. 45.

    Zhao K, Zhou S, Tan KL, Zhou A (2005) Supporting ranked join in peer-to-peer networks. In: 16Th international workshop on database and expert systems applications (DEXA’05), pp 796–800

  46. 46.

    Zhu M, Papadias D, Lee DL, Zhang J (2005) Top-k spatial joins. IEEE TKDE 17(4):567–579

Download references

Author information

Correspondence to Shuyao Qi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Qi, S., Bouros, P. & Mamoulis, N. Top-k spatial distance joins. Geoinformatica (2020). https://doi.org/10.1007/s10707-020-00393-z

Download citation

Keywords

  • Top-k join
  • Spatial join