Skip to main content
Log in

Top-k spatial-keyword publish/subscribe over sliding window

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data have been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top-\(k\) monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. To provide the most recent information under controllable memory cost, sliding window model is employed on the streaming geo-textual data. To the best of our knowledge, this is the first work to study top-\(k\) spatial-keyword publish/subscribe over sliding window. A novel centralized system, called Skype (Top-k Spatial-keyword Publish/Subscribe), is proposed in this paper. In Skype, to continuously maintain top-\(k\) results for massive subscriptions, we devise a novel indexing structure upon subscriptions such that each incoming message can be immediately delivered on its arrival. To reduce the expensive top-\(k\) re-evaluation cost triggered by message expiration, we develop a novel cost-based k -skyband technique to reduce the number of re-evaluations in a cost-effective way. Extensive experiments verify the great efficiency and effectiveness of our proposed techniques. Furthermore, to support better scalability and higher throughput, we propose a distributed version of Skype, namely DSkype, on top of Storm, which is a popular distributed stream processing system. With the help of fine-tuned subscription/message distribution mechanisms, DSkype can achieve orders of magnitude speed-up than its centralized version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

Notes

  1. Apache storm project. http://storm.apache.org/.

  2. Apache HBase project. https://hbase.apache.org/.

  3. Apache spark project. http://spark.apache.org/streaming/.

  4. Apache samza project. http://samza.apache.org/.

  5. https://github.com/apache/storm.

  6. The same technique in [29] is used to compute \(k\)-skyband.

  7. Apache Hadoop project. https://hadoop.apache.org/.

  8. https://dev.twitter.com/rest/public.

  9. http://storm.apache.org/releases/0.10.0/Concepts.html.

  10. http://storm.apache.org/releases/0.10.0/Trident-tutorial.html.

  11. http://storm.apache.org/releases/0.10.0/STORM-UI-REST-API.html.

  12. The time decay function and related index in CIQ are removed to adapt to our problem.

  13. http://geonames.usgs.gov.

  14. http://www.yelp.com/.

  15. https://en.wikipedia.org/wiki/Tfidf.

  16. http://storm.apache.org/2015/11/05/storm0100-released.html.

  17. http://zookeeper.apache.org/doc/r3.4.8/.

References

  1. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  2. Aly, A.M., Mahmood, A.R., Hassan, M.S., Aref, W.G., Ouzzani, M., Elmeleegy, H., Qadah, T.: AQWA: adaptive query-workload-aware partitioning of big spatial data. Proc. VLDB Endow. 8(13), 2062–2073 (2015)

    Article  Google Scholar 

  3. Avriel, M.: Nonlinear Programming: Analysis and Methods. Courier Corporation, MA, USA (2003)

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS (2002)

  5. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW, pp. 131–140 (2007)

  6. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  7. Böhm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE, pp. 156–165 (2007)

  8. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM, pp. 426–434 (2003)

  9. Buckley, C., Lewit, A.F.: Optimization of inverted vector searches. In: SIGIR, pp. 97–110 (1985)

  10. Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, p. 5 (2006)

  11. Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD (2013)

  12. Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: ICDE (2015)

  13. Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: PVLDB (2013)

  14. Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text vs. space: efficient geo-search query processing. In: CIKM, pp. 423–432 (2011)

  15. Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endow. 2(1), 337–348 (2009)

    Article  Google Scholar 

  16. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  17. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: SIGIR, pp. 993–1002 (2011)

  18. Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)

  19. Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)

  20. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York (2008)

    MATH  Google Scholar 

  21. Guo, L., Zhang, D., Li, G., Tan, K., Bao, Z.: Location-aware pub/sub system: When continuous moving queries meet dynamic event streams. In: SIGMOD, pp. 843–857 (2015)

  22. Guo, T., Cao, X., Cong, G.: Efficient algorithms for answering the m-closest keywords query. In: SIGMOD (2015)

  23. Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p. 16 (2007)

  24. Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: ICDE, pp. 711–722 (2015)

  25. Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: SIGKDD, pp. 802–810 (2013)

  26. Lu, J., Lu, Y., Cong, G.: Reverse spatial and textual k nearest neighbor search. In: SIGMOD, pp. 349–360 (2011)

  27. Mahmood, A.R., Aly, A.M., Qadah, T., Rezig, E.K., Daghistani, A., Madkour, A., Abdelhamid, A.S., Hassan, M.S., Aref, W.G., Basalamah, S.M.: Tornado: a distributed spatio-textual stream processing system. Proc. VLDB Endow. 8(12), 2020–2031 (2015)

    Article  Google Scholar 

  28. Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge Press, Cambridge (2008)

    Book  MATH  Google Scholar 

  29. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD, pp. 635–646 (2006)

  30. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: Md-hbase: A scalable multi-dimensional data infrastructure for location aware services. In: MDM, 2011, pp. 7–16 (2011)

  31. Pripuzic, K., Zarko, I., Aberer, K.: Time and space-efficient sliding window top-k query processing. ACM Trans Database Syst. 40, 1 (2015)

    Article  MathSciNet  Google Scholar 

  32. Ranjan, R.: Streaming big data processing in datacenter clouds. IEEE Cloud Comput. 1(1), 78–83 (2014)

    Article  Google Scholar 

  33. Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)

  34. Sadoghi, M., Jacobsen, H.: Be-tree: an index structure to efficiently match Boolean expressions over high-dimensional discrete space. In: SIGMOD, pp. 637–648 (2011)

  35. Shraer, A., Gurevich, M., Fontoura, M., Josifovski, V.: Top-k publish-subscribe for social annotation of news. Proc. VLDB Endow. 6, 385–396 (2013)

    Article  Google Scholar 

  36. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE Computer Society Washington, DC, USA, pp. 1–10 (2010). doi:10.1109/MSST.2010.5496972

  37. Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In: ICDE, pp. 1107–1118 (2015)

  38. Whang, S., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., Yerneni, R., Garcia-Molina, H.: Indexing boolean expressions. Proc. VLDB Endow. 2(1), 37–48 (2009)

    Article  Google Scholar 

  39. Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst. 36, 5 (2011)

  40. Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: Efficient in-memory spatial analytics. In: SIGMOD (2016)

  41. Yi, K., Yu, H., Yang, J., Xia, G., Chen, Y.: Efficient maintenance of materialized top-k views. In: ICDE (2003)

  42. Zhang, D., Chan, C., Tan, K.: An efficient publish/subscribe index for ecommerce databases. Proc. VLDB Endow. 7(8), 613–624 (2014)

    Article  Google Scholar 

  43. Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp. 355–364 (2014)

  44. Zhang, Y., Lin, X., Yuan, Y., Kitsuregawa, M., Zhou, X., Yu, J.X.: Duplicate-insensitive order statistics computation over data streams. IEEE Trans. Knowl. Data Eng. 22(4), 493–507 (2010)

    Article  Google Scholar 

  45. Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)

Download references

Acknowledgements

Ying Zhang is supported by ARC DE140100679 and DP130103245. Wenjie Zhang is supported by ARC DP150103071 and DP150102728. Xuemin Lin is supported by NSFC61232006, ARC DP170101628, and DP150102728.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1477 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Zhang, W., Zhang, Y. et al. Top-k spatial-keyword publish/subscribe over sliding window. The VLDB Journal 26, 301–326 (2017). https://doi.org/10.1007/s00778-016-0453-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0453-2

Keywords

Navigation