Top-k spatial-keyword publish/subscribe over sliding window

Wang, Xiang; Zhang, Wenjie; Zhang, Ying; Lin, Xuemin; Huang, Zengfeng

doi:10.1007/s00778-016-0453-2

Top-k spatial-keyword publish/subscribe over sliding window

Regular Paper
Published: 11 January 2017

Volume 26, pages 301–326, (2017)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Xiang Wang ORCID: orcid.org/0000-0001-9449-5726¹,
Wenjie Zhang¹,
Ying Zhang²,
Xuemin Lin¹ &
…
Zengfeng Huang¹

1939 Accesses
28 Citations
Explore all metrics

Abstract

With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data have been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top-\(k\) monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. To provide the most recent information under controllable memory cost, sliding window model is employed on the streaming geo-textual data. To the best of our knowledge, this is the first work to study top-\(k\) spatial-keyword publish/subscribe over sliding window. A novel centralized system, called Skype (Top-k Spatial-keyword Publish/Subscribe), is proposed in this paper. In Skype, to continuously maintain top-\(k\) results for massive subscriptions, we devise a novel indexing structure upon subscriptions such that each incoming message can be immediately delivered on its arrival. To reduce the expensive top-\(k\) re-evaluation cost triggered by message expiration, we develop a novel cost-based k -skyband technique to reduce the number of re-evaluations in a cost-effective way. Extensive experiments verify the great efficiency and effectiveness of our proposed techniques. Furthermore, to support better scalability and higher throughput, we propose a distributed version of Skype, namely DSkype, on top of Storm, which is a popular distributed stream processing system. With the help of fine-tuned subscription/message distribution mechanisms, DSkype can achieve orders of magnitude speed-up than its centralized version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate spatio-temporal top-k publish/subscribe

Article 26 April 2018

Top-k term publish/subscribe for geo-textual data streams

Article 09 March 2020

Maintaining Boolean Top-K Spatial Temporal Results in Publish-Subscribe Systems

Notes

Apache storm project. http://storm.apache.org/.
Apache HBase project. https://hbase.apache.org/.
Apache spark project. http://spark.apache.org/streaming/.
Apache samza project. http://samza.apache.org/.
https://github.com/apache/storm.
The same technique in [29] is used to compute \(k\)-skyband.
Apache Hadoop project. https://hadoop.apache.org/.
https://dev.twitter.com/rest/public.
http://storm.apache.org/releases/0.10.0/Concepts.html.
http://storm.apache.org/releases/0.10.0/Trident-tutorial.html.
http://storm.apache.org/releases/0.10.0/STORM-UI-REST-API.html.
The time decay function and related index in CIQ are removed to adapt to our problem.
http://geonames.usgs.gov.
http://www.yelp.com/.
https://en.wikipedia.org/wiki/Tfidf.
http://storm.apache.org/2015/11/05/storm0100-released.html.
http://zookeeper.apache.org/doc/r3.4.8/.

References

Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)
Article Google Scholar
Aly, A.M., Mahmood, A.R., Hassan, M.S., Aref, W.G., Ouzzani, M., Elmeleegy, H., Qadah, T.: AQWA: adaptive query-workload-aware partitioning of big spatial data. Proc. VLDB Endow. 8(13), 2062–2073 (2015)
Article Google Scholar
Avriel, M.: Nonlinear Programming: Analysis and Methods. Courier Corporation, MA, USA (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS (2002)
Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: WWW, pp. 131–140 (2007)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet MATH Google Scholar
Böhm, C., Ooi, B.C., Plant, C., Yan, Y.: Efficiently processing continuous k-nn queries on data streams. In: ICDE, pp. 156–165 (2007)
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.Y.: Efficient query evaluation using a two-level retrieval process. In: CIKM, pp. 426–434 (2003)
Buckley, C., Lewit, A.F.: Optimization of inverted vector searches. In: SIGIR, pp. 97–110 (1985)
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: ICDE, p. 5 (2006)
Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD (2013)
Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: ICDE (2015)
Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: an experimental evaluation. In: PVLDB (2013)
Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., Suel, T.: Text vs. space: efficient geo-search query processing. In: CIKM, pp. 423–432 (2011)
Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. Proc. VLDB Endow. 2(1), 337–348 (2009)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: SIGIR, pp. 993–1002 (2011)
Eldawy, A., Mokbel, M.F.: Spatialhadoop: a mapreduce framework for spatial data. In: ICDE, pp. 1352–1363 (2015)
Felipe, I.D., Hristidis, V., Rishe, N.: Keyword search on spatial databases. In: ICDE, pp. 656–665 (2008)
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York (2008)
MATH Google Scholar
Guo, L., Zhang, D., Li, G., Tan, K., Bao, Z.: Location-aware pub/sub system: When continuous moving queries meet dynamic event streams. In: SIGMOD, pp. 843–857 (2015)
Guo, T., Cao, X., Cong, G.: Efficient algorithms for answering the m-closest keywords query. In: SIGMOD (2015)
Hariharan, R., Hore, B., Li, C., Mehrotra, S.: Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In: SSDBM, p. 16 (2007)
Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: ICDE, pp. 711–722 (2015)
Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: SIGKDD, pp. 802–810 (2013)
Lu, J., Lu, Y., Cong, G.: Reverse spatial and textual k nearest neighbor search. In: SIGMOD, pp. 349–360 (2011)
Mahmood, A.R., Aly, A.M., Qadah, T., Rezig, E.K., Daghistani, A., Madkour, A., Abdelhamid, A.S., Hassan, M.S., Aref, W.G., Basalamah, S.M.: Tornado: a distributed spatio-textual stream processing system. Proc. VLDB Endow. 8(12), 2020–2031 (2015)
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge Press, Cambridge (2008)
Book MATH Google Scholar
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: SIGMOD, pp. 635–646 (2006)
Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: Md-hbase: A scalable multi-dimensional data infrastructure for location aware services. In: MDM, 2011, pp. 7–16 (2011)
Pripuzic, K., Zarko, I., Aberer, K.: Time and space-efficient sliding window top-k query processing. ACM Trans Database Syst. 40, 1 (2015)
Article MathSciNet Google Scholar
Ranjan, R.: Streaming big data processing in datacenter clouds. IEEE Cloud Comput. 1(1), 78–83 (2014)
Article Google Scholar
Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvåg, K.: Efficient processing of top-k spatial keyword queries. In: SSTD, pp. 205–222 (2011)
Sadoghi, M., Jacobsen, H.: Be-tree: an index structure to efficiently match Boolean expressions over high-dimensional discrete space. In: SIGMOD, pp. 637–648 (2011)
Shraer, A., Gurevich, M., Fontoura, M., Josifovski, V.: Top-k publish-subscribe for social annotation of news. Proc. VLDB Endow. 6, 385–396 (2013)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE Computer Society Washington, DC, USA, pp. 1–10 (2010). doi:10.1109/MSST.2010.5496972
Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In: ICDE, pp. 1107–1118 (2015)
Whang, S., Brower, C., Shanmugasundaram, J., Vassilvitskii, S., Vee, E., Yerneni, R., Garcia-Molina, H.: Indexing boolean expressions. Proc. VLDB Endow. 2(1), 37–48 (2009)
Article Google Scholar
Xiao, C., Wang, W., Lin, X., Yu, J.X., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst. 36, 5 (2011)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: Efficient in-memory spatial analytics. In: SIGMOD (2016)
Yi, K., Yu, H., Yang, J., Xia, G., Chen, Y.: Efficient maintenance of materialized top-k views. In: ICDE (2003)
Zhang, D., Chan, C., Tan, K.: An efficient publish/subscribe index for ecommerce databases. Proc. VLDB Endow. 7(8), 613–624 (2014)
Article Google Scholar
Zhang, D., Chan, C., Tan, K.: Processing spatial keyword query as a top-k aggregation query. In: SIGIR, pp. 355–364 (2014)
Zhang, Y., Lin, X., Yuan, Y., Kitsuregawa, M., Zhou, X., Yu, J.X.: Duplicate-insensitive order statistics computation over data streams. IEEE Trans. Knowl. Data Eng. 22(4), 493–507 (2010)
Article Google Scholar
Zhou, Y., Xie, X., Wang, C., Gong, Y., Ma, W.: Hybrid index structures for location-based web search. In: CIKM, pp. 155–162 (2005)

Download references

Acknowledgements

Ying Zhang is supported by ARC DE140100679 and DP130103245. Wenjie Zhang is supported by ARC DP150103071 and DP150102728. Xuemin Lin is supported by NSFC61232006, ARC DP170101628, and DP150102728.

Author information

Authors and Affiliations

School of Computer Science and Engineering, The University of New South Wales, Sydney, Australia
Xiang Wang, Wenjie Zhang, Xuemin Lin & Zengfeng Huang
Centre for Artificial Intelligence, University of Technology, Sydney, Australia
Ying Zhang

Authors

Xiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zengfeng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Wang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1477 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Zhang, W., Zhang, Y. et al. Top-k spatial-keyword publish/subscribe over sliding window. The VLDB Journal 26, 301–326 (2017). https://doi.org/10.1007/s00778-016-0453-2

Download citation

Received: 20 May 2016
Revised: 21 October 2016
Accepted: 22 December 2016
Published: 11 January 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00778-016-0453-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k spatial-keyword publish/subscribe over sliding window

Abstract

Access this article

Similar content being viewed by others

Approximate spatio-temporal top-k publish/subscribe

Top-k term publish/subscribe for geo-textual data streams

Maintaining Boolean Top-K Spatial Temporal Results in Publish-Subscribe Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 1477 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Top-k spatial-keyword publish/subscribe over sliding window

Abstract

Access this article

Similar content being viewed by others

Approximate spatio-temporal top-k publish/subscribe

Top-k term publish/subscribe for geo-textual data streams

Maintaining Boolean Top-K Spatial Temporal Results in Publish-Subscribe Systems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 1477 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation