Advertisement

Distributed and Parallel Databases

, Volume 37, Issue 1, pp 101–132 | Cite as

High-throughput publish/subscribe on top of LSM-based storage

  • Mohiuddin Abdul QaderEmail author
  • Vagelis Hristidis
Article
  • 65 Downloads
Part of the following topical collections:
  1. Special Issue on Scientific and Statistical Data Management

Abstract

State-of-the-art publish/subscribe systems are efficient when the subscriptions are relatively static—for instance, the set of followers in Twitter—or can fit in memory. However, now-a-days, many big data and IoT based applications follow a highly dynamic query paradigm, where both continuous queries and data entries are in the millions and can arrive and expire rapidly. In this paper we propose and compare several publish/subscribe storage architectures, based on the popular NoSQL log-structured merge tree (LSM) storage paradigm, to support high-throughput and highly dynamic publish/subscribe systems. Our framework naturally supports subscriptions on both historic and future streaming data, and generates instant notifications. We also extend our framework to efficiently support self-joining subscriptions, where streaming pub/sub records join with past pub/sub entries. Further, we show how hierarchical attributes, such as concept ontologies, can be efficiently supported; for example, a publication’s topic is “politics” whereas a subscription’s topic is “US politics.” We implemented and experimentally evaluated our methods on the popular LSM-based LevelDB system, using real datasets, for simple match and self-joining subscriptions on both flat and hierarchical attributes. Our results show that our approaches achieve significantly higher throughput compared to state-of-the-art baselines.

Keywords

Log-structured merge tree LevelDB Publish/subscribe Self-join subscription Dewey Big data Internet of things Continuous lookup queries 

Notes

Acknowledgements

This project is partially supported by NSF Grants IIS-1447826 and IIS-1619463.

References

  1. 1.
    Carey, M.J., Jacobs, S., Tsotras, V.J.: Breaking bad: a data serving vision for big active data. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems, pp. 181–186. ACM, New York (2016)Google Scholar
  2. 2.
    Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
  3. 3.
    Feinberg, A.: Project voldemort: reliable distributed storage. In: Proceedings of the 10th IEEE International Conference on Data Engineering (2011)Google Scholar
  4. 4.
    Alsubaiee, S., Behm, A., Borkar, V., Heilbron, Z., Kim, Y.S., Carey, M.J., Dreseler, M., Li, C.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841–852 (2014)CrossRefGoogle Scholar
  5. 5.
  6. 6.
  7. 7.
    Fidler, E., Jacobsen, H.A., Li, G., Mankovski, S.: The padres distributed publish/subscribe system. In: FIW, pp. 12–30 (2005)Google Scholar
  8. 8.
    Project Website for Open Source Code.: http://dblab.cs.ucr.edu/projects/PubSub-Store/
  9. 9.
    Eugster, P.T., Felber, P.A., Guerraoui, R., Kermarrec, A.M.: The many faces of publish/subscribe. ACM Comput. Surv. (CSUR) 35(2), 114–131 (2003)CrossRefGoogle Scholar
  10. 10.
    Kermarrec, A.M., Triantafillou, P.: Xl peer-to-peer pub/sub systems. ACM Comput. Surv. (CSUR) 46(2), 16 (2013)CrossRefGoogle Scholar
  11. 11.
    Jacobsen, H.A., Muthusamy, V., Li, G.: The padres event processing network: uniform querying of past and future eventsdas padres ereignisverarbeitungsnetzwerk: Einheitliche anfragen auf ereignisse der vergangenheit und zukunft. it Inform. Technol. 51(5), 250–260 (2009)CrossRefGoogle Scholar
  12. 12.
    Bhatt, N., Gawlick, D., Soylemez, E., Yaseem, R.: Content based publish-and-subscribe system integrated in a relational database system. US Patent 6,405,191 (2002)Google Scholar
  13. 13.
    Jacobs, S., Uddin, M.Y.S., Carey, M., Hristidis, V., Tsotras, V.J., Venkatasubramanian, N., Wu, Y., Safir, S., Kaul, P., Wang, X., Qader, M.A., Li, Y.: A bad demonstration: towards big active data. Proc. VLDB Endow. 10(12), 1941–1944 (2017)CrossRefGoogle Scholar
  14. 14.
    Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., Myllymaki, J.: Implementing a scalable xml publish/subscribe system using relational database systems. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 479–490. ACM, New York (2004)Google Scholar
  15. 15.
    Guo, L., Zhang, D., Li, G., Tan, K.L., Bao, Z.: Location-aware pub/sub system: when continuous moving queries meet dynamic event streams. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 843–857. ACM, New York (2015)Google Scholar
  16. 16.
    Qader, M.A., Hristidis, V.: Dualdb: An efficient lsm-based publish/subscribe storage system. In: Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM) (2017)Google Scholar
  17. 17.
    Widom, J., Finkelstein, S.J.: Set-oriented production rules in relational database systems. In: ACM SIGMOD Record, vol. 19, pp. 259–270. ACM, New York (1990)Google Scholar
  18. 18.
    Schreier, U., Pirahesh, H., Agrawal, R., Mohan, C.: Alert: An architecture for transforming a passive DBMS into an active DBMS. In: Proceedings of the 17th International Conference on Very Large Data Bases, pp. 469–478. Morgan Kaufmann Publishers Inc. (1991)Google Scholar
  19. 19.
    Hanson, E.N., Carnes, C., Huang, L., Konyala, M., Noronha, L., Parthasarathy, S., Park, J., Vernon, A.: Scalable trigger processing. In: Proceedings 15th International Conference on Data Engineering, pp. 266–275. IEEE (1999)Google Scholar
  20. 20.
    Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: a scalable continuous query system for internet databases. In: ACM SIGMOD Record, vol. 29, pp. 379–390. ACM, New York (2000)Google Scholar
  21. 21.
    Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: Telegraphcq: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 668–668. ACM, New York (2003)Google Scholar
  22. 22.
    Babu, S., Widom, J.: Continuous queries over data streams. ACM Sigmod Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
  23. 23.
    Madden, S., Shah, M., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM, New York (2002)Google Scholar
  24. 24.
    Garg, N.: Apache Kafka. Packt Publishing Ltd, Birmingham (2013)Google Scholar
  25. 25.
  26. 26.
  27. 27.
    Hendawi, A.M., Gupta, J., Shi, Y., Fattah, H., Ali, M.: The microsoft reactive framework meets the internet of moving things. In: IEEE 33rd International Conference on Data Engineering (2017)Google Scholar
  28. 28.
  29. 29.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. TOCS 26(2), 4 (2008)CrossRefGoogle Scholar
  30. 30.
    George, L.: HBase: The Definitive Guide. O’Reilly Media Inc, Sebastopol, CA (2011)Google Scholar
  31. 31.
  32. 32.
    Qader, M.A., Cheng, S., Hristidis, V.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: Proceedings of the 2018 International Conference on Management of Data, pp. 551–566. ACM, New York (2018)Google Scholar
  33. 33.
    Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM, New York (2002)Google Scholar
  34. 34.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringUniversity of California RiversideRiversideUSA

Personalised recommendations