A distributed B+Tree indexing method for processing range queries over streaming data

Safaee, Shahab; Mirabi, Meghdad; Rahmani, Amir Masoud; Safaei, Ali Asghar

doi:10.1007/s10586-023-04015-9

A distributed B+Tree indexing method for processing range queries over streaming data

Published: 07 May 2023

Volume 27, pages 1251–1274, (2024)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Shahab Safaee¹,
Meghdad Mirabi¹,
Amir Masoud Rahmani² &
…
Ali Asghar Safaei³

256 Accesses
1 Citation
Explore all metrics

Abstract

A data stream exhibits as a massive unbounded sequence of data elements continuously generated at a high rate. Stream databases raise new challenges for query processing due to both the streaming nature of data which constantly changes over time and the wider range of queries submitted by the user when compared with the traditional databases. In this paper, we propose a system architecture which includes components for both distributed indexing of streaming data and distributed processing of range queries on streaming data. Instead of creating a large and centralized B+Tree index structure, we create a set of small B+Tree indexes in such a way that a B+Tree index can be created for every partition of streaming data. We also design a distributed range search algorithm which can be used by each individual machine inside a Spark cluster to independently process range queries on each partition of streaming data. By exploiting the proposed system architecture, the process of indexing of streaming data and the process of querying over streaming data can be performed in a distributed and parallel manner. By performing several experiments, we demonstrate that our proposed indexing method is scalable and efficient for processing range queries on streaming data compared to the existing centralized B+Tree indexing methods and therefore, it can be used for applications involving data streams with a large volume of data elements and a large number of range queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed hybrid index for processing continuous range queries over moving objects

Article 28 December 2017

Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data

Distributed Processing of Continuous Range Queries Over Moving Objects

Data availability

The source codes and datasets used in the paper are available from the first author on reasonable request.

Notes

https://grouplens.org/.

References

Margara, A., Rabl, T.: Definition of data streams. In: Encyclopedia of Big Data Technologies, pp. 648–652. Springer, Cham (2019)
Bifet, A., Gama, J.: IoT data stream analytics. Ann. Telecommun. 75(9–10), 491–492 (2020)
Article Google Scholar
Tiwari, S., Agarwal, S.: Data stream management for CPS-based healthcare: a contemporary review. IETE Tech. Rev. (Inst. Electron. Telecommun. Eng. India) 39(5), 1–24 (2021)
Google Scholar
Mohamed, F., Ismail, R.M., Badr, N.L., Tolba, M.F.: Data streams processing techniques. Intell. Syst. Ref. Libr. 115, 279–305 (2017)
Google Scholar
Law, Y.N., Wang, H., Zaniolo, C.: Relational languages and data models for continuous queries on sequences and data streams. ACM Trans. Database Syst. 36(2), 1–32 (2011)
Article Google Scholar
Panigati, E., Schreiber, F.A., Zaniolo, C.: Data streams and data stream management systems and languages. In: Data Management in Pervasive Systems, Data-Centric Systems and Applications, pp. 93–111. Springer, Cham (2015)
Yue-Jie, L.: Data stream of wireless sensor networks based on deep learning. Int. J. Online Eng. 12(11), 22–27 (2016)
Article Google Scholar
Chakravarthy, S., Jiang, Q.: DSMs challenges. In: Stream Data Processing: A Quality of Service Perspective: Modeling, Scheduling, Load Shedding, and Complex Event Processing, pp. 23–31. Springer, Boston (2009)
Behrend, A., Gawlick, D., Nicklas, D.: DBMS meets DSMS: towards a federated solution. In: DATA 2012—Proceedings of the International Conference on Data Technology and Applications, February 2017, pp. 157–162 (2012)
Lehman, P.L., Yao, S.B.: Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6(4), 650–670 (1981)
Article Google Scholar
Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Article Google Scholar
Kholghi, M., Keyvanpour, M.: Comparative evaluation of data stream indexing models. Int. J. Mach. Learn. Comput. 2(3), 257–260 (2012)
Article Google Scholar
Shivakumar, N., García-Molina, H.: Wave-indices: indexing evolving databases. SIGMOD Rec. (ACM Spec. Interest Group Manag. Data) 26(2), 381–392 (1997)
Google Scholar
Leung, T.Y.C., Muntz, R.R.: Generalized data stream indexing and temporal query processing. In: 2nd International Workshop on Research Issues on Data Engineering: Transaction and Query Processing, 1992, pp. 124–131 (1992)
Adamu, F.B., Habbal, A., Hassan, S., Les Cottrell, R., White, B., Abdullahi, I.: A survey on big data indexing strategies. In: NETAPPS2015, 2015 (2015)
Badiozamany, S., Risch, T.: Scalable ordered indexing of streaming data. In: International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures, 2012 (2012)
Deng, Z., et al.: An efficient indexing approach for continuous spatial approximate keyword queries over geo-textual streaming data. ISPRS Int. J. Geo-Inf. 8(2), 57 (2019)
Article Google Scholar
Deng, Z., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. 26(3), 834–846 (2015)
Article Google Scholar
Aguilera, M.K., Golab, W., Shah, M.A.: A practical scalable distributed B-tree. Proc. VLDB Endow. 1(1), 598–609 (2008)
Article Google Scholar
Sumalatha, M.R., Ananthi, M.: Efficient data retrieval using adaptive clustered indexing for continuous queries over streaming data. Clust. Comput. 22(55), 1–15 (2017)
Google Scholar
Ananthi, M., Sreedhevi, D.K., Sumalatha, M.R.: Dynamic continuous query processing over streaming data. In: 2016 International Conference on Computation of Power, Energy, Information and Communication, ICCPEIC 2016, 2016, pp. 183–187 (2016)
Kalashnikov, D., Prabhakar, S., Hambrusch, S., Aref, W.: Efficient evaluation of continuous range queries on moving objects. In: Lecture Notes in Computer Science (including Subseries Lecture Notes on Artificial Intelligence, Lecture Notes on Bioinformatics), vol. 2453, pp. 731–740 (2002)
Wang, H., Belhassena, A.: Parallel trajectory search based on distributed index. Inf. Sci. (N.Y.) 388–389, 62–83 (2017)
Article Google Scholar
Rao, J., Ross, K.A.: Making B +-Trees cache conscious in main memory. SIGMOD Rec. (ACM Spec. Interest Group Manag. Data) 29(2), 475–486 (2000)
Google Scholar
Li, R., Hu, H., Li, H., Wu, Y., Yang, J.: MapReduce parallel programming model: a state-of-the-art survey. Int. J. Parallel Program. 44(4), 832–866 (2016)
Article Google Scholar
Ishwarappa K., Anuradha, J.: A brief introduction on big data 5Vs characteristics and Hadoop technology. Procedia Comput. Sci. 48(3), 319–324 (2015)
Zeebaree, S.R.M., Shukur, H., Haji, L., Zebari, R.: Characteristics and analysis of Hadoop distributed systems. Technol. Rep. Kansai Univ. 62(4), 1555–1564 (2020)
Google Scholar
Apache Spark. http://spark.apache.org/. Accessed 5 May 2023
Bansal, A., Jain, R., Modi, K.: Big Data Streaming with Spark. Springer, Singapore (2019)
Book Google Scholar
Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016)
Article Google Scholar
Hazarika, A.V., Jagadeesh Sai Raghu Ram, G., Jain, E.: Performance comparison of Hadoop and Spark engine. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, I-SMAC 2017, 2017, pp. 671–674 (2017)
Samadi, Y., Zbakh, M., Tadonki, C.: Comparative study between Hadoop and Spark based on Hibench benchmarks. In: Proceedings of 2016 International Conference on Cloud Computing Technologies and Applications, CloudTech 2016, 2017, pp. 267–275 (207)
Zhao, X., Garg, S., Queiroz, C., Buyya, R.: A Taxonomy and Survey of Stream Processing Systems, 1st edn. Elsevier, Inc., Amsterdam (2017)
Google Scholar
Šaltenis, S., Jensen, C.S., Leutenegger, S.T., Lopez, M.A.: Indexing the positions of continuously moving objects. ACM SIGMOD Rec. 29(2), 331–342 (2000)
Article Google Scholar
Park, J., Hong, B., Ban, C.: A query index for continuous queries on RFID streaming data. Sci. China F 51(12), 2047–2061 (2008)
Google Scholar
Wu, K.L., Chen, S.K., Yu, P.S.: Processing continual range queries over moving objects using VCR-based query indexes. In: Proceedings of MOBIQUITOUS 2004—1st Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, 2004, pp. 226–235 (2004)
Hankins, R.A., Patel, J.M.: Effect of node size on the performance of cache-conscious B +-trees. Perform. Eval. Rev. 31(1), 283–295 (2003)
Article Google Scholar
Heinz, S., Zobel, J., Williams, H.E.: Burst tries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20(2), 192–223 (2002)
Article Google Scholar
Silverstein, A., Baskins, D.: Judy IV Shop Manual (2002)
Baskins, D.: Judy home page (2003). http://judy.sourceforge.net. Accessed 5 May 2023
Yu, X., Pu, K.Q., Koudas, N.: Monitoring k-nearest neighbor queries over moving objects. In: Proceedings—International Conference on Data Engineering, 2005, pp. 631–642 (2005)
Singh, H., Bawa, S.: A MapReduce-based scalable discovery and indexing of structured big data. Future Gener. Comput. Syst. 73, 32–43 (2017)
Article Google Scholar
Aguilera, M.K., Merchant, A., Shah, M., Veitch, A., Karamanolis, C.: Sinfonia: a new paradigm for building scalable distributed systems. In: SOSP’07—Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007, pp. 159–174 (2007)
Safaei, A.A.: Real-time processing of streaming big data. Real-Time Syst. 53(1), 1–44 (2017)
Article MathSciNet Google Scholar
Silberschatz, A., Korth, H.F., Sudarshan, S.: Database System Concepts, 7th edn, vol. 4. McGraw-Hill, New York (2019)
Pollari-malmi, K.: B +-Trees. https://www.cs.helsinki.fi/u/mluukkai/tirak2010/B-tree.pdf. Accessed 5 May 2023
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms 4. The MIT Press, Cambridge (2022)
Google Scholar
grouplens, MovieLens Dataset. https://grouplens.org/datasets/movielens/. Accessed 5 May 2023
Taniar, D., Leung, C.H.C., Rahayu, W., Goel, S.: High-Performance Parallel Database Processing and Grid Databases. Wiley, Hoboken (2008)
Book Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering, South Tehran Branch, Islamic Azad University, Tehran, Iran
Shahab Safaee & Meghdad Mirabi
Future Technology Research Center, National Yunlin University of Science and Technology, Douliou, Yunlin, Taiwan
Amir Masoud Rahmani
Department of Medical Informatics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
Ali Asghar Safaei

Authors

Shahab Safaee
View author publications
You can also search for this author in PubMed Google Scholar
Meghdad Mirabi
View author publications
You can also search for this author in PubMed Google Scholar
Amir Masoud Rahmani
View author publications
You can also search for this author in PubMed Google Scholar
Ali Asghar Safaei
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. The first draft of the manuscript was written by SS and MM, then it is reviewed by AMR and AAS. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Meghdad Mirabi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose. The authors declare no conflict of interest.

Ethical approval

This paper does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Safaee, S., Mirabi, M., Rahmani, A.M. et al. A distributed B+Tree indexing method for processing range queries over streaming data. Cluster Comput 27, 1251–1274 (2024). https://doi.org/10.1007/s10586-023-04015-9

Download citation

Received: 08 August 2022
Revised: 21 November 2022
Accepted: 20 April 2023
Published: 07 May 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10586-023-04015-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed B+Tree indexing method for processing range queries over streaming data

Abstract

Access this article

Similar content being viewed by others

A distributed hybrid index for processing continuous range queries over moving objects

Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data

Distributed Processing of Continuous Range Queries Over Moving Objects

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A distributed B+Tree indexing method for processing range queries over streaming data

Abstract

Access this article

Similar content being viewed by others

A distributed hybrid index for processing continuous range queries over moving objects

Cost Effective Load-Balancing Approach for Range-Partitioned Main-Memory Resident Data

Distributed Processing of Continuous Range Queries Over Moving Objects

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation