Abstract
NoSQL databases have gained great popularity recently. Most of them use the Log Structured Merge (LSM) tree which provides fast write throughput and fast lookup of primary keys. Nevertheless, searching by non-key attributes is very slow because the entire LSM-tree must be scanned. To overcome this problem, the secondary index can be used. Typically, all items in the database are equally covered by the secondary index. However, this is not effective in big data stores where some items are queried very often and some never. To solve this problem, adaptive merging has been introduced. The key idea is to create a secondary index adaptively as a side-product of query processing. Consequently, the database is indexed partially depending on the query workload.
The paper considers the adaptive merging of the secondary index in LSM-based stores. In this approach, the secondary index can be initiated at an arbitrary moment. Thereafter, only the requested data are inserted into the secondary index. They are retrieved from the independent immutable files created during the index initialization in a parallel way. The method can work in the dynamic database environment where database modifications interleave with user queries. The experiments show that the proposed approach outperforms traditional methods by about 30%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
George, L.: HBase: The Definitive Guide, 1st edn. O’Reilly Media, Sebastopol (2011)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Alsubaiee, S., et al.: Storage management in AsterixDB. Proc. VLDB Endow. 7(10), 841–852 (2014)
Chodorow, K., Dirolf, M.: MongoDB - The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly, Sebastopol (2010)
Chang, F.W., et al.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 4:1–4:26 (2008)
Google: LevelDB. https://github.com/google/leveldb
Cao, Z., Dong, S., Vemuri, S., Du, D.H.C.: Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In: 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, 24–27 February 2020, pp. 209–223. USENIX Association (2020)
O’Neil, P.E., Cheng, E., Gawlick, D., O’Neil, E.J.: The log-structured merge-tree (LSM-tree). Acta Informatica 33(4), 351–385 (1996)
Qader, M.A., Cheng, S., Hristidis, V.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, 10–15 June 2018, pp. 551–566. ACM (2018)
Corbett, J.C., et al.: Spanner: Google’s globally distributed database. ACM Trans. Comput. Syst. 31(3), 8:1–8:22 (2013)
Luo, C., Carey, M.J.: Efficient data ingestion and query processing for LSM-based storage systems. Proc. VLDB Endow. 12(5), 531–543 (2019)
D’silva, J.V., Ruiz-Carrillo, R., Yu, C., Ahmad, M.Y., Kemme, B.: Secondary indexing techniques for key-value stores: two rings to rule them all. In: Proceedings of the Workshops of the EDBT/ICDT 2017 Joint Conference (EDBT/ICDT 2017), Venice, Italy, 21–24 March 2017. CEUR Workshop Proceedings, vol. 1810. CEUR-WS.org (2017)
Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR (2007)
Idreos, S., Kersten, M.L., Manegold, S.: Updating a cracked database. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 413–424. ACM, New York (2007)
Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 371–381. ACM, New York (2010)
Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011)
Xue, Z., Qin, X., Zhou, X., Wang, S., Yu, A.: Optimized adaptive hybrid indexing for in-memory column stores. In: Hong, B., Meng, X., Chen, L., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7827, pp. 101–111. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40270-8_9
Dayan, N., Athanassoulis, M., Idreos, S.: Optimal bloom filters and adaptive merging for LSM-trees. ACM Trans. Database Syst. 43(4), 16:1–16:48 (2018)
Alvarez, V., Schuhknecht, F.M., Dittrich, J., Richter, S.: Main memory adaptive indexing for multi-core systems. In: Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, 23 June 2014, pp. 3:1–3:10. ACM (2014)
Macyna, W., Kukowski, M.: Adaptive merging on phase change memory. Fundamenta Informaticae 188(2) (2023)
Acknowledgment
The paper is supported by Wroclaw University of Science and Technology (subvention number: IDUB/8211204601).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Macyna, W., Kukowski, M., Zwarzko, M. (2023). Multi-core Adaptive Merging of the Secondary Index for LSM-Based Stores. In: Strauss, C., Amagasa, T., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2023. Lecture Notes in Computer Science, vol 14147. Springer, Cham. https://doi.org/10.1007/978-3-031-39821-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-39821-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39820-9
Online ISBN: 978-3-031-39821-6
eBook Packages: Computer ScienceComputer Science (R0)