Journal of Computer Science and Technology

, Volume 32, Issue 1, pp 41–54 | Cite as

dCompaction: Speeding up Compaction of the LSM-Tree via Delayed Compaction

Regular Paper
  • 103 Downloads

Abstract

Key-value (KV) stores have become a backbone of large-scale applications in today’s data centers. Writeoptimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and RocksDB. Conventional LSM-tree organizes KV items into multiple, successively larger components, and uses compaction to push KV items from one smaller component to another adjacent larger component until the KV items reach the largest component. Unfortunately, current compaction scheme incurs significant write amplification due to repeated KV item reads and writes, and then results in poor throughput. We propose a new compaction scheme, delayed compaction (dCompaction) that decreases write amplification. dCompaction postpones some compactions and gathers them into the following compaction. In this way, it avoids KV item reads and writes during compaction, and consequently improves the throughput of LSM-tree based KV stores. We implement dCompaction on RocksDB, and conduct extensive experiments. Validation using YCSB framework shows that compared with RocksDB, dCompaction has about 40% write performance improvements and also comparable read performance.

Keywords

key-value store Log-Structured Merge-tree (LSM-tree) write amplification delayed compaction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Sears R, Ramakrishnan R. bLSM: A general purpose logstructured merge tree. In Proc. the ACM SIGMOD International Conference on Management of Data, May 2012, pp.217-228.Google Scholar
  2. [2]
    Huang Q, Birman K, van Renesse R, Lloyd W, Kumar S, Li H C. An analysis of Facebook photo caching. In Proc. the 24th ACM Symposium on Operating Systems Principles (SOSP), Nov. 2013, pp.167-181.Google Scholar
  3. [3]
    Atikoglu B, Xu Y, Frachtenberg E et al. Workload analysis of a large-scale key-value store. In Proc. ACM SIGMETRICS, Jun. 2012, pp.53-64.Google Scholar
  4. [4]
    O’Neil P, Cheng E, Gawlick D et al. The log-structured merge-tree (LSM-tree). Acta Informatica, 1996, 33(4): 351-385.CrossRefMATHGoogle Scholar
  5. [5]
    Chang F, Dean J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber R. Bigtable: A distributed storage system for structured data. In Proc. the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Nov. 2006, pp.205-218.Google Scholar
  6. [6]
    Lakshman A, Malik P. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 2010, 44(2): 35-40.CrossRefGoogle Scholar
  7. [7]
    George L. HBase: The Definitive Guide. O’Reilly Media, 2011.Google Scholar
  8. [8]
    Escriva R, Wong B, Sirer E. HyperDex: A distributed, searchable key-value store. In Proc. ACM SIGCOMM Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication, Aug. 2012, pp.25-36.Google Scholar
  9. [9]
    Cooper B, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H, Puz N, Weaver D, Yerneni R. PNUTS: Yahoo! hosted data serving platform. Proc. the VLDB Endowment, 2008, 1(2): 1277-1288.CrossRefGoogle Scholar
  10. [10]
    Shetty P, Spillane R, Malpani R et al. Building workloadindependent storage with VT-trees. In Proc. the 11th USENIX Conference on File and Storage Technologies (FAST), Feb. 2013, pp.17-30.Google Scholar
  11. [11]
    Jermaine C, Omiecinski E, Yee W G. The partitioned exponential file for database storage management. The VLDB Journal, 2007, 16(4): 417-437.CrossRefGoogle Scholar
  12. [12]
    Zhong Z, Yue Y, He B et al. Pipelined compaction for the LSM-tree. In Proc. the 28th International Parallel and Distributed Processing Symposium (IPDPS), May 2014, pp.777-786.Google Scholar
  13. [13]
    Wu X, Xu Y, Shao Z et al. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proc. the USENIX Annual Technical Conference (ATC), Jul. 2015, pp.71-82.Google Scholar
  14. [14]
    Amur H, Andersen D, Kaminsky M et al. Design of a writeoptimized data store. Technical Report GIT-CERCS-13-08, Georgia Tech CERCS, 2013.Google Scholar
  15. [15]
    Cooper B, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symposium on Cloud Computing (SoCC), Jun. 2010, pp.143-154.Google Scholar
  16. [16]
    Spillane R, Shetty P, Zadok E, Dixit S, Archak S. An efficient multi-tier tablet server storage architecture. In Proc. the 2nd ACM Symposium on Cloud Computing in Conjunction with SOSP (SoCC), Oct. 2011, pp.1-14.Google Scholar
  17. [17]
    Bloom H. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 1970, 13(7): 422-426.CrossRefMATHGoogle Scholar
  18. [18]
    Chazelle B, Guibas L. Fractional cascading: A data structuring technique with geometric applications. In Proc. the 12th International Colloquium on Automata, Languages, and Programming (ICALP), Jul. 1985, pp.90-100.Google Scholar
  19. [19]
    Bender M, Farach-Colton M, Fineman J, Fogel Y, Kuszmaul B, Nelson J. Cache-oblivious streaming B-trees. In Proc. the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), Jun. 2007, pp.81-92.Google Scholar
  20. [20]
    Li Y, He B, Yang R J et al. Tree indexing on solid state drives. Proc. the VLDB Endowment, 2010, 3(1/2): 1195-1206.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.State Key Laboratory of Computer Architecture, Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Institute of Information EngineeringChinese Academy of SciencesBeijingChina

Personalised recommendations