Skip to main content
Log in

dCompaction: Delayed Compaction for the LSM-Tree

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Key-value (KV) stores have become a backbone of large-scale applications in today’s data centers. Write-optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and RocksDB. Conventional LSM-tree organizes KV items into multiple, successively larger components, and uses compaction to push KV items from one smaller component to another adjacent larger component until the KV items reach the largest component. Unfortunately, current compaction scheme incurs significant write amplification due to repeated KV item reads and writes, and then results in poor throughput. We propose a new compaction scheme, delayed compaction (dCompaction), that decreases write amplification. dCompaction postpones some compactions and gather them into the following compaction. In this way, it avoids KV item reads and writes during compaction, and consequently improves the throughput of LSM-tree based KV stores. We implement dCompaction on RocksDB, and conduct extensive experiments. Validation using YCSB framework shows that compared with RocksDB dCompaction has about 30% write performance improvements and also comparable read performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sears, R., Ramakrishnan, R.: bLSM: a general purpose log-structured merge tree. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD’12, pp. 217–228. ACM, New York, NY, USA (2012)

  2. Google: LevelDB. http://code.google.com/p/LevelDB (2012)

  3. Facebook: RocksDB. http://rocksdb.org/ (2013)

  4. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: OSDI 2006: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, pp 15–25. USENIX Association, Berkeley, CA, USA (2006)

  5. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system[J]. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  6. HBase Documentation. Hbase: Bigtable-like structured storage for hadoop hdfs. http://wiki.apache.org/hadoop/Hbasea, (2011)

  7. Neil, P.O., Cheng, E., Gawlick, D., Neil, E.O.: The log-structured merge-tree (LSM-tree). Acta Inf. 33(4), 351–385 (1996)

    Article  MATH  Google Scholar 

  8. Redis: http://redis.io/

  9. Memcached: http://memcached.org/

  10. Huang, Q., Birman, K., van Renesse, R., Lloyd, W., Kumar, S., Li, H.C.: An analysis of facebook photo caching. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP’03)

  11. Atikoglu, B., Yuehai, X., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: SIGMETRICS (2012)

  12. Shetty, P., Spillane, R., el at.: Building workload-independent storage with VT-trees. In: 11th USENIX Conference on File and Storage Technologies (2013)

  13. Jermaine, C., Omiecinski, E., Yee, W.G.: The partitioned exponential file for database storage management. VLDB J. 16(4), 417–437 (2007)

    Article  Google Scholar 

  14. Zigang, Z., Yinliang, Y., Bingsheng, H., et al.: Pipelined compaction for the LSM-tree. In: 28th International Parallel and Distributed Processing Symposium, pp. 777–786 (2014)

  15. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’00, pp. 143–154. ACM, New York, NY, USA (2010)

  16. Escriva, R., Wong, B., Sirer, E.G.: HyperDex: a distributed, searchable key-value store. SIGCOMM Comput. Commun. Rev. 42(4), 25–36 (2012)

    Article  Google Scholar 

  17. Bender, M.A., Farach-Colton, M., Fineman, J.T., Fogel, Y.R., Kuszmaul, B.C., Nelson, J.: Cache-oblivious streaming b-trees. In: Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07), pp. 81–92. ACM, New York, NY, USA (2007)

  18. Spillane, R.P., Shetty, P.J., Zadok, E., Dixit, S., Archak, S.: An efficient multi-tier tablet server storage architecture. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC’11, pp. 1–14. ACM, New York, NY, USA (2011)

  19. Li, Y., He, B., Yang, R.J., Luo, Q., Yi, K.: Tree indexing on solid state drives. Proc. VLDB Endow. 3(1–2), 1195–1206 (2010)

    Article  Google Scholar 

  20. Chazelle, B., Guibas, L.J.: Fractional cascading: a data structuring technique with geometric applications. In: Automata, Languages and Programming, volume 194 of Lecture Notes in Computer Science, pp. 90–100. Springer, Berlin Heidelberg (1985)

  21. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  22. Wu, X., et al.: LSM-trie: an LSM-treebased ultra-large key-value store for small data. In: USENIX Annual Technical Conference (2015)

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful feedback. This work is partly supported by MOST’s 13th FYP project of China under Grant No. 2016YFB1000202, and National Science Foundation of China under Grants Nos. 61303056, 61379042 and Youth Innovation Promotion Association, CAS, No. 2016146.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengfeng Pan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, F., Yue, Y. & Xiong, J. dCompaction: Delayed Compaction for the LSM-Tree. Int J Parallel Prog 45, 1310–1325 (2017). https://doi.org/10.1007/s10766-016-0472-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0472-z

Keywords

Navigation