Skip to main content
Log in

Design and implementation of an efficient flushing scheme for cloud key-value storage

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

A key-value store is an essential component that is increasingly demanded in many scale-out environments, including social networks, online retail environments, and cloud services. Modern key-value storage engines provide many features, including transaction, versioning, and replication. In storage engines, transaction processing provides atomicity and durability by using write-ahead logging (WAL), which flushes log data before the data page is written to persistent storage in synchronous commit. However, according to our observation, WAL is a performance bottleneck in key-value storage engines since the flushing of log data to persistent storage incurs a significant overhead of lock contention and fsync() calls, even with the various optimizations in the existing scheme. In this article, we propose an approach to improve the performance of key-value storage by optimizing the existing flushing scheme combined with group commit and consolidate array. Our scheme aggregates the multiple flushing of log data into a large request on the fly and completes the request early. This scheme is an efficient group commit that reduces the number of frequent lock acquisitions and fsync() calls in the synchronous commit while supporting the same transaction level that the existing scheme provides. Furthermore, we integrate our flushing scheme into the replication system and evaluate it by using multiple nodes. We implement our scheme on the WiredTiger storage engine. The experimental results show that our scheme improves the performance of the key-value workload compared to the existing scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. A leader is the first thread to acquire a log buffer.

  2. The offset is its own LSN.

  3. The group size is the size of the total joined log records in the slot.

  4. The completion flag denotes whether the LSN is completed or not.

  5. Replication can be broadly classified into two categories, such as synchronous and asynchronous replication. In synchronous replication, the transactions are committed simultaneously in all nodes. The master and the slave always remain synchronized. Thus, the data is guaranteed to be consistent in all nodes when the transaction is committed. Meanwhile, in asynchronous replication, the transactions are committed at the master server first and then they are replicated to the slave. This means that the master and slave may not be consistent. The advantage of using asynchronous replication is that it is faster and scales better than synchronous replication. However, the data is not guaranteed to be consistent in all nodes.

References

  1. Aguilera, M.K., Leners, J.B., Walfish, M.: Yesquel: scalable sql storage for web applications. In: Proceedings of the 25th Symposium on Operating Systems Principles, ACM, pp. 245–262 (2015)

  2. Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. Proc. VLDB Endow. 10(4), 337–348 (2016)

    Article  Google Scholar 

  3. Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. ACM SIGMETRICS Perform. Eval. Rev. 40, 53–64 (2012)

    Article  Google Scholar 

  4. Banker, K.: MongoDB in Action. Manning Publications Co., Greenwich (2011)

    Google Scholar 

  5. Bernstein, P.A., Hadzilacos, V., Goodman, N.: Currency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)

    Google Scholar 

  6. Carlson, J.L.: Redis in Action. Manning Publications Co., Greenwich (2013)

    Google Scholar 

  7. Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, New York, NY, USA, SIGMOD’09, ACM, pp. 73–86 (2009)

  8. Cloud, A.E.C.: Amazon web services (2011). Accessed on 9 November 2011

  9. Felber, P., Pasin, M., Rivière, É., Schiavoni, V., Sutra, P., Coelho, F., Oliveira, R., Matos, M., Vilaça, R.: On the support of versioning in distributed key-value stores. In: 2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 95–104 (2014)

  10. Fitzpatrick, B.: Distributed caching with memcached. Linux J. 2004(124), 5 (2004)

    Google Scholar 

  11. Fruhwirt, P., Kieseberg, P., Schrittwieser, S., Huber, M., Weippl, E.: Innodb database forensics: reconstructing data manipulation queries from redo logs. In: 2012 Seventh International Conference on Availability, Reliability and Security (ARES) (2012)

  12. Gao, S., Xu, J., He, B., Choi, B., Hu, H.: Pcmlogging: reducing transaction logging overhead with pcm. In: 20th ACM International Conference on Information and Knowledge Management, New York, NY, USA, CIKM’11, ACM, pp. 2401–2404 (2011)

  13. Goel, S., Buyya, R.: Data replication strategies in wide-area distributed systems. In: Enterprise Service Computing: From Concept to Deployment. IGI Global, pp. 211–241 (2007)

  14. Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Elsevier, San Francisco (1992)

    MATH  Google Scholar 

  15. Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), IEEE, pp. 363–366 (2011)

  16. Helland, P., et al. Group commit timers and high volume transaction systems. In: High Performance Transaction Systems. Springer, New York, pp. 301–329 (1989)

  17. Huang, J., Schwan, K., Qureshi, M.K.: Nvram-aware logging in transaction systems. Proc. VLDB Endow. 8(4), 389–400 (2014)

    Article  Google Scholar 

  18. Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681–692 (2010)

    Article  Google Scholar 

  19. Kang, W.-H., Lee, S.-W., Moon, B., Kee, Y.-S., Oh, M.: Durable write cache in flash memory ssd for relational and nosql databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14 (2014)

  20. Kang, W.-H., Lee, S.-W., Moon, B., Oh, G.-H., Min, C.: X-FTL: transactional FTL for SQLite databases. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, SIGMOD’13, ACM, pp. 97–108 (2013)

  21. Kopytov, A.: Sysbench: a system performance benchmark. http://sysbench.sourceforge.net (2004)

  22. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  23. Lee, S.-W., Moon, B., Park, C., Kim, J.-M., Kim, S.-W.: A case for flash memory ssd in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, pp. 1075–1086 (2008)

  24. Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L., Bull S.A.S: A and viver, l. the new ext4 filesystem: current status and future plans. In: Ottawa Linux Symposium. http://ols.108.redhat.com/2007/ Reprints/mathur-Reprint.pdf (2007)

  25. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)

    Article  Google Scholar 

  26. MongoDB: https://www.mongodb.com/press/wired-tiger (2014)

  27. NVM express: http://www.nvmexpress.org (2012)

  28. Oh, G., Seo, C., Mayuram, R., Kee, Y.-S., Lee, S.-W.: SHARE interface in flash storage for relational and NoSQL databases. In: Proceedings of the 2016 International Conference on Management of Data, New York, NY, USA, SIGMOD’16, ACM, pp. 343–354 (2016)

  29. Ouyang, X., Nellans, D., Wipfel, R., Flynn, D., Panda, D.K.: Beyond block I/O: rethinking traditional storage primitives. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), IEEE, pp. 301–311 (2011)

  30. Ramakrishnan, R., Gehrke, J.: Database Management Systems. Osborne/McGraw-Hill, Berkeley (2000)

    MATH  Google Scholar 

  31. SAMSUNG 843Tn Data Center Series: http://www.samsung.com/semiconductor/global/file/insight/2015/08/PSG2014_2H_FINAL-1.pdf

  32. Samsung: XS1715 Ultra-fast Enterprise Class. http://www.samsung.com/global/business/semiconductor/file/product/XS1715_ProdOverview_2014_1.pdf (2014)

  33. Sivasubramanian, S.: Amazon dynamodb: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp. 729–730 (2012)

  34. Son, Y., Kang, H., Han, H., Yeom, H.Y.: An empirical evaluation and analysis of the performance of nvm express solid state drive. Cluster Comput. 19, 1–13 (2016)

    Article  Google Scholar 

  35. Son, Y., Kang, H., Han, H., and Yeom, H.Y.: Improving performance of cloud key-value storage using flushing optimization. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 42–47 (2016)

  36. Son, Y., Yeom, H., Han, H.: Optimizing i/o operations in file systems for fast storage devices. IEEE Trans. Comput. 66, 1071–1084 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  37. Song, N.Y., Son, Y., Han, H., Yeom, H.Y.: Efficient memory-mapped i/o on fast storage device. ACM Trans. Storage 19:1(19:27), 12–4 (2016)

    Google Scholar 

  38. Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, USENIX Association, p. 18 (2012)

  39. Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7, 10 (2014)

    Google Scholar 

  40. WiredTiger: http://www.wiredtiger.com (2014)

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2015M3C4A7065581, 2015M3C4A7065645). Prof. Han’s work was partly supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2055032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyuck Han.

Additional information

A preliminary version [35] of this article was presented at the 1st IEEE Internaltional Workshops on Foundations and Applications of Self* Systems, Augsburg, Germany, September 2016.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Son, Y., Yeom, H.Y. & Han, H. Design and implementation of an efficient flushing scheme for cloud key-value storage. Cluster Comput 20, 3551–3563 (2017). https://doi.org/10.1007/s10586-017-1101-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1101-3

Keywords

Navigation