Design and implementation of an efficient flushing scheme for cloud key-value storage

Son, Yongseok; Yeom, Heon Young; Han, Hyuck

doi:10.1007/s10586-017-1101-3

Design and implementation of an efficient flushing scheme for cloud key-value storage

Published: 24 August 2017

Volume 20, pages 3551–3563, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

715 Accesses
6 Altmetric
Explore all metrics

Abstract

A key-value store is an essential component that is increasingly demanded in many scale-out environments, including social networks, online retail environments, and cloud services. Modern key-value storage engines provide many features, including transaction, versioning, and replication. In storage engines, transaction processing provides atomicity and durability by using write-ahead logging (WAL), which flushes log data before the data page is written to persistent storage in synchronous commit. However, according to our observation, WAL is a performance bottleneck in key-value storage engines since the flushing of log data to persistent storage incurs a significant overhead of lock contention and fsync() calls, even with the various optimizations in the existing scheme. In this article, we propose an approach to improve the performance of key-value storage by optimizing the existing flushing scheme combined with group commit and consolidate array. Our scheme aggregates the multiple flushing of log data into a large request on the fly and completes the request early. This scheme is an efficient group commit that reduces the number of frequent lock acquisitions and fsync() calls in the synchronous commit while supporting the same transaction level that the existing scheme provides. Furthermore, we integrate our flushing scheme into the replication system and evaluate it by using multiple nodes. We implement our scheme on the WiredTiger storage engine. The experimental results show that our scheme improves the performance of the key-value workload compared to the existing scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PaxStore : A Distributed Key Value Storage System

Transactional Failure Recovery for a Distributed Key-Value Store

Performance Analysis of Key-Value Stores with Consistent Replica Selection Approach

Notes

A leader is the first thread to acquire a log buffer.
The offset is its own LSN.
The group size is the size of the total joined log records in the slot.
The completion flag denotes whether the LSN is completed or not.
Replication can be broadly classified into two categories, such as synchronous and asynchronous replication. In synchronous replication, the transactions are committed simultaneously in all nodes. The master and the slave always remain synchronized. Thus, the data is guaranteed to be consistent in all nodes when the transaction is committed. Meanwhile, in asynchronous replication, the transactions are committed at the master server first and then they are replicated to the slave. This means that the master and slave may not be consistent. The advantage of using asynchronous replication is that it is faster and scales better than synchronous replication. However, the data is not guaranteed to be consistent in all nodes.

References

Aguilera, M.K., Leners, J.B., Walfish, M.: Yesquel: scalable sql storage for web applications. In: Proceedings of the 25th Symposium on Operating Systems Principles, ACM, pp. 245–262 (2015)
Arulraj, J., Perron, M., Pavlo, A.: Write-behind logging. Proc. VLDB Endow. 10(4), 337–348 (2016)
Article Google Scholar
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. ACM SIGMETRICS Perform. Eval. Rev. 40, 53–64 (2012)
Article Google Scholar
Banker, K.: MongoDB in Action. Manning Publications Co., Greenwich (2011)
Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Currency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Google Scholar
Carlson, J.L.: Redis in Action. Manning Publications Co., Greenwich (2013)
Google Scholar
Chen, S.: Flashlogging: exploiting flash devices for synchronous logging performance. In: SIGMOD, New York, NY, USA, SIGMOD’09, ACM, pp. 73–86 (2009)
Cloud, A.E.C.: Amazon web services (2011). Accessed on 9 November 2011
Felber, P., Pasin, M., Rivière, É., Schiavoni, V., Sutra, P., Coelho, F., Oliveira, R., Matos, M., Vilaça, R.: On the support of versioning in distributed key-value stores. In: 2014 IEEE 33rd International Symposium on Reliable Distributed Systems (SRDS), IEEE, pp. 95–104 (2014)
Fitzpatrick, B.: Distributed caching with memcached. Linux J. 2004(124), 5 (2004)
Google Scholar
Fruhwirt, P., Kieseberg, P., Schrittwieser, S., Huber, M., Weippl, E.: Innodb database forensics: reconstructing data manipulation queries from redo logs. In: 2012 Seventh International Conference on Availability, Reliability and Security (ARES) (2012)
Gao, S., Xu, J., He, B., Choi, B., Hu, H.: Pcmlogging: reducing transaction logging overhead with pcm. In: 20th ACM International Conference on Information and Knowledge Management, New York, NY, USA, CIKM’11, ACM, pp. 2401–2404 (2011)
Goel, S., Buyya, R.: Data replication strategies in wide-area distributed systems. In: Enterprise Service Computing: From Concept to Deployment. IGI Global, pp. 211–241 (2007)
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Elsevier, San Francisco (1992)
MATH Google Scholar
Han, J., Haihong, E., Le, G., Du, J.: Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), IEEE, pp. 363–366 (2011)
Helland, P., et al. Group commit timers and high volume transaction systems. In: High Performance Transaction Systems. Springer, New York, pp. 301–329 (1989)
Huang, J., Schwan, K., Qureshi, M.K.: Nvram-aware logging in transaction systems. Proc. VLDB Endow. 8(4), 389–400 (2014)
Article Google Scholar
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. Proc. VLDB Endow. 3(1–2), 681–692 (2010)
Article Google Scholar
Kang, W.-H., Lee, S.-W., Moon, B., Kee, Y.-S., Oh, M.: Durable write cache in flash memory ssd for relational and nosql databases. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14 (2014)
Kang, W.-H., Lee, S.-W., Moon, B., Oh, G.-H., Min, C.: X-FTL: transactional FTL for SQLite databases. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, SIGMOD’13, ACM, pp. 97–108 (2013)
Kopytov, A.: Sysbench: a system performance benchmark. http://sysbench.sourceforge.net (2004)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Article Google Scholar
Lee, S.-W., Moon, B., Park, C., Kim, J.-M., Kim, S.-W.: A case for flash memory ssd in enterprise database applications. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ACM, pp. 1075–1086 (2008)
Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L., Bull S.A.S: A and viver, l. the new ext4 filesystem: current status and future plans. In: Ottawa Linux Symposium. http://ols.108.redhat.com/2007/ Reprints/mathur-Reprint.pdf (2007)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. (TODS) 17(1), 94–162 (1992)
Article Google Scholar
MongoDB: https://www.mongodb.com/press/wired-tiger (2014)
NVM express: http://www.nvmexpress.org (2012)
Oh, G., Seo, C., Mayuram, R., Kee, Y.-S., Lee, S.-W.: SHARE interface in flash storage for relational and NoSQL databases. In: Proceedings of the 2016 International Conference on Management of Data, New York, NY, USA, SIGMOD’16, ACM, pp. 343–354 (2016)
Ouyang, X., Nellans, D., Wipfel, R., Flynn, D., Panda, D.K.: Beyond block I/O: rethinking traditional storage primitives. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), IEEE, pp. 301–311 (2011)
Ramakrishnan, R., Gehrke, J.: Database Management Systems. Osborne/McGraw-Hill, Berkeley (2000)
MATH Google Scholar
SAMSUNG 843Tn Data Center Series: http://www.samsung.com/semiconductor/global/file/insight/2015/08/PSG2014_2H_FINAL-1.pdf
Samsung: XS1715 Ultra-fast Enterprise Class. http://www.samsung.com/global/business/semiconductor/file/product/XS1715_ProdOverview_2014_1.pdf (2014)
Sivasubramanian, S.: Amazon dynamodb: a seamlessly scalable non-relational database service. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, pp. 729–730 (2012)
Son, Y., Kang, H., Han, H., Yeom, H.Y.: An empirical evaluation and analysis of the performance of nvm express solid state drive. Cluster Comput. 19, 1–13 (2016)
Article Google Scholar
Son, Y., Kang, H., Han, H., and Yeom, H.Y.: Improving performance of cloud key-value storage using flushing optimization. In: 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 42–47 (2016)
Son, Y., Yeom, H., Han, H.: Optimizing i/o operations in file systems for fast storage devices. IEEE Trans. Comput. 66, 1071–1084 (2016)
Article MATH MathSciNet Google Scholar
Song, N.Y., Son, Y., Han, H., Yeom, H.Y.: Efficient memory-mapped i/o on fast storage device. ACM Trans. Storage 19:1(19:27), 12–4 (2016)
Google Scholar
Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., Shah, S.: Serving large-scale batch computed data with project voldemort. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, USENIX Association, p. 18 (2012)
Wang, T., Johnson, R.: Scalable logging through emerging non-volatile memory. Proc. VLDB Endow. 7, 10 (2014)
Google Scholar
WiredTiger: http://www.wiredtiger.com (2014)

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2015M3C4A7065581, 2015M3C4A7065645). Prof. Han’s work was partly supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2055032).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
Yongseok Son & Heon Young Yeom
Department of Computer Science, Dongduk Women’s University, Seoul, South Korea
Hyuck Han

Authors

Yongseok Son
View author publications
You can also search for this author in PubMed Google Scholar
Heon Young Yeom
View author publications
You can also search for this author in PubMed Google Scholar
Hyuck Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyuck Han.

Additional information

A preliminary version [35] of this article was presented at the 1st IEEE Internaltional Workshops on Foundations and Applications of Self* Systems, Augsburg, Germany, September 2016.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Son, Y., Yeom, H.Y. & Han, H. Design and implementation of an efficient flushing scheme for cloud key-value storage. Cluster Comput 20, 3551–3563 (2017). https://doi.org/10.1007/s10586-017-1101-3

Download citation

Received: 24 February 2017
Revised: 16 June 2017
Accepted: 03 August 2017
Published: 24 August 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10586-017-1101-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and implementation of an efficient flushing scheme for cloud key-value storage

Abstract

Access this article

Similar content being viewed by others

PaxStore : A Distributed Key Value Storage System

Transactional Failure Recovery for a Distributed Key-Value Store

Performance Analysis of Key-Value Stores with Consistent Replica Selection Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Design and implementation of an efficient flushing scheme for cloud key-value storage

Abstract

Access this article

Similar content being viewed by others

PaxStore : A Distributed Key Value Storage System

Transactional Failure Recovery for a Distributed Key-Value Store

Performance Analysis of Key-Value Stores with Consistent Replica Selection Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation