Advertisement

ALOR: Adaptive Layout Optimization of Raft Groups for Heterogeneous Distributed Key-Value Stores

  • Yangyang Wang
  • Yunpeng Chai
  • Xin Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11276)

Abstract

Many distributed key-value storage systems employ the simple and effective Raft protocol to ensure data consistency. They usually assume a homogeneous node hardware configuration for the underlying cluster and thus adopt even data distribution schemes. However, today’s distributed systems tend to be heterogeneous in nodes’ I/O devices due to the regular worn I/O device replacement and the emergence of expensive new storage media (e.g., non-volatile memory). In this paper, we propose a new data layout scheme called Adaptive Layout Optimization of Raft groups (ALOR), considering the hardware heterogeneity of the cluster. ALOR aims to optimize the data layout of Raft groups to achieve a better practical load balance, which leads to higher performance. ALOR consists of two components: leader migration in Raft groups and skewed data layout based on cold data migration. We conducted experiments on a practical heterogeneous cluster, and the results indicate that, on average, ALOR improves throughput by 36.89%, reduces latency and 99th percentile tail latency by 24.54% and 21.32%, respectively.

Notes

Acknowledgement

This work is supported by the National Key Research and Development Program of China (No. 2018YFB1004401), National Natural Science Foundation of China (No. 61732014, 61472427, and 61572353), Beijing Natural Science Foundation (No. 4172031), the National Science Foundation of Tianjin (17JCYBJC15400), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (No. 16XNLQ02), and open research program of State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Science (No. CARCH201702).

References

  1. 1.
    Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: USENIX Annul Technical Conference (2013)Google Scholar
  2. 2.
    Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1, 000, 000 hours mean to you? FAST 7(1), 1–16 (2007)Google Scholar
  3. 3.
    Wikipedia: Non-volatile memory (2018). https://en.wikipedia.org/wiki/Non-volatile_memory
  4. 4.
    Lamport, L.: Paxos made simple. ACM SIGACT News (Distrib. Comput. Column) 32(4), 18–25 (2001)Google Scholar
  5. 5.
    CoreOS: ETCD Documentation (2018). http://etcd.readthedocs.io/en/latest
  6. 6.
    PingCAP: TiKV (2018). https://github.com/pingcap/tikv
  7. 7.
  8. 8.
    TiDB PingCAP: TiDB (2018). https://github.com/pingcap/tidb
  9. 9.
    Corbett, J.C., Dean, J., et al.: Spanner: Google’s globally-distributed database. ACM Trans. Comput. Syst. 31(3), 8 (2012)Google Scholar
  10. 10.
    PingCAP: go-ycsb (2018). https://github.com/pingcap/go-ycsb
  11. 11.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: ACM Symposium on Cloud Computing, pp. 143–154 (2010)Google Scholar
  12. 12.
  13. 13.
    Facebook: RocksDB (2018). http://rocksdb.org/
  14. 14.
  15. 15.
    Wang, C., Jiang, J., Chen, X., Yi, N., Cui, H.: APUS: fast and scalable Paxos on RDMA. In: Proceedings of SoCC 2017, Santa Clara, CA, USA, 24–27 September 2017, 14 pGoogle Scholar
  16. 16.
  17. 17.
    Guerraoui, R., Pavlovic, M., Seredinschi, D.A.: Incremental consistency guarantees for replicated objects. In: The Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016)Google Scholar
  18. 18.
    Alagappan, R., Ganesan, A., Patel, Y., Pillai, T.S., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Correlated crash vulnerabilities. In: The Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016)Google Scholar
  19. 19.
    Zhang, K., et al.: A distributed in-memory key-value store system on heterogeneous CPU-GPU cluster. The VLDB J. 26, 729–750 (2017)CrossRefGoogle Scholar
  20. 20.
    Dey, A., Fekete, A., Röhm, U.: Scalable transactions across heterogeneous NoSQL key-value data stores. In: The 39th International Conference on Very Large Data Bases (2013)CrossRefGoogle Scholar
  21. 21.
    Kwon, Y., Fingler, H., Hunt, T., Peter, S., Witchel, E., Anderson, T.: Strata: a cross media file system. In: ACM Symposium on Operating Systems Principles (2017)Google Scholar
  22. 22.
    Kakoulli, E., Herodotou, H.: OctopusFS: a distributed file system with tiered storage management. In: ACM Conference on Management of Data (2017)Google Scholar
  23. 23.
    Axboe, J.: Flexible I/O Tester. https://github.com/axboe/fio

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  1. 1.Key Laboratory of Data Engineering and Knowledge EngineeringMOEBeijingChina
  2. 2.School of InformationRenmin University of ChinaBeijingChina
  3. 3.College of Intelligence and ComputingTianjin UniversityTianjinChina

Personalised recommendations