Abstract
In recent years, Raft has been gradually widely used in many distributed systems (e.g., Etcd, TiKV, PolarFS, etc.) to ensure the distributed consensus because it is effective and easy to implement. However, because the performance of the virtual nodes in cloud environments is usually heterogeneous and fluctuant due to the “noisy neighbor” problem and the cost efficiency, the strong leader mechanism makes the Raft protocol encounter a serious performance challenge. Specifically, when the performance of the leader node is low, the whole system performance will descend accordingly since both the write and the read requests serving will be blocked by the slow leader processing. Aiming to solve this problem, we proposed a modified version of Raft specially optimized for virtualized environments, i.e., vRaft. It breaks Raft’s strong leader restriction and can fully utilize the temporarily fast followers to accelerate both the write and the read requests processing in a virtualized cloud environment, without affecting the linearizability guarantee of Raft. The experiments based on the virtual nodes in Tencent Cloud indicate that vRaft improves the throughput by up to 64.2%, reduces average latency by 38.1%, and shortens the tail latency by 88.5% in a typical read/write-balanced workload compared with Raft.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lamport, L.: Paxos made simple. ACM SIGACT News 32(4), 18–25 (2001)
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: 2014 USENIX Annual Technical Conference (USENIXATC 2014), pp. 305–319 (2014)
Ongaro, D.: Consensus: bridging theory and practice. Stanford University (2014)
Cao, W., Liu, Z., Wang, P., et al.: PolarFS: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proc. VLDB Endow. 11(12), 1849–1862 (2018)
Where can I get Raft? https://raft.github.io/#implementations
Kernel-based Virtual Machine. https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
docker. https://www.docker.com/
Performance interference and noisy neighbors. https://en.wikipedia.org/wiki/Cloud_computing_issues#Performance_interference_and_noisy_neighbors
Misra, P.A., Borge, M.F., Goiri, Í., et al.: Managing tail latency in datacenter-scale file systems under production constraints. In: Proceedings of the Fourteenth EuroSys Conference, p. 17. ACM (2019)
Tencent Cloud. https://intl.cloud.tencent.com/
Flexible I/O Tester. https://github.com/axboe/fio
Kleppmann, M.: Designing Data-intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media Inc., Sebastopol (2017)
RocksDB. http://rocksdb.org/
go-ycsb. https://github.com/pingcap/go-ycsb
Cooper, B.F., Silberstein, A., Tam, E., et al.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
go-ycsb workloads. https://github.com/pingcap/go-ycsb/tree/master/workloads
Howard, H., Schwarzkopf, M., Madhavapeddy, A., et al.: Raft refloated: do we have consensus? ACM SIGOPS Oper. Syst. Rev. 49, 12–21 (2015)
Howard, H.: ARC: analysis of Raft consensus. Computer Laboratory, University of Cambridge (2014)
Fluri, C., Melnyk, D., Wattenhofer, R.: Improving raft when there are failures. In: 2018 Eighth Latin-American Symposium on Dependable Computing (LADC), pp. 167–170. IEEE (2018)
Sakic, E., Kellerer, W.: Response time and availability study of RAFT consensus in distributed SDN control plane. IEEE Trans. Netw. Serv. Manage. 15(1), 304–318 (2017)
Zhang, Y., Ramadan, E., Mekky, H., et al.: When raft meets SDN: how to elect a leader and reach consensus in an unruly network. In: Proceedings of the First Asia-Pacific Workshop on Networking, pp. 1–7. ACM (2017)
Kim, T., Choi, S.G., Myung, J., et al.: Load balancing on distributed datastore in opendaylight SDN controller cluster. In: 2017 IEEE Conference on Network Softwarization (NetSoft), pp. 1–3. IEEE (2017)
Sorensen, J., Xiao, A., Allender, D.: Dual-leader master election for distributed systems (Obiden) (2018)
Hanmer, R., Jagadeesan, L., Mendiratta, V., et al.: Friend or foe: strong consistency vs. overload in high-availability distributed systems and SDN. In: 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 59–64. IEEE (2018)
Copeland, C., Zhong, H.: Tangaroa: a byzantine fault tolerant raft (2016)
Wang, C., Jiang, J., Chen, X., et al.: APUS: fast and scalable paxos on RDMA. In: Proceedings of the 2017 Symposium on Cloud Computing, pp. 94–107. ACM (2017)
Ho, C.C., Wang, K., Hsu, Y.H.: A fast consensus algorithm for multiple controllers in software-defined networks. In: 2016 18th International Conference on Advanced Communication Technology (ICACT), pp. 112–116. IEEE (2016)
Acknowledgement
This work is supported by the National Key Research and Development Program of China (No. 2019YFE0198600), National Natural Science Foundation of China (No. 61972402, 61972275, and 61732014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Chai, Y. (2021). vRaft: Accelerating the Distributed Consensus Under Virtualized Environments. In: Jensen, C.S., et al. Database Systems for Advanced Applications. DASFAA 2021. Lecture Notes in Computer Science(), vol 12681. Springer, Cham. https://doi.org/10.1007/978-3-030-73194-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-73194-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73193-9
Online ISBN: 978-3-030-73194-6
eBook Packages: Computer ScienceComputer Science (R0)