Computer Science
Racos: Improving Erasure Coding State Machine Replication using Leaderless Consensus
Document Type
Conference Paper
Abstract
Cloud storage systems often adopt state machine replication (SMR) to ensure reliability and availability. Most SMR systems use "full-copy"replication across all nodes, which leads to degraded performance for data-intensive workloads, due to high disk and network I/O costs. Erasure coding has recently been integrated with leader-based SMR systems to reduce the costs, e.g., RS-Paxos, CRaft, HRaft, and FRaft. However, these systems still have bottlenecks at the leader, limiting their performance when handling large datasets. To address the bottlenecks, this paper proposes Racos, which integrates erasure coding with a recent leaderless SMR protocol, Rabia Unlike Paxos or Raft, Rabia uses a leaderless design for reaching consensus, making it suitable for our purpose. Compared to a leader-based design, Racos distributes workload evenly, alleviating the bottlenecks. We integrate our system Racos with etcd, a distributed key-value storage that powers many production systems including Kubernetes. Our evaluation, using YCSB, shows that Racos outperforms the closest competitors by up to 2.26x in throughput within local-area networks and reduce median latency by up to 76.8% in wide variety of workloads.
Publication Title
SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing
Publication Date
11-2024
First Page
600
Last Page
617
ISBN
9798400712869
DOI
10.1145/3698038.3698511
Keywords
Erasure coding, Leaderless, Rabia, state machine replication
Repository Citation
Zarnstorff, Jonathan; Lebow, Lucas; Siems, Christopher; Remuck, Dillon; Ruiz, Colin; and Tseng, Lewis, "Racos: Improving Erasure Coding State Machine Replication using Leaderless Consensus" (2024). Computer Science. 240.
https://commons.clarku.edu/faculty_computer_sciences/240
APA Citation
Zarnstorff, J., Lebow, L., Siems, C., Remuck, D., Ruiz, C., & Tseng, L. (2024, November). Racos: Improving Erasure Coding State Machine Replication using Leaderless Consensus. In Proceedings of the 2024 ACM Symposium on Cloud Computing (pp. 600-617).
Cross Post Location
Student Publications