Computer Science

Racos: Improving Erasure Coding State Machine Replication using Leaderless Consensus

Document Type

Conference Paper

Abstract

Cloud storage systems often adopt state machine replication (SMR) to ensure reliability and availability. Most SMR systems use "full-copy"replication across all nodes, which leads to degraded performance for data-intensive workloads, due to high disk and network I/O costs. Erasure coding has recently been integrated with leader-based SMR systems to reduce the costs, e.g., RS-Paxos, CRaft, HRaft, and FRaft. However, these systems still have bottlenecks at the leader, limiting their performance when handling large datasets. To address the bottlenecks, this paper proposes Racos, which integrates erasure coding with a recent leaderless SMR protocol, Rabia Unlike Paxos or Raft, Rabia uses a leaderless design for reaching consensus, making it suitable for our purpose. Compared to a leader-based design, Racos distributes workload evenly, alleviating the bottlenecks. We integrate our system Racos with etcd, a distributed key-value storage that powers many production systems including Kubernetes. Our evaluation, using YCSB, shows that Racos outperforms the closest competitors by up to 2.26x in throughput within local-area networks and reduce median latency by up to 76.8% in wide variety of workloads.

Publication Title

SoCC 2024 - Proceedings of the 2024 ACM Symposium on Cloud Computing

Publication Date

11-2024

First Page

600

Last Page

617

ISBN

9798400712869

DOI

10.1145/3698038.3698511

Keywords

Erasure coding, Leaderless, Rabia, state machine replication

APA Citation

Zarnstorff, J., Lebow, L., Siems, C., Remuck, D., Ruiz, C., & Tseng, L. (2024, November). Racos: Improving Erasure Coding State Machine Replication using Leaderless Consensus. In Proceedings of the 2024 ACM Symposium on Cloud Computing (pp. 600-617).

Cross Post Location

Student Publications

Share

COinS