Computer Science
Randomized error removal for online spread estimation in data streaming
Document Type
Article
Abstract
Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput.
Publication Title
Proceedings of the VLDB Endowment
Publication Date
2021
Volume
14
Issue
6
First Page
1040
Last Page
1052
ISSN
2150-8097
DOI
10.14778/3447689.3447707
Keywords
accuracy improvement, data streaming, error removal, hardware and software, memory footprint, multi flows, processing overhead, state of the art
Repository Citation
Wang, Haibo; Ma, Chaoyi; Odegbile, Olufemi O.; Chen, Shigang; and Peir, Jih Kwon, "Randomized error removal for online spread estimation in data streaming" (2021). Computer Science. 174.
https://commons.clarku.edu/faculty_computer_sciences/174
APA Citation
Wang, H., Ma, C., Odegbile, O. O., Chen, S., & Peir, J. K. (2021). Randomized error removal for online spread estimation in data streaming. Proceedings of the VLDB Endowment, 14(6).