Abstract
Modern stream processing systems typically require ingesting and correlating data from multiple data sources. However, these sources are out of control and prone to software errors and unavailability, causing data anomalies that must be necessarily remedied before processing the data. In this context, anomaly, such as data duplication, appears as one of the most prominent challenges of stream processing. Data duplication can hinder real-time analysis of data for decision making. This paper investigates the challenges and performs an experimental analysis of operators and auxiliary tools to help with data deduplication. The results show that there is an increase in data delivery time when using external mechanisms. However, these mechanisms are essential for an ingestion process to guarantee that no data is lost and that no duplicates are persisted.
Original language | English |
---|---|
Title of host publication | DEBS '23: Proceedings of the 17th ACM International Conference on Distributed and Event-Based Systems |
Publisher | Association for Computing Machinery |
Publication date | 27 Jun 2023 |
Pages | 91–102 |
ISBN (Electronic) | 9798400701221 |
DOIs | |
Publication status | Published - 27 Jun 2023 |
Event | 17th ACM International Conference on Distributed and Event-based Systems - DEBS '23 - Neuchatel, Switzerland Duration: 27 Jun 2023 → 30 Jun 2023 |
Conference
Conference | 17th ACM International Conference on Distributed and Event-based Systems - DEBS '23 |
---|---|
Country/Territory | Switzerland |
City | Neuchatel |
Period | 27/06/2023 → 30/06/2023 |