Humboldt-Universität zu Berlin - Mathematisch-Naturwissenschaftliche Fakultät - Institut für Informatik

PhD Talk by Samira Akili

*Decompositions for Distributed Complex Event Processing*

Die Veranstalung findet digital per Zoom statt. Eine Zoom-Einladung finden Sie hier. (nur mit Informatik/HU-Account)

Decompositions for Distributed Complex Event Processing

Complex Event Processing (CEP) systems continuously evaluate query workloads over high-rate event streams to detect patterns of interest. While the expressiveness of common CEP languages allows the definition of intricate queries, their evaluation is computationally expensive, with common evaluation algorithms exhibiting exponential worst-case complexity.

Recently, the relevance of CEP has extended to applications involving networks of event sources, such as sensor networks, Internet of Things (IoT) systems, or smart factories. Once applied in such networked applications, the challenge of CEP query evaluation is exacerbated due to additional constraints imposed by bandwidth limitations. While the distribution of query evaluation among the event sources enables performance optimization, state-of-the-art approaches do not tap into the potential of truly distributed CEP.

In this dissertation, we study the distribution and in-network evaluation of CEP workloads.

First, we propose Multi-Sink Evaluation (MuSE) for CEP queries with rich semantics supporting Kleene closure and negation in networks resembling a complete graph. MuSE leverages extensive query decomposition techniques together with multi-sink placements to minimize the transmission of events over the network required for query evaluation. We introduce MuSE graphs as a formal model to compactly describe decomposition, event flow, and placement for a given network and query workload.

Second, we explore the complementary aspect of event transmission. While commonly the event exchange follows a push-based mechanism, where an event is sent upon its creation, the adoption of pull-based communication presents an opportunity for additional transmission cost reduction. In this scenario, events are transmitted only in response to received pull-requests. We introduce predicate-based push-pull (PrePP) plans comprising a model of pull-requests that enables fine-granular filtering of events at their sources based on query predicates.

Third, we extend our model to a setting where the underlying network graph has an arbitrary topology. In such networks, not only does event distribution offer optimization potential for distributed evaluation, but also the routes in the network over which events are forwarded for query evaluation. For this purpose, we propose In-network Evaluation (INEv) graphs which additionally capture event routing. INEv graphs employ fine-granular forwarding strategies that also take into account partial results of queries and events disseminated in the network.

Finally, we leverage our findings in the context of in-network query evaluation to address the problem of parallel CEP query processing. While bandwidth constraints significantly impact the performance of CEP evaluation in event-sourced networks, this is not the case for cluster-based setups, where communication costs may take a backseat. In scenarios where a set of processing units evaluates CEP query workload in parallel, the primary objective is to provide high throughput and low latency. We identified that the interplay of the ingestion and comparison rate offered by the processing units serves as a critical factor limiting throughput optimization. Based thereon, we design a cost model that guides the decomposition of query workloads and introduce Decomposition-based Parallel (DecoPa) plans, leveraging query decomposition to facilitate efficient, parallel CEP processing.

For all proposed models, we define formal properties, reason on their correctness, and provide construction algorithms. In several comprehensive experimental studies, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques with respect to network costs, latency, and throughput. Overall, this dissertation lays the foundation for advanced, decentralized, and parallelized CEP systems that are both sound and efficient in handling high-frequency event streams.