Robust resource management in distributed stream processing systems
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2018 Dr. Xunyun Liu
Stream processing is an emerging in-memory computing paradigm that ingests dynamic data streams with a process-once-arrival strategy. It yields real-time insights by applying continuous queries over data in motion, giving birth to a wide range of time-critical applications such as fraud detection, algorithmic trading and health surveillance. Resource management is an integral part of the deployment process to ensure that the stream processing system meets the requirements articulated in the Service Level Agreement (SLA). It involves the construction of the system deployment stack over distributed resources, as well as its continuous adjustment to deal with the constantly changing runtime environment and the fluctuating workload. However, most existing resource management techniques are optimised towards a pre-configured deployment platform, thus facing a variety of challenges in resource provisioning, operator parallelisation, task scheduling, and state management to realise robustness, i.e. maintaining a certain level of performance and reliability guarantee with minimum resource costs. In this thesis, we investigate novel techniques and solutions for robust resource management to tackle arising challenges associated with the cloud deployment of stream processing systems. The outcome is a series of research work that incorporate SLA-awareness into the resource management process and relieve the burden of the developers to monitor, analyse, and rectify the performance and reliability problems encountered during execution. Specifically, we have advanced the state-of-the-art by making the following contributions: • A stepwise profiling and controlling framework that improves application performance by automatically scaling up the parallelism degree of streaming operators. It also ensures proper resource allocation between data sources and data sinks to avoid processing backlogs and starvation. • A resource-efficient scheduler that monitors the application execution, models the resource consumption, and consolidates the task placement for improving cost efficiency without causing resource contention. • A replication-based state management framework that masks state loss in the cases of node crashes and JVM failures, which also reduces the fault-tolerance overhead by eliminating the need of synchronisation with a remote state storage. • A performance-oriented deployment framework that conducts iterative scaling of the streaming application to reach its pre-defined targets on throughput and latency, regardless of the initial amount of resource allocation.
Keywordsstream processing; resource management; task scheduling; Data Stream Management Systems; fault-tolerance
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References