Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Distributed data stream processing and task placement on edge-cloud infrastructure
    Amarasinghe, Gayashan Niroshana ( 2021)
    Indubitable growth of smart and connected edge devices with substantial processing power has made ubiquitous computing possible. These edge devices either produce streams of information related to the environment in which they are deployed or the devices can be located in proximity to such information producers. Distributed Data Stream Processing is a programming paradigm that is introduced to process these event streams to acquire relevant insights in order to make informed decisions. While deploying data stream processing frameworks on distributed cloud infrastructure has been the convention, for latency critical real-time applications that rely on data streams produced outside the cloud on the edge devices, the communication overhead between the cloud and the edge is detrimental. The privacy concerns surrounding where the data streams are processed is also contributing to the move towards utilisation of the edge devices for processing user-specific data. The emergence of Edge Computing has helped to mitigate these challenges by enabling to execute processes on edge devices to utilise their unused potential. Distributed data stream processing that shares edge and cloud computing infrastructure is a nascent field which we believe to have many practical applications in the real world such as federated learning, augmented/virtual reality and healthcare applications. In this thesis, we investigate novel modelling techniques and solutions for sharing the workload of distributed data stream processing applications that utilise edge and cloud computing infrastructure. The outcome of this study is a series of research works that emanates from a comprehensive model and a simulation framework developed using this model, which we utilise to develop workload sharing strategies that consider the intrinsic characteristics of data stream processing applications executed on edge and cloud resources. First, we focus on developing a comprehensive model for representing the inherent characteristics of data stream processing applications such as the event generation rate and the distribution of even sizes at the sources, the selectivity and productivity distribution at the operators, placement of tasks onto the resources, and recording the metrics such as end-to-end latency, processing latency, networking latency and the power consumption. We also incorporate the processing, networking, power consumption, and curating characteristics of edge and cloud computing infrastructure to the model from the perspective of data stream processing. Based on our model, we develop a simulation tool, which we call ECSNeT++, and verify its accuracy by comparing the latency and power consumption metrics acquired from the calibrated simulator and a real test-bed, both of which execute identical applications. We show that ECSNeT++ can model a real deployment, with proper calibration. With the public availability of ECSNeT++ as an open source software, and the verified accuracy of our results, ECSNeT++ can be used effectively for predicting the behaviour and performance of stream processing applications running on large scale, heterogeneous edge and cloud computing infrastructure. Next, we investigate how to optimally share the application workload between the edge and cloud computing resources while upholding quality of service requirements. A typical data stream processing application is formed as a directed acyclic graph of tasks that consist of sources that generate events, operators that process incoming events and sinks that act as destinations for event streams. In order to share the workload of such an application, these tasks need to placed onto the available computing resources. To this end, we devise an optimisation framework, consisting of a constraint satisfaction formulation and a system model, that aims to minimise end-to-end latency through appropriate placement of tasks either on cloud or edge devices. We test our optimisation framework using ECSNeT++, with realistic topologies and calibration, and show that compared to edge-only and cloud-only placements, our framework is capable of achieving 8-14% latency reduction and 14-15% energy reduction when compared to the conventional cloud only placement, and 14-16% latency reduction when compared to a naive edge only placement while also reducing the energy consumption per event by 1-5%. Finally, in order to cater the multitude of applications that operate under dynamic conditions, we propose a semi-dynamic task switching methodology that can be applied to optimise end-to-end latency of the application. Here, we approach the task placement problem for changing environment conditions in two phases: in the first phase respective locally optimal task placements are acquired for discrete environment conditions which are then fed to the second phase, where the problem is modelled as an Infinite Horizon Markov Decision Process with discounted rewards. By solving this problem, an optimal policy can be obtained and we show that this optimal policy can improve the performance of distributed data stream processing applications when compared with a dynamic greedy task placement approach as well as static task placement. For real-world applications executed on ECSNeT++, our approach can improve the latency as much as 10 - 17% on average when compared to a fully dynamic greedy approach.