Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 9 of 9
  • Item
    Thumbnail Image
    Workflow Scheduling in Cloud and Edge Computing Environments with Deep Reinforcement Learning
    Jayanetti, Jayanetti Arachchige Amanda Manomi ( 2023-08)
    Cloud computing has firmly established itself as a mandatory platform for delivering computing services over the internet in an efficient manner. More recently, novel computing paradigms such as edge computing have also emerged to complement the traditional cloud computing paradigm. Owing to the multitude of benefits offered by cloud and edge computing environments, these platforms are increasingly used for the execution of workflows. The problem of scheduling workflows in a distributed system is NP-Hard in the general case. Scheduling workflows across highly dynamic cloud and edge computing environments is even more complex due to inherent challenges associated with these environments including the need to satisfy diverse contradictory objectives, coordinating executions across highly distributed infrastructures and dynamicity of the operating conditions. These requirements collectively give rise to the need for adaptive workflow scheduling algorithms that are capable of satisfying diverse optimization goals amid highly dynamic conditions. Deep Reinforcement Learning (DRL) has emerged as a promising paradigm for dealing with highly dynamic and complex problems due to the ability of DRL agents to learn to operate in stochastic environments. Despite the benefits of DRL, there are multiple challenges associated with the application of DRL techniques including multi-objectivity, curse of dimensionality, partial observability and multi-agent coordination. In this thesis, we propose novel DRL algorithms and architectures to efficiently overcome these challenges.
  • Item
    Thumbnail Image
    Microservices-based Internet of Things Applications Placement in Fog Computing Environments
    Pallewatta, Pallewatta Kankanamge Samodha Kanchani ( 2023-02)
    The Internet of Things (IoT) paradigm is rapidly improving various application domains such as healthcare, smart city, Industrial IoT (IIoT), and intelligent transportation by interweaving sensors, actuators and data analytics platforms to create smart environments. Initially, the cloud-centric IoT was introduced as a viable solution for processing and storing massive amounts of data generated by IoT devices. However, with rapidly increasing data volumes, data transmission from geo-distributed IoT devices to the centralised Cloud incurs high network congestion and high latency. Thus, cloud-centric IoT often fails to satisfy the Quality of Service (QoS) requirements of latency-sensitive and bandwidth-hungry IoT application services. Fog computing paradigm extends cloud-like services towards the edge of the network, thus offering low latency service delivery. However, Fog nodes are distributed, heterogeneous and resource-constrained, creating the need to utilise both Fog and Cloud resources to execute IoT applications in a QoS-aware manner. Meanwhile, MicroService Architecture (MSA) has emerged as a powerful application architecture capable of satisfying the development and deployment needs of rapidly evolving IoT applications. The fine-grained modularity of microservices, their independently deployable and scalable nature, along with the lack of centralised management, demonstrate immense potential in harnessing the power of distributed Fog and Cloud resources to meet the QoS requirements of IoT applications. Furthermore, the loosely coupled nature of microservices enables the dynamic composition of distributed microservices to achieve diverse performance requirements of IoT applications while utilising distributed computing resources. To this end, efficient placement of microservices plays a vital role, and scalable placement techniques can use MSA characteristics to harvest the full potential of the Fog computing paradigm. This thesis investigates novel placement techniques and systems for microservices-based IoT applications in Fog computing environments. Proposed approaches identify MSA characteristics to overcome challenges within the Fog computing environments and make use of them to fulfil heterogeneous QoS requirements of IoT application services in terms of service latency, budget, throughput and reliability while utilising Fog and Cloud resources in a balanced manner. This thesis advances the state-of-the-art in Fog computing by making the following key contributions: 1. A comprehensive taxonomy and literature review on the placement of microservices-based IoT applications considering different aspects, namely modelling microservices-based applications, creating application placement policies, microservice composition, and performance evaluation, in Fog computing environments. 2. A distributed placement technique for scalable deployment of microservices to minimise the latency of the application services and network usage due to IoT data transmission. 3. A robust placement technique for batch placement of microservices-based IoT applications, where the technique considers the placement of a set of applications simultaneously to optimise the QoS satisfaction of application services in terms of makespan, budget and throughput while dynamically utilising Fog and Cloud resources. 4. A reliability-aware placement technique for proactive redundant placement of microservices to improve reliability satisfaction in a throughput and cost-aware manner. 5. A software framework for microservices-based IoT application placement and dynamic composition across federated Fog and Cloud computing environments.
  • Item
    Thumbnail Image
    Energy and Time Aware Scheduling of Applications in Edge and Fog Computing Environments
    Goudarzi, Mohammad ( 2022)
    The Internet of Things (IoT) paradigm is playing a principal role in the advancement of many application scenarios such as healthcare, smart city, transportation, entertainment, and agriculture, which significantly affect the daily life of humans. The smooth execution of these applications requires sufficient computing and storing resources to support the massive amount of data generated by IoT devices. However, IoT devices are resource-limited intrinsically and are not capable of efficient processing and storage of large volumes of data. Hence, IoT devices require surrogate available resources for the smooth execution of their heterogeneous applications, which can be either computation-intensive or latency-sensitive. Cloud datacenters are among the potential resource providers for IoT devices. However, as they reside at a multi-hop distance from IoT devices, they cannot efficiently execute IoT applications, especially latency-sensitive ones. Fog computing paradigm, which extends Cloud services to the edge of the network within the proximity of IoT devices, offers low latency execution of IoT applications. Hence, it can improve the response time of IoT applications, service startup time, and network congestion. Also, it can reduce the energy consumption of IoT devices by minimizing their active time. However, Fog servers are resource-limited compared to Cloud servers, preventing them from the execution of all types of IoT applications, especially extremely computation-intensive applications. Hence, Cloud servers are used to support Fog servers to create a robust computing environment with heterogeneous types of resources. Consequently, the Fog computing paradigm is highly dynamic, distributed, and heterogeneous. Thus, without efficient scheduling techniques for the management of IoT applications, it is difficult to harness the full potential of this computing paradigm for different IoT-driven application scenarios. This thesis focuses on different scheduling techniques for the management of IoT applications in Fog computing environments while considering: a) IoT devices' characteristics, b) the structure of IoT applications, c) the context of resource providers, d) the networking characteristics of the Fog servers, e) the execution cost of running IoT applications, and f) the dynamics of computing environment. This thesis advances the state-of-the-art by making the following contributions: 1. A comprehensive taxonomy and literature review on the scheduling of IoT applications from different perspectives, namely application structure, environmental architecture, optimization properties, decision engine characteristics, and performance evaluation, in Fog computing environments. 2. A distributed Fog-driven scheduling technique for network resource allocation in dense and ultra-dense Fog computing environments to optimize throughput and satisfy users' heterogeneous demands. 3. A distributed scheduling technique for the batch placement of concurrent IoT applications to optimize the execution time of IoT applications and energy consumption of IoT devices. 4. A distributed application placement and migration management technique to optimize the execution time of IoT applications, the energy consumption of IoT devices, and the migration downtime in hierarchical Fog computing environments. 5. A Distributed Deep Reinforcement Learning (DDRL) technique for scheduling IoT applications in highly dynamic Fog computing environments to optimize the execution time of IoT applications and energy consumption of IoT devices. 6. A system software for scheduling IoT applications in multi-Cloud Fog computing environments. 7. A detailed study outlining challenges and new research directions for the scheduling of IoT applications in Fog computing environments.
  • Item
    Thumbnail Image
    Intelligent Scaling of Container-based Web Applications in Geographically Distributed Clouds
    Aldwyan, Yasser ( 2021)
    Cloud data centers are increasingly distributed around the globe. Recently, containerisation, a lightweight virtualization technology, has been rapidly adopted as an application packaging mechanism for efficient, consistent web application deployment and scaling within and across Cloud-based data centers. To leverage Cloud elasticity and scalability, containers commonly run on elastic, scalable clusters of virtual machines (VMs). Such global infrastructure and lightweight deployment capabilities offer a perfect choice for deploying latency-sensitive web applications in multiple locations to serve globally distributed users. However, managing container-based web applications, including containers and VMs, in widely dispersed data centers currently lacks intelligent deployment and elasticity capabilities from Cloud providers. This thesis investigates several problems related to the lack of such capabilities. This includes problems of deployment such as where and how to deploy VM clusters as well as geo-replicated application containers across data centers to address potential outages while considering wide-area network latency issues. It also considers how to dynamically deploy clusters across data centers to handle potential spatial workload fluctuations with minimum costs. This in turn gives rise to elasticity problems for multi-cluster container-based web applications deployed to multiple data centers. These problems include how to rapidly scale overloaded clusters at the VM level through temporary inter-cluster resource utilisation to avoid Cloud VM provisioning delays. Ideally this should provide sufficient VM resources for the timely launching of new containers in response to sudden workload spikes and avoid costly resource over-provisioning. A further challenge is how to control elastic scaling for both containers and VMs while considering application-level metrics and potential variations in container processing capacity, due to performance interference in shared Cloud data centers. Key to this is the need to optimise performance, availability and costs in a flexible and intelligent manner. This thesis aims to enhance the state-of-the-art in the deployment and elasticity of container-based web applications in geographically distributed Cloud environments, by tacking the above-mentioned problems using meta-heuristics and queuing theory. The thesis makes the following key contributions: 1. it provides an approach for latency-aware failover deployment of container-based web applications across multiple Cloud-based data centers to maintain performance with associated SLOs under normal conditions and in the presence of failures; 2. it provides an approach for dynamic elastic deployment of container-based clusters, both in terms of the quantity and placement across data centers whilst offering trade offs between cost and performance in the context of geographic web workload changes; 3. it offers a cost-efficient, rapid auto-elastic scaling approach for bursty multi-cluster container-based web applications deployed across data centers that scales containers in overloaded situations in a timely and cost-efficient fashion; 4. it presents a two-level elasticity controller algorithm that seamlessly auto-scales at both the container and VM levels based on application-level metrics and queuing-based performance models through estimating the container capacity needed without violating SLOs; 5. it supports dynamic, inter-data center latency aware container scheduling policies for cross-data center clusters that are able to optimise the overall performance, and 6. it presents extensive experiments using case studies based on the container technologies Docker, Docker-Swarm and Kubernetes on the Australia-wide distributed Cloud computing environment (NeCTAR) and international (commercial) cloud data centres.
  • Item
    Thumbnail Image
    Distributed data stream processing and task placement on edge-cloud infrastructure
    Amarasinghe, Gayashan Niroshana ( 2021)
    Indubitable growth of smart and connected edge devices with substantial processing power has made ubiquitous computing possible. These edge devices either produce streams of information related to the environment in which they are deployed or the devices can be located in proximity to such information producers. Distributed Data Stream Processing is a programming paradigm that is introduced to process these event streams to acquire relevant insights in order to make informed decisions. While deploying data stream processing frameworks on distributed cloud infrastructure has been the convention, for latency critical real-time applications that rely on data streams produced outside the cloud on the edge devices, the communication overhead between the cloud and the edge is detrimental. The privacy concerns surrounding where the data streams are processed is also contributing to the move towards utilisation of the edge devices for processing user-specific data. The emergence of Edge Computing has helped to mitigate these challenges by enabling to execute processes on edge devices to utilise their unused potential. Distributed data stream processing that shares edge and cloud computing infrastructure is a nascent field which we believe to have many practical applications in the real world such as federated learning, augmented/virtual reality and healthcare applications. In this thesis, we investigate novel modelling techniques and solutions for sharing the workload of distributed data stream processing applications that utilise edge and cloud computing infrastructure. The outcome of this study is a series of research works that emanates from a comprehensive model and a simulation framework developed using this model, which we utilise to develop workload sharing strategies that consider the intrinsic characteristics of data stream processing applications executed on edge and cloud resources. First, we focus on developing a comprehensive model for representing the inherent characteristics of data stream processing applications such as the event generation rate and the distribution of even sizes at the sources, the selectivity and productivity distribution at the operators, placement of tasks onto the resources, and recording the metrics such as end-to-end latency, processing latency, networking latency and the power consumption. We also incorporate the processing, networking, power consumption, and curating characteristics of edge and cloud computing infrastructure to the model from the perspective of data stream processing. Based on our model, we develop a simulation tool, which we call ECSNeT++, and verify its accuracy by comparing the latency and power consumption metrics acquired from the calibrated simulator and a real test-bed, both of which execute identical applications. We show that ECSNeT++ can model a real deployment, with proper calibration. With the public availability of ECSNeT++ as an open source software, and the verified accuracy of our results, ECSNeT++ can be used effectively for predicting the behaviour and performance of stream processing applications running on large scale, heterogeneous edge and cloud computing infrastructure. Next, we investigate how to optimally share the application workload between the edge and cloud computing resources while upholding quality of service requirements. A typical data stream processing application is formed as a directed acyclic graph of tasks that consist of sources that generate events, operators that process incoming events and sinks that act as destinations for event streams. In order to share the workload of such an application, these tasks need to placed onto the available computing resources. To this end, we devise an optimisation framework, consisting of a constraint satisfaction formulation and a system model, that aims to minimise end-to-end latency through appropriate placement of tasks either on cloud or edge devices. We test our optimisation framework using ECSNeT++, with realistic topologies and calibration, and show that compared to edge-only and cloud-only placements, our framework is capable of achieving 8-14% latency reduction and 14-15% energy reduction when compared to the conventional cloud only placement, and 14-16% latency reduction when compared to a naive edge only placement while also reducing the energy consumption per event by 1-5%. Finally, in order to cater the multitude of applications that operate under dynamic conditions, we propose a semi-dynamic task switching methodology that can be applied to optimise end-to-end latency of the application. Here, we approach the task placement problem for changing environment conditions in two phases: in the first phase respective locally optimal task placements are acquired for discrete environment conditions which are then fed to the second phase, where the problem is modelled as an Infinite Horizon Markov Decision Process with discounted rewards. By solving this problem, an optimal policy can be obtained and we show that this optimal policy can improve the performance of distributed data stream processing applications when compared with a dynamic greedy task placement approach as well as static task placement. For real-world applications executed on ECSNeT++, our approach can improve the latency as much as 10 - 17% on average when compared to a fully dynamic greedy approach.
  • Item
    Thumbnail Image
    Cost-efficient Management of Cloud Resources for Big Data Applications
    Islam, Muhammed Tawfiqul ( 2020)
    Analyzing a vast amount of business and user data on big data analytics frameworks is becoming a common practice in organizations to get a competitive advantage. These frameworks are usually deployed in a computing cluster to meet the analytics demands in every major domain, including business, government, financial markets, and health care. However, buying and maintaining a massive amount of on-premise resources is costly and difficult, especially for start-ups and small business organizations. Cloud computing provides infrastructure, platform, and software systems for storing and processing data. Thus, Cloud resources can be utilized to set up a cluster with a required big data processing framework. However, several challenges need to be addressed for Cloud-based big data processing which includes: deciding how much Cloud resources are needed for each application, how to maximize the utilization of these resources to improve applications' performance, and how to reduce the monetary cost of resource usages. In this thesis, we focus on a user-centric view, where a user can be either an individual or a small/medium business organization who want to deploy a big data processing framework on the Cloud. We explore how resource management techniques can be tailored to various user-demands such as performance improvement, and deadline guarantee for the applications; all while reducing the monetary cost of using the cluster. In particular, we propose efficient resource allocation and scheduling mechanisms for Cloud-deployed Apache Spark clusters.
  • Item
    Thumbnail Image
    Profit optimization of resource management for big data analytics-as-a-service platforms in cloud computing environments
    Zhao, Yali ( 2020)
    Discovering optimal resource management solutions to support data analytics to extract value from big data is an increasingly important research area. It is fair to say that the success of many organizations, companies, and individuals now relies heavily on data analytics solutions. Cloud computing greatly supports big data analytics by providing scalable resources based on user demand and supporting elastic resource provisioning in a pay-as-you-go model. Big data Analytics as a Service (AaaS) platforms provision AaaS to various domains as consumable services in an easy to use manner across cloud computing environments. AaaS platforms aim to deliver efficient data analytics solutions to benefit decision-making and problem solving in a wide range of application domains such as engineering, science, and government. However, big data analytics solutions face a range of challenges: the dynamic nature of query requests; the heterogeneity of cloud resources; the different Quality of Service (QoS) requirements; the potential for lengthy data processing times and associated expensive resource costs and dealing with big data processing demands under potentially limited/constrained budgets, deadlines and/or data accuracies. The above challenges need to be tackled by efficient resource management solutions to support AaaS platforms to deliver reliable, cost-effective and fast AaaS. Optimal resource management solutions are essential for AaaS platforms to maximize profits and minimize query times while guaranteeing Service Level Agreements (SLAs) during AaaS delivery. To tackle the above challenges, this thesis systematically studies profit optimization solutions to support AaaS platforms. Key contributions are made through a range of resource management solutions. These include admission control and resource scheduling algorithms that enable various problem scenarios where data needs to be processed under heterogeneous, constrained or limited budgets, deadlines, or accuracies with support of data splitting and/or data sampling-based methods to reduce data processing times and costs with potential accuracy trade-offs. These algorithms allow AaaS platforms to optimize profits and minimize query times through optimal resource management solutions, and thereby increase market share by maximizing query admissions and improve reputation by delivering SLA-supported AaaS solutions.
  • Item
    Thumbnail Image
    Anomaly-aware Management of Cloud Computing Resources
    Kardani Moghaddam, Sara ( 2019)
    Cloud computing supports on-demand provisioning of resources in a virtualized, shared environment. Although virtualization and elasticity characteristics of cloud resources make this paradigm feasible, however, without efficient management of resources, the cloud system’s performance can degrade substantially. Efficient management of resources is required due to the inherent dynamics of cloud environment such as workload changes or hardware and software functionality such as hardware failures and software bugs. In order to meet the performance expectations of users, a comprehensive understanding of the performance dynamics and proper management actions is required. With the advent of data analysis techniques, this goal can be achieved by analyzing large volumes of monitored data for discovering abnormalities in the performance data. This thesis focuses on the anomaly aware resource scaling mechanisms which utilize anomaly detection techniques and resource scaling mechanism in the cloud to improve the performance of the system in terms of the quality of service and utilization of resources. It demonstrates how anomaly detection techniques can help to identify abnormalities in the behaviour of the system and trigger relevant resource reconfiguration actions to reduce the performance degradations in the application. The thesis advances the state-of-the-art in this field by making following contributions: 1. A taxonomy and comprehensive survey on performance analysis frameworks in the context of cloud resource management. 2. An Isolation-based anomaly detection module to identify performance anomalies in web based applications considering cloud dynamics. 3. An Isolation based iterative feature refinement to remove unrelated and noisy features to reduce the complexity of anomaly detection process in high-dimensional data. 4. A joint anomaly aware resource scaling mechanism for cloud hosted application. The approach tries to identify both the anomaly event and the root cause of the problem and trigger proper vertical and horizontal scaling actions to avoid or reduce performance degradations. 5. An adaptive Deep Reinforcement Learning (DRL) based scaling framework which leverages the knowledge of anomaly detection module to decide on proper decision making epochs. The scaling actions are encoded in DRL action space and the knowledge of actions values are obtained by training multi-layer Neural Networks.
  • Item
    Thumbnail Image
    A big data infrastructure for real-time traffic analytics on the cloud
    Gong, Yikai ( 2019)
    With the increasing urbanisation occurring globally, cities are facing unprecedented challenges. One major challenge is related to traffic and the increasingly common congestion issues that arise in cities. At the same time, digital data is being created across all walks of life by industry, governments and society more generally. The term "big data'' has now entered common vernacular. Big data can include officially captured data, e.g. from traffic measurement systems from government organisations such as VicRoads in Australia, as well as other forms of data generated by the population at large, e.g. social media. This thesis explores the unique characteristics of traffic related data and focuses on the development and evaluation of an underpinning Cloud-based platform that can tackle some of the unique big data challenges related to such data. In particular, the thesis focuses on challenges related to the volume, velocity and variety of traffic data. We explore how different forms of data including official sensor data such as the Sydney Coordinated Adaptive Traffic System (SCATS) that is widely rolled out across Victoria and supported by VicRoads can be processed in real time, as well as how social media data such as Twitter can be used as a cheaper proxy for SCATS to better understand traffic in cities. We also develop novel real-time clustering algorithms that tackle the unique spatial and temporal aspects of traffic related data.