Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Scheduling and management of data intensive application workflows in grid and cloud computing environments
    PANDEY, SURAJ ( 2010)
    Large-scale scientific experiments are being conducted in collaboration with teams that are dispersed globally. Each team shares its data and utilizes distributed resources for conducting experiments. As a result, scientific data are replicated and cached at distributed locations around the world. These data are part of application workflows, which are designed for reducing the complexity of executing and managing on distributed computing environments. In order to execute these workflows in time and cost efficient manner, a workflow management system must take into account the presence of multiple data sources in addition to distributed compute resources provided by platforms such as Grids and Clouds. Therefore, this thesis builds upon an existing workflow architecture and proposes enhanced scheduling algorithms, specifically designed for managing data intensive applications. It begins with a comprehensive survey of scheduling techniques that formed the core of Grid systems in the past. It proposes an architecture that incorporates data management components and examines its practical feasibility by executing several real world applications such as Functional Magnetic Resonance Imaging (fMRI), Evolutionary Multi-objective Optimization algorithms, and so forth, using distributed Grid and Cloud resources. It then proposes several heuristics based algorithms that take into account time and cost incurred for transferring data from multiple sources while scheduling tasks. All the heuristic proposed are based on multi-source-parallel-data-retrieval technique in contrast to retrieving data from a single best resource, as done in the past. In addition to non-linear modeling approach, the thesis explores iterative techniques, such as particle-swarm optimization, to obtain schedules quicker. In summary, this thesis makes several contributions towards the scheduling and management of data intensive application workflows. The major contributions are: (i) enhanced the abstract workflow architecture by including components that handle multisource parallel data transfers; (ii) deployed several real-world application workflows using the proposed architecture and tested the feasibility of the design on real test beds; (iii) proposed a non-linear model for scheduling workflows with an objective to minimize both execution time and execution cost; (iv) proposed static and dynamic workflow scheduling heuristic that leverages the presence of multiple data sources to minimize total execution time; (v) designed and implemented a particle-swarm-optimization based heuristic that provides feasible solutions to the workflow scheduling problem with good convergence; (vi) implemented a prototype workflow management system that consists of a portal as user-interface, a workflow engine that implements all the proposed scheduling heuristic and the real-world application workflows, and plug ins to communicate with Grid and Cloud resources.