Brokering algorithms for data replication and migration across cloud-based data stores
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2017 Dr. Yaser Mansouri
Cloud computing provides users with highly reliable, scalable and flexible computing and storage resources in a pay-as-you-go manner. Data storage services are gaining increasing popularity and many organizations are considering moving data out of their in-house data centers to the so-called Cloud Storage Providers (CSPs). However, reliance on a single CSP introduces challenges in terms of service unavailability, vendor lock-in, high network latency to the end users, and a non-affordable monetary cost to application providers. These factors are vital for the data-intensive applications which experience a time-varying workload, and the providers of these applications require to offer users storage services at an affordable monetary cost within the required Quality of Service (QoS). The utilization of multiple CSPs is a promising solution and provides the increment in availability, the enhancement in mobility, the decline in network latency, and the reduction in monetary cost by data dispersion across CSPs offering several storage classes with different prices and performance metrics. The selection of these storage classes is a non-trivial problem. This thesis presents a set of algorithms to address such problem and facilitates application providers with an appropriate selection of storage services so that the data management cost of data-intensive applications is minimized while the specified QoS by users is met. The thesis advances this field by making the following key contributions: (1) Data placement algorithms that select storage services for replication non-stripped and stripped objects respectively, with the given availability to minimize storage cost and with the given budget to maximize availability. (2) A dual cloud-based storage architecture for data placement, which optimizes data management cost (i.e, storage, read, write, and potential migration costs) and considers user-perceived latency for reading and writing data as a monetary cost. (3) The optimal offline algorithm and two online algorithms with provable performance guarantees for data placement, which exploit pricing differences across storage classes owned by different CSPs to optimize data management cost for a given number of replicas of the object while respecting the user-perceived latency. (4) A lightweight object placement algorithm that utilizes Geo-distributed storage classes to optimize data management cost for a number of replicas of the object that is dynamically determined. (5) Design and implementation of a prototype system for empirical studies in latency evaluation in the context of a data placement framework across two cloud providers services (Amazon S3 and Microsoft Azure).
Keywordscloud-based data stores; data replication; data migration; read cost; write cost; storage cost; monetary cost optimization
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References