Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 186
  • Item
    Thumbnail Image
    On the predictability and efficiency of cultural markets with social influence and position biases
    Abeliuk Kimelman, Andrés ( 2016)
    Every day people make a staggering number of decisions about what to buy, what to read and where to eat. The interplay between individual choices and collective opinion is responsible for much of the observed complexity of social behaviors. The impact of social influence on the behavior of individuals may distort the quality perceived by the customers, making quality and popularity out of sync. Understanding how people respond to this information will enable us to predict social behavior and even steer it towards desired goals. In this thesis, we take that step forward by studying how and to what extent one can optimize cultural markets to reduce the unpredictability and improve the efficiency of the market. Our results contrast with earlier work which focused on showing the unpredictability and inequalities created by social influence. We show, experimentally and theoretically, that social influence can help detect correctly high-quality products and that much of its induced unpredictability can be controlled. We study a dynamic process in which choices are affected by social influence and by the position in which products are displayed. This model is used to explore the evolution of cultural markets under different policies on how items are displayed. We show that in the presence of social signals, by leveraging the position effects, one can increase the expected profit and reduce the unpredictability in cultural markets. In particular, we propose two policies for displaying products and prove that the limiting distribution of market shares converges to a monopoly for the product of highest quality, making the market both optimal and predictable asymptotically. Finally, we put to experimental test our theoretical results and show a policy that mitigates the disparities between popularity and quality that emerge from social and position biases. We report results on a randomized social experiment that we conducted online. The experiment consisted of a web interface displaying science news articles that participants can read and later recommend. We evaluated different policies for presenting items to people and measure their impact on the unpredictability of the market. Our results provide a unique insight into the impact of policy decisions for displaying the products in the dynamics of cultural markets.
  • Item
    Thumbnail Image
    Simulation of whole mammalian kidneys using complex networks
    Gale, Thomas ( 2016)
    Modelling of kidney physiology can contribute to understanding of kidney function by formalising existing knowledge into mathematical equations and computational procedures. Modelling in this way can suggest further research or stimulate theoretical development. The quantitative description provided by the model can then be used to make predictions and identify further areas for experimental or theoretical research, which can then be carried out, focusing on areas where the model and reality are different, creating an iterative process of improved understanding. Better understanding of organ function can contribute to the prevention and treatment of disease, as well as to efforts to engineer artificial organs. Existing research in the area of kidney modelling generally falls into one of three categories: • Morphological and anatomical models that describe the form and structure of the kidney • Tubule and nephron physiological models that describe the function of small internal parts of the kidney • Whole kidney physiological models that describe aggregate function but without any internal detail There is little overlap or connection between these categories of kidney models as they currently exist. This thesis brings together these three types of kidney models by computer generating an anatomical model using data from rat kidneys, simulating dynamics and interactions using the resulting whole rat kidney model with explicit representation of each nephron, and comparing the simulation results against physiological data from rats. This thesis also describes methods for simulation and analysis of the physiological model using high performance computer hardware. In unifying the three types of models above, this thesis makes the following contributions: • Development of methods for automated construction of anatomical models of arteries, nephrons and capillaries based on rat kidneys. These methods produce a combined network and three-dimensional euclidean space model of kidney anatomy. • Extension of complex network kidney models to include modelling of blood flow in an arterial network and modelling of vascular coupling communication between nephrons using the same arterial network. • Development of methods for simulation of kidney models on high performance computer hardware, and storage and analysis of the resulting data. The methods used include multithreaded parallel computation and GPU hardware acceleration. • Analysis of results from whole kidney simulations explicitly modelling all nephrons in a rat kidney, including comparison with animal data at both whole organ level and the nephron level. Analysis methods that bring together the three dimensional euclidean space representation of anatomy with the complex network used for simulation are developed and applied. • Demonstration that the computational methods presented are able to scale up to the quantities of nephrons found in human kidneys.
  • Item
    Thumbnail Image
    Machine learning for feedback in massive open online courses
    HE, JIZHENG ( 2016)
    Massive Open Online Courses (MOOCs) have received widespread attention for their potential to scale higher education, with multiple platforms such as Coursera, edX and Udacity recently appearing. Online courses from elite universities around the world are offered for free, so that anyone with internet access can learn anywhere. Enormous enrolments and diversity of students have been widely observed in MOOCs. Despite their popularity, MOOCs are limited in reaching their full potential by a number of issues. One of the major problems is the notoriously low completion rates. A number of studies have focused on identifying the factors leading to this problem. One of the factors is the lack of interactivity and support. There is broad agreement in the literature that interaction and communication play an important role in improving student learning. It has been indicated that interaction in MOOCs helps students ease their feelings of isolation and frustration, develop their own knowledge, and improve learning experience. A natural way of improving interactivity is providing feedback to students on their progress and problems. MOOCs give rise to vast amounts of student engagement data, bringing opportunities to gain insights into student learning and provide feedback. This thesis focuses on applying and designing new machine learning algorithms to assist instructors in providing student feedback. In particular, we investigate three main themes: i) identifying at-risk students not completing courses as a step towards timely intervention; ii) exploring the suitability of using automatically discovered forum topics as instruments for modelling students' ability; iii) similarity search in heterogeneous information networks. The first theme can be helpful for assisting instructors to design interventions for at-risk students to improve retention. The second theme is inspired by recent research on measurement of student learning in education research communities. Educators explore the suitability of using latent complex patterns of engagement instead of traditional visible assessment tools (e.g. quizzes and assignments), to measure a hypothesised distinctive and complex learning skill of promoting learning in MOOCs. This process is often human-intensive and time-consuming. Inspired by this research, together with the importance of MOOC discussion forums for understanding student learning and providing feedback, we investigate whether students' participation across forum discussion topics can indicate their academic ability. The third theme is a generic study of utilising the rich semantic information in heterogeneous information networks to help find similar objects. MOOCs contain diverse and complex student engagement data, which is a typical example of heterogeneous information networks, and so could benefit from this study. We make the following contributions for solving the above problems. Firstly, we propose transfer learning algorithms based on regularised logistic regression, to identify students who are at risk of not completing courses weekly. Predicted probabilities with well-calibrated and smoothed properties can not only be used for the identification of at-risk students but also for subsequent interventions. We envision an intervention that presents probability of success/failure to borderline students with the hypothesis that they can be motivated by being classified as "nearly there". Secondly, we combine topic models with measurement models to discover topics from students' online forum postings. The topics are enforced to fit measurement models as statistical evidence of instruments for measuring student ability. In particular, we focus on two measurement models, the Guttman scale and the Rasch model. To the best our knowledge, this is the first study to explore the suitability of using discovered topics from MOOC forum content as instruments for measuring student ability, by combining topic models with psychometric measurement models in this way. Furthermore, these scaled topics imply a range of difficulty levels, which can be useful for monitoring the health of a course and refining curricula, student assessment, and providing personalised feedback based on student ability levels and topic difficulty levels. Thirdly, we extend an existing meta path-based similarity measure by incorporating transitive similarity and temporal dynamics in heterogeneous information networks, evaluated using the DBLP bibliographic network. The proposed similarity measure might apply to MOOC settings to find similar students or threads, or thread recommendation in MOOC forums, by modelling student interactions in MOOC forums as a heterogeneous information network.
  • Item
    Thumbnail Image
    Application of automated feedback for the improvement of data quality in web-based clinical collaborations
    Glöckner, Stephan ( 2016)
    Background Biomedical research typically relies on data collected from patients in clinical settings. This is currently a fraught process due to the diversity and heterogeneity of data management systems, the numerous data standards and the sensitivities around the access to and sharing of such data. To tackle this, international biomedical registries are often established and targeted to specific diseases and communities. The quality of the data in such registries is essential to ensure that the clinical research findings can be translated into clinical care. However, at present clinical data management systems developed for biomedical research rarely perform quality assurance procedures during ongoing data collection. Similarly, clinical trials typically perform data quality assessment at the end of the trial. This is too late. We argue that data quality assurance procedures for cost reduction and data process improvements have to be implemented as an integral and ongoing part of disease registries and the data that they are used to collect. Such an approach requires all aspects of data collection efforts are considered including the intrinsic and extrinsic motivational factors of data entry personnel and the organisations in which they work. Technical solutions that encourage better behaviour and hence improve data quality are thus encouraged. Hypothesis The web-based interactions between data entry users and data management systems can be used to improve data quality. Leveraging technological advancement of web-based registries, new feedback mechanisms can be used to improve the overall data quality of the data that is captured by registries. This should lead to streamlined and improved data capture methods that support the users and ultimately benefit the clinical research more generally. This thesis proposes that web-based data quality feedback can motivate registry data entry personnel, increase their contributions and ultimately improve the quality of registry data and its (re-)use to support clinical trials. Methods To explore causes of low data quality and user motivation, a survey and an assessment of quality indicators in a multicentre clinical setting was performed. Based on this, we developed and evaluated a stage wise framework for web-based feedback and measured data quality trends including the factors that can impact user motivation in their data entry. This was explored in the International Niemann-Pick Disease Registry (INPDR) and two major international clinical trials associated with the European Network for the Study of Adrenal Tumours (ENSAT). We also considered the role of patients in data collection through mobile applications supporting data collection with the context of the Environmental Determinants of Islet Auto-immunity (ENDIA) clinical study. Results Researchers are motivated when they see the contribution resulting from their data entry and the improvement in treatment of patients. The results of the survey and the framework evaluation highlight the effectiveness of web-based automated data quality feedback. It was discovered that data quality feedback to researchers and the research community improve the data quality. Case studies showed an increase of data quality over the period of observation of this research – noting that these studies are still ongoing. The stage wise framework to evaluate data entry user behaviour after feedback was applied to one trial, which showed that feedback encouraged users to enter both more and higher quality data. Conclusions Recent literature confirms the need for data quality feedback as an ongoing and near-real time activity associated with data capture. Centralised data monitoring requires a general framework that can be adjusted for a variety of trials and studies. The proposed stage wide research method must be improved to measure the outcome of data quality feedback against a control group and/or where known benchmarks exist. Data quality dimensions need to be adjusted to all research interests. In the age of big data and mobile health possibilities, further research needs to be performed with regards to the upcoming challenges of data trustworthiness and record eligibility to tackle current and future research objectives. The findings highlight how biomedical research registries have to be designed with focus on data quality and feedback mechanisms.
  • Item
    Thumbnail Image
    Scheduling and rostering with learning constraint solvers
    Downing, Nicholas Ronald ( 2016)
    In this research we investigate using Constraint Programming (CP) with Lazy Clause Generation (LCG), that is, constraint solvers with nogood learning, to tackle a number of well known scheduling, rostering, and planning problems. We extend a number of CP constraints to be useable in an LCG solver, and we investigate alternative search strategies that use the Unsatisfiable Cores, i.e. reasons for failure returned by the LCG solver, to guide the search for optimal solutions. We give comprehensive analysis and experiments. These experiments show that while adding more and more sophisticated constraint propagators to LCG delivers diminishing returns, unsatisfiable-core optimization which leverages the infrastructure provided by LCG can deliver significant benefits which are unavailable in CP without LCG. Overall, we demonstrate that LCG is a highly competitive technology for solving realistic industrial scheduling and rostering problems to optimality, allowing that the problems are smaller than those tackled by competing algorithms which do not prove optimality.
  • Item
    Thumbnail Image
    Inferring sensitive information from seemingly innocuous smartphone data
    Quattrone, Anthony ( 2016)
    Smartphones have become ubiquitous and provide considerable benefits for users. Personal data considered both sensitive and non-sensitive is commonly stored on smartphones. It has been established that the use of smartphones can lead to users enjoying less privacy when a data leak occurs. In this thesis, we aim to determine if smartphone stored non-sensitive data can be analyzed to derive sensitive information. We also aim to develop methods for protecting smartphone users from privacy attacks. While privacy research is an active area, more work is needed to determine the types of inferences that can be made about a person and how accurate a profile is derivable from smartphone data. We demonstrate how straightforward it is for third party app developers to embed code in inconspicuous apps to capture and mine data. Our studies show that there are a large number of apps found in popular app stores commonly requesting special app permissions in order to gain access to sensitive data. In most cases, we could not find any functional benefits in exchange for accessing the sensitive data. Additional data not local to the device but rather stored on a social networking service can also be extracted and unnoticed via mobile apps provided the user has social networking apps installed. Current research shows that users do not easily comprehend the implications of granting apps access to sensitive permissions. Apps can transfer captured data to cloud services easily via high-speed wireless networks. This is difficult for users to detect since mobile platforms do not provide alerts for when this occurs.With access to sensitive mobile data, we developed and performed a number of case studies to learn details about an individual. Modern smartphones are capable of sending continuous location updates to services providing a near real-time proxy of where a user is located.We were able to determine where a user was traveling simply by analyzing the point of interest results from continuous location queries sent to a location-based service without referencing user location data. With continuous location data, we were able to determine the personal encounters between individuals and relationships in realtime. We also found that even diagnostic data commonly used to debug apps that appears to be anonymous is useful in identifying an individual. Mobile devices disclose the indoor location information of the user indirectly via the wireless signals emitted by the device. We were able to locate users indoors by analyzing Bluetooth beacons within a 1m accuracy by using a localization scheme we developed.With the understanding that mobile data is sometimes needed by apps to provide functionality, we developed a system called PrivacyPalisade designed to protect users from revealing sensitive information. The system detects when apps are requesting uncharacteristic app permissions based on the app’s category. Overall we found that mobile smartphones store seemingly non-sensitive data that can reveal sensitive information upon further inspection and that this information is easily accessible. In turn, an analyst can use this data to build a detailed individual profile of private and sensitive information. With the growing number of users expressing privacy concerns, techniques to better protect privacy are needed to allow manufacturers to meet their users’ privacy requirements. Our proposed protection methods demonstrated in PrivacyPalisade can be adopted to make smartphone platforms more privacy aware. Thesis Contributions: In this thesis, we show how sensitive information can be inferred from seemingly innocuous data and propose a protection system by performing the following: * Provide a comprehensive literature review of the current state of privacy research and how it relates to the use of smartphones. * Demonstrate an inference attack on trajectory privacy by reconstructing a route using only query results. * Develop an algorithm that combines range-based and range-free localization schemes to perform indoor localization using Bluetooth with accuracies of up to 1m. * Analyze diagnostic data commonly sent by smartphones and used it to identify users in a dataset with accuracies of up to 97%. * Develop an algorithm to infer potential encounters of smartphone users in realtime by proposing the use of a constraint nearest neighbor (c-NN) spatial query. * Develop and demonstrate PrivacyPalisade, a system developed for Android with the aim of protecting against privacy attacks.
  • Item
    Thumbnail Image
    Natural optimisation modelling for software developers
    Francis, Kathryn Glenn ( 2016)
    Constraint solving technology has been successfully applied to large industrial combinatorial optimisation problems with high impact, but is not widely applied to smaller problems. Unfortunately, including even basic optimisation or constraint satisfaction functionality in an application currently requires the use of an entirely separate paradigm with which most software developers are not familiar. The aim of this thesis is to demonstrate the potential for an interface to a constraint solver which is much more approachable for general software developers, in order to promote wider use of the technology. Instead of defining a conventional constraint model directly, programmers use native code (in the application programming language) to define how to combine individual decisions to construct a candidate solution, and how to evaluate the result. Constraint satisfaction or optimisation is then seamlessly integrated into the wider application through automatic conversion between this definition and a conventional model solved by an external solver. We call this a native programming language interface. This thesis presents a prototype implementation of a native Java interface to a finite domain constraint solver, before exploring several ideas for improving the automated conversion from procedural code into an equivalent constraint model. This conversion process has already been studied in the context of software engineering applications, and the improvements discussed here should be transferable back to that domain. The following new techniques are presented.1. A novel query-based approach to handling destructive assignments, designed to improve the translation particularly when aliasing between variables is allowed. 2. An alternative technique (loop untangling) for creating copies of loop bodies in such a way that the uncertainty within the loop body is reduced, at the expense of a greater number of copies and an unknown execution order. 3. A new global constraint (reaching definitions) generalising the query-based translation technique to eliminate the assumption of a known order of execution for assignments (for use with loop untangling), and to achieve stronger deduction in some cases. To support these new techniques, two further contributions are included in the thesis. 1. A study into the circuit constraint, exploring how this constraint can be extended for the sub circuit/sub path case as required to constrain the execution path when using loop untangling, and how both versions can be implemented in a lazy clause generation constraint solver. 2. As part of the implementation of the reaching definitions constraint, discussion of how optional variables (solver variables which are allowed to take no value) can be implemented in a lazy clause generation solver
  • Item
    Thumbnail Image
    Auto-scaling and deployment of web applications in distributed computing clouds
    Qu, Chenhao ( 2016)
    Cloud Computing, which allows users to acquire/release resources based on real-time demand from large data centers in a pay-as-you-go model, has attracted considerable attention from the ICT industry. Many web application providers have moved or plan to move their applications to Cloud, as it enables them to focus on their core business by freeing them from the task and the cost of managing their data center infrastructures, which are often over-provisioned or under-provisioned under a dynamic workload. Applications these days commonly serve customers from geographically dispersed regions. Therefore, to meet the stringent Quality of Service (QoS) requirements, they have to be deployed in multiple data centers close to the end customer locations. However, efficiently utilizing Cloud resources to reach high cost-efficiency, low network latency, and high availability is a challenging task for web application providers, especially when the service provider intends to deploy the application in multiple geographical distributed Cloud data centers. The problems, including how to identify satisfactory Cloud offerings, how to choose geographical locations of data centers so that the network latency is minimized, how to provision the application with minimum cost incurred, and how to guarantee high availability under failures and flash crowds, should be addressed to enable QoS-aware and cost-efficient utilization of Cloud resources. In this thesis, we investigated techniques and solutions for these questions to help application providers to efficiently manage deployment and provision of their applications in distributed computing Clouds. It extended the state-of-the-art by making the following contributions: 1. A hierarchical fuzzy inference approach for identifying satisfactory Cloud services according to individual requirements. 2. Algorithms for selection of multi-Cloud data centers and deployment of applications on them to minimize Service Level Objective (SLO) violations for web applications requiring strong consistency. 3. An auto-scaler for web applications that achieves both high availability and significant cost saving by using heterogeneous spot instances. 4. An approach that mitigates the impact of short-term application overload caused by either resource failures or flash crowds in any individual data center through geographical load balancing.
  • Item
    Thumbnail Image
    Algorithms for advanced path optimization problems
    Aljubayrin, Saad ( 2016)
    With the ever-increasing popularity of smart phones appended with a Global Positioning System (GPS), people tend to use GPS-based applications to assist them in reaching their destinations. Different people can have different optimization criteria in path finding. This thesis contributes into improving the current navigation systems by studying and solving three new path finding problems. Finding Lowest-Cost Paths in Settings with Safe and Preferred Zones Problem: Given a set of safe or preferred zones with zero or low cost, this problem finds paths that minimize the cost of travel from an origin to a destination. A life-critical application of this problem is navigating through scattered populated areas (safe zones) in hazardous environments such as deserts. In a more familiar scenario, a tourist who plans to walk to a given destination may prefer a path that visits interesting streets and blocks, e.g., with interesting houses, galleries, or other sights, (proffered zones) as much as possible. We solved this problem by proposing an algorithm that utilizes the properties of hyperbolas to elegantly describe a sparsely connected safe (preferred) zones graph. Skyline Trips of Multiple POIs Categories Problem: Given a road network with a set of Points of Interest (POIs) from different categories, a list of items the user is planning to purchase and a pricing function for items at each related POI, this problem finds the skyline trips in terms of both trip length and trip aggregated cost. This problem has important applications in everyday life. Specifically, it helps people choose the most suitable trips among the skyline trips based on two dimensions: trip total length and trip aggregated cost. We proposed a framework and two effective algorithms to efficiently solve the problem in real time which produce near optimal results when tested on real datasets. Finding Non-Dominated Paths in Uncertain Road Networks Problem: Given a source and a destination, this problem finds optimal and nondominated paths connecting the source and the destination, where optimality is defined in terms of the stochastic dominance among cost distributions of paths. This algorithm helps users choose the most suitable paths based on their personal timing preferences. We design an A based framework and propose a three-stage dominance examination method that employs extreme values in each candidate path’s cost distribution for early detection of dominated paths.
  • Item
    Thumbnail Image
    Energy-efficient management of resources in container-based clouds
    Fotuhi Piraghaj, Sareh ( 2016)
    CLOUD enables access to a shared pool of virtual resources through Internet and its adoption rate is increasing because of its high availability, scalability and cost effectiveness. However, cloud data centers are one of the fastest-growing energy consumers and half of their energy consumption is wasted mostly because of inefficient allocation of the servers resources. Therefore, this thesis focuses on software level energy management techniques that are applicable to containerized cloud environments. Containerized clouds are studied as containers are increasingly gaining popularity. And containers are going to be major deployment model in cloud environments. The main objective of this thesis is to propose an architecture and algorithms to minimize the data center energy consumption while maintaining the required Quality of Service (QoS). The objective is addressed through improvements in the resource utilization both on server and virtual machine level. We investigated the two possibilities of minimizing energy consumption in a containerized cloud environment, namely the VM sizing and container consolidation. The key contributions of this thesis are as follows: 1. A taxonomy and survey of energy-efficient resource management techniques in PaaS and CaaS environments. 2. A novel architecture for virtual machine customization and task mapping in a containerized cloud environment. 3. An efficient VM sizing technique for hosting containers and investigation of the impact of workload characterization on the efficiency of the determined VM sizes. 4. A design and implementation of a simulation toolkit that enables modeling for containerized cloud environments. 5. A framework for dynamic consolidation of containers and a novel correlation-aware container consolidation algorithm. 6. A detailed comparison of energy efficiency of container consolidation algorithms with traditional virtual machine consolidation for containerized cloud environments.